[mpich-discuss] MPI_File_open fails on the cluster (2 node)
Wesley Bland
wbland at mcs.anl.gov
Tue Aug 13 10:25:13 CDT 2013
Can you update to a recent version of MPICH? 1.4.1p is very old at this point. The most recent stable version is 3.0.4.
Wesley
On Tuesday, August 13, 2013 at 9:43 AM, 정재용 wrote:
> I set up 2 node cluster and installed mpich by using intel compiler. OS is CentOS 6.4
>
> Each node is connected by ethernet. They shared directories where the compiler and mpich are installed, by using nfs.
>
> SSH co nnection can be done without password.
>
> By way of experiment, I compiled simple example of MPI_File_open with mpicc and executed by mpiexec on the nfs directory.
>
> The code is
>
> #include "mpi.h"
> #include
>
>
> int main( int argc, char *argv[] )
> {
> MPI_Fint handleA, handleB;
> int rc;
> int errs = 0;
> int rank;
> MPI_File cFile;
>
>
> MPI_Init( &argc, &argv );
> MPI_Comm_rank( MPI_COMM_WORLD, &rank );
> rc = MPI_File_open( MPI_COMM_WORLD, "temp", MPI_MODE_RDWR | MPI_MODE_DELETE_ON_CLOSE | MPI_MODE_CREATE, MPI_INFO_NULL, &cFile );
> if (rc) {
> printf( "Unable to open file \"temp\"\n" );fflush(stdout);
> }
> else {
> MPI_File_close( &cFile );
> }
> MPI_Finalize();
> return 0;
> }
>
>
>
>
>
>
> This example just opens the file and closes it in parallel.
>
> On the single machine, it worked well.
>
> However, it didn't work when 2 machines' processor executed the code in parallel.
>
> The error message:
>
> Internal Error: invalid error code 209e0e (Ring ids do not match) in MPIR_Bcast_intra:1119
> Fatal error in PMPI_Bcast: Other MPI error, error stack:
> PMPI_Bcast(1478)......: MPI_Bcast(b uf=0x1deb980, count=1, MPI_CHAR, root=0, comm=0x84000004) failed
> MPIR_Bcast_impl(1321).:
> MPIR_Bcast_intra(1119):
>
> When I added the prefix nfs: to the filename, as "nfs:temp", it worked well.
>
> However, I don't want to add that because I should execute very big code on the cluster.
>
> It is very hard to modify the code. Would you let me know what the problem is and how the problem is solved?
>
> The nfs options are
> (client)
> mount -t nfs -o noac,nfsvers=3 "server_directory" "mount_directory"
> (server, /etc/exports)
> sync,rw,no_root_squash
>
> I shared /opt and /home/username
>
> mpich configuration is
> MPICH2 Version: 1.4.1p1
> MPICH2 Release date: Thu Sep 1 13:53:02 CDT 2011
> MPICH2 Device: ch3:nemesis
> MPICH2 configure: --prefix=/home/master/lib/mpich/mpich2-1.4.1p1 CC=icc F77=ifort CXX=icpc FC=ifort --enable-romio --with-file-system=nfs+ufs --with-pm=hydra
> MPICH2 CC: icc -O2
> MPICH2 CXX: icpc -O2
> MPICH2 F77: ifort -O2
> MPICH2 FC: ifort -O2
>
> ------------------------------------------------------
> Jaeyong Jeong
> Department of Mechanical Engineering
> Pohang University of Science and Technology
>
>
>
>
>
>
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org (mailto:discuss at mpich.org)
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130813/2f1ba840/attachment.html>
More information about the discuss
mailing list