[mpich-discuss] Fatal error in PMPI_Barrier: A process has failed, error stack:

Tony Ladd tladd at che.ufl.edu
Wed Mar 26 15:53:13 CDT 2014


I get this error when I try to use mpich across two different nodes. My 
program works on a single node. I realize this is a common error but I 
have checked all of the issues mentioned in the FAQ and I did not find 
any further discussion in the archives.

I followed the standard installation to /global/usr/bin and 
/global/usr/lib. The file system /global is automounted on the clients 
which have passwordless ssh configured.

Running either my network test code or Intel's IMB code produced a 
similar error (see attached). Both codes run on a single node under 
mpich and run across the two nodes using openmpi. Also netpipe (using 
tcp sockets works fine).

I tried including the LD_LIBRARY_PAT via the -genv option but that did 
not help. I must have some configuration issue but I do not see what. I 
have had previous versions of mpich/mvapich working without any trouble 
but I am stuck here. I would be grateful for any hints.

Tony

-- 
Tony Ladd

Chemical Engineering Department
University of Florida
Gainesville, Florida 32611-6005
USA

Email: tladd-"(AT)"-che.ufl.edu
Web    http://ladd.che.ufl.edu

Tel:   (352)-392-6509
FAX:   (352)-392-9514

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpich.log
Type: text/x-log
Size: 21367 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140326/a17563fc/attachment.bin>


More information about the discuss mailing list