[mpich-discuss] Error in MPI_Finalize on a simple ring test over TCP
Thomas Ropars
thomas.ropars at epfl.ch
Wed Jul 10 06:53:51 CDT 2013
Hi all,
I get the following error when I try to run a simple application
implementing a ring (each process sends to rank+1 and receives from
rank-1). More precisely, the error occurs during the call to MPI_Finalize():
Assertion failed in file
src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363: sc->pg_is_set
internal ABORT - process 0
Does anybody else also noticed the same error?
Here are all the details about my test:
- The error is generated with mpich-3.0.2 (but I noticed the exact same
error with mpich-3.0.4)
- I am using IPoIB for communication between nodes (The same thing
happens over Ethernet)
- The problem comes from TCP links. When all processes are on the same
node, there is no error. As soon as one process is on a remote node, the
failure occurs.
- Note also that the failure does not occur if I run a more complex code
(eg, a NAS benchmark).
Thomas Ropars
More information about the discuss
mailing list