[mpich-discuss] Error in MPI_Finalize on a simple ring test over TCP

Thomas Ropars thomas.ropars at epfl.ch
Wed Jul 10 06:53:51 CDT 2013


Hi all,

I get the following error when I try to run a simple application 
implementing a ring (each process sends to rank+1 and receives from 
rank-1). More precisely, the error occurs during the call to MPI_Finalize():

Assertion failed in file 
src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363: sc->pg_is_set
internal ABORT - process 0

Does anybody else also noticed the same error?

Here are all the details about my test:
- The error is generated with mpich-3.0.2 (but I noticed the exact same 
error with mpich-3.0.4)
- I am using IPoIB for communication between nodes (The same thing 
happens over Ethernet)
- The problem comes from TCP links. When all processes are on the same 
node, there is no error. As soon as one process is on a remote node, the 
failure occurs.
- Note also that the failure does not occur if I run a more complex code 
(eg, a NAS benchmark).

Thomas Ropars



More information about the discuss mailing list