[mpich-discuss] Error in MPI_Finalize on a simple ring test over TCP

Wesley Bland wbland at mcs.anl.gov
Wed Jul 10 07:23:34 CDT 2013


Can you send us the smallest chunk of code that still exhibits this error?

Wesley

On Jul 10, 2013, at 6:54 AM, Thomas Ropars <thomas.ropars at epfl.ch> wrote:

> Hi all,
> 
> I get the following error when I try to run a simple application implementing a ring (each process sends to rank+1 and receives from rank-1). More precisely, the error occurs during the call to MPI_Finalize():
> 
> Assertion failed in file src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363: sc->pg_is_set
> internal ABORT - process 0
> 
> Does anybody else also noticed the same error?
> 
> Here are all the details about my test:
> - The error is generated with mpich-3.0.2 (but I noticed the exact same error with mpich-3.0.4)
> - I am using IPoIB for communication between nodes (The same thing happens over Ethernet)
> - The problem comes from TCP links. When all processes are on the same node, there is no error. As soon as one process is on a remote node, the failure occurs.
> - Note also that the failure does not occur if I run a more complex code (eg, a NAS benchmark).
> 
> Thomas Ropars
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list