[mpich-discuss] MPI fail on 20 processes, but not on 10.

Wesley Bland wbland at mcs.anl.gov
Mon Jan 6 10:18:10 CST 2014


Without going through all of your code, I’d suggest making sure that you’re correctly sending and posting receives for all of the messages that you intend. I’d suggest printing out some sort of debugging messages whenever you send a message and make sure they all match up.

Wesley

On Jan 5, 2014, at 10:38 PM, Anatoly G <anatolyrishon at gmail.com> wrote:

> Hi.
> I have created an application. This application fails on MPI error.
> Assertion failed in file src/mpid/ch3/channels/nemesis/src/ch3_progress.c at line 640: pkt->type >= 0 && pkt->type < MPIDI_NEM_PKT_END
> internal ABORT - process 0
> 
> Scenario:
> Master receives messages from slaves.
> Each slave sends data using MPI_Send.
> Master receives using MPI_Irecv and MPI_Recv.
> 
> There are another errors in out*.log files. 
> Application doesn't fail with 10 processes, but fails with 20.
> 
> execute command:
> mpiexec.hydra -genvall -f MpiConfigMachines1.txt -launcher=rsh -n 20 /home/anatol-g/Grape/release_under_constr_MPI_tests_quantum/bin/linux64/rhe6/g++4.4.6/debug/mpi_rcv_any_multithread 100000 1000000 -1 -1 1 out
> 
> Please help,
> 
> Regards,
> Anatoly.
> 
> 
> <MpiConfigMachines1.txt><out_r19.log><mpi_rcv_any_multithread.cpp><out_r0.log><out_r1.log><out_r2.log><out_r3.log><out_r4.log><out_r5.log><out_r6.log><out_r7.log><out_r8.log><out_r9.log><out_r10.log><out_r11.log><out_r12.log><out_r13.log><out_r14.log><out_r15.log><out_r16.log><out_r17.log><out_r18.log>_______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140106/568d8cc2/attachment.html>


More information about the discuss mailing list