[mpich-discuss] MPI fail on 20 processes, but not on 10.
    Jeff Hammond 
    jeff.science at gmail.com
       
    Mon Jan  6 11:14:30 CST 2014
    
    
  
Are you blasting the server (master) with messages from N clients
(slaves)?  At some point, that will overwhelm the communication
buffers and fail.  Can you turn off eager using the documented
environment variable?  Rendezvous-only should be much slower but not
fail.  Then you can eliminate the pathological usage in your
application.
Jeff
On Sun, Jan 5, 2014 at 10:38 PM, Anatoly G <anatolyrishon at gmail.com> wrote:
> Hi.
> I have created an application. This application fails on MPI error.
> Assertion failed in file src/mpid/ch3/channels/nemesis/src/ch3_progress.c at
> line 640: pkt->type >= 0 && pkt->type < MPIDI_NEM_PKT_END
> internal ABORT - process 0
>
> Scenario:
> Master receives messages from slaves.
> Each slave sends data using MPI_Send.
> Master receives using MPI_Irecv and MPI_Recv.
>
> There are another errors in out*.log files.
> Application doesn't fail with 10 processes, but fails with 20.
>
> execute command:
> mpiexec.hydra -genvall -f MpiConfigMachines1.txt -launcher=rsh -n 20
> /home/anatol-g/Grape/release_under_constr_MPI_tests_quantum/bin/linux64/rhe6/g++4.4.6/debug/mpi_rcv_any_multithread
> 100000 1000000 -1 -1 1 out
>
> Please help,
>
> Regards,
> Anatoly.
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
-- 
Jeff Hammond
jeff.science at gmail.com
    
    
More information about the discuss
mailing list