[mpich-discuss] MPI_Get error with multiple threads on two nodes

Seo, Sangmin sseo at anl.gov
Tue Jul 29 20:59:36 CDT 2014


Yes. It’s working well with the mpich-dev/new-op-rma branch.

Thanks,
Sangmin

On Jul 28, 2014, at 9:28 AM, Balaji, Pavan <balaji at anl.gov> wrote:

> 
> It seems to work fine with the mpich-dev/new-op-rma branch, which will replace the mpich/master code soon.
> 
> I haven’t tested it with mpich/master.
> 
>  — Pavan
> 
> On Jul 27, 2014, at 3:03 PM, Seo, Sangmin <sseo at anl.gov> wrote:
> 
>> Hi all,
>> 
>> I encountered the following error when I run the attached MPI RMA code (simplified to reproduce the error) with two nodes. The code works well on a single node (with multiple processes and multiple threads), and had no problem with a single thread per process on two nodes. I used the current MPICH master branch, built it with default configure options, and executed on a linux cluster (MCS Breadboard node 70 and 71, Ubuntu 12.04.4, gcc 4.6.3).
>> 
>> $ mpiexec -f hosts_pthread -n 2 ./rma_get_pthread 4
>> num_threads: 4
>> Assertion failed in file src/mpid/ch3/src/ch3u_rma_sync.c at line 5979: MPIU_Object_get_ref(((curr_ptr->request))) >= 0
>> internal ABORT - process 1
>> 
>> ===================================================================================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   PID 22979 RUNNING AT bb71
>> =   EXIT CODE: 1
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> ===================================================================================
>> [proxy:0:0 at bb70] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
>> [proxy:0:0 at bb70] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
>> [proxy:0:0 at bb70] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
>> [mpiexec at bb70] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
>> [mpiexec at bb70] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
>> [mpiexec at bb70] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
>> [mpiexec at bb70] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion
>> 
>> 
>> Could someone take a look at it and tell me what I did wrong or whether it is a bug in MPICH?
>> 
>> Thank you,
>> Sangmin
>> 
>> 
>> 
>> <rma_get_pthread.c>_______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> 
> --
> Pavan Balaji  ✉️
> http://www.mcs.anl.gov/~balaji
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list