[mpich-discuss] Run mpich on localhost

Martin Ivanov marto1980 at gmail.com
Sun May 10 23:44:02 CDT 2020


Hello Tony, 
Thank you very much for your help. Interestingly, there are both mpich and hwloc packages for dfly. They should be working ok. 
For now, with 5 cores is OK for me, 50 % of the time it starts successfully. 
If I manage to compile WRF I may experiment updating mpich. But for now I prefer to stick to the library versions recommended by the WRF compilation tutorial.

Best regards, 
Martin

Von meinem iPhone gesendet

> Am 10.05.2020 um 22:17 schrieb Tony Curtis <anthony.curtis at stonybrook.edu>:
> 
> 
> 
>> On May 10, 2020, at 3:16 AM, Martin Ivanov <marto1980 at gmail.com> wrote:
>> 
>> Hello Tony, 
>> Thank you very much for your reply. I am posting you the output of 'ps x', when 'mpirun -n 5 mpich-3.0.4/examples/hellow' freezes:
>> 
>> After that freeze, I killed mpirun with Ctrl + C:
>> "
>> marto at dragonfly% mpirun -n 5 mpich-3.0.4/examples/hellow   
>> ^C[mpiexec at dragonfly] Sending Ctrl-C to processes as requested
>> [mpiexec at dragonfly] Press Ctrl-C again to force abort 
>> [proxy:0:0 at dragonfly] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:71): assert (!(pollfds[i].revents & ~POLLIN & ~POLLOUT & ~POLLHUP & ~POLLERR)) failed 
>> [proxy:0:0 at dragonfly] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event 
>> [mpiexec at dragonfly] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed 
>> [mpiexec at dragonfly] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status 
>> [mpiexec at dragonfly] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event 
>> [mpiexec at dragonfly] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
>> "
>> 
>> The next relaunch of mpirun with 5 cores was successful. For completeness, I am attaching the output of 'ps x' after mpirun with 2 cores freezes, which it actually with 2 cores always does.
>> 
>> I hope this was helpful. I am looking forward to your reply.
>> 
> 
> Hi,
> 
> Well, I took this as an opportunity to play with dfly again, so I replicated mpich 3.0.4 and see much the same behavior.  Mpirun -n 2 hangs consistently, -n 4 works reliably (= # system cores), -n 5 sometimes.
> 
> I’m not an mpich developer so I will leave further prognostication to their capable hands, but your suspicions about hwloc from the warning messages seem to be well-founded.
> 
> Tony
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20200511/d8ad91bc/attachment-0001.html>


More information about the discuss mailing list