[mpich-discuss] Run mpich on localhost

Tony Curtis anthony.curtis at stonybrook.edu
Sun May 10 15:17:32 CDT 2020



> On May 10, 2020, at 3:16 AM, Martin Ivanov <marto1980 at gmail.com> wrote:
> 
> Hello Tony, 
> Thank you very much for your reply. I am posting you the output of 'ps x', when 'mpirun -n 5 mpich-3.0.4/examples/hellow' freezes:
> 
> After that freeze, I killed mpirun with Ctrl + C:
> "
> marto at dragonfly% mpirun -n 5 mpich-3.0.4/examples/hellow   
> ^C[mpiexec at dragonfly] Sending Ctrl-C to processes as requested 
> [mpiexec at dragonfly] Press Ctrl-C again to force abort 
> [proxy:0:0 at dragonfly] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:71): assert (!(pollfds[i].revents & ~POLLIN & ~POLLOUT & ~POLLHUP & ~POLLERR)) failed 
> [proxy:0:0 at dragonfly] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event 
> [mpiexec at dragonfly] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed 
> [mpiexec at dragonfly] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status 
> [mpiexec at dragonfly] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event 
> [mpiexec at dragonfly] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
> "
> 
> The next relaunch of mpirun with 5 cores was successful. For completeness, I am attaching the output of 'ps x' after mpirun with 2 cores freezes, which it actually with 2 cores always does.
> 
> I hope this was helpful. I am looking forward to your reply.
> 

Hi,

Well, I took this as an opportunity to play with dfly again, so I replicated mpich 3.0.4 and see much the same behavior.  Mpirun -n 2 hangs consistently, -n 4 works reliably (= # system cores), -n 5 sometimes.

I’m not an mpich developer so I will leave further prognostication to their capable hands, but your suspicions about hwloc from the warning messages seem to be well-founded.

Tony

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20200510/e417fcc3/attachment.html>


More information about the discuss mailing list