[mpich-discuss] Run mpich on localhost
Tony Curtis
anthony.curtis at stonybrook.edu
Sun May 10 15:17:32 CDT 2020
> On May 10, 2020, at 3:16 AM, Martin Ivanov <marto1980 at gmail.com> wrote:
>
> Hello Tony,
> Thank you very much for your reply. I am posting you the output of 'ps x', when 'mpirun -n 5 mpich-3.0.4/examples/hellow' freezes:
>
> After that freeze, I killed mpirun with Ctrl + C:
> "
> marto at dragonfly% mpirun -n 5 mpich-3.0.4/examples/hellow
> ^C[mpiexec at dragonfly] Sending Ctrl-C to processes as requested
> [mpiexec at dragonfly] Press Ctrl-C again to force abort
> [proxy:0:0 at dragonfly] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:71): assert (!(pollfds[i].revents & ~POLLIN & ~POLLOUT & ~POLLHUP & ~POLLERR)) failed
> [proxy:0:0 at dragonfly] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
> [mpiexec at dragonfly] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed
> [mpiexec at dragonfly] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at dragonfly] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
> [mpiexec at dragonfly] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
> "
>
> The next relaunch of mpirun with 5 cores was successful. For completeness, I am attaching the output of 'ps x' after mpirun with 2 cores freezes, which it actually with 2 cores always does.
>
> I hope this was helpful. I am looking forward to your reply.
>
Hi,
Well, I took this as an opportunity to play with dfly again, so I replicated mpich 3.0.4 and see much the same behavior. Mpirun -n 2 hangs consistently, -n 4 works reliably (= # system cores), -n 5 sometimes.
I’m not an mpich developer so I will leave further prognostication to their capable hands, but your suspicions about hwloc from the warning messages seem to be well-founded.
Tony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20200510/e417fcc3/attachment.html>
More information about the discuss
mailing list