[mpich-discuss] Run mpich on localhost
marto1980 at gmail.com
Sun May 10 23:44:02 CDT 2020
Thank you very much for your help. Interestingly, there are both mpich and hwloc packages for dfly. They should be working ok.
For now, with 5 cores is OK for me, 50 % of the time it starts successfully.
If I manage to compile WRF I may experiment updating mpich. But for now I prefer to stick to the library versions recommended by the WRF compilation tutorial.
Von meinem iPhone gesendet
> Am 10.05.2020 um 22:17 schrieb Tony Curtis <anthony.curtis at stonybrook.edu>:
>> On May 10, 2020, at 3:16 AM, Martin Ivanov <marto1980 at gmail.com> wrote:
>> Hello Tony,
>> Thank you very much for your reply. I am posting you the output of 'ps x', when 'mpirun -n 5 mpich-3.0.4/examples/hellow' freezes:
>> After that freeze, I killed mpirun with Ctrl + C:
>> marto at dragonfly% mpirun -n 5 mpich-3.0.4/examples/hellow
>> ^C[mpiexec at dragonfly] Sending Ctrl-C to processes as requested
>> [mpiexec at dragonfly] Press Ctrl-C again to force abort
>> [proxy:0:0 at dragonfly] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:71): assert (!(pollfds[i].revents & ~POLLIN & ~POLLOUT & ~POLLHUP & ~POLLERR)) failed
>> [proxy:0:0 at dragonfly] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
>> [mpiexec at dragonfly] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed
>> [mpiexec at dragonfly] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
>> [mpiexec at dragonfly] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
>> [mpiexec at dragonfly] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
>> The next relaunch of mpirun with 5 cores was successful. For completeness, I am attaching the output of 'ps x' after mpirun with 2 cores freezes, which it actually with 2 cores always does.
>> I hope this was helpful. I am looking forward to your reply.
> Well, I took this as an opportunity to play with dfly again, so I replicated mpich 3.0.4 and see much the same behavior. Mpirun -n 2 hangs consistently, -n 4 works reliably (= # system cores), -n 5 sometimes.
> I’m not an mpich developer so I will leave further prognostication to their capable hands, but your suspicions about hwloc from the warning messages seem to be well-founded.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the discuss