[mpich-discuss] Binding ranks to CPUs + Hydra-Bug

Ken Raffenetti raffenet at mcs.anl.gov
Wed Dec 18 12:43:17 CST 2013


I think the misunderstanding is in


----- Original Message -----
> From: "Jan Bierbaum" <jan.bierbaum at tudos.org>
> To: discuss at mpich.org
> Sent: Wednesday, December 18, 2013 11:25:34 AM
> Subject: [mpich-discuss] Binding ranks to CPUs + Hydra-Bug
> 
> Hi!
> 
> Can somebody please explain the CPU binding in MPICH in more detail
> than
> 'mpiexec -bind-to -help' does? In particular I'd like to know how to
> use
> '-bind-to user' correctly.
> 
> What I want to do is manually assign each rank to a CPU. Not all
> available CPUs will necessarily be used and there may be more ranks
> than
> CPUs. What I learned from the help text and some trial and error
> testing
> is the following:
> 
> - The general syntax I need is '-bind-to user:0,1,2,...', which
> assigns
>    CPU0 to rank 0, CPU1 to rank1 and so on.
> - Specifying more CPU ids than ranks does not matter - they are
> simply
>    ignored.
> - Specifying too few CPU ids will leave the remaining ranks unbound
> and
>    thus free to migrate from CPU to CPU.
> 
> Now assume I want to run r ranks on a machine with c CPUs and utilize
> only c/2 CPUs. This works fine as long as r <= c. Otherwise, when I
> specify *exactly* c CPU ids, MPICH seems to "extend" the given list
> to
> the remaining ranks. Is this expected and reliable behavior or mere
> coincidence?
> 
> The main problem is that as soon as I try to hand '-bind-to' more
> than c
> CPU ids it triggers the following assertion fault:
> 
> > [mpiexec at local] control_cb
> > (/home/user/mpich-3.0.4/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:202):
> > assert (!closed) failed
> > [mpiexec at local] HYDT_dmxu_poll_wait_for_event
> > (/home/user/mpich-3.0.4/src/pm/hydra/tools/demux/demux_poll.c:77):
> > callback returned error status
> > [mpiexec at local] HYD_pmci_wait_for_completion
> > (/home/user/mpich-3.0.4/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:197):
> > error waiting for event
> > [mpiexec at local] main
> > (/home/user/mpich-3.0.4/src/pm/hydra/ui/mpich/mpiexec.c:331):
> > process manager error waiting for completion
> 
> As you can see from the output I'm running MPICH 3.0.4. During the
> 'configure' step I used '--with-pm=hydra --enable-hydra-procbind' to
> enable CPU binding.
> 
> 
> Regards, Jan
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 



More information about the discuss mailing list