[mpich-discuss] Binding ranks to CPUs + Hydra-Bug

Ken Raffenetti raffenet at mcs.anl.gov
Wed Dec 18 12:49:15 CST 2013


Sigh, hit enter early on my last reply.

The comma separated list in the user binding option is for specifying a CPU binding per rank. So for example

'-bind-to user:0,1,0,1,0,1' binds rank 0 to cpu 0, rank 1 to cpu 1, rank 2 to cpu 0, rank 3 to cpu 1, etc.

Does that help? You will have to have bindings specified for all your ranks in the case you describe.

Ken

----- Original Message -----
> From: "Ken Raffenetti" <raffenet at mcs.anl.gov>
> To: discuss at mpich.org
> Sent: Wednesday, December 18, 2013 12:43:17 PM
> Subject: Re: [mpich-discuss] Binding ranks to CPUs + Hydra-Bug
> 
> I think the misunderstanding is in
> 
> 
> ----- Original Message -----
> > From: "Jan Bierbaum" <jan.bierbaum at tudos.org>
> > To: discuss at mpich.org
> > Sent: Wednesday, December 18, 2013 11:25:34 AM
> > Subject: [mpich-discuss] Binding ranks to CPUs + Hydra-Bug
> > 
> > Hi!
> > 
> > Can somebody please explain the CPU binding in MPICH in more detail
> > than
> > 'mpiexec -bind-to -help' does? In particular I'd like to know how
> > to
> > use
> > '-bind-to user' correctly.
> > 
> > What I want to do is manually assign each rank to a CPU. Not all
> > available CPUs will necessarily be used and there may be more ranks
> > than
> > CPUs. What I learned from the help text and some trial and error
> > testing
> > is the following:
> > 
> > - The general syntax I need is '-bind-to user:0,1,2,...', which
> > assigns
> >    CPU0 to rank 0, CPU1 to rank1 and so on.
> > - Specifying more CPU ids than ranks does not matter - they are
> > simply
> >    ignored.
> > - Specifying too few CPU ids will leave the remaining ranks unbound
> > and
> >    thus free to migrate from CPU to CPU.
> > 
> > Now assume I want to run r ranks on a machine with c CPUs and
> > utilize
> > only c/2 CPUs. This works fine as long as r <= c. Otherwise, when I
> > specify *exactly* c CPU ids, MPICH seems to "extend" the given list
> > to
> > the remaining ranks. Is this expected and reliable behavior or mere
> > coincidence?
> > 
> > The main problem is that as soon as I try to hand '-bind-to' more
> > than c
> > CPU ids it triggers the following assertion fault:
> > 
> > > [mpiexec at local] control_cb
> > > (/home/user/mpich-3.0.4/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:202):
> > > assert (!closed) failed
> > > [mpiexec at local] HYDT_dmxu_poll_wait_for_event
> > > (/home/user/mpich-3.0.4/src/pm/hydra/tools/demux/demux_poll.c:77):
> > > callback returned error status
> > > [mpiexec at local] HYD_pmci_wait_for_completion
> > > (/home/user/mpich-3.0.4/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:197):
> > > error waiting for event
> > > [mpiexec at local] main
> > > (/home/user/mpich-3.0.4/src/pm/hydra/ui/mpich/mpiexec.c:331):
> > > process manager error waiting for completion
> > 
> > As you can see from the output I'm running MPICH 3.0.4. During the
> > 'configure' step I used '--with-pm=hydra --enable-hydra-procbind'
> > to
> > enable CPU binding.
> > 
> > 
> > Regards, Jan
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> > 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 



More information about the discuss mailing list