[mpich-discuss] Binding ranks to CPUs + Hydra-Bug

Jan Bierbaum jan.bierbaum at tudos.org
Wed Dec 18 11:25:34 CST 2013


Hi!

Can somebody please explain the CPU binding in MPICH in more detail than 
'mpiexec -bind-to -help' does? In particular I'd like to know how to use 
'-bind-to user' correctly.

What I want to do is manually assign each rank to a CPU. Not all 
available CPUs will necessarily be used and there may be more ranks than 
CPUs. What I learned from the help text and some trial and error testing 
is the following:

- The general syntax I need is '-bind-to user:0,1,2,...', which assigns
   CPU0 to rank 0, CPU1 to rank1 and so on.
- Specifying more CPU ids than ranks does not matter - they are simply
   ignored.
- Specifying too few CPU ids will leave the remaining ranks unbound and
   thus free to migrate from CPU to CPU.

Now assume I want to run r ranks on a machine with c CPUs and utilize 
only c/2 CPUs. This works fine as long as r <= c. Otherwise, when I 
specify *exactly* c CPU ids, MPICH seems to "extend" the given list to 
the remaining ranks. Is this expected and reliable behavior or mere 
coincidence?

The main problem is that as soon as I try to hand '-bind-to' more than c 
CPU ids it triggers the following assertion fault:

> [mpiexec at local] control_cb (/home/user/mpich-3.0.4/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed
> [mpiexec at local] HYDT_dmxu_poll_wait_for_event (/home/user/mpich-3.0.4/src/pm/hydra/tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at local] HYD_pmci_wait_for_completion (/home/user/mpich-3.0.4/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
> [mpiexec at local] main (/home/user/mpich-3.0.4/src/pm/hydra/ui/mpich/mpiexec.c:331): process manager error waiting for completion

As you can see from the output I'm running MPICH 3.0.4. During the 
'configure' step I used '--with-pm=hydra --enable-hydra-procbind' to 
enable CPU binding.


Regards, Jan



More information about the discuss mailing list