[mpich-discuss] mpiexec.hydra and oversubscription on a TORQUE/PBS cluster

Arjen van Elteren info at arjenvanelteren.com
Wed Mar 11 09:48:46 CDT 2015


Hello,

I would like to use mpiexec.hydra on a cluster managed with torque. I
have compiled mpiexec.hydra with support for pbs as the resource manager
and launcher. All of this works fine and I can submit and run jobs just
fine on the cluster.

Unfortunately, mpiexec.hydra does not respect the available nodes gived
to it via the PBS_NODEFILE environment variable. Instead it happily
oversubscribes even if I give it the -ppn variable.

Apparently the torque system expects the codes to abide by the rules and
keep within their boundaries, but does not enforce this.

The normal mpiexec provided with PBS
(https://www.osc.edu/~djohnson/mpiexec/index.php), has accounting to do
exactly so and will fail if you oversubscribe. (But development seems to
have stalled)

There are cases where oversubscription is a useful option, but I would
like mpiexec to fail when this happens (BTW it would also be nice if it
failed when you give it a host file, this seems to be  the same logic in
the code)

This problem seems to be located  src/pm/hydra/utils/alloc/alloc.c in
HYDU_create_proxy_list, where the nodes are simply oversubscribed
instead of failing.  Maybe the processor binding logic could also limit
this, but I could not find any good reference for that.

I'm I right in assuming the place to change this is in the
HYDU_create_proxy_list function and would it be possible to have a
compile option that disables oversubscription? (which would make system
management on this cluster happier). I'm a bit unsure if and how this
would work for hierarchical proxy's, does the launch node do all the
accounting?

Kind regards,

Arjen







_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list