[mpich-discuss] mpiexec.hydra and oversubscription on a TORQUE/PBS cluster

Balaji, Pavan balaji at anl.gov
Wed Mar 11 11:28:08 CDT 2015


Arjen,

I'm not sure I understand exactly what you are doing.

Suppose you allocated 4 nodes, each with 4 cores --

 * If you just do "mpiexec ./foo", it should automatically pick the number of cores allocated and run a 16 process job.

 * If you do "mpiexec -n 100 ./foo", it'll run 100 processes on 4 nodes (ranks 0-3 on node 0, ranks 1-4 on node 1, ..., ranks 16-19 on node 0, ...).  But, well, you did ask mpiexec to run 100 processes, so I'm not sure it's doing anything wrong here.

Is this behavior not what you are seeing?  Or are you expecting a different behavior?

  -- Pavan

> On Mar 11, 2015, at 9:48 AM, Arjen van Elteren <info at arjenvanelteren.com> wrote:
> 
> Hello,
> 
> I would like to use mpiexec.hydra on a cluster managed with torque. I
> have compiled mpiexec.hydra with support for pbs as the resource manager
> and launcher. All of this works fine and I can submit and run jobs just
> fine on the cluster.
> 
> Unfortunately, mpiexec.hydra does not respect the available nodes gived
> to it via the PBS_NODEFILE environment variable. Instead it happily
> oversubscribes even if I give it the -ppn variable.
> 
> Apparently the torque system expects the codes to abide by the rules and
> keep within their boundaries, but does not enforce this.
> 
> The normal mpiexec provided with PBS
> (https://www.osc.edu/~djohnson/mpiexec/index.php), has accounting to do
> exactly so and will fail if you oversubscribe. (But development seems to
> have stalled)
> 
> There are cases where oversubscription is a useful option, but I would
> like mpiexec to fail when this happens (BTW it would also be nice if it
> failed when you give it a host file, this seems to be  the same logic in
> the code)
> 
> This problem seems to be located  src/pm/hydra/utils/alloc/alloc.c in
> HYDU_create_proxy_list, where the nodes are simply oversubscribed
> instead of failing.  Maybe the processor binding logic could also limit
> this, but I could not find any good reference for that.
> 
> I'm I right in assuming the place to change this is in the
> HYDU_create_proxy_list function and would it be possible to have a
> compile option that disables oversubscription? (which would make system
> management on this cluster happier). I'm a bit unsure if and how this
> would work for hierarchical proxy's, does the launch node do all the
> accounting?
> 
> Kind regards,
> 
> Arjen
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

--
Pavan Balaji  ✉️
http://www.mcs.anl.gov/~balaji

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list