[mpich-discuss] resource allocation and multiple mpi_comm_spawn's

Junchao Zhang jczhang at mcs.anl.gov
Tue Jun 16 17:54:14 CDT 2015


A ticket is created for it,
https://trac.mpich.org/projects/mpich/ticket/2277
Thanks for your patch.

--Junchao Zhang

On Tue, Jun 16, 2015 at 7:43 AM, Arjen van Elteren <info at arjenvanelteren.com
> wrote:

> Hello,
>
> I've patched the hydra process manager (mpich release 3.1), to support
> my case (see attachment).
>
> It's a bit ugly (I have not prefixed the sort and compare function names
> as these are local to the alloc.c file) but it works for me and makes
> the resource allocation for consecutive MPI_COMM_SPAWNs more fair.
>
> All I had to patch was 'utils/alloc/alloc.c' in the hydra source-code.
>
> New procedure for allocating proxy's and assigning executables is now:
>
> 1. make proxies (NEW: make a proxy for every node, do not stop when
> number of processes is reached)
> 2 (NEW). sort the proxies by their nodes active number of processes
> (decreasing, the first proxy will have the least number of processes)
> 3. allocate executables to the proxies
>
> Should I create a bug report for this? (I can't find a login button on
> the trac website)
>
> Kind regards,
>
> Arjen
>
> On 12-06-15 11:23, Arjen van Elteren wrote:
> > Hello,
> >
> > I'm working with an application that invokes multiple mpi_comm_spawn
> calls.
> >
> > I'm using mpiexec on a cluster without a resource manager or job queue,
> > so plain ssh and fork calls and everything is managed by mpich.
> >
> > It looks like mpiexec (both hydra and mpd) re-use the hostfile from the
> > start (and do not look at already allocated resources/used nodes).
> >
> > For example I have a hostfile like this:
> >
> > node01:1
> > node02:1
> > node03:1
> >
> > When I run a call like this:
> >
> >  MPI_Comm_spawn(cmd, MPI_ARGV_NULL, number_of_workers,
> >                  MPI_INFO_NULL, 2,
> >                  MPI_COMM_SELF, &worker,
> >                  MPI_ERRCODES_IGNORE);
> >
> > I get an allocation like this:
> >
> > node              process
> > ---------------    ------------------
> > node01          manager
> > node02          worker 1
> > node03          worker 2
> >
> > Which is what I expected.
> >
> > But when I instead do 2 calls like this (i.e. each worker has one
> > process, but there are 2 workers):
> >
> >  MPI_Comm_spawn(cmd, MPI_ARGV_NULL, number_of_workers,
> >                  MPI_INFO_NULL, 1,
> >                  MPI_COMM_SELF, &worker,
> >                  MPI_ERRCODES_IGNORE);
> >
> >  MPI_Comm_spawn(cmd, MPI_ARGV_NULL, number_of_workers,
> >                  MPI_INFO_NULL, 1,
> >                  MPI_COMM_SELF, &worker,
> >                  MPI_ERRCODES_IGNORE);
> >
> > I get an allocation like this (both hydra and mpd):
> >
> > node              process
> > ---------------    ------------------
> > node01          manager
> > node02          worker 1   + worker 2
> > node03
> >
> > Which is not what I expected at all!
> >
> > In fact, when I do this for a more complex example, I conclude that in
> > MPI_Comm_spawn the hostfile is simply re-interpreted for every spawn and
> > previous allocations in the same application are not accounted for.
> >
> > I know I could set hostname in the MPI_Comm_spawn call,  but then I'm
> > moving deployment information into my application (and I don't want to
> > recompile or add a commandline argument for something that should be
> > handled by mpiexec)
> >
> > Is there an option or easy fix for this problem? (I looked at the code
> > of hydra, but I'm unsure how the different proxy's and processes divide
> > this spawning work between them (I could not easily detect one "grand
> > master" that does the allocation...)
> >
> > Kind regards,
> >
> > Arjen
> >
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150616/2f8a3f8c/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list