[mpich-discuss] resource allocation and multiple mpi_comm_spawn's
Arjen van Elteren
info at arjenvanelteren.com
Fri Jun 12 04:23:25 CDT 2015
Hello,
I'm working with an application that invokes multiple mpi_comm_spawn calls.
I'm using mpiexec on a cluster without a resource manager or job queue,
so plain ssh and fork calls and everything is managed by mpich.
It looks like mpiexec (both hydra and mpd) re-use the hostfile from the
start (and do not look at already allocated resources/used nodes).
For example I have a hostfile like this:
node01:1
node02:1
node03:1
When I run a call like this:
MPI_Comm_spawn(cmd, MPI_ARGV_NULL, number_of_workers,
MPI_INFO_NULL, 2,
MPI_COMM_SELF, &worker,
MPI_ERRCODES_IGNORE);
I get an allocation like this:
node process
--------------- ------------------
node01 manager
node02 worker 1
node03 worker 2
Which is what I expected.
But when I instead do 2 calls like this (i.e. each worker has one
process, but there are 2 workers):
MPI_Comm_spawn(cmd, MPI_ARGV_NULL, number_of_workers,
MPI_INFO_NULL, 1,
MPI_COMM_SELF, &worker,
MPI_ERRCODES_IGNORE);
MPI_Comm_spawn(cmd, MPI_ARGV_NULL, number_of_workers,
MPI_INFO_NULL, 1,
MPI_COMM_SELF, &worker,
MPI_ERRCODES_IGNORE);
I get an allocation like this (both hydra and mpd):
node process
--------------- ------------------
node01 manager
node02 worker 1 + worker 2
node03
Which is not what I expected at all!
In fact, when I do this for a more complex example, I conclude that in
MPI_Comm_spawn the hostfile is simply re-interpreted for every spawn and
previous allocations in the same application are not accounted for.
I know I could set hostname in the MPI_Comm_spawn call, but then I'm
moving deployment information into my application (and I don't want to
recompile or add a commandline argument for something that should be
handled by mpiexec)
Is there an option or easy fix for this problem? (I looked at the code
of hydra, but I'm unsure how the different proxy's and processes divide
this spawning work between them (I could not easily detect one "grand
master" that does the allocation...)
Kind regards,
Arjen
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list