<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr">A ticket is created for it, <a href="https://trac.mpich.org/projects/mpich/ticket/2277">https://trac.mpich.org/projects/mpich/ticket/2277</a><div>Thanks for your patch.</div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div dir="ltr">--Junchao Zhang</div></div></div>


<br><div class="gmail_quote">On Tue, Jun 16, 2015 at 7:43 AM, Arjen van Elteren <span dir="ltr"><<a href="mailto:info@arjenvanelteren.com" target="_blank">info@arjenvanelteren.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br>


<br>


I've patched the hydra process manager (mpich release 3.1), to support<br>


my case (see attachment).<br>


<br>


It's a bit ugly (I have not prefixed the sort and compare function names<br>


as these are local to the alloc.c file) but it works for me and makes<br>


the resource allocation for consecutive MPI_COMM_SPAWNs more fair.<br>


<br>


All I had to patch was 'utils/alloc/alloc.c' in the hydra source-code.<br>


<br>


New procedure for allocating proxy's and assigning executables is now:<br>


<br>


1. make proxies (NEW: make a proxy for every node, do not stop when<br>


number of processes is reached)<br>


2 (NEW). sort the proxies by their nodes active number of processes<br>


(decreasing, the first proxy will have the least number of processes)<br>


3. allocate executables to the proxies<br>


<br>


Should I create a bug report for this? (I can't find a login button on<br>


the trac website)<br>


<br>


Kind regards,<br>


<br>


Arjen<br>


<div class="HOEnZb"><div class="h5"><br>


On 12-06-15 11:23, Arjen van Elteren wrote:<br>


> Hello,<br>


><br>


> I'm working with an application that invokes multiple mpi_comm_spawn calls.<br>


><br>


> I'm using mpiexec on a cluster without a resource manager or job queue,<br>


> so plain ssh and fork calls and everything is managed by mpich.<br>


><br>


> It looks like mpiexec (both hydra and mpd) re-use the hostfile from the<br>


> start (and do not look at already allocated resources/used nodes).<br>


><br>


> For example I have a hostfile like this:<br>


><br>


> node01:1<br>


> node02:1<br>


> node03:1<br>


><br>


> When I run a call like this:<br>


><br>


>  MPI_Comm_spawn(cmd, MPI_ARGV_NULL, number_of_workers,<br>


>                  MPI_INFO_NULL, 2,<br>


>                  MPI_COMM_SELF, &worker,<br>


>                  MPI_ERRCODES_IGNORE);<br>


><br>


> I get an allocation like this:<br>


><br>


> node              process<br>


> ---------------    ------------------<br>


> node01          manager<br>


> node02          worker 1<br>


> node03          worker 2<br>


><br>


> Which is what I expected.<br>


><br>


> But when I instead do 2 calls like this (i.e. each worker has one<br>


> process, but there are 2 workers):<br>


><br>


>  MPI_Comm_spawn(cmd, MPI_ARGV_NULL, number_of_workers,<br>


>                  MPI_INFO_NULL, 1,<br>


>                  MPI_COMM_SELF, &worker,<br>


>                  MPI_ERRCODES_IGNORE);<br>


><br>


>  MPI_Comm_spawn(cmd, MPI_ARGV_NULL, number_of_workers,<br>


>                  MPI_INFO_NULL, 1,<br>


>                  MPI_COMM_SELF, &worker,<br>


>                  MPI_ERRCODES_IGNORE);<br>


><br>


> I get an allocation like this (both hydra and mpd):<br>


><br>


> node              process<br>


> ---------------    ------------------<br>


> node01          manager<br>


> node02          worker 1   + worker 2<br>


> node03<br>


><br>


> Which is not what I expected at all!<br>


><br>


> In fact, when I do this for a more complex example, I conclude that in<br>


> MPI_Comm_spawn the hostfile is simply re-interpreted for every spawn and<br>


> previous allocations in the same application are not accounted for.<br>


><br>


> I know I could set hostname in the MPI_Comm_spawn call,  but then I'm<br>


> moving deployment information into my application (and I don't want to<br>


> recompile or add a commandline argument for something that should be<br>


> handled by mpiexec)<br>


><br>


> Is there an option or easy fix for this problem? (I looked at the code<br>


> of hydra, but I'm unsure how the different proxy's and processes divide<br>


> this spawning work between them (I could not easily detect one "grand<br>


> master" that does the allocation...)<br>


><br>


> Kind regards,<br>


><br>


> Arjen<br>


><br>


><br>


> _______________________________________________<br>


> discuss mailing list     <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>


> To manage subscription options or unsubscribe:<br>


> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>


<br>


</div></div><br>_______________________________________________<br>


discuss mailing list     <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>


To manage subscription options or unsubscribe:<br>


<a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br></blockquote></div><br></div>