[mpich-discuss] MPI_Comm_Spawn causing zombies of hydra_pmi_proxy

Silvan Brändli silvan.braendli at tuhh.de
Thu Mar 7 09:44:31 CST 2013


PS: The attached programs are a simplification of my code. They 
reproduce the zombie problem. Waiting for the 32k zombies takes a 
while... but I expect the same behaviour as with my original code.

Am I missing something when finishing the called program? I just use 
MPI_Comm_disconnect and MPI_Finalize.

Best regards
Silvan

main.cpp

#include <mpi.h>;

int main(int argc, char *argv[])
{
   int          myrank;
   int spawnerror;
   int value = 123;
   void *buf = &value;
   MPI_Comm child_comm;

   if (MPI_Init(&argc,&argv)!=MPI_SUCCESS)
   {
     printf("MPI_Init failed");
   }

   MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

   char* hiargv[] = {"23",NULL};
   for(int i = 1; i <= 200000; i++)
   {
     value = i;
     printf("Main before spawn %d\n",i);
     MPI_Comm_spawn("./hi",hiargv, 1, MPI_INFO_NULL, myrank, 
MPI_COMM_SELF, &child_comm, &spawnerror);
     MPI_Send(buf, 1, MPI_INTEGER, 0, 1, child_comm);
     MPI_Comm_disconnect(&child_comm);
   }

   MPI_Finalize();
   return 0;
}

hi.cpp:

#include <mpi.h>;

int main(int argc, char** argv) {
   MPI_Comm parent;
   MPI_Status status;
   int err;
   int value = -1;
   void* buf= &value;

   if (MPI_Init(&argc,&argv)!=MPI_SUCCESS)
   {
     printf("MPI_Init failed");
   }
   MPI_Comm_get_parent(&parent);
   if (parent == MPI_COMM_NULL) printf("No parent!");

   MPI_Recv(buf, 1, MPI_INTEGER, 0, MPI_ANY_TAG, parent, &status);
   MPI_Comm_disconnect(&parent);
   err = MPI_Finalize();
   printf("hi finalized %d %d \n",err, value);
   return 0;
}



On 07.03.2013 12:38, Silvan Brändli wrote:
> Dear all,
>
> again I have a question related to spawning processes. I understand the
> situation as follows:
>
> My program A spawns program B. Program B spawns program C1, C2 ...
> C10000 ...
> Program Cx terminates correctly before Cx+1 is called, however returning
> 1 to mpiexec. To handle this I use the workaround as described in
> http://lists.mpich.org/pipermail/discuss/2013-February/000429.html
>
> Now it looks like with every Spawn a "hydra_pmi_proxy" is started, the
> calling program is mpiexec. When the program Cx is finished this
> "hydra_pmi_proxy" remains as a zombie until the programs A, B and
> mpiexec are finished. When approx. 32k of those "hydra_pmi_proxy" exist
> I get some problems (too many processes or something similar).
>
> What can I do to finish "hydra_pmi_proxy" while my programs A, B and
> mpiexec are still running?
>
> I'm glad about every hint.
>
> Best regards
> Silvan
>


-- 
Dipl.-Ing. Silvan Brändli
Numerische Strukturanalyse mit Anwendungen in der Schiffstechnik (M-10)

Technische Universität Hamburg-Harburg
Schwarzenbergstraße 95c
21073 Hamburg

Tel.  : +49 (0)40 42878 - 6187
Fax.  : +49 (0)40 42878 - 6090
e-mail: silvan.braendli at tuhh.de
www   : http://www.tuhh.de/skf

5th GACM Colloquium on Computational Mechanics
http://www.tu-harburg.de/gacm2013
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zombies.zip
Type: application/zip
Size: 1561 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130307/1d35a14d/attachment.zip>


More information about the discuss mailing list