[mpich-discuss] MPI_Comm_Spawn causing zombies of hydra_pmi_proxy

Pavan Balaji balaji at mcs.anl.gov
Thu Mar 7 10:23:23 CST 2013


I just tried this with the mpich master and it seems to work correctly,
and there are no zombie processes (though I reduced the number of
iterations to 10000, instead of 200000).  This was a problem in mpich
once upon a time, but that was a few years ago.  Are you using the
latest version of mpich (3.0.2)?

 -- Pavan

On 03/07/2013 09:44 AM US Central Time, Silvan Brändli wrote:
> PS: The attached programs are a simplification of my code. They
> reproduce the zombie problem. Waiting for the 32k zombies takes a
> while... but I expect the same behaviour as with my original code.
> 
> Am I missing something when finishing the called program? I just use
> MPI_Comm_disconnect and MPI_Finalize.
> 
> Best regards
> Silvan
> 
> main.cpp
> 
> #include <mpi.h>;
> 
> int main(int argc, char *argv[])
> {
>   int          myrank;
>   int spawnerror;
>   int value = 123;
>   void *buf = &value;
>   MPI_Comm child_comm;
> 
>   if (MPI_Init(&argc,&argv)!=MPI_SUCCESS)
>   {
>     printf("MPI_Init failed");
>   }
> 
>   MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
> 
>   char* hiargv[] = {"23",NULL};
>   for(int i = 1; i <= 200000; i++)
>   {
>     value = i;
>     printf("Main before spawn %d\n",i);
>     MPI_Comm_spawn("./hi",hiargv, 1, MPI_INFO_NULL, myrank,
> MPI_COMM_SELF, &child_comm, &spawnerror);
>     MPI_Send(buf, 1, MPI_INTEGER, 0, 1, child_comm);
>     MPI_Comm_disconnect(&child_comm);
>   }
> 
>   MPI_Finalize();
>   return 0;
> }
> 
> hi.cpp:
> 
> #include <mpi.h>;
> 
> int main(int argc, char** argv) {
>   MPI_Comm parent;
>   MPI_Status status;
>   int err;
>   int value = -1;
>   void* buf= &value;
> 
>   if (MPI_Init(&argc,&argv)!=MPI_SUCCESS)
>   {
>     printf("MPI_Init failed");
>   }
>   MPI_Comm_get_parent(&parent);
>   if (parent == MPI_COMM_NULL) printf("No parent!");
> 
>   MPI_Recv(buf, 1, MPI_INTEGER, 0, MPI_ANY_TAG, parent, &status);
>   MPI_Comm_disconnect(&parent);
>   err = MPI_Finalize();
>   printf("hi finalized %d %d \n",err, value);
>   return 0;
> }
> 
> 
> 
> On 07.03.2013 12:38, Silvan Brändli wrote:
>> Dear all,
>>
>> again I have a question related to spawning processes. I understand the
>> situation as follows:
>>
>> My program A spawns program B. Program B spawns program C1, C2 ...
>> C10000 ...
>> Program Cx terminates correctly before Cx+1 is called, however returning
>> 1 to mpiexec. To handle this I use the workaround as described in
>> http://lists.mpich.org/pipermail/discuss/2013-February/000429.html
>>
>> Now it looks like with every Spawn a "hydra_pmi_proxy" is started, the
>> calling program is mpiexec. When the program Cx is finished this
>> "hydra_pmi_proxy" remains as a zombie until the programs A, B and
>> mpiexec are finished. When approx. 32k of those "hydra_pmi_proxy" exist
>> I get some problems (too many processes or something similar).
>>
>> What can I do to finish "hydra_pmi_proxy" while my programs A, B and
>> mpiexec are still running?
>>
>> I'm glad about every hint.
>>
>> Best regards
>> Silvan
>>
> 
> 
> 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji



More information about the discuss mailing list