[mpich-discuss] MPI_Comm_Spawn causing zombies of hydra_pmi_proxy
Pavan Balaji
balaji at mcs.anl.gov
Thu Mar 7 10:23:23 CST 2013
I just tried this with the mpich master and it seems to work correctly,
and there are no zombie processes (though I reduced the number of
iterations to 10000, instead of 200000). This was a problem in mpich
once upon a time, but that was a few years ago. Are you using the
latest version of mpich (3.0.2)?
-- Pavan
On 03/07/2013 09:44 AM US Central Time, Silvan Brändli wrote:
> PS: The attached programs are a simplification of my code. They
> reproduce the zombie problem. Waiting for the 32k zombies takes a
> while... but I expect the same behaviour as with my original code.
>
> Am I missing something when finishing the called program? I just use
> MPI_Comm_disconnect and MPI_Finalize.
>
> Best regards
> Silvan
>
> main.cpp
>
> #include <mpi.h>;
>
> int main(int argc, char *argv[])
> {
> int myrank;
> int spawnerror;
> int value = 123;
> void *buf = &value;
> MPI_Comm child_comm;
>
> if (MPI_Init(&argc,&argv)!=MPI_SUCCESS)
> {
> printf("MPI_Init failed");
> }
>
> MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
>
> char* hiargv[] = {"23",NULL};
> for(int i = 1; i <= 200000; i++)
> {
> value = i;
> printf("Main before spawn %d\n",i);
> MPI_Comm_spawn("./hi",hiargv, 1, MPI_INFO_NULL, myrank,
> MPI_COMM_SELF, &child_comm, &spawnerror);
> MPI_Send(buf, 1, MPI_INTEGER, 0, 1, child_comm);
> MPI_Comm_disconnect(&child_comm);
> }
>
> MPI_Finalize();
> return 0;
> }
>
> hi.cpp:
>
> #include <mpi.h>;
>
> int main(int argc, char** argv) {
> MPI_Comm parent;
> MPI_Status status;
> int err;
> int value = -1;
> void* buf= &value;
>
> if (MPI_Init(&argc,&argv)!=MPI_SUCCESS)
> {
> printf("MPI_Init failed");
> }
> MPI_Comm_get_parent(&parent);
> if (parent == MPI_COMM_NULL) printf("No parent!");
>
> MPI_Recv(buf, 1, MPI_INTEGER, 0, MPI_ANY_TAG, parent, &status);
> MPI_Comm_disconnect(&parent);
> err = MPI_Finalize();
> printf("hi finalized %d %d \n",err, value);
> return 0;
> }
>
>
>
> On 07.03.2013 12:38, Silvan Brändli wrote:
>> Dear all,
>>
>> again I have a question related to spawning processes. I understand the
>> situation as follows:
>>
>> My program A spawns program B. Program B spawns program C1, C2 ...
>> C10000 ...
>> Program Cx terminates correctly before Cx+1 is called, however returning
>> 1 to mpiexec. To handle this I use the workaround as described in
>> http://lists.mpich.org/pipermail/discuss/2013-February/000429.html
>>
>> Now it looks like with every Spawn a "hydra_pmi_proxy" is started, the
>> calling program is mpiexec. When the program Cx is finished this
>> "hydra_pmi_proxy" remains as a zombie until the programs A, B and
>> mpiexec are finished. When approx. 32k of those "hydra_pmi_proxy" exist
>> I get some problems (too many processes or something similar).
>>
>> What can I do to finish "hydra_pmi_proxy" while my programs A, B and
>> mpiexec are still running?
>>
>> I'm glad about every hint.
>>
>> Best regards
>> Silvan
>>
>
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the discuss
mailing list