[mpich-discuss] MPI_Comm_Spawn causing zombies of hydra_pmi_proxy
Silvan Brändli
silvan.braendli at tuhh.de
Thu Mar 7 09:44:31 CST 2013
PS: The attached programs are a simplification of my code. They
reproduce the zombie problem. Waiting for the 32k zombies takes a
while... but I expect the same behaviour as with my original code.
Am I missing something when finishing the called program? I just use
MPI_Comm_disconnect and MPI_Finalize.
Best regards
Silvan
main.cpp
#include <mpi.h>;
int main(int argc, char *argv[])
{
int myrank;
int spawnerror;
int value = 123;
void *buf = &value;
MPI_Comm child_comm;
if (MPI_Init(&argc,&argv)!=MPI_SUCCESS)
{
printf("MPI_Init failed");
}
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
char* hiargv[] = {"23",NULL};
for(int i = 1; i <= 200000; i++)
{
value = i;
printf("Main before spawn %d\n",i);
MPI_Comm_spawn("./hi",hiargv, 1, MPI_INFO_NULL, myrank,
MPI_COMM_SELF, &child_comm, &spawnerror);
MPI_Send(buf, 1, MPI_INTEGER, 0, 1, child_comm);
MPI_Comm_disconnect(&child_comm);
}
MPI_Finalize();
return 0;
}
hi.cpp:
#include <mpi.h>;
int main(int argc, char** argv) {
MPI_Comm parent;
MPI_Status status;
int err;
int value = -1;
void* buf= &value;
if (MPI_Init(&argc,&argv)!=MPI_SUCCESS)
{
printf("MPI_Init failed");
}
MPI_Comm_get_parent(&parent);
if (parent == MPI_COMM_NULL) printf("No parent!");
MPI_Recv(buf, 1, MPI_INTEGER, 0, MPI_ANY_TAG, parent, &status);
MPI_Comm_disconnect(&parent);
err = MPI_Finalize();
printf("hi finalized %d %d \n",err, value);
return 0;
}
On 07.03.2013 12:38, Silvan Brändli wrote:
> Dear all,
>
> again I have a question related to spawning processes. I understand the
> situation as follows:
>
> My program A spawns program B. Program B spawns program C1, C2 ...
> C10000 ...
> Program Cx terminates correctly before Cx+1 is called, however returning
> 1 to mpiexec. To handle this I use the workaround as described in
> http://lists.mpich.org/pipermail/discuss/2013-February/000429.html
>
> Now it looks like with every Spawn a "hydra_pmi_proxy" is started, the
> calling program is mpiexec. When the program Cx is finished this
> "hydra_pmi_proxy" remains as a zombie until the programs A, B and
> mpiexec are finished. When approx. 32k of those "hydra_pmi_proxy" exist
> I get some problems (too many processes or something similar).
>
> What can I do to finish "hydra_pmi_proxy" while my programs A, B and
> mpiexec are still running?
>
> I'm glad about every hint.
>
> Best regards
> Silvan
>
--
Dipl.-Ing. Silvan Brändli
Numerische Strukturanalyse mit Anwendungen in der Schiffstechnik (M-10)
Technische Universität Hamburg-Harburg
Schwarzenbergstraße 95c
21073 Hamburg
Tel. : +49 (0)40 42878 - 6187
Fax. : +49 (0)40 42878 - 6090
e-mail: silvan.braendli at tuhh.de
www : http://www.tuhh.de/skf
5th GACM Colloquium on Computational Mechanics
http://www.tu-harburg.de/gacm2013
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zombies.zip
Type: application/zip
Size: 1561 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130307/1d35a14d/attachment.zip>
More information about the discuss
mailing list