[mpich-discuss] MPI_Comm_Spawn causing zombies of hydra_pmi_proxy
Silvan Brändli
silvan.braendli at tuhh.de
Tue Mar 19 04:47:13 CDT 2013
Dear Pavan,
in the attached example I use mpich3, but still I get the zombies. Is
there something wrong with
- my MPI function calls? (disconnect, finalize)
- my linked libraries? (see below)
Thanks in advance!
Best regards
Silvan
ldd_hi
linux-vdso.so.1 (0x00007fffa01ff000)
libmpichcxx.so.10 => /opt/mpich3/lib/libmpichcxx.so.10 (0x00007fdfec0e4000)
libmpich.so.10 => /opt/mpich3/lib/libmpich.so.10 (0x00007fdfebc6f000)
libopa.so.1 => /opt/mpich3/lib/libopa.so.1 (0x00007fdfeba6d000)
libmpl.so.1 => /opt/mpich3/lib/libmpl.so.1 (0x00007fdfeb868000)
libaio.so.1 => /lib64/libaio.so.1 (0x00007fdfeb666000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdfeb449000)
libm.so.6 => /lib64/libm.so.6 (0x00007fdfeb14e000)
libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.2/libstdc++.so.6
(0x00007fdfeae47000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fdfeac30000)
libc.so.6 => /lib64/libc.so.6 (0x00007fdfea882000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fdfea67e000)
libgfortran.so.3 =>
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.2/libgfortran.so.3 (0x00007fdfea361000)
libquadmath.so.0 =>
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.2/libquadmath.so.0 (0x00007fdfea12b000)
/lib64/ld-linux-x86-64.so.2 (0x00007fdfec308000)
librt.so.1 => /lib64/librt.so.1 (0x00007fdfe9f23000)
ldd_main
linux-vdso.so.1 (0x00007fffeb1ff000)
libmpichcxx.so.10 => /opt/mpich3/lib/libmpichcxx.so.10 (0x00007f6fc797e000)
libmpich.so.10 => /opt/mpich3/lib/libmpich.so.10 (0x00007f6fc7509000)
libopa.so.1 => /opt/mpich3/lib/libopa.so.1 (0x00007f6fc7307000)
libmpl.so.1 => /opt/mpich3/lib/libmpl.so.1 (0x00007f6fc7102000)
libaio.so.1 => /lib64/libaio.so.1 (0x00007f6fc6f00000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6fc6ce3000)
libm.so.6 => /lib64/libm.so.6 (0x00007f6fc69e8000)
libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.2/libstdc++.so.6
(0x00007f6fc66e1000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f6fc64ca000)
libc.so.6 => /lib64/libc.so.6 (0x00007f6fc611c000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f6fc5f18000)
libgfortran.so.3 =>
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.2/libgfortran.so.3 (0x00007f6fc5bfb000)
libquadmath.so.0 =>
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.2/libquadmath.so.0 (0x00007f6fc59c5000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6fc7ba2000)
librt.so.1 => /lib64/librt.so.1 (0x00007f6fc57bd000)
On 07.03.2013 17:23, Pavan Balaji wrote:
>
> I just tried this with the mpich master and it seems to work correctly,
> and there are no zombie processes (though I reduced the number of
> iterations to 10000, instead of 200000). This was a problem in mpich
> once upon a time, but that was a few years ago. Are you using the
> latest version of mpich (3.0.2)?
>
> -- Pavan
>
> On 03/07/2013 09:44 AM US Central Time, Silvan Brändli wrote:
>> PS: The attached programs are a simplification of my code. They
>> reproduce the zombie problem. Waiting for the 32k zombies takes a
>> while... but I expect the same behaviour as with my original code.
>>
>> Am I missing something when finishing the called program? I just use
>> MPI_Comm_disconnect and MPI_Finalize.
>>
>> Best regards
>> Silvan
>>
>> main.cpp
>>
>> #include <mpi.h>;
>>
>> int main(int argc, char *argv[])
>> {
>> int myrank;
>> int spawnerror;
>> int value = 123;
>> void *buf = &value;
>> MPI_Comm child_comm;
>>
>> if (MPI_Init(&argc,&argv)!=MPI_SUCCESS)
>> {
>> printf("MPI_Init failed");
>> }
>>
>> MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
>>
>> char* hiargv[] = {"23",NULL};
>> for(int i = 1; i <= 200000; i++)
>> {
>> value = i;
>> printf("Main before spawn %d\n",i);
>> MPI_Comm_spawn("./hi",hiargv, 1, MPI_INFO_NULL, myrank,
>> MPI_COMM_SELF, &child_comm, &spawnerror);
>> MPI_Send(buf, 1, MPI_INTEGER, 0, 1, child_comm);
>> MPI_Comm_disconnect(&child_comm);
>> }
>>
>> MPI_Finalize();
>> return 0;
>> }
>>
>> hi.cpp:
>>
>> #include <mpi.h>;
>>
>> int main(int argc, char** argv) {
>> MPI_Comm parent;
>> MPI_Status status;
>> int err;
>> int value = -1;
>> void* buf= &value;
>>
>> if (MPI_Init(&argc,&argv)!=MPI_SUCCESS)
>> {
>> printf("MPI_Init failed");
>> }
>> MPI_Comm_get_parent(&parent);
>> if (parent == MPI_COMM_NULL) printf("No parent!");
>>
>> MPI_Recv(buf, 1, MPI_INTEGER, 0, MPI_ANY_TAG, parent, &status);
>> MPI_Comm_disconnect(&parent);
>> err = MPI_Finalize();
>> printf("hi finalized %d %d \n",err, value);
>> return 0;
>> }
>>
>>
>>
>> On 07.03.2013 12:38, Silvan Brändli wrote:
>>> Dear all,
>>>
>>> again I have a question related to spawning processes. I understand the
>>> situation as follows:
>>>
>>> My program A spawns program B. Program B spawns program C1, C2 ...
>>> C10000 ...
>>> Program Cx terminates correctly before Cx+1 is called, however returning
>>> 1 to mpiexec. To handle this I use the workaround as described in
>>> http://lists.mpich.org/pipermail/discuss/2013-February/000429.html
>>>
>>> Now it looks like with every Spawn a "hydra_pmi_proxy" is started, the
>>> calling program is mpiexec. When the program Cx is finished this
>>> "hydra_pmi_proxy" remains as a zombie until the programs A, B and
>>> mpiexec are finished. When approx. 32k of those "hydra_pmi_proxy" exist
>>> I get some problems (too many processes or something similar).
>>>
>>> What can I do to finish "hydra_pmi_proxy" while my programs A, B and
>>> mpiexec are still running?
>>>
>>> I'm glad about every hint.
>>>
>>> Best regards
>>> Silvan
>>>
>>
>>
>>
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
--
Dipl.-Ing. Silvan Brändli
Numerische Strukturanalyse mit Anwendungen in der Schiffstechnik (M-10)
Technische Universität Hamburg-Harburg
Schwarzenbergstraße 95c
21073 Hamburg
Tel. : +49 (0)40 42878 - 6187
Fax. : +49 (0)40 42878 - 6090
e-mail: silvan.braendli at tuhh.de
www : http://www.tuhh.de/skf
5th GACM Colloquium on Computational Mechanics
http://www.tu-harburg.de/gacm2013
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zombies.zip
Type: application/zip
Size: 1821 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130319/53930c75/attachment.zip>
More information about the discuss
mailing list