[mpich-discuss] hydra crashes with high number of processes
Thomas Ropars
thomas.ropars at epfl.ch
Wed Jul 24 08:01:30 CDT 2013
Hi,
I'm working with mpich 3.0.4 and I get a segfault in Hydra when I try to
run an application on a large number of processes (8192).
I simply run the following command:
mpirun -f ~/machine_list -n 8192 my_exec_file
I tried to run mpirun in valgrind to identify the problem and here is
the output:
Invalid read of size 1
==44266== at 0x4A077F2: __GI_strlen (mc_replace_strmem.c:284)
==44266== by 0x3D774802B5: strdup (in /lib64/libc-2.12.so)
==44266== by 0x40EBA5: HYD_pmcd_pmi_fill_in_exec_launch_info
(pmiserv_utils.\
c:375)
==44266== by 0x40A5C2: HYD_pmci_launch_procs (pmiserv_pmci.c:121)
==44266== by 0x403A1E: main (mpiexec.c:326)
==44266== Address 0x0 is not stack'd, malloc'd or (recently) free'd
If I try to run on a smaller number of processes (eg 512), everything
works fine.
Any suggestion to solve the problem?
Thomas
More information about the discuss
mailing list