[mpich-discuss] hydra crashes with high number of processes

Antonio J. Peña apenya at mcs.anl.gov
Wed Jul 24 10:30:21 CDT 2013


Hi Thomas,

Thanks for reporting. Could you please give a try with the current 
development version and let us know if you still experience the segfault?

git clone git://git.mpich.org/mpich.git[1]

OR alternatively

http://git.mpich.org/mpich.git[2]


Thanks,
  Antonio


On Wednesday, July 24, 2013 03:01:30 PM Thomas Ropars wrote:
> Hi,
> 
> I'm working with mpich 3.0.4 and I get a segfault in Hydra when I try to
> run an application on a large number of processes (8192).
> 
> I simply run the following command:
> mpirun -f ~/machine_list -n 8192 my_exec_file
> 
> I tried to run mpirun in valgrind to identify the problem and here is
> the output:
> Invalid read of size 1
> ==44266==    at 0x4A077F2: __GI_strlen (mc_replace_strmem.c:284)
> ==44266==    by 0x3D774802B5: strdup (in /lib64/libc-2.12.so)
> ==44266==    by 0x40EBA5: HYD_pmcd_pmi_fill_in_exec_launch_info
> (pmiserv_utils.\
> c:375)
> ==44266==    by 0x40A5C2: HYD_pmci_launch_procs 
(pmiserv_pmci.c:121)
> ==44266==    by 0x403A1E: main (mpiexec.c:326)
> ==44266==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> 
> If I try to run on a smaller number of processes (eg 512), everything
> works fine.
> 
> Any suggestion to solve the problem?
> 
> Thomas
> 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
-- 
Antonio J. Peña
Postdoctoral Appointee
Mathematics and Computer Science Division
Argonne National Laboratory
9700 South Cass Avenue, Bldg. 240, Of. 3148
Argonne, IL 60439-4847
(+1) 630-252-7928
apenya at mcs.anl.gov
www.mcs.anl.gov/~apenya

--------
[1] git://git.mpich.org/mpich.git
[2] http://git.mpich.org/mpich.git
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130724/35c9835b/attachment.html>


More information about the discuss mailing list