[mpich-discuss] Problems running MPICH jobs under SLURM

Jeff Hammond jeff.science at gmail.com
Sat Jun 1 10:15:40 CDT 2013


Have you tried MPICH 3.0.4? Hydra has been improved a great deal since
the 2.4 release.

Jeff

Sent from my iPhone

On Jun 1, 2013, at 9:21 AM, Markus Geimer <m.geimer at fz-juelich.de> wrote:

> Dear MPICH developers,
>
> We are experiencing some problems getting MPICH jobs to run under
> SLURM (Debian package slurm-llnl 2.3.4-2+b1) on our small test
> cluster. When starting an MPI job with more than one rank, the
> program crashes immediately with the following output:
>
> ----- 8< ----- 8< ----- 8< ----- 8< ----- 8< ----- 8< -----
>
> *** glibc detected *** ./hello: double free or corruption (fasttop):
> 0x00000000014c9680 ***
> ======= Backtrace: =========
> /lib/x86_64-linux-gnu/libc.so.6(+0x76d76)[0x7f4ee51fed76]
> /lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c)[0x7f4ee5203aac]
> /opt/mpich/3.0.4-gcc/lib/libmpich.so.10(MPIDI_Populate_vc_node_ids+0x3f9)[0x7f4ee5dec5d9]
> /opt/mpich/3.0.4-gcc/lib/libmpich.so.10(MPID_Init+0x136)[0x7f4ee5de6dd6]
> /opt/mpich/3.0.4-gcc/lib/libmpich.so.10(MPIR_Init_thread+0x23f)[0x7f4ee5e9cf7f]
> /opt/mpich/3.0.4-gcc/lib/libmpich.so.10(MPI_Init+0xae)[0x7f4ee5e9c90e]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7fa889cddead]
> ./hello[0x400799]
>
> ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 -----
>
> Single rank jobs run fine, but they are obviously of little interest ;-)
> The test program is a simple 'hello world' printing the rank, started
> using a minimal batch script:
>
> ----- 8< ----- 8< ----- 8< ----- 8< ----- 8< ----- 8< -----
>
> #!/bin/sh
> #SBATCH -n 4
> mpiexec ./hello
>
> ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 -----
>
> We have tried with MPICH 3.0.4 as well as MPICH2 1.5, configured using
> only
>
>        --prefix=... --enable-shared --enable-debuginfo
>
> Both are showing the same symptoms. MPICH2 1.4.1p1, however, works
> without problems. Any idea what's going wrong in the newer versions?
>
> Thanks,
> Markus
>
> --
> Dr. Markus Geimer
> Juelich Supercomputing Centre
> Institute for Advanced Simulation
> Forschungszentrum Juelich GmbH
> 52425 Juelich, Germany
>
> Phone:  +49-2461-61-1773
> Fax:    +49-2461-61-6656
> E-mail: m.geimer at fz-juelich.de
> WWW:    http://www.fz-juelich.de/jsc/
>
>
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list