[mpich-discuss] Problems running MPICH jobs under SLURM

Markus Geimer m.geimer at fz-juelich.de
Sat Jun 1 09:21:35 CDT 2013


Dear MPICH developers,

We are experiencing some problems getting MPICH jobs to run under
SLURM (Debian package slurm-llnl 2.3.4-2+b1) on our small test
cluster. When starting an MPI job with more than one rank, the
program crashes immediately with the following output:

----- 8< ----- 8< ----- 8< ----- 8< ----- 8< ----- 8< -----

*** glibc detected *** ./hello: double free or corruption (fasttop):
0x00000000014c9680 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x76d76)[0x7f4ee51fed76]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c)[0x7f4ee5203aac]
/opt/mpich/3.0.4-gcc/lib/libmpich.so.10(MPIDI_Populate_vc_node_ids+0x3f9)[0x7f4ee5dec5d9]
/opt/mpich/3.0.4-gcc/lib/libmpich.so.10(MPID_Init+0x136)[0x7f4ee5de6dd6]
/opt/mpich/3.0.4-gcc/lib/libmpich.so.10(MPIR_Init_thread+0x23f)[0x7f4ee5e9cf7f]
/opt/mpich/3.0.4-gcc/lib/libmpich.so.10(MPI_Init+0xae)[0x7f4ee5e9c90e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7fa889cddead]
./hello[0x400799]

----- >8 ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 -----

Single rank jobs run fine, but they are obviously of little interest ;-)
The test program is a simple 'hello world' printing the rank, started
using a minimal batch script:

----- 8< ----- 8< ----- 8< ----- 8< ----- 8< ----- 8< -----

#!/bin/sh
#SBATCH -n 4
mpiexec ./hello

----- >8 ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 -----

We have tried with MPICH 3.0.4 as well as MPICH2 1.5, configured using
only

        --prefix=... --enable-shared --enable-debuginfo

Both are showing the same symptoms. MPICH2 1.4.1p1, however, works
without problems. Any idea what's going wrong in the newer versions?

Thanks,
Markus

--
Dr. Markus Geimer
Juelich Supercomputing Centre
Institute for Advanced Simulation
Forschungszentrum Juelich GmbH
52425 Juelich, Germany

Phone:  +49-2461-61-1773
Fax:    +49-2461-61-6656
E-mail: m.geimer at fz-juelich.de
WWW:    http://www.fz-juelich.de/jsc/


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------



More information about the discuss mailing list