[mpich-discuss] Problems running MPICH jobs under SLURM

Markus Geimer m.geimer at fz-juelich.de
Fri Jun 7 09:02:25 CDT 2013


Hi Pavan,

The nightly snapshot fixes the issue for me. Many thanks!

@John: Did you also recompile your sample application? From what
I understand, the issue is not in hydra but in the MPI library
(please correct me if I'm wrong, Pavan).

Best regards,
Markus

On 06/07/13 08:40, Pavan Balaji wrote:
> 
> FYI, I believe this is now fixed.  Please try out the latest nightly
> snapshot and let us know if you are still running into this issue:
> 
> http://www.mpich.org/static/tarballs/nightly/master/hydra/
> http://www.mpich.org/static/tarballs/nightly/master/mpich/
> 
>  -- Pavan
> 
> On 06/06/2013 01:39 AM, Biddiscombe, John A. wrote:
>> Just FYI. I am also getting the double free error when I run under
>> slurm (mpich 3.0.4). Please don't take correspondence off list as I'm
>> following the thread.
>>
>> I can't add anything more useful than Markus has already provided with
>> his stack trace and logs.
>>
>> [I did find that if I configure --with-slurm and use srun instead of
>> mpiexec , then all works, as expected, but I need mpiexec to pass env
>> vars to processes using mpmd syntax]
>>
>> JB
>>
>> -----Original Message-----
>> From: discuss-bounces at mpich.org [mailto:discuss-bounces at mpich.org] On
>> Behalf Of Markus Geimer
>> Sent: 03 June 2013 16:36
>> To: Pavan Balaji
>> Cc: discuss at mpich.org
>> Subject: Re: [mpich-discuss] Problems running MPICH jobs under SLURM
>>
>> Pavan,
>>
>>> 1. Can you run your application processes using "ddd" or some other
>>> debugger to see where the double free is coming from?  You might have
>>> to build mpich with --enable-g=dbg to get the debug symbols in.
>>
>> Here is the full stack backtrace:
>>
>> ----- 8< ----- 8< ----- 8< ----- 8< ----- 8< ----- 8< -----
>>
>> #0  0x00007ffff6deb475 in *__GI_raise (sig=<optimized out>)
>>      at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>> #1  0x00007ffff6dee6f0 in *__GI_abort () at abort.c:92
>> #2  0x00007ffff6e2652b in __libc_message (do_abort=<optimized out>,
>>      fmt=<optimized out>) at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
>> #3  0x00007ffff6e2fd76 in malloc_printerr (action=3,
>>      str=0x7ffff6f081e0 "double free or corruption (fasttop)",
>>      ptr=<optimized out>) at malloc.c:6283
>> #4  0x00007ffff6e34aac in *__GI___libc_free (mem=<optimized out>)
>>      at malloc.c:3738
>> #5  0x00007ffff7a1d5d9 in populate_ids_from_mapping (
>>      did_map=<synthetic pointer>, num_nodes=<synthetic pointer>,
>>      mapping=<optimized out>, pg=<optimized out>)
>>      at src/mpid/ch3/src/mpid_vc.c:1063
>> #6  MPIDI_Populate_vc_node_ids (pg=pg at entry=0x604910,
>>      our_pg_rank=our_pg_rank at entry=0) at src/mpid/ch3/src/mpid_vc.c:1193
>> #7  0x00007ffff7a17dd6 in MPID_Init (argc=argc at entry=0x7fffffffd97c,
>>      argv=argv at entry=0x7fffffffd970, requested=requested at entry=0,
>>      provided=provided at entry=0x7fffffffd8e8,
>>      has_args=has_args at entry=0x7fffffffd8e0,
>>      has_env=has_env at entry=0x7fffffffd8e4) at
>> src/mpid/ch3/src/mpid_init.c:156
>> #8  0x00007ffff7acdf7f in MPIR_Init_thread
>> (argc=argc at entry=0x7fffffffd97c,
>>      argv=argv at entry=0x7fffffffd970, required=required at entry=0,
>>      provided=provided at entry=0x7fffffffd944) at
>> src/mpi/init/initthread.c:431
>> #9  0x00007ffff7acd90e in PMPI_Init (argc=0x7fffffffd97c,
>> argv=0x7fffffffd970)
>>      at src/mpi/init/init.c:136
>> #10 0x000000000040086d in main ()
>>
>> ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 -----
>>
>>> 2. Can you send me the output with the ssh launcher as well?
>>
>> See mail sent off-list.
>>
>> Thanks,
>> Markus
>>
>> -- 
>> Dr. Markus Geimer
>> Juelich Supercomputing Centre
>> Institute for Advanced Simulation
>> Forschungszentrum Juelich GmbH
>> 52425 Juelich, Germany
>>
>> Phone:  +49-2461-61-1773
>> Fax:    +49-2461-61-6656
>> E-mail: m.geimer at fz-juelich.de
>> WWW:    http://www.fz-juelich.de/jsc/
>>
>>
>>
>> ------------------------------------------------------------------------------------------------
>>
>> ------------------------------------------------------------------------------------------------
>>
>> Forschungszentrum Juelich GmbH
>> 52425 Juelich
>> Sitz der Gesellschaft: Juelich
>> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
>> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
>> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten
>> Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr.
>> Sebastian M. Schmidt
>> ------------------------------------------------------------------------------------------------
>>
>> ------------------------------------------------------------------------------------------------
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
> 


-- 
Dr. Markus Geimer
Juelich Supercomputing Centre
Institute for Advanced Simulation
Forschungszentrum Juelich GmbH
52425 Juelich, Germany

Phone:  +49-2461-61-1773
Fax:    +49-2461-61-6656
E-mail: m.geimer at fz-juelich.de
WWW:    http://www.fz-juelich.de/jsc/




More information about the discuss mailing list