[mpich-discuss] Problems running MPICH jobs under SLURM

Antonio J. Peña apenya at mcs.anl.gov
Thu Jun 6 01:44:22 CDT 2013


Thanks for your inputs JB. Everything related to this issue will be discussed 
through this thread. You can also check the corresponding ticket:

http://trac.mpich.org/projects/mpich/ticket/1871


  Antonio


On Thursday, June 06, 2013 06:39:20 AM Biddiscombe, John A. wrote:
> Just FYI. I am also getting the double free error when I run under slurm
> (mpich 3.0.4). Please don't take correspondence off list as I'm following
> the thread.
> 
> I can't add anything more useful than Markus has already provided with his
> stack trace and logs.
> 
> [I did find that if I configure --with-slurm and use srun instead of mpiexec
> , then all works, as expected, but I need mpiexec to pass env vars to
> processes using mpmd syntax]
> 
> JB
> 
> -----Original Message-----
> From: discuss-bounces at mpich.org [mailto:discuss-bounces at mpich.org] On Behalf
> Of Markus Geimer Sent: 03 June 2013 16:36
> To: Pavan Balaji
> Cc: discuss at mpich.org
> Subject: Re: [mpich-discuss] Problems running MPICH jobs under SLURM
> 
> Pavan,
> 
> > 1. Can you run your application processes using "ddd" or some other
> > debugger to see where the double free is coming from?  You might have
> > to build mpich with --enable-g=dbg to get the debug symbols in.
> 
> Here is the full stack backtrace:
> 
> ----- 8< ----- 8< ----- 8< ----- 8< ----- 8< ----- 8< -----
> 
> #0  0x00007ffff6deb475 in *__GI_raise (sig=<optimized out>)
>     at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1  0x00007ffff6dee6f0 in *__GI_abort () at abort.c:92
> #2  0x00007ffff6e2652b in __libc_message (do_abort=<optimized out>,
>     fmt=<optimized out>) at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
> #3  0x00007ffff6e2fd76 in malloc_printerr (action=3,
>     str=0x7ffff6f081e0 "double free or corruption (fasttop)",
>     ptr=<optimized out>) at malloc.c:6283
> #4  0x00007ffff6e34aac in *__GI___libc_free (mem=<optimized out>)
>     at malloc.c:3738
> #5  0x00007ffff7a1d5d9 in populate_ids_from_mapping (
>     did_map=<synthetic pointer>, num_nodes=<synthetic pointer>,
>     mapping=<optimized out>, pg=<optimized out>)
>     at src/mpid/ch3/src/mpid_vc.c:1063
> #6  MPIDI_Populate_vc_node_ids (pg=pg at entry=0x604910,
>     our_pg_rank=our_pg_rank at entry=0) at src/mpid/ch3/src/mpid_vc.c:1193
> #7  0x00007ffff7a17dd6 in MPID_Init (argc=argc at entry=0x7fffffffd97c,
>     argv=argv at entry=0x7fffffffd970, requested=requested at entry=0,
>     provided=provided at entry=0x7fffffffd8e8,
>     has_args=has_args at entry=0x7fffffffd8e0,
>     has_env=has_env at entry=0x7fffffffd8e4) at
> src/mpid/ch3/src/mpid_init.c:156
> #8  0x00007ffff7acdf7f in MPIR_Init_thread (argc=argc at entry=0x7fffffffd97c,
>     argv=argv at entry=0x7fffffffd970, required=required at entry=0,
>     provided=provided at entry=0x7fffffffd944) at src/mpi/init/initthread.c:431
> #9  0x00007ffff7acd90e in PMPI_Init (argc=0x7fffffffd97c,
> argv=0x7fffffffd970)
>     at src/mpi/init/init.c:136
> #10 0x000000000040086d in main ()
> 
> ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 -----
> 
> > 2. Can you send me the output with the ssh launcher as well?
> 
> See mail sent off-list.
> 
> Thanks,
> Markus
> 
> --
> Dr. Markus Geimer
> Juelich Supercomputing Centre
> Institute for Advanced Simulation
> Forschungszentrum Juelich GmbH
> 52425 Juelich, Germany
> 
> Phone:  +49-2461-61-1773
> Fax:    +49-2461-61-6656
> E-mail: m.geimer at fz-juelich.de
> WWW:    http://www.fz-juelich.de/jsc/
> 
> 
> 
> ----------------------------------------------------------------------------
> --------------------
> ---------------------------------------------------------------------------
> --------------------- Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke
> (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M.
> Schmidt
> ---------------------------------------------------------------------------
> ---------------------
> ---------------------------------------------------------------------------
> --------------------- _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss




More information about the discuss mailing list