[mpich-discuss] Mpich failing electric-fence check

Balaji, Pavan balaji at anl.gov
Fri Mar 7 09:10:08 CST 2014


Thanks, we’ll look into it.  We try to avoid malloc(0) because it’s not portable, but it sometimes slips through.

https://trac.mpich.org/projects/mpich/ticket/2054

We also have our internal memory tracing integrated into MPICH, that is tested regularly.  We should probably also update it to check for zero-byte allocation errors.

  — Pavan

On Mar 7, 2014, at 2:45 AM, Matthieu Dorier <matthieu.dorier at irisa.fr> wrote:

> Hi,
> 
> I wanted to debug a memory corruption in an MPI program using the electric-fence tool, and noticed that electric-fence detects an error already in MPI_Init (thus the program stops and I cannot debug the actual memory corruption that happens later). The following program is a minimal one to exemplify the error:
> 
> #include <mpi.h>
> int main(int argc, char** argv) {
>   MPI_Init(&argc,&argv);
>   MPI_Finalize();
>   return 0;
> }
> 
> The error output by electric-fence:
> 
> ElectricFence Aborting: Allocating 0 bytes, probably a bug.
> 
> And the backtrace output by gdb:
> 
> Program received signal SIGILL, Illegal instruction.
> 0x0012d422 in __kernel_vsyscall ()
> (gdb) backtrace
> #0 0x0012d422 in __kernel_vsyscall ()
> #1 0x0040c976 in kill () at ../sysdeps/unix/syscall-template.S:82
> #2 0x0012fc54 in EF_Abort () from /usr/lib/libefence.so.0
> #3 0x0012f71b in memalign () from /usr/lib/libefence.so.0
> #4 0x0012f88b in malloc () from /usr/lib/libefence.so.0
> #5 0x001e3b6b in MPID_nem_init () from /home/mdorier/deploy/lib/libmpich.so.10
> #6 0x001d2f4c in MPIDI_CH3_Init () from /home/mdorier/deploy/lib/libmpich.so.10
> #7 0x001c8c57 in MPID_Init () from /home/mdorier/deploy/lib/libmpich.so.10
> #8 0x0029d435 in MPIR_Init_thread () from /home/mdorier/deploy/lib/libmpich.so.10
> #9 0x0029cd33 in PMPI_Init () from /home/mdorier/deploy/lib/libmpich.so.10
> #10 0x0804859f in main (argc=1, argv=0xbffff994) at m.c:4
> 
> The version of mpich is 3.0.4, gcc 4.6.4, on Ubuntu 10.4, linux kernel 2.6.32.
> 
> I suspect a call to malloc with 0 as parameter, whose output is properly checked by Mpich, but makes electric-fence think there is an error.
> 
> Matthieu Dorier
> PhD student at ENS Rennes
> http://people.irisa.fr/Matthieu.Dorier
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss




More information about the discuss mailing list