[mpich-discuss] Mpich failing electric-fence check
Balaji, Pavan
balaji at anl.gov
Fri Mar 7 09:10:08 CST 2014
Thanks, we’ll look into it. We try to avoid malloc(0) because it’s not portable, but it sometimes slips through.
https://trac.mpich.org/projects/mpich/ticket/2054
We also have our internal memory tracing integrated into MPICH, that is tested regularly. We should probably also update it to check for zero-byte allocation errors.
— Pavan
On Mar 7, 2014, at 2:45 AM, Matthieu Dorier <matthieu.dorier at irisa.fr> wrote:
> Hi,
>
> I wanted to debug a memory corruption in an MPI program using the electric-fence tool, and noticed that electric-fence detects an error already in MPI_Init (thus the program stops and I cannot debug the actual memory corruption that happens later). The following program is a minimal one to exemplify the error:
>
> #include <mpi.h>
> int main(int argc, char** argv) {
> MPI_Init(&argc,&argv);
> MPI_Finalize();
> return 0;
> }
>
> The error output by electric-fence:
>
> ElectricFence Aborting: Allocating 0 bytes, probably a bug.
>
> And the backtrace output by gdb:
>
> Program received signal SIGILL, Illegal instruction.
> 0x0012d422 in __kernel_vsyscall ()
> (gdb) backtrace
> #0 0x0012d422 in __kernel_vsyscall ()
> #1 0x0040c976 in kill () at ../sysdeps/unix/syscall-template.S:82
> #2 0x0012fc54 in EF_Abort () from /usr/lib/libefence.so.0
> #3 0x0012f71b in memalign () from /usr/lib/libefence.so.0
> #4 0x0012f88b in malloc () from /usr/lib/libefence.so.0
> #5 0x001e3b6b in MPID_nem_init () from /home/mdorier/deploy/lib/libmpich.so.10
> #6 0x001d2f4c in MPIDI_CH3_Init () from /home/mdorier/deploy/lib/libmpich.so.10
> #7 0x001c8c57 in MPID_Init () from /home/mdorier/deploy/lib/libmpich.so.10
> #8 0x0029d435 in MPIR_Init_thread () from /home/mdorier/deploy/lib/libmpich.so.10
> #9 0x0029cd33 in PMPI_Init () from /home/mdorier/deploy/lib/libmpich.so.10
> #10 0x0804859f in main (argc=1, argv=0xbffff994) at m.c:4
>
> The version of mpich is 3.0.4, gcc 4.6.4, on Ubuntu 10.4, linux kernel 2.6.32.
>
> I suspect a call to malloc with 0 as parameter, whose output is properly checked by Mpich, but makes electric-fence think there is an error.
>
> Matthieu Dorier
> PhD student at ENS Rennes
> http://people.irisa.fr/Matthieu.Dorier
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list