[mpich-discuss] Problems running MPICH jobs under SLURM

Pavan Balaji balaji at mcs.anl.gov
Sat Jun 8 15:23:38 CDT 2013


Thanks, John.  I'll look into it and get back to you if I need any more 
information.

Btw, you should not need sudo at all.  You might have some previously 
left over files with root permissions that might have caused the issue. 
  If you delete the entire directory and start from scratch, this issue 
should not be there.

  -- Pavan

On 06/08/2013 03:02 PM, Biddiscombe, John A. wrote:
> Following your instructions (I only have 1 node, so changed N2 n4 to N 1 n2),  same error, listed below...
>
> [NB . Only one interesting thing is that I cannot do a make as user biddisco and have to sudo make as it gives me some permission error otherwise
> mkdir -p '/home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/etc'
>   /usr/bin/install -c -m 644 src/env/mpicc.conf src/env/mpif77.conf src/env/mpif90.conf src/env/mpicxx.conf '/home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/etc'
> /usr/bin/install: cannot remove `/home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/etc/mpicc.conf': Permission denied
> /usr/bin/install: cannot remove `/home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/etc/mpif77.conf': Permission denied
> /usr/bin/install: cannot remove `/home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/etc/mpif90.conf': Permission denied
> /usr/bin/install: cannot remove `/home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/etc/mpicxx.conf': Permission denied
> make[3]: *** [install-sysconfDATA] Error 1
> make[3]: Leaving directory `/home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79'
> make[2]: *** [install-am] Error 2
> make[2]: Leaving directory `/home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79'
> make[1]: *** [install-recursive] Error 1
> make[1]: Leaving directory `/home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79'
> make: *** [install] Error 2
> ]
>
> JB
>
> biddisco at breno2 ~/build/mpich-master-v3.0.4-259-gf322ce79 $ ./install/bin/mpiexec -n 2 ./hello
> *** glibc detected *** ./hello: double free or corruption (fasttop): 0x00000000017b2340 ***
> ======= Backtrace: =========
> /lib/x86_64-linux-gnu/libc.so.6(+0x7eb96)[0x7fcaeb4a2b96]
> *** glibc detected *** ./hello: double free or corruption (fasttop): 0x00000000011e7340 ***
> ======= Backtrace: =========
> /lib/x86_64-linux-gnu/libc.so.6(+0x7eb96)[0x7fd1a77ddb96]
> /home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/lib/libmpich.so.10(MPIDI_Populate_vc_node_ids+0x3f9)[0x7fd1a7bc75a9]
> /home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/lib/libmpich.so.10(MPID_Init+0x136)[0x7fd1a7bc1da6]
> /home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/lib/libmpich.so.10(MPIR_Init_thread+0x22f)[0x7fd1a7c78f1f]
> /home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/lib/libmpich.so.10(MPI_Init+0xae)[0x7fd1a7c788be]
> ./hello[0x40081e]
> /home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/lib/libmpich.so.10(MPIDI_Populate_vc_node_ids+0x3f9)[0x7fcaeb88c5a9]
> /home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/lib/libmpich.so.10(MPID_Init+0x136)[0x7fcaeb886da6]
> /home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/lib/libmpich.so.10(MPIR_Init_thread+0x22f)[0x7fcaeb93df1f]
> /home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/lib/libmpich.so.10(MPI_Init+0xae)[0x7fcaeb93d8be]
> ./hello[0x40081e]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fcaeb44576d]
> ./hello[0x400719]
> ======= Memory map: ========
> 00400000-00401000 r-xp 00000000 08:01 10625669                           /home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/hello
> 00600000-00601000 r--p 00000000 08:01 10625669                           /home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/hello
> 00601000-00602000 rw-p 00001000 08:01 10625669                           /home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/hello
> 017b1000-017d2000 rw-p 00000000 00:00 0                                  [heap]
> 7fcaeabe1000-7fcaeabe4000 rw-p 00000000 00:00 0
> 7fcaeabe4000-7fcaeabf9000 r-xp 00000000 08:01 9047556                    /lib/x86_64-linux-gnu/libgcc_s.so.1
> 7fcaeabf9000-7fcaeadf8000 ---p 00015000 08:01 9047556                    /lib/x86_64-linux-gnu/libgcc_s.so.1
> 7fcaeadf8000-7fcaeadf9000 r--p 00014000 08:01 9047556                    /lib/x86_64-linux-gnu/libgcc_s.so.1
> 7fcaeadf9000-7fcaeadfa000 rw-p 00015000 08:01 9047556                    /lib/x86_64-linux-gnu/libgcc_s.so.1
> 7fcaeadfa000-7fcaeae12000 r-xp 00000000 08:01 9050338                    /lib/x86_64-linux-gnu/libpthread-2.15.so
> 7fcaeae12000-7fcaeb011000 ---p 00018000 08:01 9050338                    /lib/x86_64-linux-gnu/libpthread-2.15.so
> 7fcaeb011000-7fcaeb012000 r--p 00017000 08:01 9050338                    /lib/x86_64-linux-gnu/libpthread-2.15.so
> 7fcaeb012000-7fcaeb013000 rw-p 00018000 08:01 9050338                    /lib/x86_64-linux-gnu/libpthread-2.15.so
> 7fcaeb013000-7fcaeb017000 rw-p 00000000 00:00 0
> 7fcaeb017000-7fcaeb01e000 r-xp 00000000 08:01 9050343                    /lib/x86_64-linux-gnu/librt-2.15.so
> 7fcaeb01e000-7fcaeb21d000 ---p 00007000 08:01 9050343                    /lib/x86_64-linux-gnu/librt-2.15.so
> 7fcaeb21d000-7fcaeb21e000 r--p 00006000 08:01 9050343                    /lib/x86_64-linux-gnu/librt-2.15.so
> 7fcaeb21e000-7fcaeb21f000 rw-p 00007000 08:01 9050343                    /lib/x86_64-linux-gnu/librt-2.15.so
> 7fcaeb21f000-7fcaeb223000 r-xp 00000000 08:01 8661134                    /home/biddisco/apps/mpich-3.0.4/lib/libmpl.so.1.0.0
> 7fcaeb223000-7fcaeb422000 ---p 00004000 08:01 8661134                    /home/biddisco/apps/mpich-3.0.4/lib/libmpl.so.1.0.0
> 7fcaeb422000-7fcaeb423000 r--p 00003000 08:01 8661134                    /home/biddisco/apps/mpich-3.0.4/lib/libmpl.so.1.0.0
> 7fcaeb423000-7fcaeb424000 rw-p 00004000 08:01 8661134                    /home/biddisco/apps/mpich-3.0.4/lib/libmpl.so.1.0.0
> 7fcaeb424000-7fcaeb5d9000 r-xp 00000000 08:01 9050358                    /lib/x86_64-linux-gnu/libc-2.15.so
> 7fcaeb5d9000-7fcaeb7d8000 ---p 001b5000 08:01 9050358                    /lib/x86_64-linux-gnu/libc-2.15.so
> 7fcaeb7d8000-7fcaeb7dc000 r--p 001b4000 08:01 9050358                    /lib/x86_64-linux-gnu/libc-2.15.so
> 7fcaeb7dc000-7fcaeb7de000 rw-p 001b8000 08:01 9050358                    /lib/x86_64-linux-gnu/libc-2.15.so
> 7fcaeb7de000-7fcaeb7e3000 rw-p 00000000 00:00 0
> 7fcaeb7e3000-7fcaeba03000 r-xp 00000000 08:01 11675463                   /home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/lib/libmpich.so.10.0.4
> 7fcaeba03000-7fcaebc03000 ---p 00220000 08:01 11675463                   /home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/lib/libmpich.so.10.0.4
> 7fcaebc03000-7fcaebc10000 r--p 00220000 08:01 11675463                   /home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/lib/libmpich.so.10.0.4
> 7fcaebc10000-7fcaebc16000 rw-p 0022d000 08:01 11675463                   /home/biddisco/build/mpich-master-v3.0.4-259-gf322ce79/install/lib/libmpich.so.10.0.4
> 7fcaebc16000-7fcaebc4e000 rw-p 00000000 00:00 0
> 7fcaebc4e000-7fcaebc70000 r-xp 00000000 08:01 9050344                    /lib/x86_64-linux-gnu/ld-2.15.so
> 7fcaebe54000-7fcaebe56000 rw-p 00000000 00:00 0
> 7fcaebe6d000-7fcaebe70000 rw-p 00000000 00:00 0
> 7fcaebe70000-7fcaebe71000 r--p 00022000 08:01 9050344                    /lib/x86_64-linux-gnu/ld-2.15.so
> 7fcaebe71000-7fcaebe73000 rw-p 00023000 08:01 9050344                    /lib/x86_64-linux-gnu/ld-2.15.so
> 7fff671e5000-7fff67206000 rw-p 00000000 00:00 0                          [stack]
> 7fff673b4000-7fff673b5000 r-xp 00000000 00:00 0                          [vdso]
> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 6
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji



More information about the discuss mailing list