[mpich-discuss] 3.2 build help

Balaji, Pavan balaji at anl.gov
Mon Mar 28 12:23:34 CDT 2016


Michael,

MPICH IB support is through MXM, so it won't work directly with native verbs.  So your configure option below is essentially falling back to "tcp".

With respect to the error, what's "mpi_script_launcher.run"?  Can you try using "mpiexec" and see if the error still shows up?

  -- Pavan

> On Mar 28, 2016, at 11:33 AM, Galloway, Michael D. <gallowaymd at ornl.gov> wrote:
> 
> Good Day All,
> 
> We’re trying to get a build of 3.2 for our centos7 hpc environment using IB. we don’t have mxm installed so I’m trying this:
> 
> ./configure --prefix=/software/tools/apps/mpich/gnu/3.2 --with-pm=hydra -with-device=ch3:nemesis --with-ibverbs=/usr  --with-pbs=/opt/torque
> 
> But we end up with backtraces of the form:
> 
>  mpi_script_launcher.run:17440 terminated with signal 7 at PC=7f8caca067f4 SP=7fff7c4ffb20. Backtrace:
>  mpi_script_launcher.run:17441 terminated with signal 7 at PC=7fece77297f4 SP=7ffd3b5ff100.  Backtrace:
> /software/tools/apps/mpich/gnu/3.2/lib/libmpi.so.12(MPID_nem_init+0x964)[0x7f8caca067f4]
> /software/tools/apps/mpich/gnu/3.2/lib/libmpi.so.12(MPID_nem_init+0x964)[0x7fece77297f4]
> /software/tools/apps/mpich/gnu/3.2/lib/libmpi.so.12(MPIDI_CH3_Init+0x29)[0x7f8cac9f7609]
> /software/tools/apps/mpich/gnu/3.2/lib/libmpi.so.12(MPIDI_CH3_Init+0x29)[0x7fece771a609]
> /software/tools/apps/mpich/gnu/3.2/lib/libmpi.so.12(MPID_Init+0x18b)[0x7f8cac9ececb]
> /software/tools/apps/mpich/gnu/3.2/lib/libmpi.so.12(MPID_Init+0x18b)[0x7fece770fecb]
> /software/tools/apps/mpich/gnu/3.2/lib/libmpi.so.12(MPIR_Init_thread+0x34c)[0x7f8cac95345c]
> /software/tools/apps/mpich/gnu/3.2/lib/libmpi.so.12(MPIR_Init_thread+0x34c)[0x7fece767645c]
> /software/tools/apps/mpich/gnu/3.2/lib/libmpi.so.12(MPI_Init+0x7e)[0x7f8cac952ede]
> /home/m8a/mpi_script_launcher/mpi_script_launcher.run[0x4008d0]
> /software/tools/apps/mpich/gnu/3.2/lib/libmpi.so.12(MPI_Init+0x7e)[0x7fece7675ede]
> /home/m8a/mpi_script_launcher/mpi_script_launcher.run[0x4008d0]
> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f8cac4fcb15]
> /home/m8a/mpi_script_launcher/mpi_script_launcher.run[0x4007c9]
> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fece721fb15]
> /home/m8a/mpi_script_launcher/mpi_script_launcher.run[0x4007c9]
> 
> 
> Similarly with mvapich we get failures of this form:
> 
> mpirun -n 4 /home/m8a/mpi_script_launcher/MVAPICH/mpi_script_launcher.run  /home/m8a/mpi_script_launcher/mpi_bash_script_example.sh  
> [cli_0]: aborting job:
> Fatal error in MPI_Init:
> Other MPI error, error stack:
> MPIR_Init_thread(514)..........: 
> MPID_Init(365).................: channel initialization failed
> MPIDI_CH3_Init(495)............: 
> MPIDI_CH3I_SHMEM_Helper_fn(921): write: Success
>  ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 17901 RUNNING AT mod-condo-login02.ornl.gov
> =   EXIT CODE: 1
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> 
> 
> I suspect we are doing something silly here but I’m not sure what, and  openmpi code on the same cluster runs fine. 
> 
> Is there a current recommendation for IB/pbs/torque build flags?
> 
> — michael
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list