[mpich-discuss] Fatal error in PMPI_Reduce

Michael Colonno mcolonno at stanford.edu
Fri Jan 11 15:31:48 CST 2013


            Hi All ~

 

            I've compiled MPICH2 3.0 with the Intel compiler (v. 13) on a
CentOS 6.3 x64 system using SLURM as the process manager. My configure was
simply: 

 

./configure --with-pmi=slurm --with-pm=no --prefix=/usr/local/apps/MPICH2

 

No errors during build or install. When I compile and run the example
program cxxcpi I get (truncated): 

 

$ srun -n32 /usr/local/apps/cxxcpi

Fatal error in PMPI_Reduce: A process has failed, error stack:

PMPI_Reduce(1217)...............: MPI_Reduce(sbuf=0x7fff4ad18120,
rbuf=0x7fff4ad18128, count=1, MPI_DOUBLE, MPI_SUM, root=0, MPI_COMM_WORLD)
failed

MPIR_Reduce_impl(1029)..........:

MPIR_Reduce_intra(779)..........:

MPIR_Reduce_impl(1029)..........:

MPIR_Reduce_intra(835)..........:

MPIR_Reduce_binomial(144).......:

MPIDI_CH3U_Recvq_FDU_or_AEP(612): Communication error with rank 16

MPIR_Reduce_intra(799)..........:

MPIR_Reduce_impl(1029)..........:

MPIR_Reduce_intra(835)..........:

MPIR_Reduce_binomial(206).......: Failure during collective

srun: error: task 0: Exited with exit code 1

 

            This error is experienced with many of my MPI programs. A
different application yields: 

 

PMPI_Bcast(1525)......: MPI_Bcast(buf=0x7fff545be5fc, count=1, MPI_INT,
root=0, MPI_COMM_WORLD) failed

MPIR_Bcast_impl(1369).:

MPIR_Bcast_intra(1160):

MPIR_SMP_Bcast(1077)..: Failure during collective 

 

            Can anyone point me in the right direction? 

 

            Thanks,

            ~Mike C.  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130111/8aa6c13a/attachment.html>


More information about the discuss mailing list