[mpich-discuss] Fatal error in PMPI_Reduce
Michael Colonno
mcolonno at stanford.edu
Fri Jan 11 15:31:48 CST 2013
Hi All ~
I've compiled MPICH2 3.0 with the Intel compiler (v. 13) on a
CentOS 6.3 x64 system using SLURM as the process manager. My configure was
simply:
./configure --with-pmi=slurm --with-pm=no --prefix=/usr/local/apps/MPICH2
No errors during build or install. When I compile and run the example
program cxxcpi I get (truncated):
$ srun -n32 /usr/local/apps/cxxcpi
Fatal error in PMPI_Reduce: A process has failed, error stack:
PMPI_Reduce(1217)...............: MPI_Reduce(sbuf=0x7fff4ad18120,
rbuf=0x7fff4ad18128, count=1, MPI_DOUBLE, MPI_SUM, root=0, MPI_COMM_WORLD)
failed
MPIR_Reduce_impl(1029)..........:
MPIR_Reduce_intra(779)..........:
MPIR_Reduce_impl(1029)..........:
MPIR_Reduce_intra(835)..........:
MPIR_Reduce_binomial(144).......:
MPIDI_CH3U_Recvq_FDU_or_AEP(612): Communication error with rank 16
MPIR_Reduce_intra(799)..........:
MPIR_Reduce_impl(1029)..........:
MPIR_Reduce_intra(835)..........:
MPIR_Reduce_binomial(206).......: Failure during collective
srun: error: task 0: Exited with exit code 1
This error is experienced with many of my MPI programs. A
different application yields:
PMPI_Bcast(1525)......: MPI_Bcast(buf=0x7fff545be5fc, count=1, MPI_INT,
root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1369).:
MPIR_Bcast_intra(1160):
MPIR_SMP_Bcast(1077)..: Failure during collective
Can anyone point me in the right direction?
Thanks,
~Mike C.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130111/8aa6c13a/attachment.html>
More information about the discuss
mailing list