[mpich-discuss] Fatal error in PMPI_Reduce

Pavan Balaji balaji at mcs.anl.gov
Fri Jan 11 16:19:23 CST 2013


Michael,

Did you try just using mpiexec?

mpiexec -n 32 /usr/local/apps/cxxcpi

 -- Pavan

On 01/11/2013 03:31 PM US Central Time, Michael Colonno wrote:
>             Hi All ~
> 
>  
> 
>             I've compiled MPICH2 3.0 with the Intel compiler (v. 13) on
> a CentOS 6.3 x64 system using SLURM as the process manager. My configure
> was simply:
> 
>  
> 
> ./configure --with-pmi=slurm --with-pm=no --prefix=/usr/local/apps/MPICH2
> 
>  
> 
> No errors during build or install. When I compile and run the example
> program cxxcpi I get (truncated):
> 
>  
> 
> $ srun -n32 /usr/local/apps/cxxcpi
> 
> Fatal error in PMPI_Reduce: A process has failed, error stack:
> 
> PMPI_Reduce(1217)...............: MPI_Reduce(sbuf=0x7fff4ad18120,
> rbuf=0x7fff4ad18128, count=1, MPI_DOUBLE, MPI_SUM, root=0,
> MPI_COMM_WORLD) failed
> 
> MPIR_Reduce_impl(1029)..........:
> 
> MPIR_Reduce_intra(779)..........:
> 
> MPIR_Reduce_impl(1029)..........:
> 
> MPIR_Reduce_intra(835)..........:
> 
> MPIR_Reduce_binomial(144).......:
> 
> MPIDI_CH3U_Recvq_FDU_or_AEP(612): Communication error with rank 16
> 
> MPIR_Reduce_intra(799)..........:
> 
> MPIR_Reduce_impl(1029)..........:
> 
> MPIR_Reduce_intra(835)..........:
> 
> MPIR_Reduce_binomial(206).......: Failure during collective
> 
> srun: error: task 0: Exited with exit code 1
> 
>  
> 
>             This error is experienced with many of my MPI programs. A
> different application yields:
> 
>  
> 
> PMPI_Bcast(1525)......: MPI_Bcast(buf=0x7fff545be5fc, count=1, MPI_INT,
> root=0, MPI_COMM_WORLD) failed
> 
> MPIR_Bcast_impl(1369).:
> 
> MPIR_Bcast_intra(1160):
> 
> MPIR_SMP_Bcast(1077)..: Failure during collective
> 
>  
> 
>             Can anyone point me in the right direction?
> 
>  
> 
>             Thanks,
> 
>             ~Mike C.  
> 
> 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji



More information about the discuss mailing list