[mpich-discuss] MPI_REDUCE with MPI_IN_PLACE does not always work

Michael.Rachner at dlr.de Michael.Rachner at dlr.de
Fri Sep 13 03:31:35 CDT 2013


Dear MPI community,

I found a problem when calling  MPI_REDUCE with the keyword  MPI_IN_PLACE   in my Ftn95-code (using the INTEL-12  Ftn95-compiler).
Depending on the MPI-implementation and the operating system the problem occurs or not.

This is my experience so far:
     MPICH2 v.1.4.1p1                        on Win7(64-bit) PC   :  It works
    Microsoft-MPI  (v. of 12/11/2012)   on WIN7(64-bit) PC  :   It fails  (either assuming the contribution of the root to be zero (i.e. silently a wrong result!),
                                                                                                                              or access violation or floating exception on the root in MPI_REDUCE )
    OPENMPI v.1.6.2-2 on WIN7(64-bit) PC :  It fails  ( in the same manner as with MS-MPI)
    OPENMPI v.1.4.3  and 1.6.3       on 2 LINUX-Clusters :  It works
    INTEL-MPI  v. 4.0.3  and 4.1.0   on 2 LINUX-Clusters :  It works

My question is:  Is the failing possibly caused by an erroneous ('dangerous') Ftn-coding  causing some MPI implementations to fail and others not?
                              Or is the problem actually caused by a bug in different MPI implementions?

This is my Ftn95-coding:

!
      subroutine mpiw_reduce_sumfast_real8( rbuffarr, nelem )
!
!===============================================================================
!
!     sbr mpiw_reduce_sumfast_real8  is a wrapper for the MPI-routine  MPI_REDUCE
!     applied for summing element-wise a real(REAL8) 1d-array  rbuffarr(nelem)
!     from all processes of communicator  commSPRAY
!     and store the sums on master in the same array  rbuffarr(nelem) ,
!     i.e. on the master we overwrite the original contribution of the master by:
!
!       for i=1..nelem:   rbuffarr(i) = SUM_over_iproc (rbuffarr(i) )  , with iproc=1,numprocs
!
!
!     mpiw_reduce_sumfast_real8 calls    :  MPI_REDUCE
!
!                                                       last update: 03.09.2013
!===============================================================================
!
     use MPIHEADER   , only:  MPI_SUM, MPI_IN_PLACE
      use NUMBER_MODEL, only:  INT4,REAL8
      use MPARAL      , only:  lmaster, commSPRAY, ierr_mpi, mpiusertype_REAL8
!
      implicit none
!
      integer (INT4) , intent(IN)                      ::  nelem
      real    (REAL8), intent(INOUT), dimension(nelem) ::  rbuffarr  ! input on master&slaves, result only on master
!
      real    (REAL8) ::  rdummyarr(1)

      if(lmaster) then
        call MPI_REDUCE( MPI_IN_PLACE, rbuffarr, nelem, mpiusertype_REAL8, MPI_SUM &
                        ,0_INT4, commSPRAY, ierr_mpi )
      else  ! slaves
        call MPI_REDUCE( rbuffarr, rdummyarr, nelem, mpiusertype_REAL8, MPI_SUM &
                        ,0_INT4, commSPRAY, ierr_mpi )
      endif
!
     return
      end subroutine mpiw_reduce_sumfast_real8


Note, that the problem is not dependent on the reduce-operator (here  MPI_SUM) chosen.
The problem does not occur, when I apply MPI_REDUCE without the MPI_IN_PLACE option.

As a reference I cite here the MPI 2.2 standard (of Sept 4, 2009, p. 164):
  The "in place" option for intracommunicators is specified by passing the value
  MPI_IN_PLACE to the argument sendbuf at the root. In such a case, the input data is taken
  at the root from the receive buffer, where it will be replaced by the output data.

Does my coding strictly conform to this?

Greetings
Michael Rachner





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130913/5d1de560/attachment.html>


More information about the discuss mailing list