[mpich-discuss] Issue with MPI_IALLTOALL on Cray XE6

Rattermann . Dale drattermann at drc.com
Thu Jun 12 13:47:01 CDT 2014


Hi,

I'm not sure if anyone can help me with this question but I'll ask anyway. I'm working on optimizing a code that utilizes MPI on a Cray XE6. In this code there is a subroutine that calls MPI_ALLTOALL 16 different times. I changed this code so that non-blocking MPI_IALLTOALLs were used instead so that computation could be overlapped with communication, but found that it actually slowed the code down quite considerably. I wrote a little test program in fortran (that I have attached below) in order to time the two different scenarios. I've tried using both the cray-mpich/6.0.0 and cray-mpich/6.3.0 modules. I've noticed two things by running this code:

1) MPI_IALLTOALL and MPI_WAIT run slower than MPI_ALLTOALL on the Cray… even when computation and communication are overlapped using the former
2) MPI_ALLTOALL runs very slow the first time it is called (this can be seen by commenting out the MPI_ALLTOALL call on lines 26 and 27 in the attached code and compiling and running)

I'm using an interactive batch session on Garnet with 2 compute nodes (64 cores total). I've compiled and run the code using the following:

ftn mpi_test.f
aprun -n 64 ./a.out 6400000

Here is an example of results I get while running with lines 26 and 27 uncommented:

nratter at batch04-wlm:~/temp/mpi_test> make clean; make; aprun -n 64 ./a.out 6400000
rm a.out
ftn mpi_test.f
            0 : dt for MPI_ALLTOALL section =    0.9902300834655762
            0 : dt for MPI_IALLTOALL section =     1.116391181945801

And here is an example of results I get while running with lines 26 and 27 commented out:

nratter at batch04-wlm:~/temp/mpi_test> make clean; make; aprun -n 64 ./a.out 6400000
rm a.out
ftn mpi_test.f
            0 : dt for MPI_ALLTOALL section =     2.331636905670166
            0 : dt for MPI_IALLTOALL section =     1.070204973220825

Does anyone know of a way to improve both the efficiency of MPI_IALLTOALL/MPI_WAIT and preventing the first call to MPI_ALLTOALL from taking so long? In terms of priority, I am much more interested in improving the non-blocking communications. I've tried many different combinations of environment variables with no success. Thanks for your time… I'm hoping someone can help here.

Thanks,
Nick Rattermann

-------------------------

      program hello_world
        implicit none
        include 'mpif.h'
        integer ierr, num_procs, my_id,rqst(2),cnt,i,N,maxnum
        real*8,dimension(:),allocatable :: a,atemp,b,btemp,
     >         c,ctemp,d,dtemp,e,f
        character(len=100) :: arg
        double precision :: t1,t2

       CALL GETARG(1,arg) !Grab the 2nd command line argument
       ! and store it in the temporary variable
       ! 'arg'

       read(arg,*) maxnum !Now convert string to integer

        allocate(a(maxnum),atemp(maxnum),b(maxnum),btemp(maxnum),
     >    c(maxnum),ctemp(maxnum),d(maxnum),dtemp(maxnum),
     >    e(maxnum),f(maxnum))

        call MPI_INIT ( ierr )

      !  find out MY process ID, and how many processes were started.
        call MPI_COMM_RANK (MPI_COMM_WORLD, my_id, ierr)
        call MPI_COMM_SIZE (MPI_COMM_WORLD, num_procs, ierr)

        call MPI_ALLTOALL(c,maxnum/num_procs,MPI_REAL8,ctemp,
     >       MAXNUM/num_procs,MPI_REAL8,MPI_COMM_WORLD,ierr)

!- MPI_ALLTOALL SECTION

        a = 1.
        atemp = 0.
        b = 2.
        btemp = 0.

        call MPI_BARRIER(MPI_COMM_WORLD,ierr)

        t1 = MPI_WTIME()

        call MPI_ALLTOALL(a,maxnum/num_procs,MPI_REAL8,atemp,
     >       MAXNUM/num_procs,MPI_REAL8,MPI_COMM_WORLD,ierr)
        call MPI_ALLTOALL(b,maxnum/num_procs,MPI_REAL8,btemp,
     >       MAXNUM/num_procs,MPI_REAL8,MPI_COMM_WORLD,ierr)
        do i=1,maxnum
          e(i) = real(i)
        end do

        cnt = 0
        do i=1,maxnum
          if (atemp(i) .ne. 1.) cnt = cnt + 1
        end do
        if (cnt .ne. 0) print *,my_id,": ATEMP NOT EQUAL TO 1!!!"

        cnt = 0
        do i=1,maxnum
          if (btemp(i) .ne. 2.) cnt = cnt + 1
        end do
        if (cnt .ne. 0) print *,my_id,": BTEMP NOT EQUAL TO 2!!!"

        t2 = MPI_WTIME()

        if (my_id == 0) print*,my_id,": dt for MPI_ALLTOALL section = ",
     >    t2-t1

!- MPI_IALLTOALL SECTION

        c = 1.
        ctemp = 0.
        d = 2.
        dtemp = 0.

        call MPI_BARRIER(MPI_COMM_WORLD,ierr)

        t1 = MPI_WTIME()

        call MPI_IALLTOALL(c,maxnum/num_procs,MPI_REAL8,ctemp,
     >       MAXNUM/num_procs,MPI_REAL8,MPI_COMM_WORLD,rqst(1),ierr)
        call MPI_IALLTOALL(d,maxnum/num_procs,MPI_REAL8,dtemp,
     >       MAXNUM/num_procs,MPI_REAL8,MPI_COMM_WORLD,rqst(2),ierr)
        do i=1,maxnum
          f(i) = real(i)
        end do

        call MPI_WAIT(rqst(1),MPI_STATUS_IGNORE,ierr)
        cnt = 0
        do i=1,maxnum
          if (ctemp(i) .ne. 1.) cnt = cnt + 1
        end do
        if (cnt .ne. 0) print *,my_id,": ATEMP NOT EQUAL TO 1!!!"

        call MPI_WAIT(rqst(2),MPI_STATUS_IGNORE,ierr)
        cnt = 0
        do i=1,maxnum
          if (dtemp(i) .ne. 2.) cnt = cnt + 1
        end do
        if (cnt .ne. 0) print *,my_id,": BTEMP NOT EQUAL TO 2!!!"

        t2 = MPI_WTIME()

        if (my_id == 0) print*,my_id,": dt for MPI_IALLTOALL section = "
     >    ,t2-t1

        call MPI_FINALIZE ( ierr )

        deallocate(a,atemp,b,btemp,c,ctemp,d,dtemp,e,f)

      end

________________________________
This electronic message transmission and any attachments that accompany it contain information from DRC® (Dynamics Research Corporation) or its subsidiaries, or the intended recipient, which is privileged, proprietary, business confidential, or otherwise protected from disclosure and is the exclusive property of DRC and/or the intended recipient. The information in this email is solely intended for the use of the individual or entity that is the intended recipient. If you are not the intended recipient, any use, dissemination, distribution, retention, or copying of this communication, attachments, or substance is prohibited. If you have received this electronic transmission in error, please immediately reply to the author via email that you received the message by mistake and also promptly and permanently delete this message and all copies of this email and any attachments. We thank you for your assistance and apologize for any inconvenience.


More information about the discuss mailing list