[mpich-discuss] Issue with MPI_IALLTOALL on Cray XE6

Balaji, Pavan balaji at anl.gov
Fri Jun 13 05:43:56 CDT 2014


Hello,

While Cray MPI is derived from MPICH, it is closed source.  So we don’t know what modifications they made to their code from the base MPICH version.

I’m afraid you might have to ask the Cray folks about this.

  — Pavan

On Jun 12, 2014, at 1:47 PM, Rattermann . Dale <drattermann at drc.com> wrote:

> Hi,
> 
> I'm not sure if anyone can help me with this question but I'll ask anyway. I'm working on optimizing a code that utilizes MPI on a Cray XE6. In this code there is a subroutine that calls MPI_ALLTOALL 16 different times. I changed this code so that non-blocking MPI_IALLTOALLs were used instead so that computation could be overlapped with communication, but found that it actually slowed the code down quite considerably. I wrote a little test program in fortran (that I have attached below) in order to time the two different scenarios. I've tried using both the cray-mpich/6.0.0 and cray-mpich/6.3.0 modules. I've noticed two things by running this code:
> 
> 1) MPI_IALLTOALL and MPI_WAIT run slower than MPI_ALLTOALL on the Cray… even when computation and communication are overlapped using the former
> 2) MPI_ALLTOALL runs very slow the first time it is called (this can be seen by commenting out the MPI_ALLTOALL call on lines 26 and 27 in the attached code and compiling and running)
> 
> I'm using an interactive batch session on Garnet with 2 compute nodes (64 cores total). I've compiled and run the code using the following:
> 
> ftn mpi_test.f
> aprun -n 64 ./a.out 6400000
> 
> Here is an example of results I get while running with lines 26 and 27 uncommented:
> 
> nratter at batch04-wlm:~/temp/mpi_test> make clean; make; aprun -n 64 ./a.out 6400000
> rm a.out
> ftn mpi_test.f
>            0 : dt for MPI_ALLTOALL section =    0.9902300834655762
>            0 : dt for MPI_IALLTOALL section =     1.116391181945801
> 
> And here is an example of results I get while running with lines 26 and 27 commented out:
> 
> nratter at batch04-wlm:~/temp/mpi_test> make clean; make; aprun -n 64 ./a.out 6400000
> rm a.out
> ftn mpi_test.f
>            0 : dt for MPI_ALLTOALL section =     2.331636905670166
>            0 : dt for MPI_IALLTOALL section =     1.070204973220825
> 
> Does anyone know of a way to improve both the efficiency of MPI_IALLTOALL/MPI_WAIT and preventing the first call to MPI_ALLTOALL from taking so long? In terms of priority, I am much more interested in improving the non-blocking communications. I've tried many different combinations of environment variables with no success. Thanks for your time… I'm hoping someone can help here.
> 
> Thanks,
> Nick Rattermann
> 
> -------------------------
> 
>      program hello_world
>        implicit none
>        include 'mpif.h'
>        integer ierr, num_procs, my_id,rqst(2),cnt,i,N,maxnum
>        real*8,dimension(:),allocatable :: a,atemp,b,btemp,
>>        c,ctemp,d,dtemp,e,f
>        character(len=100) :: arg
>        double precision :: t1,t2
> 
>       CALL GETARG(1,arg) !Grab the 2nd command line argument
>       ! and store it in the temporary variable
>       ! 'arg'
> 
>       read(arg,*) maxnum !Now convert string to integer
> 
>        allocate(a(maxnum),atemp(maxnum),b(maxnum),btemp(maxnum),
>>   c(maxnum),ctemp(maxnum),d(maxnum),dtemp(maxnum),
>>   e(maxnum),f(maxnum))
> 
>        call MPI_INIT ( ierr )
> 
>      !  find out MY process ID, and how many processes were started.
>        call MPI_COMM_RANK (MPI_COMM_WORLD, my_id, ierr)
>        call MPI_COMM_SIZE (MPI_COMM_WORLD, num_procs, ierr)
> 
>        call MPI_ALLTOALL(c,maxnum/num_procs,MPI_REAL8,ctemp,
>>      MAXNUM/num_procs,MPI_REAL8,MPI_COMM_WORLD,ierr)
> 
> !- MPI_ALLTOALL SECTION
> 
>        a = 1.
>        atemp = 0.
>        b = 2.
>        btemp = 0.
> 
>        call MPI_BARRIER(MPI_COMM_WORLD,ierr)
> 
>        t1 = MPI_WTIME()
> 
>        call MPI_ALLTOALL(a,maxnum/num_procs,MPI_REAL8,atemp,
>>      MAXNUM/num_procs,MPI_REAL8,MPI_COMM_WORLD,ierr)
>        call MPI_ALLTOALL(b,maxnum/num_procs,MPI_REAL8,btemp,
>>      MAXNUM/num_procs,MPI_REAL8,MPI_COMM_WORLD,ierr)
>        do i=1,maxnum
>          e(i) = real(i)
>        end do
> 
>        cnt = 0
>        do i=1,maxnum
>          if (atemp(i) .ne. 1.) cnt = cnt + 1
>        end do
>        if (cnt .ne. 0) print *,my_id,": ATEMP NOT EQUAL TO 1!!!"
> 
>        cnt = 0
>        do i=1,maxnum
>          if (btemp(i) .ne. 2.) cnt = cnt + 1
>        end do
>        if (cnt .ne. 0) print *,my_id,": BTEMP NOT EQUAL TO 2!!!"
> 
>        t2 = MPI_WTIME()
> 
>        if (my_id == 0) print*,my_id,": dt for MPI_ALLTOALL section = ",
>>   t2-t1
> 
> !- MPI_IALLTOALL SECTION
> 
>        c = 1.
>        ctemp = 0.
>        d = 2.
>        dtemp = 0.
> 
>        call MPI_BARRIER(MPI_COMM_WORLD,ierr)
> 
>        t1 = MPI_WTIME()
> 
>        call MPI_IALLTOALL(c,maxnum/num_procs,MPI_REAL8,ctemp,
>>      MAXNUM/num_procs,MPI_REAL8,MPI_COMM_WORLD,rqst(1),ierr)
>        call MPI_IALLTOALL(d,maxnum/num_procs,MPI_REAL8,dtemp,
>>      MAXNUM/num_procs,MPI_REAL8,MPI_COMM_WORLD,rqst(2),ierr)
>        do i=1,maxnum
>          f(i) = real(i)
>        end do
> 
>        call MPI_WAIT(rqst(1),MPI_STATUS_IGNORE,ierr)
>        cnt = 0
>        do i=1,maxnum
>          if (ctemp(i) .ne. 1.) cnt = cnt + 1
>        end do
>        if (cnt .ne. 0) print *,my_id,": ATEMP NOT EQUAL TO 1!!!"
> 
>        call MPI_WAIT(rqst(2),MPI_STATUS_IGNORE,ierr)
>        cnt = 0
>        do i=1,maxnum
>          if (dtemp(i) .ne. 2.) cnt = cnt + 1
>        end do
>        if (cnt .ne. 0) print *,my_id,": BTEMP NOT EQUAL TO 2!!!"
> 
>        t2 = MPI_WTIME()
> 
>        if (my_id == 0) print*,my_id,": dt for MPI_IALLTOALL section = "
>>   ,t2-t1
> 
>        call MPI_FINALIZE ( ierr )
> 
>        deallocate(a,atemp,b,btemp,c,ctemp,d,dtemp,e,f)
> 
>      end
> 
> ________________________________
> This electronic message transmission and any attachments that accompany it contain information from DRC® (Dynamics Research Corporation) or its subsidiaries, or the intended recipient, which is privileged, proprietary, business confidential, or otherwise protected from disclosure and is the exclusive property of DRC and/or the intended recipient. The information in this email is solely intended for the use of the individual or entity that is the intended recipient. If you are not the intended recipient, any use, dissemination, distribution, retention, or copying of this communication, attachments, or substance is prohibited. If you have received this electronic transmission in error, please immediately reply to the author via email that you received the message by mistake and also promptly and permanently delete this message and all copies of this email and any attachments. We thank you for your assistance and apologize for any inconvenience.
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss




More information about the discuss mailing list