[mpich-discuss] Poor performance of Waitany / Waitsome

Wed Jan 15 12:17:00 CST 2014

Hi all,

I noticed unexpectedly poor performance of the MPI_Waitany() routine (Mac OSX 10.9.1, MPICH v3.0.4 via Macports).

I noticed that “wbland” had added relevant information to the “trac” system:

http://trac.mpich.org/projects/mpich/ticket/1988

… so I downloaded his example code, modified it a little and wrote a wrapper script to examine the different routines (attached to this email, apologies in advance for any dumb contents).

Results:

./time_routines.sh 4 50

nprocs = 4, ntokens = 16, ncycles = 50
Method          : Time         Relative
    MPI_Waitall : 1.358000e-03    1.000x
    MPI_Waitany : 1.491000e-03    1.098x
   MPI_Waitsome : 3.243000e-03    2.388x
   PMPI_Waitall : 9.860000e-04    0.726x
   PMPI_Waitany : 1.421000e-03    1.046x
  PMPI_Waitsome : 4.432000e-03    3.264x

nprocs = 4, ntokens = 64, ncycles = 50
Method          : Time         Relative
    MPI_Waitall : 2.075000e-03    1.000x
    MPI_Waitany : 5.746000e-03    2.769x
   MPI_Waitsome : 1.314400e-02    6.334x
   PMPI_Waitall : 3.142000e-03    1.514x
   PMPI_Waitany : 5.450000e-03    2.627x
  PMPI_Waitsome : 1.891500e-02    9.116x

nprocs = 4, ntokens = 128, ncycles = 50
Method          : Time         Relative
    MPI_Waitall : 5.159000e-03    1.000x
    MPI_Waitany : 1.615100e-02    3.131x
   MPI_Waitsome : 5.004100e-02    9.700x
   PMPI_Waitall : 3.480000e-03    0.675x
   PMPI_Waitany : 2.564000e-02    4.970x
  PMPI_Waitsome : 3.799700e-02    7.365x

nprocs = 4, ntokens = 512, ncycles = 50
Method          : Time         Relative
    MPI_Waitall : 1.949800e-02    1.000x
    MPI_Waitany : 2.431020e-01   12.468x
   MPI_Waitsome : 3.643640e-01   18.687x
   PMPI_Waitall : 1.869800e-02    0.959x
   PMPI_Waitany : 2.491870e-01   12.780x
  PMPI_Waitsome : 3.500600e-01   17.954x

nprocs = 4, ntokens = 1024, ncycles = 50
Method          : Time         Relative
    MPI_Waitall : 2.749100e-02    1.000x
    MPI_Waitany : 1.223122e+00   44.492x
   MPI_Waitsome : 1.554282e+00   56.538x
   PMPI_Waitall : 3.329800e-02    1.211x
   PMPI_Waitany : 1.232125e+00   44.819x
  PMPI_Waitsome : 1.531198e+00   55.698x

… and so it seems the performance delta between the different approaches ( Waitall / Waitany / Waitsome ) increases as a function of the buffer size.

This is a bit of a problem for me, as I make heavy use of Waitany() to overlap communication with calculations. Is there any way to avoid this behavior?

Cheers,

J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140115/e8021eba/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nb_ring.c
Type: application/octet-stream
Size: 6542 bytes
Desc: nb_ring.c
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140115/e8021eba/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: time_routines.sh
Type: application/octet-stream
Size: 840 bytes
Desc: time_routines.sh
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140115/e8021eba/attachment-0001.obj>