[mpich-discuss] Poor performance of Waitany / Waitsome
John Grime
jgrime at uchicago.edu
Wed Jan 15 12:17:00 CST 2014
Hi all,
I noticed unexpectedly poor performance of the MPI_Waitany() routine (Mac OSX 10.9.1, MPICH v3.0.4 via Macports).
I noticed that “wbland” had added relevant information to the “trac” system:
http://trac.mpich.org/projects/mpich/ticket/1988
… so I downloaded his example code, modified it a little and wrote a wrapper script to examine the different routines (attached to this email, apologies in advance for any dumb contents).
Results:
./time_routines.sh 4 50
nprocs = 4, ntokens = 16, ncycles = 50
Method : Time Relative
MPI_Waitall : 1.358000e-03 1.000x
MPI_Waitany : 1.491000e-03 1.098x
MPI_Waitsome : 3.243000e-03 2.388x
PMPI_Waitall : 9.860000e-04 0.726x
PMPI_Waitany : 1.421000e-03 1.046x
PMPI_Waitsome : 4.432000e-03 3.264x
nprocs = 4, ntokens = 64, ncycles = 50
Method : Time Relative
MPI_Waitall : 2.075000e-03 1.000x
MPI_Waitany : 5.746000e-03 2.769x
MPI_Waitsome : 1.314400e-02 6.334x
PMPI_Waitall : 3.142000e-03 1.514x
PMPI_Waitany : 5.450000e-03 2.627x
PMPI_Waitsome : 1.891500e-02 9.116x
nprocs = 4, ntokens = 128, ncycles = 50
Method : Time Relative
MPI_Waitall : 5.159000e-03 1.000x
MPI_Waitany : 1.615100e-02 3.131x
MPI_Waitsome : 5.004100e-02 9.700x
PMPI_Waitall : 3.480000e-03 0.675x
PMPI_Waitany : 2.564000e-02 4.970x
PMPI_Waitsome : 3.799700e-02 7.365x
nprocs = 4, ntokens = 512, ncycles = 50
Method : Time Relative
MPI_Waitall : 1.949800e-02 1.000x
MPI_Waitany : 2.431020e-01 12.468x
MPI_Waitsome : 3.643640e-01 18.687x
PMPI_Waitall : 1.869800e-02 0.959x
PMPI_Waitany : 2.491870e-01 12.780x
PMPI_Waitsome : 3.500600e-01 17.954x
nprocs = 4, ntokens = 1024, ncycles = 50
Method : Time Relative
MPI_Waitall : 2.749100e-02 1.000x
MPI_Waitany : 1.223122e+00 44.492x
MPI_Waitsome : 1.554282e+00 56.538x
PMPI_Waitall : 3.329800e-02 1.211x
PMPI_Waitany : 1.232125e+00 44.819x
PMPI_Waitsome : 1.531198e+00 55.698x
… and so it seems the performance delta between the different approaches ( Waitall / Waitany / Waitsome ) increases as a function of the buffer size.
This is a bit of a problem for me, as I make heavy use of Waitany() to overlap communication with calculations. Is there any way to avoid this behavior?
Cheers,
J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140115/e8021eba/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nb_ring.c
Type: application/octet-stream
Size: 6542 bytes
Desc: nb_ring.c
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140115/e8021eba/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: time_routines.sh
Type: application/octet-stream
Size: 840 bytes
Desc: time_routines.sh
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140115/e8021eba/attachment-0001.obj>
More information about the discuss
mailing list