<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<div class="BodyFragment"><font size="2"><span style="font-size:10pt;">
<div class="PlainText">Hi all,<br>
<br>
I noticed unexpectedly poor performance of the MPI_Waitany() routine (Mac OSX 10.9.1, MPICH v3.0.4 via Macports).<br>
<br>
I noticed that “wbland” had added relevant information to the “trac” system:<br>
<br>
<a href="http://trac.mpich.org/projects/mpich/ticket/1988">http://trac.mpich.org/projects/mpich/ticket/1988</a><br>
<br>
… so I downloaded his example code, modified it a little and wrote a wrapper script to examine the different routines (attached to this email, apologies in advance for any dumb contents).
<br>
<br>
Results:<br>
<br>
./time_routines.sh 4 50<br>
<br>
nprocs = 4, ntokens = 16, ncycles = 50<br>
Method : Time Relative<br>
MPI_Waitall : 1.358000e-03 1.000x<br>
MPI_Waitany : 1.491000e-03 1.098x<br>
MPI_Waitsome : 3.243000e-03 2.388x<br>
PMPI_Waitall : 9.860000e-04 0.726x<br>
PMPI_Waitany : 1.421000e-03 1.046x<br>
PMPI_Waitsome : 4.432000e-03 3.264x<br>
<br>
<br>
nprocs = 4, ntokens = 64, ncycles = 50<br>
Method : Time Relative<br>
MPI_Waitall : 2.075000e-03 1.000x<br>
MPI_Waitany : 5.746000e-03 2.769x<br>
MPI_Waitsome : 1.314400e-02 6.334x<br>
PMPI_Waitall : 3.142000e-03 1.514x<br>
PMPI_Waitany : 5.450000e-03 2.627x<br>
PMPI_Waitsome : 1.891500e-02 9.116x<br>
<br>
<br>
nprocs = 4, ntokens = 128, ncycles = 50<br>
Method : Time Relative<br>
MPI_Waitall : 5.159000e-03 1.000x<br>
MPI_Waitany : 1.615100e-02 3.131x<br>
MPI_Waitsome : 5.004100e-02 9.700x<br>
PMPI_Waitall : 3.480000e-03 0.675x<br>
PMPI_Waitany : 2.564000e-02 4.970x<br>
PMPI_Waitsome : 3.799700e-02 7.365x<br>
<br>
<br>
nprocs = 4, ntokens = 512, ncycles = 50<br>
Method : Time Relative<br>
MPI_Waitall : 1.949800e-02 1.000x<br>
MPI_Waitany : 2.431020e-01 12.468x<br>
MPI_Waitsome : 3.643640e-01 18.687x<br>
PMPI_Waitall : 1.869800e-02 0.959x<br>
PMPI_Waitany : 2.491870e-01 12.780x<br>
PMPI_Waitsome : 3.500600e-01 17.954x<br>
<br>
<br>
nprocs = 4, ntokens = 1024, ncycles = 50<br>
Method : Time Relative<br>
MPI_Waitall : 2.749100e-02 1.000x<br>
MPI_Waitany : 1.223122e+00 44.492x<br>
MPI_Waitsome : 1.554282e+00 56.538x<br>
PMPI_Waitall : 3.329800e-02 1.211x<br>
PMPI_Waitany : 1.232125e+00 44.819x<br>
PMPI_Waitsome : 1.531198e+00 55.698x<br>
<br>
… and so it seems the performance delta between the different approaches ( Waitall / Waitany / Waitsome ) increases as a function of the buffer size.<br>
<br>
This is a bit of a problem for me, as I make heavy use of Waitany() to overlap communication with calculations. Is there any way to avoid this behavior?<br>
<br>
Cheers,</div>
</span></font></div>
<div class="BodyFragment"><font size="2"><span style="font-size:10pt;">
<div class="PlainText"><br>
<br>
J.</div>
</span></font></div>
</body>
</html>