[mpich-discuss] low performance in an asynchronous, mixed MPI/pthreads app

Geoffrey Irving irving at naml.us
Tue Jan 8 01:09:10 CST 2013


On Mon, Jan 7, 2013 at 11:39 AM, Darius Buntinas <buntinas at mcs.anl.gov> wrote:
> Hi Geoff,
>
> Is it possible for you to try this with fewer worker threads per node (leaving some idle cores on each node)?

Yes, this should have been an easy thing to try, but it crashed due to
passing a 0 pointer to MPI.  That may indicate that I ran out of
memory and an unchecked malloc failed.  I'll track that down and
retry.

> What's the scale of the x-axis? (about how much time is one tick?)

The scale is shown on the top next to the ticks.  It's 1e-2 s = .01 s
in both figures.

> Do the worker threads receive messages, or is it only the comm thread receive all messages?

The comm thread receives all messages.

> How many isends are performed by the worker in the "wakeup" section?

Just one 8 byte Isend.

> Aside from the isends in "wakeup," are there any other MPI calls made by the worker threads?

No.

> How many Isends are performed by the comm thread in "response_send," and how many in "output_send?"

It's usually 1 per response_send region, but can be more if two happen
to occur back to back (they're separate blocks but you can't tell in
the screenshot without selecting one of them).  In output_send several
occur at once (usually 10 or 20 I think in the figure).  The scatter
pattern in the lines towards the right of the following shows one
example.

> Does "wait" do any MPI other than MPI_Waitsome?

No, just MPI_Waitsome.

> Does the worker do any MPI calls other than isends in "response_send" and "output_send," and waitsome in "wait?"

No for output_send and wait, but yes for response_send: after the data
Isend I repost a wildcard Irecv for a small 8 byte message.

Thanks for all the questions!
Geoffrey



More information about the discuss mailing list