[mpich-discuss] CPU usage versus Nodes, Threads

Thu Oct 23 11:05:34 CDT 2014

Hi Ruben,

You are right! My algorithm does what you described.

I will record the timestamp for each thread and every event.  Thanks for your suggestions!

Qiguo

From: parasietje at gmail.com [mailto:parasietje at gmail.com] On Behalf Of Ruben Faelens
Sent: Thursday, October 23, 2014 10:58 AM
To: discuss at mpich.org
Subject: Re: [mpich-discuss] CPU usage versus Nodes, Threads

Hi Qiguo,

You should try to collect performance statistics, especially regarding your specific nodes and what they are doing at every moment in time.
If I understand correctly, your algorithm does the following:
- Master thread: read in data, split it up into pieces, transfer pieces to slaves
- Slave thread: do calculation, transfer data back to master
- Master: recombine data, do calculation, split data back up, transfer pieces to slaves
- etc...

The reason you do not see linear performance scaling could be due to the following:
- The master thread recombining and splitting the data set may be responsible for a large part of the work (and therefore is the bottleneck)
- Work is not divided equally. A significant part of the time is spent waiting on one slave node who has a more difficult problem (takes longer) than the rest.
- There is a common dataset. I/O takes a larger part of the time when more slave nodes are used.

The only way to know for sure, is to simply generate a log file that shows the time every process starts and ends a specific process step. Output the time when
- the slave starts receiving data
- starts calculation
- starts sending results back
- starts waiting for his next piece of data
This will clearly show you what each node is doing at each moment in time, and should identify the bottleneck.

/ Ruben

On Thu, Oct 23, 2014 at 5:27 PM, Qiguo Jing <qjing at trinityconsultants.com<mailto:qjing at trinityconsultants.com>> wrote:
Hi Bob,

Thanks for your suggestions.   Here are more tests.  We actually have three clusters.

Cluster 1 and 2:  8 nodes, (2 Processors, 4 cores/processor, no HT – Total 8 Threads)/node
Cluster 3:              8 nodes, (1 Processors, 4 cores /processor, HT – Total 8 Threads )/node

We also have a standalone machine:  2 processors, 6 cores/processor, HT – total 24 threads.

For one particular case:

Cluster 1 and 2 take 48 min to finish with 8 nodes, 8 threads/node,  60% CPU usage;  53 min to finish with 3 nodes, 8 threads/node, 90% CPU usage;

Cluster 3 takes 227 min to finish with 8 nodes, 8 threads/node, 20% CPU usage; 207 min to finish with 3 nodes, 8 threads/node, 50% CPU usage;

Standalone machine takes 82 min to finish with 24 threads, 100% CPU usage.

It looks like with 24 threads, they should be pretty busy?   Could the above phenomena be a hardware issue more than software?

Qiguo

From: Bob Ilgner [mailto:bobilgner at gmail.com]
Sent: Thursday, October 23, 2014 1:11 AM
To: discuss at mpich.org<mailto:discuss at mpich.org>
Subject: Re: [mpich-discuss] CPU usage versus Nodes, Threads

Hi Qiguo,