[mpich-discuss] CPU usage versus Nodes, Threads
Qiguo Jing
qjing at trinityconsultants.com
Thu Oct 23 11:05:34 CDT 2014
Hi Ruben,
You are right! My algorithm does what you described.
I will record the timestamp for each thread and every event. Thanks for your suggestions!
Qiguo
From: parasietje at gmail.com [mailto:parasietje at gmail.com] On Behalf Of Ruben Faelens
Sent: Thursday, October 23, 2014 10:58 AM
To: discuss at mpich.org
Subject: Re: [mpich-discuss] CPU usage versus Nodes, Threads
Hi Qiguo,
You should try to collect performance statistics, especially regarding your specific nodes and what they are doing at every moment in time.
If I understand correctly, your algorithm does the following:
- Master thread: read in data, split it up into pieces, transfer pieces to slaves
- Slave thread: do calculation, transfer data back to master
- Master: recombine data, do calculation, split data back up, transfer pieces to slaves
- etc...
The reason you do not see linear performance scaling could be due to the following:
- The master thread recombining and splitting the data set may be responsible for a large part of the work (and therefore is the bottleneck)
- Work is not divided equally. A significant part of the time is spent waiting on one slave node who has a more difficult problem (takes longer) than the rest.
- There is a common dataset. I/O takes a larger part of the time when more slave nodes are used.
The only way to know for sure, is to simply generate a log file that shows the time every process starts and ends a specific process step. Output the time when
- the slave starts receiving data
- starts calculation
- starts sending results back
- starts waiting for his next piece of data
This will clearly show you what each node is doing at each moment in time, and should identify the bottleneck.
/ Ruben
On Thu, Oct 23, 2014 at 5:27 PM, Qiguo Jing <qjing at trinityconsultants.com<mailto:qjing at trinityconsultants.com>> wrote:
Hi Bob,
Thanks for your suggestions. Here are more tests. We actually have three clusters.
Cluster 1 and 2: 8 nodes, (2 Processors, 4 cores/processor, no HT – Total 8 Threads)/node
Cluster 3: 8 nodes, (1 Processors, 4 cores /processor, HT – Total 8 Threads )/node
We also have a standalone machine: 2 processors, 6 cores/processor, HT – total 24 threads.
For one particular case:
Cluster 1 and 2 take 48 min to finish with 8 nodes, 8 threads/node, 60% CPU usage; 53 min to finish with 3 nodes, 8 threads/node, 90% CPU usage;
Cluster 3 takes 227 min to finish with 8 nodes, 8 threads/node, 20% CPU usage; 207 min to finish with 3 nodes, 8 threads/node, 50% CPU usage;
Standalone machine takes 82 min to finish with 24 threads, 100% CPU usage.
It looks like with 24 threads, they should be pretty busy? Could the above phenomena be a hardware issue more than software?
Qiguo
From: Bob Ilgner [mailto:bobilgner at gmail.com]
Sent: Thursday, October 23, 2014 1:11 AM
To: discuss at mpich.org<mailto:discuss at mpich.org>
Subject: Re: [mpich-discuss] CPU usage versus Nodes, Threads
Hi Qiguo,
More information about the discuss
mailing list