[mpich-discuss] mpi_gather slow on a single node

Keith.Bannister at csiro.au Keith.Bannister at csiro.au
Thu Dec 8 16:38:52 CST 2016


HI Halim,

Thanks for getting back to me.

On 6 Dec 2016, at 3:48 am, Halim Amer <aamer at anl.gov> wrote:
> > 2) The latency changes by > 10x over 100 iterations. Is that normal?
> 
> What is the baseline you are comparing against? Do you mean memory latency? If yes, how do you measure it and from where do you fetch the data? 

I mean the (max - min)/min latency for a call to mpi_gather() as reported by the osu_gather [1] benchmark for a 16MB message size with 12 ranks running on the same node:

mpirun  -n 12 ./osu_gather -m 33554432 -f -M 1073741842
# Size       Avg Latency(us)   Min Latency(us)   Max Latency(us)  Iterations
...
16777216            19515.15           3522.29          49930.88         100

I haven’t checked the osu_gather benchmark. I imagine it initialises some arrays and just runs mpi_gather.

I don’t know how to find out whether mpich is using shared memory, or going over some network interface. In either case, the jitter in latency (x13) is much larger on this mpich setup than I’ve seen on another machine (in which it’s a few percent). 

Moreover, that latency should be much smaller if it is using shared memory. If I understand it right, the average throughput = message_size/latency = 6.5 Gbits/sec, which resembles a network throughput to me. I’m sure the memory bus on this machine can sustain much more than that.

vmstat doesn’t report any swapping, but I’m wondering whether there’s some problem with how the shared memory is working? Some virtual memory setup problem?

> What is your hardware?

Cray XC30
Single node:
Memory: 64 GB
CPU
model name           : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
Architecture         : x86_64
cpu MHz              : 1999.873
cache size           : 20480 KB (Last Level)

Total Number of Sockets             	: 2
Total Number of Cores               	: 16	(8 per socket)
Hyperthreading                      	: ON
Total Number of Physical Processors 	: 16
Total Number of Logical Processors  	: 32	(2 per Phys Processor)

> 
> > MPICH configure: --prefix=/group/astronomy856/ban115/mpich/build-ingest-debug --enable-error-messages=all --enable-timing=all --enable-g=most
> 
> You are trying to understand if there is a performance anomaly, yet you build MPICH in debugging mode. I suggest building with *--enable-fast=O3,ndebug* and remove the other flags you supplied.

I’ve compiled as you say, and got essentailly the same results. :-(



[1] http://mvapich.cse.ohio-state.edu/benchmarks/
--
KEITH BANNISTER
CSIRO Astronomy and Space Science
T +61 2 9372 4295
E keith.bannister at csiro.au





_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list