[mpich-devel] ROMIO collective i/o memory use

Mon May 6 15:05:10 CDT 2013

> From: "Rob Latham" <robl at mcs.anl.gov>
> 
> On Mon, May 06, 2013 at 02:30:15PM -0500, Bob Cernohous wrote:
> > > From: Rob Ross <rross at mcs.anl.gov>
> > > 
> > > Should we consider this as interest in working on this problem on 
> > > the IBM side :)? -- Rob
> > 
> > Say what?! ;)
> 
> RobR's excited that IBM's looking at the ROMIO piece of DCMF.  We
> thought we were on our own with that one. 
> 
> 
> > I was looking more for agreement that collective i/o is 'what it 
> > is'... and maybe some idea if we just have some known limitations on 
> > scaling it.  Yes, that BG alltoallv is a bigger problem that we can 
avoid 
> > with an env var -- is that just going to have to be 'good enough'?  (I 

> > think that Jeff P wrote that on BG/P and got good performance with 
that 
> > alltoallv.  Trading memory for performance, not unusual, and at least 
it's 
> > selectable.)
> 
> I can't test while our Blue Gene is under maintenance.    I know the
> environment variable selection helps only a little bit (like improves
> scaling from 4k to 8k maybe?  don't have the notes offhand). 

Ouch.  So you've seen the scaling failures at 8k... ranks? racks?  Kevin 
is failing at... 16 racks x 16 ranks per node... I think ... so 256k 
ranks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20130506/a2165efe/attachment-0001.html>