[mpich-devel] tackling "large datatype" change

Rob Latham robl at mcs.anl.gov
Fri Jul 12 10:53:36 CDT 2013


On Fri, Jul 12, 2013 at 10:32:49AM -0500, Jeff Hammond wrote:
> Pavan indicated to me in another conversation that the internal overflow issues were fixed.  You can fight with him about the correctness of that statement :-)

He might mean Dave's large-count branch from April.  

Oddly, Dave's large-count work and my work overlap in about one line.

Neither branch is (yet) sufficient to pass the tests I've described.

I think I'm going to want to incorporate large-count into
ticket-1742-bigio but I can't get my head around 'rebase --onto'

==rob

> 
> Jeff
> 
> ----- Original Message -----
> > From: "David Goodell (dgoodell)" <dgoodell at cisco.com>
> > To: "<devel at mpich.org>" <devel at mpich.org>
> > Sent: Friday, July 12, 2013 10:13:37 AM
> > Subject: Re: [mpich-devel] tackling "large datatype" change
> > 
> > Jeff,
> > 
> > How could this possibly work correctly with broken (i.e., uses "int"s
> > internally for certain calculations) datatype and communication
> > engines?
> > 
> > The hard part of large count types is not constructing the actual
> > type, it's processing that type correctly throughout the entire
> > stack...
> > 
> > -Dave
> > 
> > On Jul 12, 2013, at 9:57 AM, Jeff Hammond <jhammond at alcf.anl.gov>
> > wrote:
> > 
> > > Hi Rob,
> > > 
> > > I started working on something to address this a few weeks ago.  I
> > > was going to write a portable (i.e. above MPI) implementation of
> > > 
> > > MPIX_Type_contiguous_x(MPI_Count count, MPI_Datatype old_type,
> > > MPI_Datatype *newtype)
> > > 
> > > and then reimplement it efficiently in MPICH.  I will also try to
> > > get JeffS or DaveG to create it for OtherMPI so e.g. PETSc can use
> > > it more broadly.
> > > 
> > > Does this sound like an okay plan to you?  I've not pushed the
> > > latest code to Github but when I do, I'll send you the link.
> > > 
> > > Best,
> > > 
> > > Jeff
> > > 
> > > 
> > > ----- Original Message -----
> > >> From: "Rob Latham" <robl at mcs.anl.gov>
> > >> To: devel at mpich.org
> > >> Sent: Friday, July 12, 2013 6:25:14 AM
> > >> Subject: [mpich-devel] tackling "large datatype" change
> > >> 
> > >> I've been trying to tackle tt #1742, #1890, and #1893 as part of
> > >> some
> > >> I/O work that uses large datatypes.
> > >> 
> > >> Am I stepping on anyone's toes here?  Pavan, I know i bugged you
> > >> about these
> > >> tickets.  Hope I didn't waste my Thursday on this...
> > >> 
> > >> Pavan told me there are two problems here, but I think there are
> > >> really one solution to both:
> > >> 
> > >> - large datatype means a datatype that describes more than 2 gigs
> > >> of
> > >>  data.  A million contigs of a million contigs, say
> > >> 
> > >> - large count means datatypes that use MPI_Count to say how many
> > >>  elements they have:  a contig of 3 billion MPI_BYTES, say
> > >> 
> > >> Internally, though, there's no way to decouple those two problems.
> > >> 
> > >> The changes are pervasive and make me really nervous.
> > >> 
> > >> I've been sometimes pushing changes to the ticket-1742-bigio
> > >> branch.
> > >> 
> > >> Who can review datatype changes once I'm done?:
> > >> 
> > >> ==rob
> > >> 
> > >> --
> > >> Rob Latham
> > >> Mathematics and Computer Science Division
> > >> Argonne National Lab, IL USA
> > >> 
> > > 
> > > --
> > > Jeff Hammond
> > > Argonne Leadership Computing Facility
> > > University of Chicago Computation Institute
> > > jhammond at alcf.anl.gov / (630) 252-5381
> > > http://www.linkedin.com/in/jeffhammond
> > > https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
> > > ALCF docs: http://www.alcf.anl.gov/user-guides
> > 
> > 
> 

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


More information about the devel mailing list