[mpich-devel] tackling "large datatype" change

Jeff Hammond jhammond at alcf.anl.gov
Fri Jul 12 10:32:49 CDT 2013


Pavan indicated to me in another conversation that the internal overflow issues were fixed.  You can fight with him about the correctness of that statement :-)

Jeff

----- Original Message -----
> From: "David Goodell (dgoodell)" <dgoodell at cisco.com>
> To: "<devel at mpich.org>" <devel at mpich.org>
> Sent: Friday, July 12, 2013 10:13:37 AM
> Subject: Re: [mpich-devel] tackling "large datatype" change
> 
> Jeff,
> 
> How could this possibly work correctly with broken (i.e., uses "int"s
> internally for certain calculations) datatype and communication
> engines?
> 
> The hard part of large count types is not constructing the actual
> type, it's processing that type correctly throughout the entire
> stack...
> 
> -Dave
> 
> On Jul 12, 2013, at 9:57 AM, Jeff Hammond <jhammond at alcf.anl.gov>
> wrote:
> 
> > Hi Rob,
> > 
> > I started working on something to address this a few weeks ago.  I
> > was going to write a portable (i.e. above MPI) implementation of
> > 
> > MPIX_Type_contiguous_x(MPI_Count count, MPI_Datatype old_type,
> > MPI_Datatype *newtype)
> > 
> > and then reimplement it efficiently in MPICH.  I will also try to
> > get JeffS or DaveG to create it for OtherMPI so e.g. PETSc can use
> > it more broadly.
> > 
> > Does this sound like an okay plan to you?  I've not pushed the
> > latest code to Github but when I do, I'll send you the link.
> > 
> > Best,
> > 
> > Jeff
> > 
> > 
> > ----- Original Message -----
> >> From: "Rob Latham" <robl at mcs.anl.gov>
> >> To: devel at mpich.org
> >> Sent: Friday, July 12, 2013 6:25:14 AM
> >> Subject: [mpich-devel] tackling "large datatype" change
> >> 
> >> I've been trying to tackle tt #1742, #1890, and #1893 as part of
> >> some
> >> I/O work that uses large datatypes.
> >> 
> >> Am I stepping on anyone's toes here?  Pavan, I know i bugged you
> >> about these
> >> tickets.  Hope I didn't waste my Thursday on this...
> >> 
> >> Pavan told me there are two problems here, but I think there are
> >> really one solution to both:
> >> 
> >> - large datatype means a datatype that describes more than 2 gigs
> >> of
> >>  data.  A million contigs of a million contigs, say
> >> 
> >> - large count means datatypes that use MPI_Count to say how many
> >>  elements they have:  a contig of 3 billion MPI_BYTES, say
> >> 
> >> Internally, though, there's no way to decouple those two problems.
> >> 
> >> The changes are pervasive and make me really nervous.
> >> 
> >> I've been sometimes pushing changes to the ticket-1742-bigio
> >> branch.
> >> 
> >> Who can review datatype changes once I'm done?:
> >> 
> >> ==rob
> >> 
> >> --
> >> Rob Latham
> >> Mathematics and Computer Science Division
> >> Argonne National Lab, IL USA
> >> 
> > 
> > --
> > Jeff Hammond
> > Argonne Leadership Computing Facility
> > University of Chicago Computation Institute
> > jhammond at alcf.anl.gov / (630) 252-5381
> > http://www.linkedin.com/in/jeffhammond
> > https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
> > ALCF docs: http://www.alcf.anl.gov/user-guides
> 
> 

-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
ALCF docs: http://www.alcf.anl.gov/user-guides



More information about the devel mailing list