[mpich-devel] tackling "large datatype" change
Jeff Hammond
jhammond at alcf.anl.gov
Fri Jul 12 10:32:49 CDT 2013
Pavan indicated to me in another conversation that the internal overflow issues were fixed. You can fight with him about the correctness of that statement :-)
Jeff
----- Original Message -----
> From: "David Goodell (dgoodell)" <dgoodell at cisco.com>
> To: "<devel at mpich.org>" <devel at mpich.org>
> Sent: Friday, July 12, 2013 10:13:37 AM
> Subject: Re: [mpich-devel] tackling "large datatype" change
>
> Jeff,
>
> How could this possibly work correctly with broken (i.e., uses "int"s
> internally for certain calculations) datatype and communication
> engines?
>
> The hard part of large count types is not constructing the actual
> type, it's processing that type correctly throughout the entire
> stack...
>
> -Dave
>
> On Jul 12, 2013, at 9:57 AM, Jeff Hammond <jhammond at alcf.anl.gov>
> wrote:
>
> > Hi Rob,
> >
> > I started working on something to address this a few weeks ago. I
> > was going to write a portable (i.e. above MPI) implementation of
> >
> > MPIX_Type_contiguous_x(MPI_Count count, MPI_Datatype old_type,
> > MPI_Datatype *newtype)
> >
> > and then reimplement it efficiently in MPICH. I will also try to
> > get JeffS or DaveG to create it for OtherMPI so e.g. PETSc can use
> > it more broadly.
> >
> > Does this sound like an okay plan to you? I've not pushed the
> > latest code to Github but when I do, I'll send you the link.
> >
> > Best,
> >
> > Jeff
> >
> >
> > ----- Original Message -----
> >> From: "Rob Latham" <robl at mcs.anl.gov>
> >> To: devel at mpich.org
> >> Sent: Friday, July 12, 2013 6:25:14 AM
> >> Subject: [mpich-devel] tackling "large datatype" change
> >>
> >> I've been trying to tackle tt #1742, #1890, and #1893 as part of
> >> some
> >> I/O work that uses large datatypes.
> >>
> >> Am I stepping on anyone's toes here? Pavan, I know i bugged you
> >> about these
> >> tickets. Hope I didn't waste my Thursday on this...
> >>
> >> Pavan told me there are two problems here, but I think there are
> >> really one solution to both:
> >>
> >> - large datatype means a datatype that describes more than 2 gigs
> >> of
> >> data. A million contigs of a million contigs, say
> >>
> >> - large count means datatypes that use MPI_Count to say how many
> >> elements they have: a contig of 3 billion MPI_BYTES, say
> >>
> >> Internally, though, there's no way to decouple those two problems.
> >>
> >> The changes are pervasive and make me really nervous.
> >>
> >> I've been sometimes pushing changes to the ticket-1742-bigio
> >> branch.
> >>
> >> Who can review datatype changes once I'm done?:
> >>
> >> ==rob
> >>
> >> --
> >> Rob Latham
> >> Mathematics and Computer Science Division
> >> Argonne National Lab, IL USA
> >>
> >
> > --
> > Jeff Hammond
> > Argonne Leadership Computing Facility
> > University of Chicago Computation Institute
> > jhammond at alcf.anl.gov / (630) 252-5381
> > http://www.linkedin.com/in/jeffhammond
> > https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
> > ALCF docs: http://www.alcf.anl.gov/user-guides
>
>
--
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
ALCF docs: http://www.alcf.anl.gov/user-guides
More information about the devel
mailing list