[mpich-devel] tackling "large datatype" change

Rob Latham robl at mcs.anl.gov
Fri Jul 12 10:31:15 CDT 2013


On Fri, Jul 12, 2013 at 10:24:48AM -0500, Pavan Balaji wrote:
> Rob,
> 
> Sorry, I'll look into this next week.

no need to apologize.  I've been there -- twice.

> Amount of time left is proportional to age of baby.
> 
> The first solution works for some cases in MPICH, but there are
> still many bugs.  I think your ticket is to fix this part?
> 
> The second solution is known not to work.

OK, I don't think there's any way to decouple these two solutions,
though.   

There are lots of places in the datatype/dataloop code where
optimizations are applied :  "oh these two contig types are next to
each other.  Let's combine them into a single contig type... with a
count that overflows. "

==rob

>  -- Pavan
> 
> On 07/12/2013 06:25 AM, Rob Latham wrote:
> >I've been trying to tackle tt #1742, #1890, and #1893 as part of some
> >I/O work that uses large datatypes.
> >
> >Am I stepping on anyone's toes here?  Pavan, I know i bugged you about these
> >tickets.  Hope I didn't waste my Thursday on this...
> >
> >Pavan told me there are two problems here, but I think there are
> >really one solution to both:
> >
> >- large datatype means a datatype that describes more than 2 gigs of
> >   data.  A million contigs of a million contigs, say
> >
> >- large count means datatypes that use MPI_Count to say how many
> >   elements they have:  a contig of 3 billion MPI_BYTES, say
> >
> >Internally, though, there's no way to decouple those two problems.
> >
> >The changes are pervasive and make me really nervous.
> >
> >I've been sometimes pushing changes to the ticket-1742-bigio branch.
> >
> >Who can review datatype changes once I'm done?:
> >
> >==rob
> >
> 

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


More information about the devel mailing list