[mpich-devel] tackling "large datatype" change

Pavan Balaji balaji at mcs.anl.gov
Fri Jul 12 12:27:00 CDT 2013


Not quite.  I said that it works for some cases, but there are bugs.

  -- Pavan

On 07/12/2013 10:32 AM, Jeff Hammond wrote:
> Pavan indicated to me in another conversation that the internal overflow issues were fixed.  You can fight with him about the correctness of that statement :-)
>
> Jeff
>
> ----- Original Message -----
>> From: "David Goodell (dgoodell)" <dgoodell at cisco.com>
>> To: "<devel at mpich.org>" <devel at mpich.org>
>> Sent: Friday, July 12, 2013 10:13:37 AM
>> Subject: Re: [mpich-devel] tackling "large datatype" change
>>
>> Jeff,
>>
>> How could this possibly work correctly with broken (i.e., uses "int"s
>> internally for certain calculations) datatype and communication
>> engines?
>>
>> The hard part of large count types is not constructing the actual
>> type, it's processing that type correctly throughout the entire
>> stack...
>>
>> -Dave
>>
>> On Jul 12, 2013, at 9:57 AM, Jeff Hammond <jhammond at alcf.anl.gov>
>> wrote:
>>
>>> Hi Rob,
>>>
>>> I started working on something to address this a few weeks ago.  I
>>> was going to write a portable (i.e. above MPI) implementation of
>>>
>>> MPIX_Type_contiguous_x(MPI_Count count, MPI_Datatype old_type,
>>> MPI_Datatype *newtype)
>>>
>>> and then reimplement it efficiently in MPICH.  I will also try to
>>> get JeffS or DaveG to create it for OtherMPI so e.g. PETSc can use
>>> it more broadly.
>>>
>>> Does this sound like an okay plan to you?  I've not pushed the
>>> latest code to Github but when I do, I'll send you the link.
>>>
>>> Best,
>>>
>>> Jeff
>>>
>>>
>>> ----- Original Message -----
>>>> From: "Rob Latham" <robl at mcs.anl.gov>
>>>> To: devel at mpich.org
>>>> Sent: Friday, July 12, 2013 6:25:14 AM
>>>> Subject: [mpich-devel] tackling "large datatype" change
>>>>
>>>> I've been trying to tackle tt #1742, #1890, and #1893 as part of
>>>> some
>>>> I/O work that uses large datatypes.
>>>>
>>>> Am I stepping on anyone's toes here?  Pavan, I know i bugged you
>>>> about these
>>>> tickets.  Hope I didn't waste my Thursday on this...
>>>>
>>>> Pavan told me there are two problems here, but I think there are
>>>> really one solution to both:
>>>>
>>>> - large datatype means a datatype that describes more than 2 gigs
>>>> of
>>>>   data.  A million contigs of a million contigs, say
>>>>
>>>> - large count means datatypes that use MPI_Count to say how many
>>>>   elements they have:  a contig of 3 billion MPI_BYTES, say
>>>>
>>>> Internally, though, there's no way to decouple those two problems.
>>>>
>>>> The changes are pervasive and make me really nervous.
>>>>
>>>> I've been sometimes pushing changes to the ticket-1742-bigio
>>>> branch.
>>>>
>>>> Who can review datatype changes once I'm done?:
>>>>
>>>> ==rob
>>>>
>>>> --
>>>> Rob Latham
>>>> Mathematics and Computer Science Division
>>>> Argonne National Lab, IL USA
>>>>
>>>
>>> --
>>> Jeff Hammond
>>> Argonne Leadership Computing Facility
>>> University of Chicago Computation Institute
>>> jhammond at alcf.anl.gov / (630) 252-5381
>>> http://www.linkedin.com/in/jeffhammond
>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>>> ALCF docs: http://www.alcf.anl.gov/user-guides
>>
>>
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the devel mailing list