[mpich-discuss] MPI-IO bug
Rob Latham
robl at mcs.anl.gov
Sat May 3 12:27:23 CDT 2014
On 05/02/2014 05:24 PM, Wei-keng Liao wrote:
> Hi, Rob,
>
> Now we are all clear about the lower bound for indexed data type.
> I wonder if the correct solution (at least for calculating n_filetypes) should change
> from
> n_filetypes = (offset - flat_file->indices[0]) / filetype_extent;
> to
> n_filetypes = (offset - filetype_lb) / filetype_extent;
that's certainly more clear.
==rob
>
> Wei-keng
>
> On May 1, 2014, at 6:03 PM, Wei-keng Liao wrote:
>
>>
>> The flattened filetype offset-length lists are: (i.e. flat_file->indices[] and flat_file->blocklens[])
>> rank0: (0, 0) (4, 0) (8, 0) (12, 0) (16, 8) (28, 4)
>> rank1: (0, 16) (20, 0) (24, 4) (32, 4)
>>
>> What should their type extents be?
>>
>> MPI_Type_extent() returns values below:
>> rank 0: filetype_extent == 16 LB == 16
>> rank 1: filetype_extent == 36 LB == 0
>>
>> Is this correct? Shouldn't rank 0's extent be 32 and its LB be 0?
>>
>> If filetype_extent 16 is correct, then the way of calculating n_filetypes (below) is wrong.
>> n_filetypes = (offset - flat_file->indices[0]) / filetype_extent;
>>
>> The divisor should be the "range extent" of flat_file, i.e.
>> (flat_file->indices[flat_file->count-1] + flat_file->blocklens[flat_file->count-1]) - flat_file->indices[0]
>>
>>
>> Wei-keng
>>
>> On May 1, 2014, at 3:22 PM, Rob Latham wrote:
>>
>>>
>>>
>>> On 05/01/2014 12:02 PM, Wei-keng Liao wrote:
>>>> They are reasonable doubts.
>>>> I tested this patch against test/mpi/io/resized.c and src/mpi/romio/test/hindexed.c
>>>>
>>>> Could you run tests (both MPI and ROMIO) and see if the patch failed for any tests?
>>>> At least we will have an idea if it breaks anything first.
>>>
>>> OK, happy to do so. We've groused about the sparseness of the ROMIO test coverage for 12 years, but it's enough to catch a few things.
>>>
>>> This change introduces no new test failures, but I think precisely zero tests tile the file view.
>>>
>>> I still want to better understand how this works
>>>
>>> the indexed datatype looks like this (view in fixed-with font please)
>>>
>>> rank 0: |----11-1-|
>>> rank 1: |1111--1-1|
>>>
>>> ADIOI_Calc_my_off_len() should return a list of 3 things for rank 0 and a list of 3 things for rank 1 . For this test, it does the right thing for rank 1, returning 3 offset-length tuples: (0,16) (24,4) (32,4)
>>>
>>> but rank 0 returns the entirely incorrect
>>> (16,12)
>>>
>>> instead of
>>> (0,0) (16, 8) (28, 4)
>>>
>>> My concern is mostly with tiling, or in the context of Calc_my_off_len, how n_filetypes is computed (there's also filetype_extent, but ROMIO calls the MPI library for that, so I have high confidence it is correct)
>>>
>>> Fun finding: if I alter your test case to test the effects of tiling a file view -- by resizing the indexed type -- ROMIO handles that just fine without your change.
>>>
>>> So actually the problem with Calc_my_off_len seems to be that the indexed type puts the underlying file offset at '16', and ROMIO thinks that means the datatype was tiled once -- it seems to be ignoring the lower bound marker.
>>>
>>> It's just a small tweak to your approach but I'd like to make it a bit more explicit that the flattened representation uses a zero-length item to indicate the LB of the type. What do you think of the attached patch?
>>>
>>> I'm still not convinced this properly handles tiling but we fix a known issue and can deal with the tiling-when-not-resized issue later.
>>>
>>> ==rob
>>>
>>>
>>>>
>>>> Wei-keng
>>>>
>>>> On May 1, 2014, at 10:53 AM, Rob Latham wrote:
>>>>
>>>>>
>>>>>
>>>>> On 04/09/2014 02:52 PM, Wei-keng Liao wrote:
>>>>>
>>>>>>
>>>>>> The patch below can fix this problem. Hope it does not break other tests.
>>>>>
>>>>> I'm uncertain about this patch...
>>>>>
>>>>>> Index: adio/common/ad_read_coll.c
>>>>>
>>>>> it should have been in the write path, correct
>>>>> ?
>>>>>> @@ -368,13 +368,16 @@
>>>>>> #endif
>>>>>> if (file_ptr_type == ADIO_INDIVIDUAL) {
>>>>>> /* Wei-keng reworked type processing to be a bit more
>>>>>> efficient */
>>>>>> + for (i=0; i<flat_file->count; i++) /* skip blocklens[] == 0 */
>>>>>> + if (flat_file->blocklens[i] > 0) break;
>>>>>> +
>>>>>
>>>>> the flat_file->blocklens[] array might contain a UB and LB marker. The marker will be stored in the flattened representation with a zero-length element at a particular offset. So if you resize a type, your flattened representation would be like this:
>>>>> (offset: 50, length: 0),
>>>>> (offset: 100, length: 1024),
>>>>> (offset: 200, length: 0)
>>>>>
>>>>> furthermore, ADIOI_Optimize_flatten will coalesce multiple zero-length elements into one, but will leave the first and last elements alone.
>>>>>
>>>>>> offset = fd->fp_ind - disp;
>>>>>> - n_filetypes = (offset - flat_file->indices[0]) /
>>>>>> filetype_extent;
>>>>>> + n_filetypes = (offset - flat_file->indices[i]) /
>>>>>> filetype_extent;
>>>>>
>>>>> doesn't this mess up tiling of the file view?
>>>>>
>>>>> ==rob
>>>>>
>>>>>> offset -= (ADIO_Offset)n_filetypes * filetype_extent;
>>>>>> /* now offset is local to this extent */
>>>>>>
>>>>>> /* find the block where offset is located, skip
>>>>>> blocklens[i]==0 */
>>>>>> - for (i=0; i<flat_file->count; i++) {
>>>>>> + for (; i<flat_file->count; i++) {
>>>>>> ADIO_Offset dist;
>>>>>> if (flat_file->blocklens[i] == 0) continue;
>>>>>> dist = flat_file->indices[i] + flat_file->blocklens[i]
>>>>>> - offset;
>>>>>>
>>>>>> Wei-keng
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> discuss mailing list discuss at mpich.org
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>
>>>>> --
>>>>> Rob Latham
>>>>> Mathematics and Computer Science Division
>>>>> Argonne National Lab, IL USA
>>>>> _______________________________________________
>>>>> discuss mailing list discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>> _______________________________________________
>>>> discuss mailing list discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>
>>> --
>>> Rob Latham
>>> Mathematics and Computer Science Division
>>> Argonne National Lab, IL USA
>>> <0001-Better-deal-with-file-view-types-with-lb.patch>_______________________________________________
>>> discuss mailing list discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the discuss
mailing list