[mpich-discuss] MPI-IO bug

Rob Latham robl at mcs.anl.gov
Thu May 1 15:22:01 CDT 2014



On 05/01/2014 12:02 PM, Wei-keng Liao wrote:
> They are reasonable doubts.
> I tested this patch against test/mpi/io/resized.c and src/mpi/romio/test/hindexed.c
>
> Could you run tests (both MPI and ROMIO) and see if the patch failed for any tests?
> At least we will have an idea if it breaks anything first.

OK, happy to do so.  We've groused about the sparseness of the ROMIO 
test coverage for 12 years, but it's enough to catch a few things.

This change introduces no new test failures, but I think precisely zero 
tests tile the file view.

I still want to better understand how this works

the indexed datatype looks like this (view in fixed-with font please)

rank 0:  |----11-1-|
rank 1:  |1111--1-1|

ADIOI_Calc_my_off_len() should return a list of 3 things for rank 0 and 
a list of 3 things for rank 1 .  For this test, it does the right thing 
for rank 1, returning 3 offset-length tuples: (0,16) (24,4) (32,4)

but rank 0 returns the entirely incorrect
(16,12)

instead of
(0,0) (16, 8) (28, 4)

My concern is mostly with tiling, or in the context of Calc_my_off_len, 
how n_filetypes is computed (there's also filetype_extent, but ROMIO 
calls the MPI library for that, so I have high confidence it is correct)

Fun finding: if I alter your test case to test the effects of tiling a 
file view -- by resizing the indexed type -- ROMIO handles that just 
fine without your change.

So actually the problem with Calc_my_off_len seems to be that the 
indexed type puts the underlying file offset at '16', and ROMIO thinks 
that means the datatype was tiled once -- it seems to be ignoring the 
lower bound marker.

It's just a small tweak to your approach but I'd like to make it a bit 
more explicit that the flattened representation uses a zero-length item 
to indicate the LB of the type.  What do you think of the attached patch?

I'm still not convinced this properly handles tiling but we fix a known 
issue and can deal with the tiling-when-not-resized issue later.

==rob


>
> Wei-keng
>
> On May 1, 2014, at 10:53 AM, Rob Latham wrote:
>
>>
>>
>> On 04/09/2014 02:52 PM, Wei-keng Liao wrote:
>>
>>>
>>> The patch below can fix this problem. Hope it does not break other tests.
>>
>> I'm uncertain about this patch...
>>
>>> Index: adio/common/ad_read_coll.c
>>
>> it should have been in the write path, correct
>> ?
>>> @@ -368,13 +368,16 @@
>>>   #endif
>>>           if (file_ptr_type == ADIO_INDIVIDUAL) {
>>>              /* Wei-keng reworked type processing to be a bit more
>>> efficient */
>>> +            for (i=0; i<flat_file->count; i++) /* skip blocklens[] == 0 */
>>> +                if (flat_file->blocklens[i] > 0) break;
>>> +
>>
>> the flat_file->blocklens[] array might contain a UB and LB marker.  The marker will be stored in the flattened representation with a zero-length element at a particular offset.  So if you resize a type, your flattened representation would be like this:
>> (offset: 50, length: 0),
>> (offset: 100, length: 1024),
>> (offset: 200, length: 0)
>>
>> furthermore, ADIOI_Optimize_flatten will coalesce multiple zero-length elements into one, but will leave the first and last elements alone.
>>
>>>               offset       = fd->fp_ind - disp;
>>> -            n_filetypes  = (offset - flat_file->indices[0]) /
>>> filetype_extent;
>>> +            n_filetypes  = (offset - flat_file->indices[i]) /
>>> filetype_extent;
>>
>> doesn't this mess up tiling of the file view?
>>
>> ==rob
>>
>>>               offset      -= (ADIO_Offset)n_filetypes * filetype_extent;
>>>               /* now offset is local to this extent */
>>>
>>>               /* find the block where offset is located, skip
>>> blocklens[i]==0 */
>>> -            for (i=0; i<flat_file->count; i++) {
>>> +            for (; i<flat_file->count; i++) {
>>>                   ADIO_Offset dist;
>>>                   if (flat_file->blocklens[i] == 0) continue;
>>>                   dist = flat_file->indices[i] + flat_file->blocklens[i]
>>> - offset;
>>>
>>> Wei-keng
>>>
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> --
>> Rob Latham
>> Mathematics and Computer Science Division
>> Argonne National Lab, IL USA
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Better-deal-with-file-view-types-with-lb.patch
Type: text/x-patch
Size: 1405 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140501/3e91fda0/attachment.bin>


More information about the discuss mailing list