[mpich-discuss] MPI-IO bug

Rob Latham robl at mcs.anl.gov
Wed Apr 30 16:06:29 CDT 2014


Wei-keng, I've already let this slip longer than I intended.   I've 
opened a ticket [1], which I intend to close pretty quickly

http://trac.mpich.org/projects/mpich/ticket/2073

==rob


On 04/09/2014 02:52 PM, Wei-keng Liao wrote:
> (This bug is probably caused by my patch long ago.)
> Attached is a program extracted from an application that can reproduce
> the problem
> observed from a large run. The problem is when defining a filetype using
> MPI_Type_indexed
> and the first few elements of argument blocklens[] are zeros, a
> collective write will
> miss writing some data.
>
> The test program first fills a file with 9 integers with values all -999.
> It then defines a filetype and writes to the file in parallel with user
> buffers
> with value all 1s. Lastly, the file is read back and checked for contents.
>
> Command used to compile and run the test program:
>      mpicc -g -o bug_indexed_io bug_indexed_io.c
>      mpiexec -n 2 bug_indexed_io
>
> Stdout:
>     0: Error: unexpected varlue at buf[7] == -999
>
>
> The patch below can fix this problem. Hope it does not break other tests.
>
> Index: adio/common/ad_read_coll.c
> @@ -368,13 +368,16 @@
>   #endif
>           if (file_ptr_type == ADIO_INDIVIDUAL) {
>              /* Wei-keng reworked type processing to be a bit more
> efficient */
> +            for (i=0; i<flat_file->count; i++) /* skip blocklens[] == 0 */
> +                if (flat_file->blocklens[i] > 0) break;
> +
>               offset       = fd->fp_ind - disp;
> -            n_filetypes  = (offset - flat_file->indices[0]) /
> filetype_extent;
> +            n_filetypes  = (offset - flat_file->indices[i]) /
> filetype_extent;
>               offset      -= (ADIO_Offset)n_filetypes * filetype_extent;
>               /* now offset is local to this extent */
>
>               /* find the block where offset is located, skip
> blocklens[i]==0 */
> -            for (i=0; i<flat_file->count; i++) {
> +            for (; i<flat_file->count; i++) {
>                   ADIO_Offset dist;
>                   if (flat_file->blocklens[i] == 0) continue;
>                   dist = flat_file->indices[i] + flat_file->blocklens[i]
> - offset;
>
> Wei-keng
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA



More information about the discuss mailing list