[mpich-discuss] About an error while using mpi i/o collectives : "Error in ADIOI_Calc_aggregator(): rank_index(1)..."

pramod kumbhar pramod.s.kumbhar at gmail.com
Tue Aug 22 15:05:36 CDT 2017


Hi Rob,

Thanks! Below is not exactly same issue/error but related :

While constructing derived datatype (filetype used for set_view), do we
need displacements / offsets to be in ascending order?
I mean, suppose I am creating derived datatype
using MPI_Type_create_hindexed (or mpi struct) with length/displacements as:

        blocklengths[0] = 8;
        blocklengths[1] = 231670;
        blocklengths[2] = 116606;

        displacements[0] = 0;
        displacements[1] = 8;
        displacements[2] = 231678;

Above displacements are in ascending order. Suppose I shuffle order bit:

        blocklengths[0] = 8;
        blocklengths[1] = 116606;
        blocklengths[2] = 231670;

        displacements[0] = 0;
        displacements[1] = 231678;
        displacements[2] = 8;

It's still the same but while specifying block-lengths/offsets I changed
the order. (resultant file will have data in different oder but that's
ignored here)
Isn't this a valid specification? This second example results in a segfault
(in ADIO_GEN_WriteStrided / Coll).

I quickly wrote attached program, let me know if I have missed anything
obvious here.

Regards,
Pramod

p.s. you can compile & run as:

Not working => mpicxx  test.cpp && mpirun -n 2 ./a.out
Working =>. mpicxx test.cpp -DUSE_ORDER && mpirun -n 2 ./a.out



On Tue, Aug 22, 2017 at 5:25 PM, Latham, Robert J. <robl at mcs.anl.gov> wrote:

> On Mon, 2017-08-21 at 17:45 +0200, pramod kumbhar wrote:
> > Dear All,
> >
> > In one of our application I am seeing following error while using
> > collective call MPI_File_write_all :
> >
> > Error in ADIOI_Calc_aggregator(): rank_index(1) >= fd->hints-
> > >cb_nodes (1) fd_size=102486061 off=102486469
> >
> > Non collective version works fine.
> >
> > While looking at callstack I came across below comment in mpich-
> > 3.2/src/mpi/romio/adio/common/ad_aggregate.c :
> >
> >     /* we index into fd_end with rank_index, and fd_end was allocated
> > to be no
> >      * bigger than fd->hins->cb_nodes.   If we ever violate that,
> > we're
> >      * overrunning arrays.  Obviously, we should never ever hit this
> > abort */
> >     if (rank_index >= fd->hints->cb_nodes || rank_index < 0) {
> >         FPRINTF(stderr, "Error in ADIOI_Calc_aggregator():
> > rank_index(%d) >= fd->hints->cb_nodes (%d) fd_size=%lld
> > off=%lld\n",
> >             rank_index,fd->hints->cb_nodes,fd_size,off);
> >         MPI_Abort(MPI_COMM_WORLD, 1);
> >     }
> >
> > I am going to look into application and see if there is an issue with
> > offset overflow. But looking at above comment ("Obviously, we should
> > never ever hit this abort ") I thought should ask if there is any
> > obvious thing I am missing.
>
> that's my comment.  The 'rank_index' array is allocated based on the
> 'cb_nodes' hint.  I definitely would like to know more about how the
> code is manipulating rank_index, cb_nodes, and fd_end .
>
> If there is a reduced test case you can send me, that will be a huge
> help.
>
> ==rob
>
> >
> > Regards,
> > Pramod
> >
> > p.s. I will provide reproducer after looking into this more
> > carefully.
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170822/2b1349d8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.cpp
Type: text/x-c++src
Size: 2499 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170822/2b1349d8/attachment.bin>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list