[mpich-discuss] About an error while using mpi i/o collectives : "Error in ADIOI_Calc_aggregator(): rank_index(1)..."

pramod kumbhar pramod.s.kumbhar at gmail.com
Wed Aug 23 00:35:25 CDT 2017


Hi Rajeev, Rob,

Thanks for clarification. I see type and filetype specification in the
standard mention "monotonically nondecreasing" constraint.
Just curious: Is/Was there any specific reason for this constraint?

@Rob: the error message I posted in the first email triggered from the
application violating this constraint. Depending on the number of ranks and
offsets order, the application deadlock, crashes etc.

Thanks,

Pramod

On Wed, Aug 23, 2017 at 3:07 AM, Latham, Robert J. <robl at mcs.anl.gov> wrote:

> On Tue, 2017-08-22 at 22:27 +0000, Thakur, Rajeev wrote:
> > Yes, displacements for the filetype must be in “monotonically
> > nondecreasing order”.
>
> ... which sounds pretty restrictive, but there is no constraint on
> memory types.  Folks work around this by shuffling the memory addresses
> to match the ascending file offsets.
>
> ==rob
>
> >
> > Rajeev
> >
> > > On Aug 22, 2017, at 3:05 PM, pramod kumbhar <pramod.s.kumbhar at gmail
> > > .com> wrote:
> > >
> > > Hi Rob,
> > >
> > > Thanks! Below is not exactly same issue/error but related :
> > >
> > > While constructing derived datatype (filetype used for set_view),
> > > do we need displacements / offsets to be in ascending order?
> > > I mean, suppose I am creating derived datatype using
> > > MPI_Type_create_hindexed (or mpi struct) with length/displacements
> > > as:
> > >
> > >         blocklengths[0] = 8;
> > >         blocklengths[1] = 231670;
> > >         blocklengths[2] = 116606;
> > >
> > >         displacements[0] = 0;
> > >         displacements[1] = 8;
> > >         displacements[2] = 231678;
> > >
> > > Above displacements are in ascending order. Suppose I shuffle order
> > > bit:
> > >
> > >         blocklengths[0] = 8;
> > >         blocklengths[1] = 116606;
> > >         blocklengths[2] = 231670;
> > >
> > >         displacements[0] = 0;
> > >         displacements[1] = 231678;
> > >         displacements[2] = 8;
> > >
> > > It's still the same but while specifying block-lengths/offsets I
> > > changed the order. (resultant file will have data in different oder
> > > but that's ignored here)
> > > Isn't this a valid specification? This second example results in a
> > > segfault (in ADIO_GEN_WriteStrided / Coll).
> > >
> > > I quickly wrote attached program, let me know if I have missed
> > > anything obvious here.
> > >
> > > Regards,
> > > Pramod
> > >
> > > p.s. you can compile & run as:
> > >
> > > Not working => mpicxx  test.cpp && mpirun -n 2 ./a.out
> > > Working =>. mpicxx test.cpp -DUSE_ORDER && mpirun -n 2 ./a.out
> > >
> > >
> > >
> > > On Tue, Aug 22, 2017 at 5:25 PM, Latham, Robert J. <robl at mcs.anl.go
> > > v> wrote:
> > > On Mon, 2017-08-21 at 17:45 +0200, pramod kumbhar wrote:
> > > > Dear All,
> > > >
> > > > In one of our application I am seeing following error while using
> > > > collective call MPI_File_write_all :
> > > >
> > > > Error in ADIOI_Calc_aggregator(): rank_index(1) >= fd->hints-
> > > > > cb_nodes (1) fd_size=102486061 off=102486469
> > > >
> > > > Non collective version works fine.
> > > >
> > > > While looking at callstack I came across below comment in mpich-
> > > > 3.2/src/mpi/romio/adio/common/ad_aggregate.c :
> > > >
> > > >     /* we index into fd_end with rank_index, and fd_end was
> > > > allocated
> > > > to be no
> > > >      * bigger than fd->hins->cb_nodes.   If we ever violate that,
> > > > we're
> > > >      * overrunning arrays.  Obviously, we should never ever hit
> > > > this
> > > > abort */
> > > >     if (rank_index >= fd->hints->cb_nodes || rank_index < 0) {
> > > >         FPRINTF(stderr, "Error in ADIOI_Calc_aggregator():
> > > > rank_index(%d) >= fd->hints->cb_nodes (%d) fd_size=%lld
> > > > off=%lld\n",
> > > >             rank_index,fd->hints->cb_nodes,fd_size,off);
> > > >         MPI_Abort(MPI_COMM_WORLD, 1);
> > > >     }
> > > >
> > > > I am going to look into application and see if there is an issue
> > > > with
> > > > offset overflow. But looking at above comment ("Obviously, we
> > > > should
> > > > never ever hit this abort ") I thought should ask if there is any
> > > > obvious thing I am missing.
> > >
> > > that's my comment.  The 'rank_index' array is allocated based on
> > > the
> > > 'cb_nodes' hint.  I definitely would like to know more about how
> > > the
> > > code is manipulating rank_index, cb_nodes, and fd_end .
> > >
> > > If there is a reduced test case you can send me, that will be a
> > > huge
> > > help.
> > >
> > > ==rob
> > >
> > > >
> > > > Regards,
> > > > Pramod
> > > >
> > > > p.s. I will provide reproducer after looking into this more
> > > > carefully.
> > > > _______________________________________________
> > > > discuss mailing list     discuss at mpich.org
> > > > To manage subscription options or unsubscribe:
> > > > https://lists.mpich.org/mailman/listinfo/discuss
> > >
> > > _______________________________________________
> > > discuss mailing list     discuss at mpich.org
> > > To manage subscription options or unsubscribe:
> > > https://lists.mpich.org/mailman/listinfo/discuss
> > >
> > > <test.cpp>_______________________________________________
> > > discuss mailing list     discuss at mpich.org
> > > To manage subscription options or unsubscribe:
> > > https://lists.mpich.org/mailman/listinfo/discuss
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170823/f7f964ca/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list