[mpich-discuss] About an error while using mpi i/o collectives : "Error in ADIOI_Calc_aggregator(): rank_index(1)..."
Thakur, Rajeev
thakur at anl.gov
Tue Aug 22 17:27:19 CDT 2017
Yes, displacements for the filetype must be in “monotonically nondecreasing order”.
Rajeev
> On Aug 22, 2017, at 3:05 PM, pramod kumbhar <pramod.s.kumbhar at gmail.com> wrote:
>
> Hi Rob,
>
> Thanks! Below is not exactly same issue/error but related :
>
> While constructing derived datatype (filetype used for set_view), do we need displacements / offsets to be in ascending order?
> I mean, suppose I am creating derived datatype using MPI_Type_create_hindexed (or mpi struct) with length/displacements as:
>
> blocklengths[0] = 8;
> blocklengths[1] = 231670;
> blocklengths[2] = 116606;
>
> displacements[0] = 0;
> displacements[1] = 8;
> displacements[2] = 231678;
>
> Above displacements are in ascending order. Suppose I shuffle order bit:
>
> blocklengths[0] = 8;
> blocklengths[1] = 116606;
> blocklengths[2] = 231670;
>
> displacements[0] = 0;
> displacements[1] = 231678;
> displacements[2] = 8;
>
> It's still the same but while specifying block-lengths/offsets I changed the order. (resultant file will have data in different oder but that's ignored here)
> Isn't this a valid specification? This second example results in a segfault (in ADIO_GEN_WriteStrided / Coll).
>
> I quickly wrote attached program, let me know if I have missed anything obvious here.
>
> Regards,
> Pramod
>
> p.s. you can compile & run as:
>
> Not working => mpicxx test.cpp && mpirun -n 2 ./a.out
> Working =>. mpicxx test.cpp -DUSE_ORDER && mpirun -n 2 ./a.out
>
>
>
> On Tue, Aug 22, 2017 at 5:25 PM, Latham, Robert J. <robl at mcs.anl.gov> wrote:
> On Mon, 2017-08-21 at 17:45 +0200, pramod kumbhar wrote:
> > Dear All,
> >
> > In one of our application I am seeing following error while using
> > collective call MPI_File_write_all :
> >
> > Error in ADIOI_Calc_aggregator(): rank_index(1) >= fd->hints-
> > >cb_nodes (1) fd_size=102486061 off=102486469
> >
> > Non collective version works fine.
> >
> > While looking at callstack I came across below comment in mpich-
> > 3.2/src/mpi/romio/adio/common/ad_aggregate.c :
> >
> > /* we index into fd_end with rank_index, and fd_end was allocated
> > to be no
> > * bigger than fd->hins->cb_nodes. If we ever violate that,
> > we're
> > * overrunning arrays. Obviously, we should never ever hit this
> > abort */
> > if (rank_index >= fd->hints->cb_nodes || rank_index < 0) {
> > FPRINTF(stderr, "Error in ADIOI_Calc_aggregator():
> > rank_index(%d) >= fd->hints->cb_nodes (%d) fd_size=%lld
> > off=%lld\n",
> > rank_index,fd->hints->cb_nodes,fd_size,off);
> > MPI_Abort(MPI_COMM_WORLD, 1);
> > }
> >
> > I am going to look into application and see if there is an issue with
> > offset overflow. But looking at above comment ("Obviously, we should
> > never ever hit this abort ") I thought should ask if there is any
> > obvious thing I am missing.
>
> that's my comment. The 'rank_index' array is allocated based on the
> 'cb_nodes' hint. I definitely would like to know more about how the
> code is manipulating rank_index, cb_nodes, and fd_end .
>
> If there is a reduced test case you can send me, that will be a huge
> help.
>
> ==rob
>
> >
> > Regards,
> > Pramod
> >
> > p.s. I will provide reproducer after looking into this more
> > carefully.
> > _______________________________________________
> > discuss mailing list discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
> <test.cpp>_______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list