[mpich-discuss] About an error while using mpi i/o collectives : "Error in ADIOI_Calc_aggregator(): rank_index(1)..."
pramod kumbhar
pramod.s.kumbhar at gmail.com
Tue Aug 22 15:05:36 CDT 2017
Hi Rob,
Thanks! Below is not exactly same issue/error but related :
While constructing derived datatype (filetype used for set_view), do we
need displacements / offsets to be in ascending order?
I mean, suppose I am creating derived datatype
using MPI_Type_create_hindexed (or mpi struct) with length/displacements as:
blocklengths[0] = 8;
blocklengths[1] = 231670;
blocklengths[2] = 116606;
displacements[0] = 0;
displacements[1] = 8;
displacements[2] = 231678;
Above displacements are in ascending order. Suppose I shuffle order bit:
blocklengths[0] = 8;
blocklengths[1] = 116606;
blocklengths[2] = 231670;
displacements[0] = 0;
displacements[1] = 231678;
displacements[2] = 8;
It's still the same but while specifying block-lengths/offsets I changed
the order. (resultant file will have data in different oder but that's
ignored here)
Isn't this a valid specification? This second example results in a segfault
(in ADIO_GEN_WriteStrided / Coll).
I quickly wrote attached program, let me know if I have missed anything
obvious here.
Regards,
Pramod
p.s. you can compile & run as:
Not working => mpicxx test.cpp && mpirun -n 2 ./a.out
Working =>. mpicxx test.cpp -DUSE_ORDER && mpirun -n 2 ./a.out
On Tue, Aug 22, 2017 at 5:25 PM, Latham, Robert J. <robl at mcs.anl.gov> wrote:
> On Mon, 2017-08-21 at 17:45 +0200, pramod kumbhar wrote:
> > Dear All,
> >
> > In one of our application I am seeing following error while using
> > collective call MPI_File_write_all :
> >
> > Error in ADIOI_Calc_aggregator(): rank_index(1) >= fd->hints-
> > >cb_nodes (1) fd_size=102486061 off=102486469
> >
> > Non collective version works fine.
> >
> > While looking at callstack I came across below comment in mpich-
> > 3.2/src/mpi/romio/adio/common/ad_aggregate.c :
> >
> > /* we index into fd_end with rank_index, and fd_end was allocated
> > to be no
> > * bigger than fd->hins->cb_nodes. If we ever violate that,
> > we're
> > * overrunning arrays. Obviously, we should never ever hit this
> > abort */
> > if (rank_index >= fd->hints->cb_nodes || rank_index < 0) {
> > FPRINTF(stderr, "Error in ADIOI_Calc_aggregator():
> > rank_index(%d) >= fd->hints->cb_nodes (%d) fd_size=%lld
> > off=%lld\n",
> > rank_index,fd->hints->cb_nodes,fd_size,off);
> > MPI_Abort(MPI_COMM_WORLD, 1);
> > }
> >
> > I am going to look into application and see if there is an issue with
> > offset overflow. But looking at above comment ("Obviously, we should
> > never ever hit this abort ") I thought should ask if there is any
> > obvious thing I am missing.
>
> that's my comment. The 'rank_index' array is allocated based on the
> 'cb_nodes' hint. I definitely would like to know more about how the
> code is manipulating rank_index, cb_nodes, and fd_end .
>
> If there is a reduced test case you can send me, that will be a huge
> help.
>
> ==rob
>
> >
> > Regards,
> > Pramod
> >
> > p.s. I will provide reproducer after looking into this more
> > carefully.
> > _______________________________________________
> > discuss mailing list discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170822/2b1349d8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.cpp
Type: text/x-c++src
Size: 2499 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170822/2b1349d8/attachment.bin>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list