[mpich-discuss] About an error while using mpi i/o collectives : "Error in ADIOI_Calc_aggregator(): rank_index(1)..."

Latham, Robert J. robl at mcs.anl.gov
Tue Aug 22 20:07:52 CDT 2017


On Tue, 2017-08-22 at 22:27 +0000, Thakur, Rajeev wrote:
> Yes, displacements for the filetype must be in “monotonically
> nondecreasing order”.

... which sounds pretty restrictive, but there is no constraint on
memory types.  Folks work around this by shuffling the memory addresses
to match the ascending file offsets.

==rob

> 
> Rajeev
> 
> > On Aug 22, 2017, at 3:05 PM, pramod kumbhar <pramod.s.kumbhar at gmail
> > .com> wrote:
> > 
> > Hi Rob,
> > 
> > Thanks! Below is not exactly same issue/error but related :
> > 
> > While constructing derived datatype (filetype used for set_view),
> > do we need displacements / offsets to be in ascending order?
> > I mean, suppose I am creating derived datatype using
> > MPI_Type_create_hindexed (or mpi struct) with length/displacements
> > as:
> > 
> >         blocklengths[0] = 8;
> >         blocklengths[1] = 231670;
> >         blocklengths[2] = 116606;
> > 
> >         displacements[0] = 0;
> >         displacements[1] = 8;
> >         displacements[2] = 231678;
> > 
> > Above displacements are in ascending order. Suppose I shuffle order
> > bit:
> > 
> >         blocklengths[0] = 8;
> >         blocklengths[1] = 116606;
> >         blocklengths[2] = 231670;
> > 
> >         displacements[0] = 0;
> >         displacements[1] = 231678;
> >         displacements[2] = 8;
> > 
> > It's still the same but while specifying block-lengths/offsets I
> > changed the order. (resultant file will have data in different oder
> > but that's ignored here)
> > Isn't this a valid specification? This second example results in a
> > segfault (in ADIO_GEN_WriteStrided / Coll). 
> > 
> > I quickly wrote attached program, let me know if I have missed
> > anything obvious here.
> > 
> > Regards,
> > Pramod
> > 
> > p.s. you can compile & run as:
> > 
> > Not working => mpicxx  test.cpp && mpirun -n 2 ./a.out
> > Working =>. mpicxx test.cpp -DUSE_ORDER && mpirun -n 2 ./a.out 
> > 
> > 
> > 
> > On Tue, Aug 22, 2017 at 5:25 PM, Latham, Robert J. <robl at mcs.anl.go
> > v> wrote:
> > On Mon, 2017-08-21 at 17:45 +0200, pramod kumbhar wrote:
> > > Dear All,
> > > 
> > > In one of our application I am seeing following error while using
> > > collective call MPI_File_write_all :
> > > 
> > > Error in ADIOI_Calc_aggregator(): rank_index(1) >= fd->hints-
> > > > cb_nodes (1) fd_size=102486061 off=102486469
> > > 
> > > Non collective version works fine.
> > > 
> > > While looking at callstack I came across below comment in mpich-
> > > 3.2/src/mpi/romio/adio/common/ad_aggregate.c :
> > > 
> > >     /* we index into fd_end with rank_index, and fd_end was
> > > allocated
> > > to be no
> > >      * bigger than fd->hins->cb_nodes.   If we ever violate that,
> > > we're
> > >      * overrunning arrays.  Obviously, we should never ever hit
> > > this
> > > abort */
> > >     if (rank_index >= fd->hints->cb_nodes || rank_index < 0) {
> > >         FPRINTF(stderr, "Error in ADIOI_Calc_aggregator():
> > > rank_index(%d) >= fd->hints->cb_nodes (%d) fd_size=%lld
> > > off=%lld\n",
> > >             rank_index,fd->hints->cb_nodes,fd_size,off);
> > >         MPI_Abort(MPI_COMM_WORLD, 1);
> > >     }
> > > 
> > > I am going to look into application and see if there is an issue
> > > with
> > > offset overflow. But looking at above comment ("Obviously, we
> > > should
> > > never ever hit this abort ") I thought should ask if there is any
> > > obvious thing I am missing.
> > 
> > that's my comment.  The 'rank_index' array is allocated based on
> > the
> > 'cb_nodes' hint.  I definitely would like to know more about how
> > the
> > code is manipulating rank_index, cb_nodes, and fd_end .
> > 
> > If there is a reduced test case you can send me, that will be a
> > huge
> > help.
> > 
> > ==rob
> > 
> > > 
> > > Regards,
> > > Pramod
> > > 
> > > p.s. I will provide reproducer after looking into this more
> > > carefully.
> > > _______________________________________________
> > > discuss mailing list     discuss at mpich.org
> > > To manage subscription options or unsubscribe:
> > > https://lists.mpich.org/mailman/listinfo/discuss
> > 
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> > 
> > <test.cpp>_______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list