[mpich-discuss] About an error while using mpi i/o collectives : "Error in ADIOI_Calc_aggregator(): rank_index(1)..."

Thakur, Rajeev thakur at anl.gov
Tue Aug 22 17:27:19 CDT 2017


Yes, displacements for the filetype must be in “monotonically nondecreasing order”.

Rajeev

> On Aug 22, 2017, at 3:05 PM, pramod kumbhar <pramod.s.kumbhar at gmail.com> wrote:
> 
> Hi Rob,
> 
> Thanks! Below is not exactly same issue/error but related :
> 
> While constructing derived datatype (filetype used for set_view), do we need displacements / offsets to be in ascending order?
> I mean, suppose I am creating derived datatype using MPI_Type_create_hindexed (or mpi struct) with length/displacements as:
> 
>         blocklengths[0] = 8;
>         blocklengths[1] = 231670;
>         blocklengths[2] = 116606;
> 
>         displacements[0] = 0;
>         displacements[1] = 8;
>         displacements[2] = 231678;
> 
> Above displacements are in ascending order. Suppose I shuffle order bit:
> 
>         blocklengths[0] = 8;
>         blocklengths[1] = 116606;
>         blocklengths[2] = 231670;
> 
>         displacements[0] = 0;
>         displacements[1] = 231678;
>         displacements[2] = 8;
> 
> It's still the same but while specifying block-lengths/offsets I changed the order. (resultant file will have data in different oder but that's ignored here)
> Isn't this a valid specification? This second example results in a segfault (in ADIO_GEN_WriteStrided / Coll). 
> 
> I quickly wrote attached program, let me know if I have missed anything obvious here.
> 
> Regards,
> Pramod
> 
> p.s. you can compile & run as:
> 
> Not working => mpicxx  test.cpp && mpirun -n 2 ./a.out
> Working =>. mpicxx test.cpp -DUSE_ORDER && mpirun -n 2 ./a.out 
> 
> 
> 
> On Tue, Aug 22, 2017 at 5:25 PM, Latham, Robert J. <robl at mcs.anl.gov> wrote:
> On Mon, 2017-08-21 at 17:45 +0200, pramod kumbhar wrote:
> > Dear All,
> >
> > In one of our application I am seeing following error while using
> > collective call MPI_File_write_all :
> >
> > Error in ADIOI_Calc_aggregator(): rank_index(1) >= fd->hints-
> > >cb_nodes (1) fd_size=102486061 off=102486469
> >
> > Non collective version works fine.
> >
> > While looking at callstack I came across below comment in mpich-
> > 3.2/src/mpi/romio/adio/common/ad_aggregate.c :
> >
> >     /* we index into fd_end with rank_index, and fd_end was allocated
> > to be no
> >      * bigger than fd->hins->cb_nodes.   If we ever violate that,
> > we're
> >      * overrunning arrays.  Obviously, we should never ever hit this
> > abort */
> >     if (rank_index >= fd->hints->cb_nodes || rank_index < 0) {
> >         FPRINTF(stderr, "Error in ADIOI_Calc_aggregator():
> > rank_index(%d) >= fd->hints->cb_nodes (%d) fd_size=%lld
> > off=%lld\n",
> >             rank_index,fd->hints->cb_nodes,fd_size,off);
> >         MPI_Abort(MPI_COMM_WORLD, 1);
> >     }
> >
> > I am going to look into application and see if there is an issue with
> > offset overflow. But looking at above comment ("Obviously, we should
> > never ever hit this abort ") I thought should ask if there is any
> > obvious thing I am missing.
> 
> that's my comment.  The 'rank_index' array is allocated based on the
> 'cb_nodes' hint.  I definitely would like to know more about how the
> code is manipulating rank_index, cb_nodes, and fd_end .
> 
> If there is a reduced test case you can send me, that will be a huge
> help.
> 
> ==rob
> 
> >
> > Regards,
> > Pramod
> >
> > p.s. I will provide reproducer after looking into this more
> > carefully.
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> <test.cpp>_______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list