[mpich-discuss] About an error while using mpi i/o collectives : "Error in ADIOI_Calc_aggregator(): rank_index(1)..."
Latham, Robert J.
robl at mcs.anl.gov
Tue Aug 22 10:25:37 CDT 2017
On Mon, 2017-08-21 at 17:45 +0200, pramod kumbhar wrote:
> Dear All,
>
> In one of our application I am seeing following error while using
> collective call MPI_File_write_all :
>
> Error in ADIOI_Calc_aggregator(): rank_index(1) >= fd->hints-
> >cb_nodes (1) fd_size=102486061 off=102486469
>
> Non collective version works fine.
>
> While looking at callstack I came across below comment in mpich-
> 3.2/src/mpi/romio/adio/common/ad_aggregate.c :
>
> /* we index into fd_end with rank_index, and fd_end was allocated
> to be no
> * bigger than fd->hins->cb_nodes. If we ever violate that,
> we're
> * overrunning arrays. Obviously, we should never ever hit this
> abort */
> if (rank_index >= fd->hints->cb_nodes || rank_index < 0) {
> FPRINTF(stderr, "Error in ADIOI_Calc_aggregator():
> rank_index(%d) >= fd->hints->cb_nodes (%d) fd_size=%lld
> off=%lld\n",
> rank_index,fd->hints->cb_nodes,fd_size,off);
> MPI_Abort(MPI_COMM_WORLD, 1);
> }
>
> I am going to look into application and see if there is an issue with
> offset overflow. But looking at above comment ("Obviously, we should
> never ever hit this abort ") I thought should ask if there is any
> obvious thing I am missing.
that's my comment. The 'rank_index' array is allocated based on the
'cb_nodes' hint. I definitely would like to know more about how the
code is manipulating rank_index, cb_nodes, and fd_end .
If there is a reduced test case you can send me, that will be a huge
help.
==rob
>
> Regards,
> Pramod
>
> p.s. I will provide reproducer after looking into this more
> carefully.
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list