[mpich-discuss] About an error while using mpi i/o collectives : "Error in ADIOI_Calc_aggregator(): rank_index(1)..."
pramod kumbhar
pramod.s.kumbhar at gmail.com
Mon Aug 21 10:45:52 CDT 2017
Dear All,
In one of our application I am seeing following error while using
collective call MPI_File_write_all :
Error in ADIOI_Calc_aggregator(): rank_index(1) >= fd->hints->cb_nodes (1)
fd_size=102486061 off=102486469
Non collective version works fine.
While looking at callstack I came across below comment
in mpich-3.2/src/mpi/romio/adio/common/ad_aggregate.c :
/* we index into fd_end with rank_index, and fd_end was allocated to be
no
* bigger than fd->hins->cb_nodes. If we ever violate that, we're
** overrunning arrays. Obviously, we should never ever hit this abort
*/*
if (rank_index >= fd->hints->cb_nodes || rank_index < 0) {
FPRINTF(stderr, "Error in ADIOI_Calc_aggregator(): rank_index(%d)
>= fd->hints->cb_nodes (%d) fd_size=%lld off=%lld\n",
rank_index,fd->hints->cb_nodes,fd_size,off);
MPI_Abort(MPI_COMM_WORLD, 1);
}
I am going to look into application and see if there is an issue with
offset overflow. But looking at above comment ("Obviously, we should never
ever hit this abort ") I thought should ask if there is any obvious thing I
am missing.
Regards,
Pramod
p.s. I will provide reproducer after looking into this more carefully.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170821/7f021f84/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list