[mpich-discuss] About an error while using mpi i/o collectives : "Error in ADIOI_Calc_aggregator(): rank_index(1)..."

pramod kumbhar pramod.s.kumbhar at gmail.com
Mon Aug 21 10:45:52 CDT 2017


Dear All,

In one of our application I am seeing following error while using
collective call MPI_File_write_all :

Error in ADIOI_Calc_aggregator(): rank_index(1) >= fd->hints->cb_nodes (1)
fd_size=102486061 off=102486469

Non collective version works fine.

While looking at callstack I came across below comment
in mpich-3.2/src/mpi/romio/adio/common/ad_aggregate.c :

    /* we index into fd_end with rank_index, and fd_end was allocated to be
no
     * bigger than fd->hins->cb_nodes.   If we ever violate that, we're
     ** overrunning arrays.  Obviously, we should never ever hit this abort
*/*
    if (rank_index >= fd->hints->cb_nodes || rank_index < 0) {
        FPRINTF(stderr, "Error in ADIOI_Calc_aggregator(): rank_index(%d)
>= fd->hints->cb_nodes (%d) fd_size=%lld   off=%lld\n",
            rank_index,fd->hints->cb_nodes,fd_size,off);
        MPI_Abort(MPI_COMM_WORLD, 1);
    }

I am going to look into application and see if there is an issue with
offset overflow. But looking at above comment ("Obviously, we should never
ever hit this abort ") I thought should ask if there is any obvious thing I
am missing.

Regards,
Pramod

p.s. I will provide reproducer after looking into this more carefully.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170821/7f021f84/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list