[mpich-devel] MPI IO Error
Rob Latham
robl at mcs.anl.gov
Mon Mar 28 09:47:08 CDT 2016
On 03/15/2016 08:14 PM, Dominic Kedelty wrote:
> Hello,
>
> I am wondering if I can get help with an error I am receiving when using
> MPI IO. I was referred here by openmpi saying that this was a possible
> ROMIO bug. I am receiving the following error
>
> Error in ADIOI_Calc_aggregator(): rank_index(40) >= fd->hints->cb_nodes
> (40) fd_size=213909504 off=8617247540
> Error in ADIOI_Calc_aggregator(): rank_index(40) >= fd->hints->cb_nodes
> (40) fd_size=213909504 off=8617247540
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 157
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 477
>
> This is happening I BELIEVE when the code I am using is writing an
> ensight gold format file for a large problem that I am running on 640
> cores. I can provide more information if need be. I am using openmpi
> 1.8.7, but I have tried mvapich2 version 1.9 with the same error code,
> which are the two current mpi files on the cluster that I am using.
ROMIO selects a subset of processes called I/O aggregators. These
aggregators will carry out I/O. Generally this works great: fewer
clients banging on the file system, average I/O request size increases.
In this assertion, something in ROMIO selected the 41st i/o aggregator,
but only 40 I/O aggreagators were available.
Since this happens pretty far down inside ROMIO, it's going to be a lot
easier to debug if you can provide me a reproducer or a reproducing
recipe . Since EnSight Gold is a commercial CFD package, it might be
difficult for me to know what it's trying to do.
Do other applications using MPI-IO work on your system?
thanks
==rob
>
>
> _______________________________________________
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/devel
>
More information about the devel
mailing list