[mpich-devel] MPI IO Error

Rob Latham robl at mcs.anl.gov
Mon Mar 28 09:47:08 CDT 2016



On 03/15/2016 08:14 PM, Dominic Kedelty wrote:
> Hello,
>
> I am wondering if I can get help with an error I am receiving when using
> MPI IO. I was referred here by openmpi saying that this was a possible
> ROMIO bug. I am  receiving the following error
>
> Error in ADIOI_Calc_aggregator(): rank_index(40) >= fd->hints->cb_nodes
> (40) fd_size=213909504 off=8617247540
> Error in ADIOI_Calc_aggregator(): rank_index(40) >= fd->hints->cb_nodes
> (40) fd_size=213909504 off=8617247540
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 157
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 477
>
> This is happening I BELIEVE when the code I am using is writing an
> ensight gold format file for a large problem that I am running on 640
> cores. I can provide more information if need be. I am using openmpi
> 1.8.7, but I have tried mvapich2 version 1.9 with the same error code,
> which are the two current mpi files on the cluster that I am using.


ROMIO selects a subset of processes called I/O aggregators.  These 
aggregators will carry out I/O.  Generally this works great: fewer 
clients banging on the file system, average I/O request size increases.

In this assertion, something in ROMIO selected the 41st i/o aggregator, 
but only 40 I/O aggreagators were available.

Since this happens pretty far down inside ROMIO, it's going to be a lot 
easier to debug if you can provide me a reproducer or a reproducing 
recipe .   Since EnSight Gold is a commercial CFD package, it might be 
difficult for me to know what it's trying to do.

Do other applications using MPI-IO work on your system?

thanks
==rob

>
>
> _______________________________________________
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/devel
>


More information about the devel mailing list