[mpich-discuss] Issue with MPICH 4.0b1 and MPI I/O and ROMIO...maybe?

Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] matthew.thompson at nasa.gov
Fri Dec 17 13:10:14 CST 2021


MPICH Discuss,

So this is an odd one. Thanks to Hui Zhou from this list, I was able to build MPICH 4.0b1. I proceeded to build libraries and my application that I work on. And all seemed to compile just well. I then ran my model and...crash:

  Error opening file species.data        -115
  NetCDF: Error initializing for parallel access

I did a bit of debugging and found that the crash was due to an environment variable that was set because my application mistakenly thought I was running Intel MPI (mainly because we didn't have detection for MPICH, so it defaulted to our "default" on this cluster of Intel MPI). When it sees Intel MPI, it sets:

  ROMIO_FSTYPE_FORCE="gpfs:"

which we've found is useful when running with Intel MPI on our GPFS system.

I whipped up a little netCDF reader executable that essentially does:

  call check( nf90_open(FILE_NAME, IOR(NF90_NOWRITE, NF90_MPIIO), comm=MPI_COMM_WORLD, info=MPI_INFO_NULL, ncid=ncid) )

and then:

  $ mpirun -np 2 ./simple_xy_rd_mpiio.exe
   OX_in(1,1:5):    7.1448361893544643E-008   1.7177123368128377E-007   2.8831971121690003E-007   3.7688394627366506E-007   4.6076729631749913E-007
   OX_in(1,1:5):    7.1448361893544643E-008   1.7177123368128377E-007   2.8831971121690003E-007   3.7688394627366506E-007   4.6076729631749913E-007
   *** SUCCESS reading example file species.data!
   *** SUCCESS reading example file species.data!
  $ ROMIO_FSTYPE_FORCE="gpfs:" mpirun -np 2 ./simple_xy_rd_mpiio.exe
   NetCDF: Error initializing for parallel access
   NetCDF: Error initializing for parallel access
  STOP Stopped
  STOP Stopped

So, of course the "right" thing to do is not to set that. (Doctor, it hurts when I do this. So stop doing that.)

But it got me wondering, is there perhaps a "better" way I should be building MPICH? Should this flag cause this sort of crash? Or does it mean I build MPICH/ROMIO incorrectly or incompletely (no GPFS support, say)?

Thanks,
Matt
--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC,    Global Modeling and Assimilation Office
Code 610.1,  8800 Greenbelt Rd,  Greenbelt,  MD 20771
Phone: 301-614-6712                 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20211217/45f37145/attachment.html>


More information about the discuss mailing list