[mpich-discuss] Issue with MPICH 4.0b1 and MPI I/O and ROMIO...maybe?

Latham, Robert J. robl at mcs.anl.gov
Mon Dec 20 09:20:48 CST 2021


On Fri, 2021-12-17 at 19:10 +0000, Thompson, Matt (GSFC-610.1)[SCIENCE
SYSTEMS AND APPLICATIONS INC] via discuss wrote:
> MPICH Discuss,
NetCDF: Error initializing for parallel access
>  
> I did a bit of debugging and found that the crash was due to an
> environment variable that was set because my application mistakenly
> thought I was running Intel MPI (mainly because we didn't have
> detection for MPICH, so it defaulted to our "default" on this cluster
> of Intel MPI). When it sees Intel MPI, it sets:
>  
>   ROMIO_FSTYPE_FORCE="gpfs:"
>  
> which we've found is useful when running with Intel MPI on our GPFS
> system.

Hah I had no idea anyone besides me was using this little feature. Here's a bit more information for context:

https://press3.mcs.anl.gov/romio/2019/02/20/useful-environment-variables/

 
> So, of course the "right" thing to do is not to set that. (Doctor, it
> hurts when I do this. So stop doing that.)
>  
> But it got me wondering, is there perhaps a "better" way I should be
> building MPICH? Should this flag cause this sort of crash? Or does it
> mean I build MPICH/ROMIO incorrectly or incompletely (no GPFS
> support, say)?

Intel MPI has its own way of supporting extra file systems, via the "I_MPI_EXTRA_FILESYSTEM" environment variable.  

https://press3.mcs.anl.gov/romio/2014/06/12/romio-and-intel-mpi/

I'm open to suggestions on how ROMIO should best handle this situation.  You requested a file system (gpfs) that ROMIO did not support.  ROMIO could fall back to "generic unix file system" but you asked explicitly for GPFS, presumably for a good reason (or in your case, by accident but ROMIO is not a mind reader...)

If you are building your own MPICH, add the `--with-file-system=...` flag.  that is a '+'-delmited list of file systems.  for example:

`--with-file-system=ufs+testfs+gpfs+lustre+panfs+pvfs2+unify`

I try to build everything I can so my list is quite long.  Yours will be shorter -- if you pick a file system for which you do not have development headers and libraries, your build will fail (hopefully at configure time).

==rob


More information about the discuss mailing list