[mpich-discuss] [EXTERNAL] Re: Issue with MPICH 4.0b1 and MPI I/O and ROMIO...maybe?

Latham, Robert J. robl at mcs.anl.gov
Wed Dec 22 12:59:05 CST 2021


On Wed, 2021-12-22 at 17:35 +0000, Thompson, Matt (GSFC-610.1)[SCIENCE
SYSTEMS AND APPLICATIONS INC] wrote:
> Rob,
> 
> Thanks. So for a cluster that is mainly "regular linux" (not sure
> what they run for non-GPFS), GPFS, some tmpfs probably, and a whole
> bunch of NFS, what would recommend as a good set of filesystems?
> Would something like:
> 
>   --with-file-system=gpfs
> 
> be enough, or once you have that flag does that do enough?

If you have any gpfs anywhere, then yeah, you can add the 'gpfs' flag.

You probably want "--with-file-system=ufs+gpfs"

ROMIO has an NFS driver but I'll warn you it is not -- and in fact cannot be -- very good.  There are many cases where if you try to write data to NFS in paralllel you will end up with corrupted data.  If you do manage to not get data corruption, your performance will be terrible because ROMIO tried really really hard to ensure data correctness (wrapping each and every I/O call with a lock/unlock in an effort to force client-side data cache flushes).

> Also, I don't see --with-file-system in the configure help? Does the
> MPICH configure "pass down" the option to (I assume) ROMIO?

correct.  ROMIO is both it's own thing and part of MPICH.  Uh, sorry about that.

==rob

> Matt
> -- 
> Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
> NASA GSFC,    Global Modeling and Assimilation Office
> Code 610.1,  8800 Greenbelt Rd,  Greenbelt,  MD 20771
> Phone: 301-614-6712                 Fax: 301-614-6246
> http://science.gsfc.nasa.gov/sed/bio/matthew.thompson
> 
> On 12/20/21, 10:21 AM, "Latham, Robert J." <robl at mcs.anl.gov> wrote:
> 
>     On Fri, 2021-12-17 at 19:10 +0000, Thompson, Matt (GSFC-
> 610.1)[SCIENCE
>     SYSTEMS AND APPLICATIONS INC] via discuss wrote:
>     > MPICH Discuss,
>     NetCDF: Error initializing for parallel access
>     >  
>     > I did a bit of debugging and found that the crash was due to an
>     > environment variable that was set because my application
> mistakenly
>     > thought I was running Intel MPI (mainly because we didn't have
>     > detection for MPICH, so it defaulted to our "default" on this
> cluster
>     > of Intel MPI). When it sees Intel MPI, it sets:
>     >  
>     >   ROMIO_FSTYPE_FORCE="gpfs:"
>     >  
>     > which we've found is useful when running with Intel MPI on our
> GPFS
>     > system.
> 
>     Hah I had no idea anyone besides me was using this little
> feature. Here's a bit more information for context:
> 
>    
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpress3.mcs.anl.gov%2Fromio%2F2019%2F02%2F20%2Fuseful-environment-variables%2F&data=04%7C01%7Cmatthew.thompson%40nasa.gov%7Caf61122a5ef345bbbf9408d9c3cc58fd%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637756104698561618%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=FwCjPxubMan1uOvKoqqNTJED0QK6q5e2cfE%2FiCxu%2F%2BE%3D&reserved=0
> 
> 
>     > So, of course the "right" thing to do is not to set that.
> (Doctor, it
>     > hurts when I do this. So stop doing that.)
>     >  
>     > But it got me wondering, is there perhaps a "better" way I
> should be
>     > building MPICH? Should this flag cause this sort of crash? Or
> does it
>     > mean I build MPICH/ROMIO incorrectly or incompletely (no GPFS
>     > support, say)?
> 
>     Intel MPI has its own way of supporting extra file systems, via
> the "I_MPI_EXTRA_FILESYSTEM" environment variable.  
> 
>    
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpress3.mcs.anl.gov%2Fromio%2F2014%2F06%2F12%2Fromio-and-intel-mpi%2F&data=04%7C01%7Cmatthew.thompson%40nasa.gov%7Caf61122a5ef345bbbf9408d9c3cc58fd%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637756104698561618%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ezuOrZ00lYjQ2GSxxmOiRZD8WmUfcnFO99Qp7w%2BcTSI%3D&reserved=0
> 
>     I'm open to suggestions on how ROMIO should best handle this
> situation.  You requested a file system (gpfs) that ROMIO did not
> support.  ROMIO could fall back to "generic unix file system" but you
> asked explicitly for GPFS, presumably for a good reason (or in your
> case, by accident but ROMIO is not a mind reader...)
> 
>     If you are building your own MPICH, add the `--with-file-
> system=...` flag.  that is a '+'-delmited list of file systems.  for
> example:
> 
>     `--with-file-system=ufs+testfs+gpfs+lustre+panfs+pvfs2+unify`
> 
>     I try to build everything I can so my list is quite long.  Yours
> will be shorter -- if you pick a file system for which you do not
> have development headers and libraries, your build will fail
> (hopefully at configure time).
> 
>     ==rob
> 



More information about the discuss mailing list