[mpich-discuss] ROMIO filesystem check during MPI_File_open
Rob Latham
robl at mcs.anl.gov
Fri Mar 14 10:23:10 CDT 2014
On 03/13/2014 04:28 PM, Jeff Squyres (jsquyres) wrote:
> We had a user run into a situation with ROMIO that caused a bunch of head-scratching before we figured it out.
>
> The user was running a trivial C MPI program that failed with a truncation error in MPI_Bcast. After much poking around, we finally figured out that it was actually failing deep within MPI_File_open. The reason it was getting a truncation error is because of the user's setup:
>
> - the cluster head node is *also* a compute node
> - the cluster head node is the NFS server
> - in MPI_COMM_WORLD rank 0 (on the head node), ROMIO saw a UFS filesystem
> - all other MCW process ranks saw an NFS filesystem
> - hence, during ROMIO's MPI_File_open, MCW rank 0 chose to do a UFS file open and everyone else chose to do an NFS file open
>
> Things go downhill from there, eventually resulting in a MPI_Bcast mismatch.
>
> Is there any thought to supporting this kind of scenario? All MPI processes are accessing the same filesystem; it's just that some of the MPI processes happen to see that filesystem as local, and some see it as NFS.
>
I thought we handled this? we certianly seem to have made an effort:
https://trac.mpich.org/projects/mpich/browser/src/mpi/romio/adio/common/ad_fstype.c#L644
If someone configured ROMIO without NFS (good for them!), then we will
do a "scalable stat" from one process, instead of from all zillion
processes (be nice to your file system...)
If nfs is enabled, then we have no choice. everyone must stat the file
system.
So everyone goes and stats the file system to see what kind of driver to
use. Did everyone agree? Check out line 709:
708 /* ensure everyone came up with the same file system type */
709 MPI_Allreduce(&file_system, &min_code, 1, MPI_INT,
710 MPI_MIN, comm);
711 if (min_code == ADIO_NFS) file_system = ADIO_NFS;
And yes, ADIO_NFS is ROMIO's first parallel file system -- despite it
being evil and we hates it.
Now, you know as well as I the old adage about tested code and broken
code. I can't imagine the last time this code was exercised...
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the discuss
mailing list