[mpich-discuss] ROMIO filesystem check during MPI_File_open

Jeff Squyres (jsquyres) jsquyres at cisco.com
Thu Mar 13 16:37:21 CDT 2014


Good to know.

Is it easy to get error detection during MPI_File_open to know that there is a mismatch?


On Mar 13, 2014, at 5:33 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:

> In such cases, if the user prepends the file name with "nfs:" it should work.
> 
> Rajeev
> 
> On Mar 13, 2014, at 4:28 PM, Jeff Squyres (jsquyres) <jsquyres at cisco.com> wrote:
> 
>> We had a user run into a situation with ROMIO that caused a bunch of head-scratching before we figured it out.
>> 
>> The user was running a trivial C MPI program that failed with a truncation error in MPI_Bcast.  After much poking around, we finally figured out that it was actually failing deep within MPI_File_open.  The reason it was getting a truncation error is because of the user's setup:
>> 
>> - the cluster head node is *also* a compute node
>> - the cluster head node is the NFS server
>> - in MPI_COMM_WORLD rank 0 (on the head node), ROMIO saw a UFS filesystem
>> - all other MCW process ranks saw an NFS filesystem
>> - hence, during ROMIO's MPI_File_open, MCW rank 0 chose to do a UFS file open and everyone else chose to do an NFS file open
>> 
>> Things go downhill from there, eventually resulting in a MPI_Bcast mismatch.
>> 
>> Is there any thought to supporting this kind of scenario?  All MPI processes are accessing the same filesystem; it's just that some of the MPI processes happen to see that filesystem as local, and some see it as NFS.
>> 
>> Regardless, if this can't/won't be supported, can there be some kind of Allreduce before calling ADIOI_Open that determines, "Hey, we're not viewing this as the same filesystem -- we should throw an error (and possibly print a helpful error message)."  Such cluster configurations are not entirely uncommon; being able to error out with a user-helpful reason would be most helpful.
>> 
>> -- 
>> Jeff Squyres
>> jsquyres at cisco.com
>> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss


-- 
Jeff Squyres
jsquyres at cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/




More information about the discuss mailing list