[mpich-discuss] ROMIO filesystem check during MPI_File_open

Rob Latham robl at mcs.anl.gov
Mon Mar 24 16:54:40 CDT 2014



On 03/24/2014 04:37 PM, Jeff Squyres (jsquyres) wrote:
> On Mar 24, 2014, at 5:17 PM, Rob Latham <robl at mcs.anl.gov> wrote:
>
>> Every process will call ADIO_FileSysType_fncall() in the case where have_nfs_enabled==1 .  Your interpretation of the code is so different from mine that I'm just going to have to past what I'm looking at and you can tell me where I'm wrong.  These lines come from MPICH's ad_fstype.c.  The content is the same in openmpi-1.6.4, just shifted by 25 or so lines:
>>
>> 637     ADIO_FileSysType_fncall(filename, &file_system, &myerrcode);
>> 638     if (myerrcode != MPI_SUCCESS) {
>
> This is where our disconnect is occurring: _fncall returns myerrcode==MPI_SUCCESS in all cases.
>
> Meaning: the block with the Allreduce is not entered.  This causes everyone to just use the file_system value they got from _fncall: one will have ADIO_UFS, and the rest will have ADIO_NFS.
>
> /Me checks again just to make sure I'm not goofing this up... Yep: I ran this through DDT and verified that process0 gets file_system==ADIO_UFS/myerrcode==MPI_SUCCESS and process1 gets file_system=ADIO_NFS/myerrcode=MPI_SUCCESS.
>

Thanks, Jeff.  Big help pointing me to the crux of the problem. I don't 
know why I reduce the error code if it's not successful.  We need to 
reduce the error code in all cases, and *then* reduce the detected file 
system type.

Please try the patch below -- I don't have a mixed NFS/UFS environment I 
can try this out on.

==rob




-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-fix-fs-detection-when-multiple-fs-exist.patch
Type: text/x-patch
Size: 3194 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140324/15685e0f/attachment.bin>


More information about the discuss mailing list