[mpich-discuss] ROMIO filesystem check during MPI_File_open
Rob Latham
robl at mcs.anl.gov
Mon Mar 24 16:54:40 CDT 2014
On 03/24/2014 04:37 PM, Jeff Squyres (jsquyres) wrote:
> On Mar 24, 2014, at 5:17 PM, Rob Latham <robl at mcs.anl.gov> wrote:
>
>> Every process will call ADIO_FileSysType_fncall() in the case where have_nfs_enabled==1 . Your interpretation of the code is so different from mine that I'm just going to have to past what I'm looking at and you can tell me where I'm wrong. These lines come from MPICH's ad_fstype.c. The content is the same in openmpi-1.6.4, just shifted by 25 or so lines:
>>
>> 637 ADIO_FileSysType_fncall(filename, &file_system, &myerrcode);
>> 638 if (myerrcode != MPI_SUCCESS) {
>
> This is where our disconnect is occurring: _fncall returns myerrcode==MPI_SUCCESS in all cases.
>
> Meaning: the block with the Allreduce is not entered. This causes everyone to just use the file_system value they got from _fncall: one will have ADIO_UFS, and the rest will have ADIO_NFS.
>
> /Me checks again just to make sure I'm not goofing this up... Yep: I ran this through DDT and verified that process0 gets file_system==ADIO_UFS/myerrcode==MPI_SUCCESS and process1 gets file_system=ADIO_NFS/myerrcode=MPI_SUCCESS.
>
Thanks, Jeff. Big help pointing me to the crux of the problem. I don't
know why I reduce the error code if it's not successful. We need to
reduce the error code in all cases, and *then* reduce the detected file
system type.
Please try the patch below -- I don't have a mixed NFS/UFS environment I
can try this out on.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-fix-fs-detection-when-multiple-fs-exist.patch
Type: text/x-patch
Size: 3194 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140324/15685e0f/attachment.bin>
More information about the discuss
mailing list