[mpich-discuss] [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
Rob Latham
robl at mcs.anl.gov
Mon Jan 12 15:10:05 CST 2015
On 12/17/2014 07:04 PM, Eric Chamberland wrote:
> Hi!
>
> Here is a "poor man's fix" that works for me (the idea is not from me,
> thanks to Thomas H.):
>
> #1- char* lCwd = getcwd(0,0);
> #2- chdir(lPathToFile);
> #3- MPI_File_open(...,lFileNameWithoutTooLongPath,...);
> #4- chdir(lCwd);
> #5- ...
>
> I think there are some limitations but it works very well for our
> uses... and until a "real" fix is proposed...
A bit of a delay on my part due to the winter break but I have returned
to this topic.
I have an approach that will at least tell you something went wrong in
processing the shared file pointer name: the string is so long it
truncates the error message, but it leaves enough to tell you what went
wrong.
ERROR Returned by MPI: 1006695702
ERROR_string Returned by MPI: Invalid file name, error stack:
ADIOI_Shfp_fname(60): Pathname
this/is/a_very/long/path/that/contains/a/not/so/long/filename
/but/trying/to/collectively/mpi_file_open/it/you/will/have/a/memory/corruption/resulting/of/
invalide/writing/or/reading/past/the/end/of/one/or/some/hidden/strings/in/mpio/Simpimple/use
r
At least you get "invalid file name"
Furthermore, I'm changing that code to use PATH_MAX, not 256, which
would have fixed the specific problem you encountered (and might have
been sufficient to get us 10 more years, at which point someone might
try to create a file with 1000 characters in it)
==rob
>
> Thanks for helping!
>
> Eric
>
>
> On 12/15/2014 11:42 PM, Gilles Gouaillardet wrote:
>> Eric and all,
>>
>> That is clearly a limitation in romio, and this is being tracked at
>> https://trac.mpich.org/projects/mpich/ticket/2212
>>
>> in the mean time, what we can do in OpenMPI is update
>> mca_io_romio_file_open() and fails with a user friendly error message
>> if strlen(filename) is larger that 225.
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/12/16 12:43, Gilles Gouaillardet wrote:
>>> Eric,
>>>
>>> thanks for the simple test program.
>>>
>>> i think i see what is going wrong and i will make some changes to avoid
>>> the memory overflow.
>>>
>>> that being said, there is a hard coded limit of 256 characters, and your
>>> path is bigger than 300 characters.
>>> bottom line, and even if there is no more memory overflow, that cannot
>>> work as expected.
>>>
>>> i will report this to the mpich folks, since romio is currently imported
>>> from mpich.
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On 2014/12/16 0:16, Eric Chamberland wrote:
>>>> Hi Gilles,
>>>>
>>>> just created a very simple test case!
>>>>
>>>> with this setup, you will see the bug with valgrind:
>>>>
>>>> export
>>>> too_long=./this/is/a_very/long/path/that/contains/a/not/so/long/filename/but/trying/to/collectively/mpi_file_open/it/you/will/have/a/memory/corruption/resulting/of/invalide/writing/or/reading/past/the/end/of/one/or/some/hidden/strings/in/mpio/Simple/user/would/like/to/have/the/parameter/checked/and/an/error/returned/or/this/limit/removed
>>>>
>>>>
>>>> mpicc -o bug_MPI_File_open_path_too_long
>>>> bug_MPI_File_open_path_too_long.c
>>>>
>>>> mkdir -p $too_long
>>>> echo "header of a text file" > $too_long/toto.txt
>>>>
>>>> mpirun -np 2 valgrind ./bug_MPI_File_open_path_too_long
>>>> $too_long/toto.txt
>>>>
>>>> and watch the errors!
>>>>
>>>> unfortunately, the memory corruptions here doesn't seem to segfault
>>>> this simple test case, but in my case, it is fatal and with valgrind,
>>>> it is reported...
>>>>
>>>> OpenMPI 1.6.5, 1.8.3rc3 are affected
>>>>
>>>> MPICH-3.1.3 also have the error!
>>>>
>>>> thanks,
>>>>
>>>> Eric
>>>>
>>> _______________________________________________
>>> users mailing list
>>> users at open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2014/12/26005.php
>> _______________________________________________
>> users mailing list
>> users at open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/12/26006.php
>
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list