[mpich-devel] Two issues with MPI_File error handling

Rob Latham robl at mcs.anl.gov
Thu May 7 14:00:58 CDT 2015



On 05/07/2015 06:58 AM, Lisandro Dalcin wrote:
> This is using the preview release 3.2b2, I discovered this issue
> months ago and it seems I forgot to report it.

Thanks for reporting these two issues.  These two patches (attached) 
address your test case, but it's a part of the code I don't play in 
often, so I'm going to need a review on them before committing:

==rob

>
> 1) MPI_File_call_errhandler(file, value) should return MPI_SUCCESS
> instead of value under normal operation (as the MPI spec says), and
> particularly if the error handler is set to MPI_ERRORS_RETURN. This
> bug is trivial to fix.
>
> 2) Setting the predefined error handler MPI_ERRORS_ARE_FATAL and
> calling MPI_File_call_errhandler(file, errcode) should "gracefully"
> abort the processes the MPI way, instead of segfaulting. The problem
> here is that ROMIO has to somehow handle MPI_ERRORS_ARE_FATAL in some
> special way I'm not sure how to implement, right now it is trying to
> invoke a callback that is set to NULL.
>
> I'm attaching a test case showcasing the two issues, and pasting output below:
>
> $ cat file_errhdl.c
> #include <mpi.h>
> #include <stdio.h>
>
> int main(int argc, char *argv[])
> {
>    int ierr;
>    MPI_File fh;
>    MPI_Init(&argc, &argv);
>
>    MPI_File_open(MPI_COMM_WORLD,"/tmp/datafile",
>                  MPI_MODE_CREATE|MPI_MODE_RDWR|MPI_MODE_DELETE_ON_CLOSE,
>                  MPI_INFO_NULL,&fh);
>
>    MPI_File_set_errhandler(fh,MPI_ERRORS_RETURN);
>    ierr = MPI_File_call_errhandler(fh,MPI_ERR_FILE);
>    printf("ierr: %d, expected: %d\n", ierr, (int)MPI_SUCCESS);
>
>    MPI_File_set_errhandler(fh,MPI_ERRORS_ARE_FATAL);
>    MPI_File_call_errhandler(fh,MPI_ERR_FILE); /* should abort */
>    MPI_File_close(&fh);
>
>    MPI_Finalize();
>    return 0;
> }
>
> $ mpicc file_errhdl.c
>
> $ ./a.out
> ierr: 27, expected: 0
> Segmentation fault (core dumped)
>
> $ valgrind -q ./a.out
> ierr: 27, expected: 0
> ==30036== Jump to the invalid address stated on the next line
> ==30036==    at 0x0: ???
> ==30036==    by 0x4CE8EDC: PMPI_File_call_errhandler
> (file_call_errhandler.c:105)
> ==30036==    by 0x400951: main (in /home/dalcinl/Devel/BUGS-MPI/mpich/a.out)
> ==30036==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==30036==
> ==30036==
> ==30036== Process terminating with default action of signal 11 (SIGSEGV)
> ==30036==  Bad permissions for mapped region at address 0x0
> ==30036==    at 0x0: ???
> ==30036==    by 0x4CE8EDC: PMPI_File_call_errhandler
> (file_call_errhandler.c:105)
> ==30036==    by 0x400951: main (in /home/dalcinl/Devel/BUGS-MPI/mpich/a.out)
> Segmentation fault (core dumped)
>
>
>
>
> _______________________________________________
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/devel
>

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-avoid-segfault-when-calling-built-in-error-handler.patch
Type: text/x-patch
Size: 952 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/devel/attachments/20150507/d3cea772/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-proper-return-value-for-MPI_File_call_errhandler.patch
Type: text/x-patch
Size: 2632 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/devel/attachments/20150507/d3cea772/attachment-0001.bin>


More information about the devel mailing list