[mpich-devel] Issue with MPI_Error_string() for user-defined codes/clases
Rob Latham
robl at mcs.anl.gov
Tue Apr 22 15:01:57 CDT 2014
On 04/22/2014 04:04 AM, Lisandro Dalcin wrote:
> The MPI-3 standard says (pp.354, lines 39-40):
>
> """
> If MPI_ERROR_STRING is called when no string has been set, it will
> return a empty
> string (all spaces in Fortran, "" in C).
> """
>
> The following simple tests segfaults. A quick fix would be to return
> an empty string in MPIR_Err_get_dynerr_string() (file dynerrutil.c),
> e.g:
Now that I look at this a bit more closely, shouldn't user_code_msgs[]
and user_class_msgs[] be initialized to "", not 0 ? That messes up some
other assumptions of the errorcode handling code, though..
==rob
>
> diff --git a/src/mpi/errhan/dynerrutil.c b/src/mpi/errhan/dynerrutil.c
> index 943e8c3..fb40469 100644
> --- a/src/mpi/errhan/dynerrutil.c
> +++ b/src/mpi/errhan/dynerrutil.c
> @@ -297,11 +297,13 @@ const char *MPIR_Err_get_dynerr_string( int code )
> if (errcode) {
> if (errcode < first_free_code) {
> errstr = user_code_msgs[errcode];
> + if (!errstr) errstr = "";
> }
> }
> else {
> if (errclass < first_free_class) {
> errstr = user_class_msgs[errclass];
> + if (!errstr) errstr = "";
> }
> }
>
>
> [dalcinl at kw2060 ~]$ cat error_string2.c
> #include <stdio.h>
> #include <mpi.h>
> int main(int argc, char *argv[])
> {
> int errorclass;
> char errorstring[MPI_MAX_ERROR_STRING] = {64,0};
> int slen;
> MPI_Init(&argc, &argv);
> MPI_Add_error_class(&errorclass);
> MPI_Error_string(errorclass, errorstring, &slen);
> printf("errorclass:%d errorstring:'%s' len:%d\n", errorclass,
> errorstring, slen);
> MPI_Finalize();
> return 0;
> }
>
> [dalcinl at kw2060 ~]$ mpicc error_string2.c
> [dalcinl at kw2060 ~]$ ./a.out
> Segmentation fault (core dumped)
>
> [dalcinl at kw2060 ~]$ valgrind -q ./a.out
> ==14176== Invalid read of size 1
> ==14176== at 0x4C6F5E7: MPIU_Strncpy (safestr.c:65)
> ==14176== by 0x4C61864: MPIR_Err_get_string (errutil.c:601)
> ==14176== by 0x4DA4DB4: PMPI_Error_string (error_string.c:80)
> ==14176== by 0x400888: main (in /home/dalcinl/Devel/BUGS-MPI/~/a.out)
> ==14176== Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==14176==
> ==14176==
> ==14176== Process terminating with default action of signal 11 (SIGSEGV)
> ==14176== Access not within mapped region at address 0x0
> ==14176== at 0x4C6F5E7: MPIU_Strncpy (safestr.c:65)
> ==14176== by 0x4C61864: MPIR_Err_get_string (errutil.c:601)
> ==14176== by 0x4DA4DB4: PMPI_Error_string (error_string.c:80)
> ==14176== by 0x400888: main (in /home/dalcinl/Devel/BUGS-MPI/~/a.out)
> ==14176== If you believe this happened as a result of a stack
> ==14176== overflow in your program's main thread (unlikely but
> ==14176== possible), you can try to increase the size of the
> ==14176== main thread stack using the --main-stacksize= flag.
> ==14176== The main thread stack size used in this run was 8720384.
> Segmentation fault (core dumped)
>
>
>
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the devel
mailing list