[mpich-devel] Issue with MPI_Error_string() for user-defined codes/clases

Rob Latham robl at mcs.anl.gov
Tue Apr 22 15:01:57 CDT 2014



On 04/22/2014 04:04 AM, Lisandro Dalcin wrote:
> The MPI-3 standard says (pp.354, lines 39-40):
>
> """
> If MPI_ERROR_STRING is called when no string has been set, it will
> return a empty
> string (all spaces in Fortran, "" in C).
> """
>
> The following simple tests segfaults. A quick fix would be to return
> an empty string in MPIR_Err_get_dynerr_string() (file dynerrutil.c),
> e.g:

Now that I look at this a bit more closely, shouldn't user_code_msgs[] 
and user_class_msgs[] be initialized to "", not 0 ?  That messes up some 
other assumptions of the errorcode handling code, though..


==rob

>
> diff --git a/src/mpi/errhan/dynerrutil.c b/src/mpi/errhan/dynerrutil.c
> index 943e8c3..fb40469 100644
> --- a/src/mpi/errhan/dynerrutil.c
> +++ b/src/mpi/errhan/dynerrutil.c
> @@ -297,11 +297,13 @@ const char *MPIR_Err_get_dynerr_string( int code )
>       if (errcode) {
>    if (errcode < first_free_code) {
>       errstr = user_code_msgs[errcode];
> +    if (!errstr) errstr = "";
>    }
>       }
>       else {
>    if (errclass < first_free_class) {
>       errstr = user_class_msgs[errclass];
> +    if (!errstr) errstr = "";
>    }
>       }
>
>
> [dalcinl at kw2060 ~]$ cat error_string2.c
> #include <stdio.h>
> #include <mpi.h>
> int main(int argc, char *argv[])
> {
>    int errorclass;
>    char errorstring[MPI_MAX_ERROR_STRING] = {64,0};
>    int slen;
>    MPI_Init(&argc, &argv);
>    MPI_Add_error_class(&errorclass);
>    MPI_Error_string(errorclass, errorstring, &slen);
>    printf("errorclass:%d errorstring:'%s' len:%d\n", errorclass,
> errorstring, slen);
>    MPI_Finalize();
>    return 0;
> }
>
> [dalcinl at kw2060 ~]$ mpicc error_string2.c
> [dalcinl at kw2060 ~]$ ./a.out
> Segmentation fault (core dumped)
>
> [dalcinl at kw2060 ~]$ valgrind -q ./a.out
> ==14176== Invalid read of size 1
> ==14176==    at 0x4C6F5E7: MPIU_Strncpy (safestr.c:65)
> ==14176==    by 0x4C61864: MPIR_Err_get_string (errutil.c:601)
> ==14176==    by 0x4DA4DB4: PMPI_Error_string (error_string.c:80)
> ==14176==    by 0x400888: main (in /home/dalcinl/Devel/BUGS-MPI/~/a.out)
> ==14176==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==14176==
> ==14176==
> ==14176== Process terminating with default action of signal 11 (SIGSEGV)
> ==14176==  Access not within mapped region at address 0x0
> ==14176==    at 0x4C6F5E7: MPIU_Strncpy (safestr.c:65)
> ==14176==    by 0x4C61864: MPIR_Err_get_string (errutil.c:601)
> ==14176==    by 0x4DA4DB4: PMPI_Error_string (error_string.c:80)
> ==14176==    by 0x400888: main (in /home/dalcinl/Devel/BUGS-MPI/~/a.out)
> ==14176==  If you believe this happened as a result of a stack
> ==14176==  overflow in your program's main thread (unlikely but
> ==14176==  possible), you can try to increase the size of the
> ==14176==  main thread stack using the --main-stacksize= flag.
> ==14176==  The main thread stack size used in this run was 8720384.
> Segmentation fault (core dumped)
>
>
>

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


More information about the devel mailing list