[mpich-discuss] can't MPI_Abort a program due to deadlock at pthread_mutex_lock

Jeff Hammond jhammond at alcf.anl.gov
Wed Jan 2 11:24:25 CST 2013

> If you really need this to work, you can also disable pthread cancellation
> when the thread is in MPI calls:
>         pthread_setcanceltype(PTHREAD_CANCEL_DISABLE, &cancel_val);
>                   MPI_STATUS_IGNORE);
>         pthread_setcanceltype(cancel_val, &junk);
> With that change, the program you sent works correctly.

Thanks.  This is useful info but I have a better way to work around
this issue in the code that I actually care about.  I'm writing active
messages over p2p and it's trivial to tell the polling thread to
pthread_exit via an active-message.

> It think that adding setcanceltype() to the MPICH thread safety macros might
> be all that we have to do to make the library cancel-safe, but I'm not a
> pthreads expert.  It could have a nonzero performance cost, so it would
> likely need to be a compile-time option.

This might be useful as a debugging option but I wouldn't prioritize it.

>> Pthread_cancel is unsafe in any context where the cancelled thread can
>> hold a mutex needed by other threads.  This looks to me like a bad
>> programming practice (e.g. you will also likely leak memory, since you
>> can't free MPI objects allocated by the thread from inside of a
>> cancellation handler), not an MPICH-specific problem; there are safer
>> ways to cancel/cleanup threads.

I don't care about memory leaks since the use case was merely to exit
the program.

>> Do you have a good case for why such functionality is needed?

The use of pthread_cancel is entirely optional.  I knew there were
many ways to work around this and I've already implemented the one
that makes the most sense to me.

Thanks for all your comments.



Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381

More information about the discuss mailing list