[mpich-discuss] can't MPI_Abort a program due to deadlock at pthread_mutex_lock

Jeff Hammond jhammond at alcf.anl.gov
Wed Jan 2 11:24:25 CST 2013


> If you really need this to work, you can also disable pthread cancellation
> when the thread is in MPI calls:
>
>         pthread_setcanceltype(PTHREAD_CANCEL_DISABLE, &cancel_val);
>         MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD,
>                   MPI_STATUS_IGNORE);
>         pthread_setcanceltype(cancel_val, &junk);
>
> With that change, the program you sent works correctly.

Thanks.  This is useful info but I have a better way to work around
this issue in the code that I actually care about.  I'm writing active
messages over p2p and it's trivial to tell the polling thread to
pthread_exit via an active-message.

> It think that adding setcanceltype() to the MPICH thread safety macros might
> be all that we have to do to make the library cancel-safe, but I'm not a
> pthreads expert.  It could have a nonzero performance cost, so it would
> likely need to be a compile-time option.

This might be useful as a debugging option but I wouldn't prioritize it.

>> Pthread_cancel is unsafe in any context where the cancelled thread can
>> hold a mutex needed by other threads.  This looks to me like a bad
>> programming practice (e.g. you will also likely leak memory, since you
>> can't free MPI objects allocated by the thread from inside of a
>> cancellation handler), not an MPICH-specific problem; there are safer
>> ways to cancel/cleanup threads.

I don't care about memory leaks since the use case was merely to exit
the program.

>> Do you have a good case for why such functionality is needed?

The use of pthread_cancel is entirely optional.  I knew there were
many ways to work around this and I've already implemented the one
that makes the most sense to me.

Thanks for all your comments.

Best,

Jeff

-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond



More information about the discuss mailing list