[mpich-discuss] [PATCH v2] test: add attrdeleteget, MPI_Attr_get called from delete_fn

Pavan Balaji balaji at mcs.anl.gov
Sat May 18 16:02:17 CDT 2013


On 05/18/2013 03:41 PM US Central Time, Fab Tillier wrote:
> I wonder what other implementations would do if you initiated an
> MPI_Bcast or similar on such a communicator from the delete callback
> (say you wanted to notify everyone the attribute was being deleted,
> or whatnot).  I'm reluctant to special case certain functions as
> working on a freed communicator, but perhaps the structure of our
> implementation could change to not invalidate the communicator.

This should be valid, IMO.  See below.

>> Yes, the test relies on that.  The problem is MPI_Attr_get returning an
>> error when called from delete_fn, presumably due to eager invalidation
>> of the communicator.
> 
> The issue here is that the communicator is reference counted. It is 
> deleted when the reference count reaches zero (at which point it is 
> no longer valid, as communicator lookup checks that the reference 
> count is non-zero). Having the reference count bounce through zero 
> (to initiate destruction), then back to 1 so that the attribute 
> delete callbacks can access it, then back to zero, is really messy, 
> especially if the delete callbacks were to initiate I/O operations.

In MPICH, here's what we do:

1. Run the user callback.  If it returned an error, return that error to
the user.

2. Decrement the communicator ref-count.

3. If the ref-count has reached zero, free the attributes and then free
the communicator.

If the user did a MPI_BCAST, there's no problem since after the
callback, the communicator ref-count does not include the BCAST.  On the
other hand, if the user did an MPI_IBCAST, then the ref-count will not
touch zero, so the communicator is not actually deleted internally.

The attribute might not actually get deleted if the ref-count didn't
reach zero, but you can't access them anyway, since the comm handle if
not valid.  They'll eventually get freed when the ref-count reaches zero.

 -- Pavan

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji



More information about the discuss mailing list