[mpich-discuss] [PATCH v2] test: add attrdeleteget, MPI_Attr_get called from delete_fn

Fab Tillier ftillier at microsoft.com
Sat May 18 16:19:36 CDT 2013


Pavan Balaji wrote on Sat, 18 May 2013 at 14:02:17

> In MPICH, here's what we do:
> 
> 1. Run the user callback.  If it returned an error, return that error to
> the user.

So you don't actually free the attribute if the callback succeeds?

What happens if the callback fails, but some preceding attributes' callbacks succeeded?  If you wanted to provide some level of fault tolerance (since attribute delete callbacks could be recoverable errors), what happens on a subsequent call to MPI_COMM_FREE?  Do the attributes that had their callback called successfully get a second delete call?

It seems that once an attribute delete callback returns success, that attribute should be deleted (that is, you don't need to delay the deletion until the ref count of the communicator reaches zero).

> 2. Decrement the communicator ref-count.
> 
> 3. If the ref-count has reached zero, free the attributes and then free
> the communicator.

Delaying freeing the attributes (when the delete callback has been invoked and has freed whatever state it had) seems odd to me.  Wouldn't it require extra bookkeeping?  Do you see a problem with freeing the attribute immediately after the delete callback succeeds (regardless of comm ref count)?  This would solve any issues with a subsequent call to MPI_COMM_FREE, in that only attributes whose delete callback failed or weren't called would get invoked.
 
> If the user did a MPI_BCAST, there's no problem since after the
> callback, the communicator ref-count does not include the BCAST.  On the
> other hand, if the user did an MPI_IBCAST, then the ref-count will not
> touch zero, so the communicator is not actually deleted internally.
> 
> The attribute might not actually get deleted if the ref-count didn't
> reach zero, but you can't access them anyway, since the comm handle if
> not valid.  They'll eventually get freed when the ref-count reaches zero.

Right, so why not just delete the attribute from the context of the MPI_COMM_FREE call, rather than delaying until the ref count reaches zero?  I didn't think attributes could affect internal MPI operations, so thought they could be freed early.

Thanks,
-Fab



More information about the discuss mailing list