[mpich-devel] progress in flushall

Jeff Hammond jeff.science at gmail.com
Wed Apr 30 16:29:13 CDT 2014


On Wed, Apr 30, 2014 at 3:58 AM, Balaji, Pavan <balaji at anl.gov> wrote:
>
> On Apr 29, 2014, at 10:20 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>> Yeah, I know that's what MPICH gives me.  My point is that is
>> undesirable.  All sync operations should make progress, no matter
>> what.
>
> They are not required to make progress from the spec perspective.  It’s a bad assumption to make that Win_flush will always make progress.  I think it’s OK for it to be mostly a no-op when there are no operations posted.  If you need progress, why don’t you use test or something similar?

I want flushall to make progress because there might be incoming RMA
that won't be processed otherwise.  While today MPICH makes progress
on all Ch3 ops in Barrier, if that ever changes, ARMCI_Barrier is
going to deadlock because it does win_flush_all then mpi_barrier.  If
flush_all doesn't process incoming RMA, the initiators cannot return
from win_flush_all and will never proceed to barrier, while the
processes that haven't initiated any RMA will be in mpi_barrier not
make progress on RMA.

If win_unlock_all behaves like win_flush_all and win_free doesn't make
progress, then I can get the same deadlock scenario in ARMCI_Free.

I'm already adding Iprobe for progress inside of ARMCI-MPI, but it
seems silly to have to test for p2p to get progress in RMA and for RMA
sync ops to not guarantee progress.  I'd much prefer the design where
p2p and rma progress are decoupled and my RMA sync ops are burdened
with progress on p2p and vice versa.

Best,

Jeff

-- 
Jeff Hammond
jeff.science at gmail.com


More information about the devel mailing list