[mpich-devel] Deadlock when using MPIX_Grequest interface

Jeff Hammond jeff.science at gmail.com
Wed Mar 8 17:17:29 CST 2017


On Wed, Mar 8, 2017 at 2:25 PM, Latham, Robert J. <robl at mcs.anl.gov> wrote:

> On Wed, 2017-03-08 at 09:25 +0000, Giuseppe Congiu wrote:
> > Hello Rob,
> >
> > > I'm excited to see someone using the MPIX_Grequest interface.  We
> > > used
> > > the MPIX_Grequest interface to implement non-blocking collective
> > > I/O,
> > > and had some bad interactions between libc's aio and the grequest
> > > callbacks.  I don't know if you are running into something similar.
> >
> > Maybe. Do you have a description of the problem somewhere?
>
> The guy who did that work just left last Friday.  I'll have to dig up
> the archives. Looks like it was a hard-to-debug segfault  https://trac.
> mpich.org/projects/mpich/ticket/2201
>
>
That link is 404.  Not sure if temporary or permanent.

Jeff


> >
> > > Do you have any desire or plans to submit these changes into
> > > upstream
> > > ROMIO?
> >
> > The idea would be to push these changes to upstream ROMIO if this is
> > relevant for the community.
>
> I don't encounter many BeeGFS users, but ROMIO file system drivers are
> fairly self-contained and it wouldn't be a burden to ship with them in
> ROMIO.
>
>
> > In principle here I have the same intent. The difference is that I
> > cannot check on progress since
> > BeeGFS does not provide a way for checking the status of a single
> > request. Instead it only
> > offers a blocking wait interface for all the requests submitted for a
> > certain file (identified
> > by the filename). Thus I need to invoke deeper_cache_flush_wait()
> > from inside one of the
> > callbacks.
>
> Blocking the progress engine when it expects to repeatedly call non-
> blocking functions could work as long as deeper_cache_flush_wait()
> eventually finishes and nothing needs MPI.
>
> Now, all I know about DEEP-ER is what I just read on https://www.beegfs
> .com/wiki/cacheAPI, so I'm sure I don't know the whole picture, but
> can you call deeper_cache_flush_is_finished() and deeper_cache_flush()
> without the WAIT flag?  Stick those two routines in the poll_fn().
>
> The generalized request extensions provide a wait_fn() that should be
> able to handle this, too...
>
> When it gets stuck, what does the call stack look like?
>
> Is it stuck for good or just making progress really really slowly?
>
> ==rob
> _______________________________________________
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/devel




-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20170308/eda061e2/attachment.html>


More information about the devel mailing list