[mpich-devel] Deadlock when using MPIX_Grequest interface

Raffenetti, Kenneth J. raffenet at mcs.anl.gov
Wed Mar 8 19:02:28 CST 2017


Ticket numbers are maintained on GitHub: https://github.com/pmodels/mpich/issues/2201

Ken

On Mar 8, 2017 5:17 PM, Jeff Hammond <jeff.science at gmail.com> wrote:


On Wed, Mar 8, 2017 at 2:25 PM, Latham, Robert J. <robl at mcs.anl.gov<mailto:robl at mcs.anl.gov>> wrote:
On Wed, 2017-03-08 at 09:25 +0000, Giuseppe Congiu wrote:
> Hello Rob,
>
> > I'm excited to see someone using the MPIX_Grequest interface.  We
> > used
> > the MPIX_Grequest interface to implement non-blocking collective
> > I/O,
> > and had some bad interactions between libc's aio and the grequest
> > callbacks.  I don't know if you are running into something similar.
>
> Maybe. Do you have a description of the problem somewhere?

The guy who did that work just left last Friday.  I'll have to dig up
the archives. Looks like it was a hard-to-debug segfault  https://trac.
mpich.org/projects/mpich/ticket/2201<http://mpich.org/projects/mpich/ticket/2201>


That link is 404.  Not sure if temporary or permanent.

Jeff

>
> > Do you have any desire or plans to submit these changes into
> > upstream
> > ROMIO?
>
> The idea would be to push these changes to upstream ROMIO if this is
> relevant for the community.

I don't encounter many BeeGFS users, but ROMIO file system drivers are
fairly self-contained and it wouldn't be a burden to ship with them in
ROMIO.


> In principle here I have the same intent. The difference is that I
> cannot check on progress since
> BeeGFS does not provide a way for checking the status of a single
> request. Instead it only
> offers a blocking wait interface for all the requests submitted for a
> certain file (identified
> by the filename). Thus I need to invoke deeper_cache_flush_wait()
> from inside one of the
> callbacks.

Blocking the progress engine when it expects to repeatedly call non-
blocking functions could work as long as deeper_cache_flush_wait()
eventually finishes and nothing needs MPI.

Now, all I know about DEEP-ER is what I just read on https://www.beegfs
.com/wiki/cacheAPI, so I'm sure I don't know the whole picture, but
can you call deeper_cache_flush_is_finished() and deeper_cache_flush()
without the WAIT flag?  Stick those two routines in the poll_fn().

The generalized request extensions provide a wait_fn() that should be
able to handle this, too...

When it gets stuck, what does the call stack look like?

Is it stuck for good or just making progress really really slowly?

==rob
_______________________________________________
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/devel



--
Jeff Hammond
jeff.science at gmail.com<mailto:jeff.science at gmail.com>
http://jeffhammond.github.io/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20170309/16078d09/attachment.html>


More information about the devel mailing list