[mpich-devel] Fw: romio questions
Rob Latham
robl at mcs.anl.gov
Fri Apr 4 16:34:39 CDT 2014
This is getting out of my depth. MPICH guys: Mark Allen is updating
Platform's MPI and has been asking me some questions about
multi-threaded ROMIO.
The independent ROMIO routines might make some local MPI calls to
process the data type, but otherwise, yes, the only blocking call would
be the file system operation.
I haven't thought about what would happen if MPI_File_write_all was
called from two threads.
MPICH guys: there's no way the CS_ENTER/EXIT macros can be that clever,
right?
==rob
On 04/04/2014 04:28 PM, Mark Allen wrote:
> Thanks. I looked a bit at the CS_ENTER/EXIT code. Am I right that all
> the non-collective MPIO calls like MPI_File_read etc are only blocking
> in the sense of waiting on some local operation to complete? If that's
> the case it would be okay for them to hold a lock from beginning to end
> of the call like that. But for the collective MPIO operations I don't
> think you could just hold the lock for the entire function call since
> you might end up waiting for remote peers to arrive, and if two
> collectives were taking place at the same time they could mismatch and
> deadlock. But I'm not sure if that's what the code is doing or not. Is
> there a release of the lock hidden somewhere when collectives block? If
> inside the MPI_Allgather for example MPI performed a release and regain
> of the MPIO lock is the blocking MPIO collective holding any shared
> resource that would get corrupted by letting other threads in at that point?
>
> Fwiw I also noticed mpi-io/ioreq_f2c.c looks to be a case where an early
> return runs the risk of a mismatched CS_ENTER w/o a corresponding
> CS_EXIT. It looks like all the other files used goto fn_exit to ensure
> a match.
>
> Mark
>
>
> Inactive hide details for Rob Latham ---04/04/2014 09:49:00 AM---On
> 04/04/2014 01:22 AM, Paul Coffman wrote: > ----- Forwarded Rob Latham
> ---04/04/2014 09:49:00 AM---On 04/04/2014 01:22 AM, Paul Coffman wrote:
> > ----- Forwarded by Paul Coffman/Rochester/IBM on 04/04
>
> From: Rob Latham <robl at mcs.anl.gov>
> To: Paul Coffman/Rochester/IBM at IBMUS
> Cc: Mark Allen/Dallas/IBM at IBMUS
> Date: 04/04/2014 09:49 AM
> Subject: Re: Fw: romio questions
>
> ------------------------------------------------------------------------
>
>
>
>
>
> On 04/04/2014 01:22 AM, Paul Coffman wrote:
>
> > ----- Forwarded by Paul Coffman/Rochester/IBM on 04/04/2014 01:22 AM
> -----
> >
> > From: Mark Allen/Dallas/IBM
> > To: Paul Coffman/Rochester/IBM at IBMUS,
> > Date: 04/04/2014 01:07 AM
> > Subject: romio questions
> > ------------------------------------------------------------------------
> >
> >
> > I have two questions/topics for you:
> >
> > First I wanted to ask do you happen to know if romio is thread safe? I
> > see a fair number of critical-section begin/end macros and am guessing
> > it is, but thought I'd ask anyway.
>
> the internal romio routines (things in the ADIO and ADIOI namespace)
> rely on several bits of global state -- the flattened representation of
> the file and memory datatype come to mind first, but there are probably
> others. The critical section macros are at the MPI-IO interface to
> romio and should provide a "big lock" to the rest.
>
> I haven't tried this though. I would be a little nervous about it
> working without a few patches.
>
> > Second, I noticed romio uses an extension of generalized requests
> > described here:
> >
> http://www.cs.illinois.edu/~wgropp/bib/papers/2007/latham_grequest-enhance-4.pdf
> >
> > but the code looks confused on whether the proposed wait_fn callback is
> > waiting for a single request or all the requests.
> >
> > In romio's definitions of
> > ADIOI_PVFS2_aio_wait_fn
> > ADIOI_GEN_aio_wait_fn
> > the wait_fn looks to me very much like it's waiting for all the reqs, vs
> > the NTFS function that looks like it's waiting on some
> > ADIOI_NTFS_aio_wait_fn
>
> the intent for the extended wait_fn was indeed to wait for all
> outstanding generalized requests. Specifically, to call aio_suspend on
> more than one operation at a time.
>
> I would not read too much into NTFS_aio_wait_fn. that work was done
> quickly for the paper and then set aside.
>
> > And where these wait_fn callbacks get used, MPI_Waitsome uses
> > MPIR_Grequest_progress_poke which conditionally calls wait_fn. It seems
> > to me this would make MPI_Waitsome erroneously block until all its
> > generalized requests finish (the code in MPIR_Grequest_progress_poke
> > looks like it believes the wait_fn is supposed to just complete one
> > request).
>
> It has been some years since I looked at the waitsome/extended-grequest
> interaction, but it does sound like I could have implemented it better...
>
> > Is this an area you've worked with any or something we need to worry
> > about? The reason I was looking at it was in order to pull in the new
> > romio I figured I'd just add MPICH's concept of extended generalized
> > requests into our MPI, so I wanted to make sure I understood how they
> > were intended to work in mpich and I think it's pretty muddled there
> > unless I'm reading it wrong.
>
> You would be the first developers besides me to look closely at our
> extended generalized request proposal. It's no surprise to me it's
> muddled, since it hasn't had a lot of attention over the years.
>
> at various points in the last 6 years MPICH developers have modified the
> implementation of generalized requests to keep the common code paths
> speedy. That might explain some murkiness or things like 'waitsome'
> actually waiting for everything.
>
> ==rob
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
>
>
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the devel
mailing list