[mpich-devel] Fw: romio questions

Fri Apr 4 16:34:39 CDT 2014

This is getting out of my depth.  MPICH guys: Mark Allen is updating 
Platform's MPI and has been asking me some questions about 
multi-threaded ROMIO.

The independent ROMIO routines might make some local MPI calls to 
process the data type, but otherwise, yes, the only blocking call would 
be the file system operation.

I haven't thought about what would happen if MPI_File_write_all was 
called from two threads.

MPICH guys: there's no way the CS_ENTER/EXIT macros can be that clever, 
right?

==rob

On 04/04/2014 04:28 PM, Mark Allen wrote:
> Thanks.  I looked a bit at the CS_ENTER/EXIT code.  Am I right that all
> the non-collective MPIO calls like MPI_File_read etc are only blocking
> in the sense of waiting on some local operation to complete?  If that's
> the case it would be okay for them to hold a lock from beginning to end
> of the call like that.  But for the collective MPIO operations I don't
> think you could just hold the lock for the entire function call since
> you might end up waiting for remote peers to arrive, and if two
> collectives were taking place at the same time they could mismatch and
> deadlock.  But I'm not sure if that's what the code is doing or not.  Is
> there a release of the lock hidden somewhere when collectives block?  If
> inside the MPI_Allgather for example MPI performed a release and regain
> of the MPIO lock is the blocking MPIO collective holding any shared
> resource that would get corrupted by letting other threads in at that point?
>
> Fwiw I also noticed mpi-io/ioreq_f2c.c looks to be a case where an early
> return runs the risk of a mismatched CS_ENTER w/o a corresponding
> CS_EXIT.  It looks like all the other files used goto fn_exit to ensure
> a match.

>
> Mark
>
>
> Inactive hide details for Rob Latham ---04/04/2014 09:49:00 AM---On
> 04/04/2014 01:22 AM, Paul Coffman wrote: > ----- Forwarded Rob Latham
> ---04/04/2014 09:49:00 AM---On 04/04/2014 01:22 AM, Paul Coffman wrote:
>  > ----- Forwarded by Paul Coffman/Rochester/IBM on 04/04
>
> From: Rob Latham <robl at mcs.anl.gov>
> To: Paul Coffman/Rochester/IBM at IBMUS
> Cc: Mark Allen/Dallas/IBM at IBMUS
> Date: 04/04/2014 09:49 AM
> Subject: Re: Fw: romio questions
>
> ------------------------------------------------------------------------
>
>
>
>
>
> On 04/04/2014 01:22 AM, Paul Coffman wrote:
>
>  > ----- Forwarded by Paul Coffman/Rochester/IBM on 04/04/2014 01:22 AM
> -----
>  >
>  > From: Mark Allen/Dallas/IBM
>  > To: Paul Coffman/Rochester/IBM at IBMUS,
>  > Date: 04/04/2014 01:07 AM
>  > Subject: romio questions
>  > ------------------------------------------------------------------------
>  >
>  >
>  > I have two questions/topics for you:
>  >
>  > First I wanted to ask do you happen to know if romio is thread safe?  I
>  > see a fair number of critical-section begin/end macros and am guessing
>  > it is, but thought I'd ask anyway.
>
> the internal romio routines (things in the ADIO and ADIOI namespace)
> rely on several bits of global state -- the flattened representation of
> the file and memory datatype come to mind first, but there are probably
> others.  The critical section macros are at the MPI-IO interface to
> romio and should provide a "big lock" to the rest.
>
> I haven't tried this though.  I would be a little nervous about it
> working without a few patches.
>
>  > Second, I noticed romio uses an extension of generalized requests
>  > described here:
>  >
> http://www.cs.illinois.edu/~wgropp/bib/papers/2007/latham_grequest-enhance-4.pdf
>  >
>  > but the code looks confused on whether the proposed wait_fn callback is
>  > waiting for a single request or all the requests.
>  >
>  > In romio's definitions of
>  >      ADIOI_PVFS2_aio_wait_fn
>  >      ADIOI_GEN_aio_wait_fn
>  > the wait_fn looks to me very much like it's waiting for all the reqs, vs
>  > the NTFS function that looks like it's waiting on some
>  >      ADIOI_NTFS_aio_wait_fn
>
> the intent for the extended wait_fn was indeed to wait for all
> outstanding generalized requests.  Specifically, to call aio_suspend on
> more than one operation at a time.
>
> I would not read too much into NTFS_aio_wait_fn.  that work was done
> quickly for the paper and then set aside.
>
>  > And where these wait_fn callbacks get used, MPI_Waitsome uses
>  > MPIR_Grequest_progress_poke which conditionally calls wait_fn. It seems
>  > to me this would make MPI_Waitsome erroneously block until all its
>  > generalized requests finish (the code in MPIR_Grequest_progress_poke
>  > looks like it believes the wait_fn is supposed to just complete one
>  > request).
>
> It has been some years since I looked at the waitsome/extended-grequest
> interaction, but it does sound like I could have implemented it better...
>
>  > Is this an area you've worked with any or something we need to worry
>  > about?  The reason I was looking at it was in order to pull in the new
>  > romio I figured I'd just add MPICH's concept of extended generalized
>  > requests into our MPI, so I wanted to make sure I understood how they
>  > were intended to work in mpich and I think it's pretty muddled there
>  > unless I'm reading it wrong.
>
> You would be the first developers besides me to look closely at our
> extended generalized request proposal.  It's no surprise to me it's
> muddled, since it hasn't had a lot of attention over the years.
>
> at various points in the last 6 years MPICH developers have modified the
> implementation of generalized requests to keep the common code paths
> speedy.  That might explain some murkiness or things like 'waitsome'
> actually waiting for everything.
>
> ==rob
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
>
>

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA