[mpich-devel] Fatal error in PMPI_Barrier

Giuseppe Congiu giuseppe.congiu at seagate.com
Thu Dec 4 07:14:18 CST 2014

On 3 December 2014 at 21:19, Rob Latham <robl at mcs.anl.gov> wrote:

> Sorry not to have responded to you sooner. Between a big conference and US
> thanksgiving, a lot of us were out of the office for the last few weeks.

Hello Rob, I can totally understand, I have been also busy with other stuff
in the last few weeks. Nevertheless, I cannot deny I have been looking
forward for a reply : )

> Welcome.  You're digging into ROMIO and Generalized Requests, so you've
> picked two fairly obscure areas of MPI to work on!
> As it happens, I am the world's expert in both ROMIO and Generalized
> requests.   (the population size for that domain is exceedingly small...)

Then you are the blessing that I have been waiting for : )

> I think this approach can greatly benefit collective I/O -- I observe on
> some systems that the communication cost is not hurting two-phase
> collective I/O but instead it is the synchronization:  if one I/O
> aggregator is taking longer to do I/O than the others, all N processes pay
> a price waiting for the laggard.

Yes, this is exactly the point. Communication cost does not seem to be a
problem, on the other hand the MPI_Allreduce() at the end of the collective
write is a real pain. By writing to local devices, the I/O response time
variation among aggregators can be greatly reduced from what I observed.

> before we get too far, we need to talk a bit about generalized requests
> and their "super-strict" progress model.   Can you share your query, free,
> and cancel functions?

The call back functions are quite simple, they don't do anything special:

    int ADIOI_GEN_Ifile_sync_query_fn( void *extra_state, MPI_Status
*status )
        ARGS *state = (ARGS*)extra_state;

        MPI_Status_set_cancelled( status, 0 );
        MPI_Status_set_elements( status, MPI_BYTE, state->bytes_xfered );
        status->MPI_SOURCE = MPI_UNDEFINED;
        status->MPI_TAG = MPI_UNDEFINED;

        return state->error_code;

    int ADIOI_GEN_Ifile_sync_free_fn( void *extra_state )
        return MPI_SUCCESS;

    int ADIOI_GEN_Ifile_sync_cancel_fn( void *extra_state, int complete )
        return MPI_SUCCESS;

> Which part of this ends up in a generalized request?

The function that ends up in a generalized request is the synchronization
function. ADIOI_GEN_IfileSync(), which is also pretty simple. It just reads
the locally written file domains and writes them to the global file.

> MPICH is going to set those hints to "automatic", but you have overridden
> the defaults?  (it's the right override in most cases, so good! unless you
> did not, in which case we should double check that you are not mixing
> vendor MPICH and your own MPICH)

I am setting the MPI hints using a IOR configuration file, follows a config
file snippet:


the collective buffer size by default is 16MB so the hint in this case is
superfluous (same goes for cb_nodes=1 since I am using only one node).

>  MPIR_Barrier_intra(169):
>> PSIlogger: Child with rank 5 exited with status 1.
>> mpid_irecv_done(101)...: read from socket failed - request
>> state:recv(pde)done
>> wait entry: 2a1ef18, 21f2790, 2a1e8e8, 21f4488
>> Fatal error in PMPI_Wait: Other MPI error, error stack:
>> PMPI_Wait(180)..........: MPI_Wait(request=0x21f4488, status=0x21f2138)
>> failed
>> MPIR_Wait_impl(77)......:
>> MPIR_Grequest_query(447): user request query function returned error
>> code 1601
> I wonder if your query is doing more than it's supposed to do....

Maybe you can tell me. I basically used Chapter 12 - External Interfaces -
(from MPI specifications published September 21, 2012) as reference for the
generalized request interface.
I just want to remind that this problem is showing itself only when I am
writing more than 4 segments in IOR.

>  Does anyone have a clue of what is going wrong here?
> We're going to need to see some more code, I think...

Follows the function that starts the generalized request to synchronize the

    int ADIOI_GEN_IfileSync( ADIO_File fd, int count, MPI_Datatype
                             int file_ptr_type, ADIO_Offset offset,
                             ADIO_Request *request )
        pthread_t      thread;
        pthread_attr_t attr;
        int            rc;
        ARGS          *args;

        /* copy args */
        args = (ARGS *)ADIOI_Malloc(sizeof(ARGS));
        args->fd = fd;
        args->count = count;
        args->datatype = datatype;
        args->file_ptr_type = file_ptr_type;
        args->offset = offset;
        args->request = request;

        /* start generalized request */
        MPI_Grequest_start( &ADIOI_GEN_Ifile_sync_query_fn,
                            request );

        /* spawn a new thread to handle the request */
        pthread_attr_init( &attr );
        pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_DETACHED );
        rc = pthread_create( &thread, &attr, ADIOI_GEN_Ifile_sync_thread,
args );
        pthread_attr_destroy( &attr );

        /* --Error handling starts here-- */
        //TODO: do something
        /* --End of Error handling-- */

        return  MPI_SUCCESS;

BTW, I just realized I forgot to free ARGS in the query function. I will
fix that but I don't think it is causing the problem. Am I wrong?

Thanks for your help,

Giuseppe Congiu *·* Research Engineer II
Seagate Technology, LLC
office: +44 (0)23 9249 6082 *·* mobile:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20141204/f8287db4/attachment.html>

More information about the devel mailing list