[mpich-devel] Fatal error in PMPI_Barrier

Giuseppe Congiu giuseppe.congiu at seagate.com
Thu Dec 4 08:16:31 CST 2014


Hello Rob,

I think I may have just found what the problem is:

In ADIOI_GEN_IfileSync() I am passing args to MPI_Grequest_start() to later
get how much data has been written to global file system in the query
function and also return the error_code. The problem is that I am freeing
args in the pthread routine (which is not reported in the previous email
btw). This causes MPI_Wait() catching fire when the pthread has already
completed and args freed (which happens when I have many segments). I fixed
the code and now it seems to work : )

I am sorry I have bothered you with a silly problem just related to a lack
of attention from my side. Nevertheless, talking to you has helped me
finding the solution in a few mins after wandering for days.

Best Regards,

Giuseppe

On 4 December 2014 at 13:14, Giuseppe Congiu <giuseppe.congiu at seagate.com>
wrote:

>
> On 3 December 2014 at 21:19, Rob Latham <robl at mcs.anl.gov> wrote:
>
>
>> Sorry not to have responded to you sooner. Between a big conference and
>> US thanksgiving, a lot of us were out of the office for the last few weeks.
>>
>
> Hello Rob, I can totally understand, I have been also busy with other
> stuff in the last few weeks. Nevertheless, I cannot deny I have been
> looking forward for a reply : )
>
>
>> Welcome.  You're digging into ROMIO and Generalized Requests, so you've
>> picked two fairly obscure areas of MPI to work on!
>>
>> As it happens, I am the world's expert in both ROMIO and Generalized
>> requests.   (the population size for that domain is exceedingly small...)
>
>
> Then you are the blessing that I have been waiting for : )
>
>
>> I think this approach can greatly benefit collective I/O -- I observe on
>> some systems that the communication cost is not hurting two-phase
>> collective I/O but instead it is the synchronization:  if one I/O
>> aggregator is taking longer to do I/O than the others, all N processes pay
>> a price waiting for the laggard.
>>
>
> Yes, this is exactly the point. Communication cost does not seem to be a
> problem, on the other hand the MPI_Allreduce() at the end of the collective
> write is a real pain. By writing to local devices, the I/O response time
> variation among aggregators can be greatly reduced from what I observed.
>
>
>> before we get too far, we need to talk a bit about generalized requests
>> and their "super-strict" progress model.   Can you share your query, free,
>> and cancel functions?
>
>
> The call back functions are quite simple, they don't do anything special:
>
>     int ADIOI_GEN_Ifile_sync_query_fn( void *extra_state, MPI_Status
> *status )
>     {
>         ARGS *state = (ARGS*)extra_state;
>
>         MPI_Status_set_cancelled( status, 0 );
>         MPI_Status_set_elements( status, MPI_BYTE, state->bytes_xfered );
>         status->MPI_SOURCE = MPI_UNDEFINED;
>         status->MPI_TAG = MPI_UNDEFINED;
>
>         return state->error_code;
>     }
>
>     int ADIOI_GEN_Ifile_sync_free_fn( void *extra_state )
>     {
>         return MPI_SUCCESS;
>     }
>
>     int ADIOI_GEN_Ifile_sync_cancel_fn( void *extra_state, int complete )
>     {
>         return MPI_SUCCESS;
>     }
>
>
>> Which part of this ends up in a generalized request?
>
>
> The function that ends up in a generalized request is the synchronization
> function. ADIOI_GEN_IfileSync(), which is also pretty simple. It just reads
> the locally written file domains and writes them to the global file.
>
>
>> MPICH is going to set those hints to "automatic", but you have overridden
>> the defaults?  (it's the right override in most cases, so good! unless you
>> did not, in which case we should double check that you are not mixing
>> vendor MPICH and your own MPICH)
>
>
> I am setting the MPI hints using a IOR configuration file, follows a
> config file snippet:
>
> IOR_HINT__MPI__cb_buffer_size=16777216
> IOR_HINT__MPI__cb_nodes=1
> IOR_HINT__MPI__romio_cb_read=enable
> IOR_HINT__MPI__romio_cb_write=enable
> IOR_HINT__MPI__local_cache=enable
> IOR_HINT__MPI__local_cache_path=/tmp/ior_tmp_file
> IOR_HINT__MPI__local_cache_flush_flag=flush_immediate
> IOR_HINT__MPI__local_cache_discard_flag=enable
> IOR_HINT__MPI__romio_no_indep_rw=true
>
> the collective buffer size by default is 16MB so the hint in this case is
> superfluous (same goes for cb_nodes=1 since I am using only one node).
>
>
>>  MPIR_Barrier_intra(169):
>>> PSIlogger: Child with rank 5 exited with status 1.
>>> mpid_irecv_done(101)...: read from socket failed - request
>>> state:recv(pde)done
>>> wait entry: 2a1ef18, 21f2790, 2a1e8e8, 21f4488
>>> Fatal error in PMPI_Wait: Other MPI error, error stack:
>>> PMPI_Wait(180)..........: MPI_Wait(request=0x21f4488, status=0x21f2138)
>>> failed
>>> MPIR_Wait_impl(77)......:
>>> MPIR_Grequest_query(447): user request query function returned error
>>> code 1601
>>>
>>
>> I wonder if your query is doing more than it's supposed to do....
>
>
> Maybe you can tell me. I basically used Chapter 12 - External Interfaces -
> (from MPI specifications published September 21, 2012) as reference for the
> generalized request interface.
> I just want to remind that this problem is showing itself only when I am
> writing more than 4 segments in IOR.
>
>
>>  Does anyone have a clue of what is going wrong here?
>>>
>>
>> We're going to need to see some more code, I think...
>
>
> Follows the function that starts the generalized request to synchronize
> the file:
>
>     int ADIOI_GEN_IfileSync( ADIO_File fd, int count, MPI_Datatype
> datatype,
>                              int file_ptr_type, ADIO_Offset offset,
>                              ADIO_Request *request )
>     {
>         pthread_t      thread;
>         pthread_attr_t attr;
>         int            rc;
>         ARGS          *args;
>
>         /* copy args */
>         args = (ARGS *)ADIOI_Malloc(sizeof(ARGS));
>         args->fd = fd;
>         args->count = count;
>         args->datatype = datatype;
>         args->file_ptr_type = file_ptr_type;
>         args->offset = offset;
>         args->request = request;
>
>         /* start generalized request */
>         MPI_Grequest_start( &ADIOI_GEN_Ifile_sync_query_fn,
>             &ADIOI_GEN_Ifile_sync_free_fn,
>                             &ADIOI_GEN_Ifile_sync_cancel_fn,
>                             args,
>                             request );
>
>         /* spawn a new thread to handle the request */
>         pthread_attr_init( &attr );
>         pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_DETACHED );
>         rc = pthread_create( &thread, &attr, ADIOI_GEN_Ifile_sync_thread,
> args );
>         pthread_attr_destroy( &attr );
>
>         /* --Error handling starts here-- */
>         //TODO: do something
>         /* --End of Error handling-- */
>
>         return  MPI_SUCCESS;
>     }
>
> BTW, I just realized I forgot to free ARGS in the query function. I will
> fix that but I don't think it is causing the problem. Am I wrong?
>
> Thanks for your help,
>
> --
> Giuseppe Congiu *·* Research Engineer II
> Seagate Technology, LLC
> office: +44 (0)23 9249 6082 *·* mobile:
> www.seagate.com
>



-- 
Giuseppe Congiu *·* Research Engineer II
Seagate Technology, LLC
office: +44 (0)23 9249 6082 *·* mobile:
www.seagate.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20141204/bc07f5fc/attachment-0001.html>


More information about the devel mailing list