[mpich-devel] Fatal error in PMPI_Barrier
Rob Latham
robl at mcs.anl.gov
Thu Dec 4 08:51:43 CST 2014
On 12/04/2014 08:16 AM, Giuseppe Congiu wrote:
> Hello Rob,
>
> I think I may have just found what the problem is:
>
> In ADIOI_GEN_IfileSync() I am passing args to MPI_Grequest_start() to
> later get how much data has been written to global file system in the
> query function and also return the error_code. The problem is that I am
> freeing args in the pthread routine (which is not reported in the
> previous email btw). This causes MPI_Wait() catching fire when the
> pthread has already completed and args freed (which happens when I have
> many segments). I fixed the code and now it seems to work : )
>
> I am sorry I have bothered you with a silly problem just related to a
> lack of attention from my side. Nevertheless, talking to you has helped
> me finding the solution in a few mins after wandering for days.
My old office mate had a little rubber duck he'd keep on his desk. When
faced with a programming problem, he'd talk to "debugging ducky" and
sort out the state.
I'm glad I could be your debugging ducky. I am always glad to come
across folks brave enough to hack on the ROMIO code. Please keep me
updated with your research.
==rob
>
> Best Regards,
>
> Giuseppe
>
> On 4 December 2014 at 13:14, Giuseppe Congiu
> <giuseppe.congiu at seagate.com <mailto:giuseppe.congiu at seagate.com>> wrote:
>
>
> On 3 December 2014 at 21:19, Rob Latham <robl at mcs.anl.gov
> <mailto:robl at mcs.anl.gov>> wrote:
>
> Sorry not to have responded to you sooner. Between a big
> conference and US thanksgiving, a lot of us were out of the
> office for the last few weeks.
>
>
> Hello Rob, I can totally understand, I have been also busy with
> other stuff in the last few weeks. Nevertheless, I cannot deny I
> have been looking forward for a reply : )
>
> Welcome. You're digging into ROMIO and Generalized Requests, so
> you've picked two fairly obscure areas of MPI to work on!
>
> As it happens, I am the world's expert in both ROMIO and
> Generalized requests. (the population size for that domain is
> exceedingly small...)
>
>
> Then you are the blessing that I have been waiting for : )
>
> I think this approach can greatly benefit collective I/O -- I
> observe on some systems that the communication cost is not
> hurting two-phase collective I/O but instead it is the
> synchronization: if one I/O aggregator is taking longer to do
> I/O than the others, all N processes pay a price waiting for the
> laggard.
>
>
> Yes, this is exactly the point. Communication cost does not seem to
> be a problem, on the other hand the MPI_Allreduce() at the end of
> the collective write is a real pain. By writing to local devices,
> the I/O response time variation among aggregators can be greatly
> reduced from what I observed.
>
> before we get too far, we need to talk a bit about generalized
> requests and their "super-strict" progress model. Can you
> share your query, free, and cancel functions?
>
>
> The call back functions are quite simple, they don't do anything
> special:
>
> int ADIOI_GEN_Ifile_sync_query_fn( void *extra_state,
> MPI_Status *status )
> {
> ARGS *state = (ARGS*)extra_state;
>
> MPI_Status_set_cancelled( status, 0 );
> MPI_Status_set_elements( status, MPI_BYTE,
> state->bytes_xfered );
> status->MPI_SOURCE = MPI_UNDEFINED;
> status->MPI_TAG = MPI_UNDEFINED;
> return state->error_code;
> }
>
> int ADIOI_GEN_Ifile_sync_free_fn( void *extra_state )
> {
> return MPI_SUCCESS;
> }
>
> int ADIOI_GEN_Ifile_sync_cancel_fn( void *extra_state, int
> complete )
> {
> return MPI_SUCCESS;
> }
>
> Which part of this ends up in a generalized request?
>
>
> The function that ends up in a generalized request is the
> synchronization function. ADIOI_GEN_IfileSync(), which is also
> pretty simple. It just reads the locally written file domains and
> writes them to the global file.
>
> MPICH is going to set those hints to "automatic", but you have
> overridden the defaults? (it's the right override in most
> cases, so good! unless you did not, in which case we should
> double check that you are not mixing vendor MPICH and your own
> MPICH)
>
> I am setting the MPI hints using a IOR configuration file, follows a
> config file snippet:
>
> IOR_HINT__MPI__cb_buffer_size=16777216
> IOR_HINT__MPI__cb_nodes=1
> IOR_HINT__MPI__romio_cb_read=enable
> IOR_HINT__MPI__romio_cb_write=enable
> IOR_HINT__MPI__local_cache=enable
> IOR_HINT__MPI__local_cache_path=/tmp/ior_tmp_file
> IOR_HINT__MPI__local_cache_flush_flag=flush_immediate
> IOR_HINT__MPI__local_cache_discard_flag=enable
> IOR_HINT__MPI__romio_no_indep_rw=true
>
> the collective buffer size by default is 16MB so the hint in this
> case is superfluous (same goes for cb_nodes=1 since I am using only
> one node).
>
> MPIR_Barrier_intra(169):
> PSIlogger: Child with rank 5 exited with status 1.
> mpid_irecv_done(101)...: read from socket failed - request
> state:recv(pde)done
> wait entry: 2a1ef18, 21f2790, 2a1e8e8, 21f4488
> Fatal error in PMPI_Wait: Other MPI error, error stack:
> PMPI_Wait(180)..........: MPI_Wait(request=0x21f4488,
> status=0x21f2138)
> failed
> MPIR_Wait_impl(77)......:
> MPIR_Grequest_query(447): user request query function
> returned error
> code 1601
>
>
> I wonder if your query is doing more than it's supposed to do....
>
>
> Maybe you can tell me. I basically used Chapter 12 - External
> Interfaces - (from MPI specifications published September 21, 2012)
> as reference for the generalized request interface.
> I just want to remind that this problem is showing itself only when
> I am writing more than 4 segments in IOR.
>
> Does anyone have a clue of what is going wrong here?
>
>
> We're going to need to see some more code, I think...
>
>
> Follows the function that starts the generalized request to
> synchronize the file:
>
> int ADIOI_GEN_IfileSync( ADIO_File fd, int count, MPI_Datatype
> datatype,
> int file_ptr_type, ADIO_Offset offset,
> ADIO_Request *request )
> {
> pthread_t thread;
> pthread_attr_t attr;
> int rc;
> ARGS *args;
>
> /* copy args */
> args = (ARGS *)ADIOI_Malloc(sizeof(ARGS));
> args->fd = fd;
> args->count = count;
> args->datatype = datatype;
> args->file_ptr_type = file_ptr_type;
> args->offset = offset;
> args->request = request;
>
> /* start generalized request */
> MPI_Grequest_start( &ADIOI_GEN_Ifile_sync_query_fn,
> &ADIOI_GEN_Ifile_sync_free_fn,
> &ADIOI_GEN_Ifile_sync_cancel_fn,
> args,
> request );
>
> /* spawn a new thread to handle the request */
> pthread_attr_init( &attr );
> pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_DETACHED );
> rc = pthread_create( &thread, &attr,
> ADIOI_GEN_Ifile_sync_thread, args );
> pthread_attr_destroy( &attr );
>
> /* --Error handling starts here-- */
> //TODO: do something
> /* --End of Error handling-- */
> return MPI_SUCCESS;
> }
>
> BTW, I just realized I forgot to free ARGS in the query function. I
> will fix that but I don't think it is causing the problem. Am I wrong?
>
> Thanks for your help,
>
> --
> Giuseppe Congiu *·* Research Engineer II
> Seagate Technology, LLC
> office: +44 (0)23 9249 6082 <tel:%2B44%20%280%2923%209249%206082>
> *·* mobile:
> www.seagate.com <http://www.seagate.com>
>
>
>
>
> --
> Giuseppe Congiu *·* Research Engineer II
> Seagate Technology, LLC
> office: +44 (0)23 9249 6082 *·* mobile:
> www.seagate.com <http://www.seagate.com>
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the devel
mailing list