[mpich-devel] Fatal error in PMPI_Barrier

Rob Latham robl at mcs.anl.gov
Wed Dec 3 15:19:46 CST 2014



On 11/18/2014 10:32 AM, Giuseppe Congiu wrote:
> Hello,
>
> I am new to the MPI forum and I hope you can help me with my problem.

Sorry not to have responded to you sooner. Between a big conference and 
US thanksgiving, a lot of us were out of the office for the last few weeks.

Welcome.  You're digging into ROMIO and Generalized Requests, so you've 
picked two fairly obscure areas of MPI to work on!

As it happens, I am the world's expert in both ROMIO and Generalized 
requests.   (the population size for that domain is exceedingly small...)

> I am currently working on ROMIO developing a modification that enables
> the usage of locally attached SSDs as persistent cache in a HPC cluster.
> This extension is supposed to improve collective write operations
> performance, making them complete faster on local storage devices
> (scaling linearly with the number of aggregator nodes) and afterwards
> asynchronously flushing the local data to global file system while the
> application can progress with computation.

I think this approach can greatly benefit collective I/O -- I observe on 
some systems that the communication cost is not hurting two-phase 
collective I/O but instead it is the synchronization:  if one I/O 
aggregator is taking longer to do I/O than the others, all N processes 
pay a price waiting for the laggard.

> The extension uses the MPI Generalized request interface to provide
> non-blocking flushing of local data to global file system.

before we get too far, we need to talk a bit about generalized requests 
and their "super-strict" progress model.   Can you share your query, 
free, and cancel functions?

 > I have added
> an ADIO_WriteContigLocal() function to the "common" ADIO driver that can
> be triggered by an apposite MPI hint (e.g. local_cache = enable) set by
> the application. The flushing of local data is started immediately after
> ADIOI_GEN_WriteStridedColl() returns to MPI_File_write_all(). The
> non-blocking syncing function starts a new pthread (I know this is not
> portable but the cluster I am using supports pthreads :-) ) and assigns
> it a file domain, which will be then read from the local file and
> written to the global file system.

Which part of this ends up in a generalized request?

> When the file is finally closed, the implementation (still accordingly
> to the specific MPI hint) invokes an ADIO_CloseLocal() function which
> MPI_Wait(s) for all the pending generalized requests that have been
> started earlier. Finally, in ADIO_Close() I have a MPI_Barrier() to make
> sure that when MPI_File_close() returns all the file domains are
> consistent with the file in the global file system.
>
> I am testing my code using IOR with the following configuration on a
> single node with 8 cores:
>
> $ mpiexec -np 8 ~/benchmarks/IOR/IOR -r -w -a MPIIO -b 1m -t 1m -c -s 5
> -U ~/romio-totalview-dbg/ior.hints -o /work/ior11/testFile -H -V
>
> Strangely, when I increment the number of segments to be written by each
> process (i.e. -s) above 4/5 I get that IOR aborts with errors. Follows
> an example:
>
> Command line used: ~/benchmarks/IOR/IOR -r -w -a MPIIO -b 1m -t 1m -c -s
> 5 -U ~/romio-totalview-dbg/ior.hints -o /work/ior11/testFile -H -V
> Machine: Linux xxxx128
>
> Summary:
> api                = MPIIO (version=3, subversion=0)
> test filename      = /work/ior11/testFile
> access             = single-shared-file
> ordering in a file = sequential offsets
> ordering inter file= no tasks offsets
> clients            = 8 (8 per node)
> repetitions        = 1
> xfersize           = 1 MiB
> blocksize          = 1 MiB
> aggregate filesize = 40 MiB
>
>
> hints passed to MPI_File_open() {
> cb_buffer_size = 16777216
> cb_nodes = 1
> romio_cb_read = enable
> romio_cb_write = enable
> local_cache = enable
> local_cache_path = /tmp/ior_tmp_file
> local_cache_flush_flag = flush_immediate
> local_cache_discard_flag = enable
> romio_no_indep_rw = true
> }
>
> hints returned from opened file {
> cb_buffer_size = 16777216
> romio_cb_read = enable
> romio_cb_write = enable

  MPICH is going to set those hints to "automatic", but you have 
overridden the defaults?  (it's the right override in most cases, so 
good! unless you did not, in which case we should double check that you 
are not mixing vendor MPICH and your own MPICH)


> MPIR_Barrier_intra(169):
> PSIlogger: Child with rank 5 exited with status 1.
> mpid_irecv_done(101)...: read from socket failed - request
> state:recv(pde)done
> wait entry: 2a1ef18, 21f2790, 2a1e8e8, 21f4488
> Fatal error in PMPI_Wait: Other MPI error, error stack:
> PMPI_Wait(180)..........: MPI_Wait(request=0x21f4488, status=0x21f2138)
> failed
> MPIR_Wait_impl(77)......:
> MPIR_Grequest_query(447): user request query function returned error
> code 1601

I wonder if your query is doing more than it's supposed to do....



> Does anyone have a clue of what is going wrong here?

We're going to need to see some more code, I think...

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


More information about the devel mailing list