[mpich-devel] Fatal error in PMPI_Barrier

Giuseppe Congiu giuseppe.congiu at seagate.com
Tue Nov 18 10:32:02 CST 2014


Hello,

I am new to the MPI forum and I hope you can help me with my problem.

I am currently working on ROMIO developing a modification that enables the
usage of locally attached SSDs as persistent cache in a HPC cluster. This
extension is supposed to improve collective write operations performance,
making them complete faster on local storage devices (scaling linearly with
the number of aggregator nodes) and afterwards asynchronously flushing the
local data to global file system while the application can progress with
computation.

The extension uses the MPI Generalized request interface to provide
non-blocking flushing of local data to global file system. I have added an
ADIO_WriteContigLocal() function to the "common" ADIO driver that can be
triggered by an apposite MPI hint (e.g. local_cache = enable) set by the
application. The flushing of local data is started immediately after
ADIOI_GEN_WriteStridedColl() returns to MPI_File_write_all(). The
non-blocking syncing function starts a new pthread (I know this is not
portable but the cluster I am using supports pthreads :-) ) and assigns it
a file domain, which will be then read from the local file and written to
the global file system.

When the file is finally closed, the implementation (still accordingly to
the specific MPI hint) invokes an ADIO_CloseLocal() function which
MPI_Wait(s) for all the pending generalized requests that have been started
earlier. Finally, in ADIO_Close() I have a MPI_Barrier() to make sure that
when MPI_File_close() returns all the file domains are consistent with the
file in the global file system.

I am testing my code using IOR with the following configuration on a single
node with 8 cores:

$ mpiexec -np 8 ~/benchmarks/IOR/IOR -r -w -a MPIIO -b 1m -t 1m -c -s 5 -U
~/romio-totalview-dbg/ior.hints -o /work/ior11/testFile -H -V

Strangely, when I increment the number of segments to be written by each
process (i.e. -s) above 4/5 I get that IOR aborts with errors. Follows an
example:

Command line used: ~/benchmarks/IOR/IOR -r -w -a MPIIO -b 1m -t 1m -c -s 5
-U ~/romio-totalview-dbg/ior.hints -o /work/ior11/testFile -H -V
Machine: Linux xxxx128

Summary:
api                = MPIIO (version=3, subversion=0)
test filename      = /work/ior11/testFile
access             = single-shared-file
ordering in a file = sequential offsets
ordering inter file= no tasks offsets
clients            = 8 (8 per node)
repetitions        = 1
xfersize           = 1 MiB
blocksize          = 1 MiB
aggregate filesize = 40 MiB


hints passed to MPI_File_open() {
cb_buffer_size = 16777216
cb_nodes = 1
romio_cb_read = enable
romio_cb_write = enable
local_cache = enable
local_cache_path = /tmp/ior_tmp_file
local_cache_flush_flag = flush_immediate
local_cache_discard_flag = enable
romio_no_indep_rw = true
}

hints returned from opened file {
cb_buffer_size = 16777216
romio_cb_read = enable
romio_cb_write = enable
cb_nodes = 1
romio_no_indep_rw = true
romio_cb_pfr = disable
romio_cb_fr_types = aar
romio_cb_fr_alignment = 1
romio_cb_ds_threshold = 0
romio_cb_alltoall = automatic
ind_rd_buffer_size = 4194304
ind_wr_buffer_size = 524288
romio_ds_read = automatic
romio_ds_write = automatic
local_cache = enable
local_cache_flush_flag = flush_immediate
local_cache_discard_flag = enable
local_cache_path = /tmp/ior_tmp_file
cb_config_list = *:1
}
Assertion failed in file ~/mpich2/src/mpi/coll/helper_fns.c at line 491:
status->MPI_TAG == recvtag
PSIlogger: Child with rank 1 exited with status 1.
(null)
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(428)......: MPI_Barrier(comm=0x84000001) failed
MPIR_Barrier_impl(335).: Failure during collective
MPIR_Barrier_impl(328).:
MPIR_Barrier(292)......:
MPIR_Barrier_intra(169):
mpid_irecv_done(101)...: read from socket failed - request
state:recv(pde)done
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(428)......: MPI_Barrier(comm=0x84000001) failed
MPIR_Barrier_impl(335).: Failure during collective
MPIR_Barrier_impl(328).:
MPIR_Barrier(292)......:
MPIR_Barrier_intra(169):
PSIlogger: Child with rank 4 exited with status 1.
mpid_irecv_done(101)...: read from socket failed - request
state:recv(pde)done
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(428)......: MPI_Barrier(comm=0x84000001) failed
MPIR_Barrier_impl(335).: Failure during collective
MPIR_Barrier_impl(328).:
MPIR_Barrier(292)......:
MPIR_Barrier_intra(169):
mpid_irecv_done(101)...: read from socket failed - request
state:recv(pde)done
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(428)......: MPI_Barrier(comm=0x84000001) failed
MPIR_Barrier_impl(335).: Failure during collective
MPIR_Barrier_impl(328).:
MPIR_Barrier(292)......:
MPIR_Barrier_intra(169):
PSIlogger: Child with rank 3 exited with status 1.
mpid_irecv_done(101)...: read from socket failed - request
state:recv(pde)done
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(428)......: MPI_Barrier(comm=0x84000000) failed
MPIR_Barrier_impl(335).: Failure during collective
MPIR_Barrier_impl(328).:
MPIR_Barrier(292)......:
MPIR_Barrier_intra(169):
mpid_irecv_done(101)...: read from socket failed - request
state:recv(pde)done
MPIR_Barrier_intra(169):
PSIlogger: Child with rank 5 exited with status 1.
mpid_irecv_done(101)...: read from socket failed - request
state:recv(pde)done
wait entry: 2a1ef18, 21f2790, 2a1e8e8, 21f4488
Fatal error in PMPI_Wait: Other MPI error, error stack:
PMPI_Wait(180)..........: MPI_Wait(request=0x21f4488, status=0x21f2138)
failed
MPIR_Wait_impl(77)......:
MPIR_Grequest_query(447): user request query function returned error code
1601
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(428)......: MPI_Barrier(comm=0x84000000) failed
MPIR_Barrier_impl(335).: Failure during collective
MPIR_Barrier_impl(328).:
MPIR_Barrier(292)......:
MPIR_Barrier_intra(169):
mpid_irecv_done(101)...: read from socket failed - request
state:recv(pde)done
MPIR_Barrier_intra(169):
mpid_irecv_done(101)...: read from socket failed - request
state:recv(pde)done
MPIR_Barrier_intra(169):
PSIlogger: Child with rank 2 exited with status 1.
PSIlogger: Child with rank 7 exited with status 1.
PSIlogger: Child with rank 6 exited with status 1.
PSIlogger: Child with rank 0 exited with status 1.
mpid_irecv_done(101)...: read from socket failed - request
state:recv(pde)done

I am using MPICH version 3.1 and I have enabled multi-thread support for
it. Nevertheless, there seems to be some problem with MPI_Barrier() and
MPI_Wait(). The MPI_Wait() error in particular seems to be related to the
bug reported at this link: http://trac.mpich.org/projects/mpich/ticket/1849,
which BTW I fixed accordingly but without any luck I would say.

To further provide some description of what is happening here: when I
increment the number of segments, IOR will increment the number of calls to
MPI_File_write_all() (made one after the other). For every collective write
my extension keeps track of the new file domain and starts a new thread to
"handle" it. Eventually, there might be a number of pthread (active at the
same time) equal to the number of segments, which should not be a problem
considering small numbers like 5 or 10.

Does anyone have a clue of what is going wrong here?

Many Thanks,

-- 
Giuseppe Congiu *·* Research Engineer II
Seagate Technology, LLC
office: +44 (0)23 9249 6082 *·* mobile:
www.seagate.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20141118/d1aff9fd/attachment.html>


More information about the devel mailing list