[mpich-discuss] Parallel test hanging with mpich on rhel7
Orion Poplawski
orion at cora.nwra.com
Tue Feb 4 15:23:11 CST 2014
On 02/03/2014 07:29 PM, Orion Poplawski wrote:
> We're starting to do the Fedora EPEL builds for EPEL7. I'm building
> hdf5 1.8.12 with:
>
> mpich-3.0.4-4.el7.x86_64
> gcc-4.8.2-3.el7.x86_64
>
> The following test hangs here:
>
> $ mpirun -np 4 ./t_cache
> ===================================
> Parallel metadata cache tests
> mpi_size = 4
> express_test = 1
> ===================================
> *** Hint ***
> You can use environment variable HDF5_PARAPREFIX to run parallel test
> files in a
> different directory or to add file type prefix. E.g.,
> HDF5_PARAPREFIX=pfs:/PFS/user/me
> export HDF5_PARAPREFIX
> *** End of Hint ***
> 0:setup_rand(): seed = 138071.
> 3:setup_rand(): seed = 149196.
> 2:setup_rand(): seed = 160135.
> 1:setup_rand(): seed = 180134.
> Testing server smoke check
> PASSED
> Testing smoke check #1 -- process 0 only md write strategy
This turned out to be triggered when running oversubscribed. Reducing the
number of mpi processes to equal or less than the number of cores made this
particular hang go away.
However, I'm still seeing a hang on our Fedora builders in a different test:
make[4]: Entering directory `/builddir/build/BUILD/hdf5-1.8.12/mpich/testpar'
============================
Testing t_mpi
Full log:
http://koji.fedoraproject.org/koji/getfile?taskID=6492001&name=build.log
Unfortunately I'm not able to reproduce this on my own machines so I'm at a
loss here.
--
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane orion at nwra.com
Boulder, CO 80301 http://www.nwra.com
More information about the discuss
mailing list