[mpich-discuss] Parallel test hanging with mpich on rhel7

Orion Poplawski orion at cora.nwra.com
Tue Feb 4 15:23:11 CST 2014


On 02/03/2014 07:29 PM, Orion Poplawski wrote:
> We're starting to do the Fedora EPEL builds for EPEL7.  I'm building
> hdf5 1.8.12 with:
> 
> mpich-3.0.4-4.el7.x86_64
> gcc-4.8.2-3.el7.x86_64
> 
> The following test hangs here:
> 
> $ mpirun -np 4 ./t_cache
> ===================================
> Parallel metadata cache tests
>         mpi_size     = 4
>         express_test = 1
> ===================================
> *** Hint ***
> You can use environment variable HDF5_PARAPREFIX to run parallel test
> files in a
> different directory or to add file type prefix. E.g.,
>    HDF5_PARAPREFIX=pfs:/PFS/user/me
>    export HDF5_PARAPREFIX
> *** End of Hint ***
> 0:setup_rand(): seed = 138071.
> 3:setup_rand(): seed = 149196.
> 2:setup_rand(): seed = 160135.
> 1:setup_rand(): seed = 180134.
> Testing server smoke check
> PASSED
> Testing smoke check #1 -- process 0 only md write strategy

This turned out to be triggered when running oversubscribed.  Reducing the
number of mpi processes to equal or less than the number of cores made this
particular hang go away.

However, I'm still seeing a hang on our Fedora builders in a different test:


make[4]: Entering directory `/builddir/build/BUILD/hdf5-1.8.12/mpich/testpar'
============================
Testing  t_mpi

Full log:
http://koji.fedoraproject.org/koji/getfile?taskID=6492001&name=build.log

Unfortunately I'm not able to reproduce this on my own machines so I'm at a
loss here.

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA, Boulder/CoRA Office             FAX: 303-415-9702
3380 Mitchell Lane                       orion at nwra.com
Boulder, CO 80301                   http://www.nwra.com



More information about the discuss mailing list