[mpich-discuss] Parallel test hanging with mpich on rhel7

Orion Poplawski orion at cora.nwra.com
Mon Feb 3 21:05:30 CST 2014


On 02/03/2014 07:29 PM, Orion Poplawski wrote:
> We're starting to do the Fedora EPEL builds for EPEL7.  I'm building
> hdf5 1.8.12 with:
> 
> mpich-3.0.4-4.el7.x86_64
> gcc-4.8.2-3.el7.x86_64
> 
> The following test hangs here:
> 
> $ mpirun -np 4 ./t_cache
> ===================================
> Parallel metadata cache tests
>         mpi_size     = 4
>         express_test = 1
> ===================================
> *** Hint ***
> You can use environment variable HDF5_PARAPREFIX to run parallel test
> files in a
> different directory or to add file type prefix. E.g.,
>    HDF5_PARAPREFIX=pfs:/PFS/user/me
>    export HDF5_PARAPREFIX
> *** End of Hint ***
> 0:setup_rand(): seed = 138071.
> 3:setup_rand(): seed = 149196.
> 2:setup_rand(): seed = 160135.
> 1:setup_rand(): seed = 180134.
> Testing server smoke check
> PASSED
> Testing smoke check #1 -- process 0 only md write strategy
> 
> Runs fine with openmpi.  Not seeing problems either in Fedora, which has
> similar versions, so really not sure what is at issue, or how to debug
> further.

I ran with:

MPICH_DBG_LEVEL=VERBOSE
MPICH_DBG=FILE

log files are here:

http://www.cora.nwra.com/~orion/hdf5-mpich-debug.tar.gz

Hope that helps.

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion at cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com



More information about the discuss mailing list