[mpich-devel] Hydra fails to launch hello world on 1 proc

Jeff Hammond jhammond at alcf.anl.gov
Wed Apr 10 22:30:13 CDT 2013


"mpiexec -n 1 hostname" hangs with Hydra but runs fine with OpenMPI.

I'm having issues with MPI+Pthreads code with both MPICH and OpenMPI
that indicates that my system is not behaving as others do, but I'll
need to do a lot more work to figure out what the important
differences are.

Jeff

On Wed, Apr 10, 2013 at 9:24 PM, Dave Goodell <goodell at mcs.anl.gov> wrote:
> Does it run non-MPI jobs OK?  ("mpiexec -n 1 hostname", for example)
>
> Is this Linux or a Mac?
>
> If you temporarily disable the firewall, does that make a difference?
>
> -Dave
>
> On Apr 10, 2013, at 6:34 PM CDT, Jeff Hammond <jhammond at alcf.anl.gov> wrote:
>
>> Hi,
>>
>> I'm using the latest Git trunk build of MPICH with GCC and am unable
>> to run a 'hello, world' program using mpiexec.
>>
>> Any clues what the problem is?  I have not seen this problem before,
>> but this is newly refreshed laptop.  The firewall is active but I
>> would not have expected Hydra to need to go through the firewall to
>> launch a serial job.
>>
>> If there's something wrong with my setup, it would be nice if Hydra
>> would issue a warning/error instead of handing.
>>
>> Thanks,
>>
>> Jeff
>>
>> I compiled MPICH like this:
>> ../configure CC=gcc CXX=g++ FC=gfortran F77=gfortran --enable-threads
>> --enable-f77 --enable-fc --enable-g --with-pm=hydra --enable-rpath
>> --disable-static --enable-shared --with-device=ch3:nemesis
>> --prefix=/home/jeff/eclipse/MPICH/git/install-gcc
>>
>> jeff at goldstone:~/eclipse/OSPRI/mcs.svn/trunk/tests/devices/mpi-pt> mpicc -show
>> gcc -I/home/jeff/eclipse/MPICH/git/install-gcc/include
>> -L/home/jeff/eclipse/MPICH/git/install-gcc/lib64 -Wl,-rpath
>> -Wl,/home/jeff/eclipse/MPICH/git/install-gcc/lib64 -lmpich -lopa -lmpl
>> -lrt -lpthread
>>
>> jeff at goldstone:~/eclipse/OSPRI/mcs.svn/trunk/tests/devices/mpi-pt> make
>> mpicc -g -O0 -Wall -std=gnu99 -DDEBUG -c hello.c -o hello.o
>> mpicc -g -O0 -Wall -std=gnu99 safemalloc.o hello.o -lm -o hello.x
>> rm hello.o
>>
>> jeff at goldstone:~/eclipse/OSPRI/mcs.svn/trunk/tests/devices/mpi-pt>
>> mpiexec -n 1 ./hello.x
>> ^C[mpiexec at goldstone.mcs.anl.gov] Sending Ctrl-C to processes as requested
>> [mpiexec at goldstone.mcs.anl.gov] Press Ctrl-C again to force abort
>> [mpiexec at goldstone.mcs.anl.gov] HYDU_sock_write
>> (../../../../src/pm/hydra/utils/sock/sock.c:291): write error (Bad
>> file descriptor)
>> [mpiexec at goldstone.mcs.anl.gov] HYD_pmcd_pmiserv_send_signal
>> (../../../../src/pm/hydra/pm/pmiserv/pmiserv_cb.c:170): unable to
>> write data to proxy
>> [mpiexec at goldstone.mcs.anl.gov] ui_cmd_cb
>> (../../../../src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:79): unable to
>> send signal downstream
>> [mpiexec at goldstone.mcs.anl.gov] HYDT_dmxu_poll_wait_for_event
>> (../../../../src/pm/hydra/tools/demux/demux_poll.c:77): callback
>> returned error status
>> [mpiexec at goldstone.mcs.anl.gov] HYD_pmci_wait_for_completion
>> (../../../../src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:197): error
>> waiting for event
>> [mpiexec at goldstone.mcs.anl.gov] main
>> (../../../../src/pm/hydra/ui/mpich/mpiexec.c:331): process manager
>> error waiting for completion
>>
>> jeff at goldstone:~/eclipse/OSPRI/mcs.svn/trunk/tests/devices/mpi-pt> ./hello.x
>> <no errors>
>>
>> jeff at goldstone:~/eclipse/OSPRI/mcs.svn/trunk/tests/devices/mpi-pt> cat hello.c
>> #include <stdio.h>
>> #include <stdlib.h>
>>
>> #include <mpi.h>
>>
>> int main(int argc, char * argv[])
>> {
>>    int provided;
>>
>>    MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
>>    if (provided!=MPI_THREAD_MULTIPLE)
>>        MPI_Abort(MPI_COMM_WORLD, 1);
>>
>>    int rank, size;
>>    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>    MPI_Comm_size(MPI_COMM_WORLD, &size);
>>
>>    MPI_Finalize();
>>
>>    return 0;
>> }
>>
>>
>> --
>> Jeff Hammond
>> Argonne Leadership Computing Facility
>> University of Chicago Computation Institute
>> jhammond at alcf.anl.gov / (630) 252-5381
>> http://www.linkedin.com/in/jeffhammond
>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>



-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond


More information about the devel mailing list