[mpich-devel] Hydra fails to launch hello world on 1 proc
Dave Goodell
goodell at mcs.anl.gov
Wed Apr 10 22:33:47 CDT 2013
Linux or Mac? If it's Linux, an "strace -f -ff -o strace.out mpiexec -n 1 hostname" might shed some light on the situation.
-Dave
On Apr 10, 2013, at 10:30 PM CDT, Jeff Hammond <jhammond at alcf.anl.gov> wrote:
> "mpiexec -n 1 hostname" hangs with Hydra but runs fine with OpenMPI.
>
> I'm having issues with MPI+Pthreads code with both MPICH and OpenMPI
> that indicates that my system is not behaving as others do, but I'll
> need to do a lot more work to figure out what the important
> differences are.
>
> Jeff
>
> On Wed, Apr 10, 2013 at 9:24 PM, Dave Goodell <goodell at mcs.anl.gov> wrote:
>> Does it run non-MPI jobs OK? ("mpiexec -n 1 hostname", for example)
>>
>> Is this Linux or a Mac?
>>
>> If you temporarily disable the firewall, does that make a difference?
>>
>> -Dave
>>
>> On Apr 10, 2013, at 6:34 PM CDT, Jeff Hammond <jhammond at alcf.anl.gov> wrote:
>>
>>> Hi,
>>>
>>> I'm using the latest Git trunk build of MPICH with GCC and am unable
>>> to run a 'hello, world' program using mpiexec.
>>>
>>> Any clues what the problem is? I have not seen this problem before,
>>> but this is newly refreshed laptop. The firewall is active but I
>>> would not have expected Hydra to need to go through the firewall to
>>> launch a serial job.
>>>
>>> If there's something wrong with my setup, it would be nice if Hydra
>>> would issue a warning/error instead of handing.
>>>
>>> Thanks,
>>>
>>> Jeff
>>>
>>> I compiled MPICH like this:
>>> ../configure CC=gcc CXX=g++ FC=gfortran F77=gfortran --enable-threads
>>> --enable-f77 --enable-fc --enable-g --with-pm=hydra --enable-rpath
>>> --disable-static --enable-shared --with-device=ch3:nemesis
>>> --prefix=/home/jeff/eclipse/MPICH/git/install-gcc
>>>
>>> jeff at goldstone:~/eclipse/OSPRI/mcs.svn/trunk/tests/devices/mpi-pt> mpicc -show
>>> gcc -I/home/jeff/eclipse/MPICH/git/install-gcc/include
>>> -L/home/jeff/eclipse/MPICH/git/install-gcc/lib64 -Wl,-rpath
>>> -Wl,/home/jeff/eclipse/MPICH/git/install-gcc/lib64 -lmpich -lopa -lmpl
>>> -lrt -lpthread
>>>
>>> jeff at goldstone:~/eclipse/OSPRI/mcs.svn/trunk/tests/devices/mpi-pt> make
>>> mpicc -g -O0 -Wall -std=gnu99 -DDEBUG -c hello.c -o hello.o
>>> mpicc -g -O0 -Wall -std=gnu99 safemalloc.o hello.o -lm -o hello.x
>>> rm hello.o
>>>
>>> jeff at goldstone:~/eclipse/OSPRI/mcs.svn/trunk/tests/devices/mpi-pt>
>>> mpiexec -n 1 ./hello.x
>>> ^C[mpiexec at goldstone.mcs.anl.gov] Sending Ctrl-C to processes as requested
>>> [mpiexec at goldstone.mcs.anl.gov] Press Ctrl-C again to force abort
>>> [mpiexec at goldstone.mcs.anl.gov] HYDU_sock_write
>>> (../../../../src/pm/hydra/utils/sock/sock.c:291): write error (Bad
>>> file descriptor)
>>> [mpiexec at goldstone.mcs.anl.gov] HYD_pmcd_pmiserv_send_signal
>>> (../../../../src/pm/hydra/pm/pmiserv/pmiserv_cb.c:170): unable to
>>> write data to proxy
>>> [mpiexec at goldstone.mcs.anl.gov] ui_cmd_cb
>>> (../../../../src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:79): unable to
>>> send signal downstream
>>> [mpiexec at goldstone.mcs.anl.gov] HYDT_dmxu_poll_wait_for_event
>>> (../../../../src/pm/hydra/tools/demux/demux_poll.c:77): callback
>>> returned error status
>>> [mpiexec at goldstone.mcs.anl.gov] HYD_pmci_wait_for_completion
>>> (../../../../src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:197): error
>>> waiting for event
>>> [mpiexec at goldstone.mcs.anl.gov] main
>>> (../../../../src/pm/hydra/ui/mpich/mpiexec.c:331): process manager
>>> error waiting for completion
>>>
>>> jeff at goldstone:~/eclipse/OSPRI/mcs.svn/trunk/tests/devices/mpi-pt> ./hello.x
>>> <no errors>
>>>
>>> jeff at goldstone:~/eclipse/OSPRI/mcs.svn/trunk/tests/devices/mpi-pt> cat hello.c
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>>
>>> #include <mpi.h>
>>>
>>> int main(int argc, char * argv[])
>>> {
>>> int provided;
>>>
>>> MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
>>> if (provided!=MPI_THREAD_MULTIPLE)
>>> MPI_Abort(MPI_COMM_WORLD, 1);
>>>
>>> int rank, size;
>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>
>>> MPI_Finalize();
>>>
>>> return 0;
>>> }
>>>
>>>
>>> --
>>> Jeff Hammond
>>> Argonne Leadership Computing Facility
>>> University of Chicago Computation Institute
>>> jhammond at alcf.anl.gov / (630) 252-5381
>>> http://www.linkedin.com/in/jeffhammond
>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>>
>
>
>
> --
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhammond at alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
More information about the devel
mailing list