[mpich-devel] Hydra fails to launch hello world on 1 proc

Jeff Hammond jhammond at alcf.anl.gov
Wed Apr 10 22:45:38 CDT 2013


Not sure if this helps at all.  It makes no sense to me.

Jeff

On Wed, Apr 10, 2013 at 9:33 PM, Dave Goodell <goodell at mcs.anl.gov> wrote:
> Linux or Mac?  If it's Linux, an "strace -f -ff -o strace.out mpiexec -n 1 hostname" might shed some light on the situation.
>
> -Dave
>
> On Apr 10, 2013, at 10:30 PM CDT, Jeff Hammond <jhammond at alcf.anl.gov> wrote:
>
>> "mpiexec -n 1 hostname" hangs with Hydra but runs fine with OpenMPI.
>>
>> I'm having issues with MPI+Pthreads code with both MPICH and OpenMPI
>> that indicates that my system is not behaving as others do, but I'll
>> need to do a lot more work to figure out what the important
>> differences are.
>>
>> Jeff
>>
>> On Wed, Apr 10, 2013 at 9:24 PM, Dave Goodell <goodell at mcs.anl.gov> wrote:
>>> Does it run non-MPI jobs OK?  ("mpiexec -n 1 hostname", for example)
>>>
>>> Is this Linux or a Mac?
>>>
>>> If you temporarily disable the firewall, does that make a difference?
>>>
>>> -Dave
>>>
>>> On Apr 10, 2013, at 6:34 PM CDT, Jeff Hammond <jhammond at alcf.anl.gov> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm using the latest Git trunk build of MPICH with GCC and am unable
>>>> to run a 'hello, world' program using mpiexec.
>>>>
>>>> Any clues what the problem is?  I have not seen this problem before,
>>>> but this is newly refreshed laptop.  The firewall is active but I
>>>> would not have expected Hydra to need to go through the firewall to
>>>> launch a serial job.
>>>>
>>>> If there's something wrong with my setup, it would be nice if Hydra
>>>> would issue a warning/error instead of handing.
>>>>
>>>> Thanks,
>>>>
>>>> Jeff
>>>>
>>>> I compiled MPICH like this:
>>>> ../configure CC=gcc CXX=g++ FC=gfortran F77=gfortran --enable-threads
>>>> --enable-f77 --enable-fc --enable-g --with-pm=hydra --enable-rpath
>>>> --disable-static --enable-shared --with-device=ch3:nemesis
>>>> --prefix=/home/jeff/eclipse/MPICH/git/install-gcc
>>>>
>>>> jeff at goldstone:~/eclipse/OSPRI/mcs.svn/trunk/tests/devices/mpi-pt> mpicc -show
>>>> gcc -I/home/jeff/eclipse/MPICH/git/install-gcc/include
>>>> -L/home/jeff/eclipse/MPICH/git/install-gcc/lib64 -Wl,-rpath
>>>> -Wl,/home/jeff/eclipse/MPICH/git/install-gcc/lib64 -lmpich -lopa -lmpl
>>>> -lrt -lpthread
>>>>
>>>> jeff at goldstone:~/eclipse/OSPRI/mcs.svn/trunk/tests/devices/mpi-pt> make
>>>> mpicc -g -O0 -Wall -std=gnu99 -DDEBUG -c hello.c -o hello.o
>>>> mpicc -g -O0 -Wall -std=gnu99 safemalloc.o hello.o -lm -o hello.x
>>>> rm hello.o
>>>>
>>>> jeff at goldstone:~/eclipse/OSPRI/mcs.svn/trunk/tests/devices/mpi-pt>
>>>> mpiexec -n 1 ./hello.x
>>>> ^C[mpiexec at goldstone.mcs.anl.gov] Sending Ctrl-C to processes as requested
>>>> [mpiexec at goldstone.mcs.anl.gov] Press Ctrl-C again to force abort
>>>> [mpiexec at goldstone.mcs.anl.gov] HYDU_sock_write
>>>> (../../../../src/pm/hydra/utils/sock/sock.c:291): write error (Bad
>>>> file descriptor)
>>>> [mpiexec at goldstone.mcs.anl.gov] HYD_pmcd_pmiserv_send_signal
>>>> (../../../../src/pm/hydra/pm/pmiserv/pmiserv_cb.c:170): unable to
>>>> write data to proxy
>>>> [mpiexec at goldstone.mcs.anl.gov] ui_cmd_cb
>>>> (../../../../src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:79): unable to
>>>> send signal downstream
>>>> [mpiexec at goldstone.mcs.anl.gov] HYDT_dmxu_poll_wait_for_event
>>>> (../../../../src/pm/hydra/tools/demux/demux_poll.c:77): callback
>>>> returned error status
>>>> [mpiexec at goldstone.mcs.anl.gov] HYD_pmci_wait_for_completion
>>>> (../../../../src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:197): error
>>>> waiting for event
>>>> [mpiexec at goldstone.mcs.anl.gov] main
>>>> (../../../../src/pm/hydra/ui/mpich/mpiexec.c:331): process manager
>>>> error waiting for completion
>>>>
>>>> jeff at goldstone:~/eclipse/OSPRI/mcs.svn/trunk/tests/devices/mpi-pt> ./hello.x
>>>> <no errors>
>>>>
>>>> jeff at goldstone:~/eclipse/OSPRI/mcs.svn/trunk/tests/devices/mpi-pt> cat hello.c
>>>> #include <stdio.h>
>>>> #include <stdlib.h>
>>>>
>>>> #include <mpi.h>
>>>>
>>>> int main(int argc, char * argv[])
>>>> {
>>>>   int provided;
>>>>
>>>>   MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
>>>>   if (provided!=MPI_THREAD_MULTIPLE)
>>>>       MPI_Abort(MPI_COMM_WORLD, 1);
>>>>
>>>>   int rank, size;
>>>>   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>   MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>>
>>>>   MPI_Finalize();
>>>>
>>>>   return 0;
>>>> }
>>>>
>>>>
>>>> --
>>>> Jeff Hammond
>>>> Argonne Leadership Computing Facility
>>>> University of Chicago Computation Institute
>>>> jhammond at alcf.anl.gov / (630) 252-5381
>>>> http://www.linkedin.com/in/jeffhammond
>>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>>>
>>
>>
>>
>> --
>> Jeff Hammond
>> Argonne Leadership Computing Facility
>> University of Chicago Computation Institute
>> jhammond at alcf.anl.gov / (630) 252-5381
>> http://www.linkedin.com/in/jeffhammond
>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>



-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strace.out.11831
Type: application/octet-stream
Size: 38312 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/devel/attachments/20130410/e54092e5/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strace.out.11832
Type: application/octet-stream
Size: 38014 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/devel/attachments/20130410/e54092e5/attachment-0003.obj>


More information about the devel mailing list