[mpich-discuss] runtime error for mpich-master-v3.2-247-g1aec69b70951 with Solaris Sparc

Siegmar Gross siegmar.gross at informatik.hs-fulda.de
Thu Apr 21 08:21:16 CDT 2016


Hi,

I have built mpich-master-v3.2-247-g1aec69b70951 on my machines
(Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64)
with gcc-5.1.0 and Sun C 5.13. I get the following errors for both
compilers if I run small programs that spawn processes on two Sparc
machines. Everything works fine if I use Linux and/or Solaris x86_64.
"mpiexec" is aliased to 'mpiexec -genvnone'. I get different errors,
if I run the same command several times as you can see below (sometimes
it even works as expected).


tyr spawn 119 mpichversion
MPICH Version:          3.2
MPICH Release date:     Tue Apr 19 00:00:44 CDT 2016
MPICH Device:           ch3:nemesis
MPICH configure:        --prefix=/usr/local/mpich-3.2.1_64_gcc 
--libdir=/usr/local/mpich-3.2.1_64_gcc/lib64 
--includedir=/usr/local/mpich-3.2.1_64_gcc/include64 CC=gcc CXX=g++ 
F77=gfortran FC=gfortran CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64 
LDFLAGS=-m64 -L/usr/lib/sparcv9 -Wl,-rpath -Wl,/usr/lib/sparcv9 
--enable-fortran=yes --enable-cxx --enable-romio --enable-debuginfo 
--enable-smpcoll --enable-threads=multiple --with-thread-package=posix 
--enable-shared
MPICH CC:       gcc -m64   -O2
MPICH CXX:      g++ -m64  -O2
MPICH F77:      gfortran -m64  -O2
MPICH FC:       gfortran -m64  -O2


tyr spawn 120 mpiexec -np 1 --host tyr,tyr,tyr,ruester,ruester spawn_master

Parent process 0 running on tyr.informatik.hs-fulda.de
   I create 4 slave processes

Fatal error in MPI_Comm_spawn: Unknown error class, error stack:
MPI_Comm_spawn(144)...................: MPI_Comm_spawn(cmd="spawn_slave", 
argv=0, maxprocs=4, MPI_INFO_NULL, root=0, MPI_COMM_WORLD, 
intercomm=ffffffff7fffdf58, errors=0) failed
MPIDI_Comm_spawn_multiple(274)........:
MPID_Comm_accept(153).................:
MPIDI_Comm_accept(1039)...............:
MPIDU_Complete_posted_with_error(1137): Process failed

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 3182 RUNNING AT tyr
=   EXIT CODE: 10
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================



tyr spawn 121 mpiexec -np 1 --host tyr,tyr,tyr,ruester,ruester spawn_master

Parent process 0 running on tyr.informatik.hs-fulda.de
   I create 4 slave processes

Parent process 0: tasks in MPI_COMM_WORLD:                    1
                   tasks in COMM_CHILD_PROCESSES local group:  1
                   tasks in COMM_CHILD_PROCESSES remote group: 4

Slave process 3 of 4 running on ruester.informatik.hs-fulda.de
Slave process 2 of 4 running on ruester.informatik.hs-fulda.de
spawn_slave 2: argv[0]: spawn_slave
spawn_slave 3: argv[0]: spawn_slave
Slave process 0 of 4 running on tyr.informatik.hs-fulda.de
spawn_slave 0: argv[0]: spawn_slave
Slave process 1 of 4 running on tyr.informatik.hs-fulda.de
spawn_slave 1: argv[0]: spawn_slave



tyr spawn 122 mpiexec -np 1 --host tyr,tyr,tyr,ruester,ruester spawn_master

Parent process 0 running on tyr.informatik.hs-fulda.de
   I create 4 slave processes



tyr spawn 123 mpiexec -np 1 --host tyr,tyr,tyr,ruester,ruester spawn_master

Parent process 0 running on tyr.informatik.hs-fulda.de
   I create 4 slave processes

Fatal error in MPI_Comm_spawn: Unknown error class, error stack:
MPI_Comm_spawn(144)...................: MPI_Comm_spawn(cmd="spawn_slave", 
argv=0, maxprocs=4, MPI_INFO_NULL, root=0, MPI_COMM_WORLD, 
intercomm=ffffffff7fffdf58, errors=0) failed
MPIDI_Comm_spawn_multiple(274)........:
MPID_Comm_accept(153).................:
MPIDI_Comm_accept(1039)...............:
MPIDU_Complete_posted_with_error(1137): Process failed
tyr spawn 124 mpiexec -np 1 --host tyr,tyr,tyr,ruester,ruester spawn_master

Parent process 0 running on tyr.informatik.hs-fulda.de
   I create 4 slave processes

Fatal error in MPI_Comm_spawn: Unknown error class, error stack:
MPI_Comm_spawn(144)...................: MPI_Comm_spawn(cmd="spawn_slave", 
argv=0, maxprocs=4, MPI_INFO_NULL, root=0, MPI_COMM_WORLD, 
intercomm=ffffffff7fffdf58, errors=0) failed
MPIDI_Comm_spawn_multiple(274)........:
MPID_Comm_accept(153).................:
MPIDI_Comm_accept(1039)...............:
MPIDU_Complete_posted_with_error(1137): Process failed

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 3466 RUNNING AT tyr
=   EXIT CODE: 10
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0 at tyr.informatik.hs-fulda.de] HYD_pmcd_pmip_control_cmd_cb 
(../../../../mpich-master-v3.2-247-g1aec69b70951/src/pm/hydra/pm/pmiserv/pmip_cb.c:886): 
assert (!closed) failed
[proxy:0:0 at tyr.informatik.hs-fulda.de[proxy:1:1 at ruester.informatik.hs-fulda.de] 
HYD_pmcd_pmip_control_cmd_cb 
(../../../../mpich-master-v3.2-247-g1aec69b70951/src/pm/hydra/pm/pmiserv/pmip_cb.c] 
HYDT_dmxu_poll_wait_for_event 
(../../../../mpich-master-v3.2-247-g1aec69b70951/src/pm/hydra/tools/demux/demux_poll.c:77): 
callback returned error status
[proxy:0:0 at tyr.informatik.hs-fulda.de] main 
(../../../../mpich-master-v3.2-247-g1aec69b70951/src/pm/hydra/pm/pmiserv/pmip.c:202): 
demux engine error waiting for event
[mpiexec at tyr.informatik.hs-fulda.de] HYDT_bscu_wait_for_completion 
(../../../../mpich-master-v3.2-247-g1aec69b70951/src/pm/hydra/tools/bootstrap/utils/bscu_wait.c:76): 
one of the processes terminated badly; aborting
[mpiexec at tyr.informatik.hs-fulda.de] HYDT_bsci_wait_for_completion 
(../../../../mpich-master-v3.2-247-g1aec69b70951/src/pm/hydra/tools/bootstrap/src/bsci_wait.c:23): 
launcher returned error waiting for completion
[mpiexec at tyr.informatik.hs-fulda.de] HYD_pmci_wait_for_completion 
(../../../../mpich-master-v3.2-247-g1aec69b70951/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:218): 
launcher returned error waiting for completion
[mpiexec at tyr.informatik.hs-fulda.de] main 
(../../../../mpich-master-v3.2-247-g1aec69b70951/src/pm/hydra/ui/mpich/mpiexec.c:340): 
process manager error waiting for completion
tyr spawn 125


I would be grateful if somebody can fix the problem. Thank you very
much for any help in advance.


Kind regards

Siegmar
-------------- next part --------------
A non-text attachment was scrubbed...
Name: spawn_master.c
Type: text/x-csrc
Size: 6372 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160421/ee83f788/attachment.bin>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list