[mpich-discuss] mpich-master-v3.2-331-g44fd9c5f39e5: runtime error spawning processes
Siegmar Gross
siegmar.gross at informatik.hs-fulda.de
Wed Jun 8 03:14:47 CDT 2016
Hi,
I have built mpich-master-v3.2-331-g44fd9c5f39e5 on my machines (Solaris
10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with
gcc-5.1.0 and Sun C 5.13. Most of the time I get an error with different
error messages spawning processes on a Sparc machine.
tyr spawn 107 mpiexec -np 1 --host tyr,tyr,tyr,tyr,tyr spawn_master
Parent process 0 running on tyr.informatik.hs-fulda.de
I create 4 slave processes
Parent process 0: tasks in MPI_COMM_WORLD: 1
tasks in COMM_CHILD_PROCESSES local group: 1
tasks in COMM_CHILD_PROCESSES remote group: 4
Slave process 0 of 4 running on tyr.informatik.hs-fulda.de
Slave process 1 of 4 running on tyr.informatik.hs-fulda.de
Slave process 2 of 4 running on tyr.informatik.hs-fulda.de
Slave process 3 of 4 running on tyr.informatik.hs-fulda.de
spawn_slave 0: argv[0]: spawn_slave
spawn_slave 1: argv[0]: spawn_slave
spawn_slave 2: argv[0]: spawn_slave
spawn_slave 3: argv[0]: spawn_slave
tyr spawn 108 mpiexec -np 1 --host tyr,tyr,tyr,tyr,tyr spawn_master
Parent process 0 running on tyr.informatik.hs-fulda.de
I create 4 slave processes
Fatal error in MPI_Comm_spawn: Unknown error class, error stack:
MPI_Comm_spawn(141)...................: MPI_Comm_spawn(cmd="spawn_slave",
argv=0, maxprocs=4, MPI_INFO_NULL, root=0, MPI_COMM_WORLD,
intercomm=ffffffff7fffdf58, errors=0) failed
MPIDI_Comm_spawn_multiple(274)........:
MPID_Comm_accept(153).................:
MPIDI_Comm_accept(1039)...............:
MPIDU_Complete_posted_with_error(1137): Process failed
tyr spawn 109
tyr spawn 111 mpiexec -np 1 --host tyr,tyr,tyr,tyr,tyr spawn_master
Parent process 0 running on tyr.informatik.hs-fulda.de
I create 4 slave processes
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 3322 RUNNING AT tyr
= EXIT CODE: 10
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0 at tyr.informatik.hs-fulda.de] HYD_pmcd_pmip_control_cmd_cb
(../../../../mpich-master-v3.2-331-g44fd9c5f39e5/src/pm/hydra/pm/pmiserv/pmip_cb.c:883):
assert (!closed) failed
[proxy:0:0 at tyr.informatik.hs-fulda.de] HYDT_dmxu_poll_wait_for_event
(../../../../mpich-master-v3.2-331-g44fd9c5f39e5/src/pm/hydra/tools/demux/demux_poll.c:77):
callback returned error status
[proxy:0:0 at tyr.informatik.hs-fulda.de] main
(../../../../mpich-master-v3.2-331-g44fd9c5f39e5/src/pm/hydra/pm/pmiserv/pmip.c:202):
demux engine error waiting for event
[mpiexec at tyr.informatik.hs-fulda.de] HYDT_bscu_wait_for_completion
(../../../../mpich-master-v3.2-331-g44fd9c5f39e5/src/pm/hydra/tools/bootstrap/utils/bscu_wait.c:76):
one of the processes terminated badly; aborting
[mpiexec at tyr.informatik.hs-fulda.de] HYDT_bsci_wait_for_completion
(../../../../mpich-master-v3.2-331-g44fd9c5f39e5/src/pm/hydra/tools/bootstrap/src/bsci_wait.c:23):
launcher returned error waiting for completion
[mpiexec at tyr.informatik.hs-fulda.de] HYD_pmci_wait_for_completion
(../../../../mpich-master-v3.2-331-g44fd9c5f39e5/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:218):
launcher returned error waiting for completion
[mpiexec at tyr.informatik.hs-fulda.de] main
(../../../../mpich-master-v3.2-331-g44fd9c5f39e5/src/pm/hydra/ui/mpich/mpiexec.c:340):
process manager error waiting for completion
tyr spawn 112
I would be grateful if somebody can fix the problem. Please let me
know, if you need more information. Thank you very much for any help
in advance.
Kind regards
Siegmar
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list