[mpich-discuss] mpich-master-v3.2-331-g44fd9c5f39e5: runtime error spawning processes

Siegmar Gross siegmar.gross at informatik.hs-fulda.de
Wed Jun 8 03:14:47 CDT 2016


Hi,

I have built mpich-master-v3.2-331-g44fd9c5f39e5 on my machines (Solaris
10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with
gcc-5.1.0 and Sun C 5.13. Most of the time I get an error with different
error messages spawning processes on a Sparc machine.


tyr spawn 107 mpiexec -np 1 --host tyr,tyr,tyr,tyr,tyr spawn_master

Parent process 0 running on tyr.informatik.hs-fulda.de
   I create 4 slave processes

Parent process 0: tasks in MPI_COMM_WORLD:                    1
                   tasks in COMM_CHILD_PROCESSES local group:  1
                   tasks in COMM_CHILD_PROCESSES remote group: 4

Slave process 0 of 4 running on tyr.informatik.hs-fulda.de
Slave process 1 of 4 running on tyr.informatik.hs-fulda.de
Slave process 2 of 4 running on tyr.informatik.hs-fulda.de
Slave process 3 of 4 running on tyr.informatik.hs-fulda.de
spawn_slave 0: argv[0]: spawn_slave
spawn_slave 1: argv[0]: spawn_slave
spawn_slave 2: argv[0]: spawn_slave
spawn_slave 3: argv[0]: spawn_slave



tyr spawn 108 mpiexec -np 1 --host tyr,tyr,tyr,tyr,tyr spawn_master

Parent process 0 running on tyr.informatik.hs-fulda.de
   I create 4 slave processes

Fatal error in MPI_Comm_spawn: Unknown error class, error stack:
MPI_Comm_spawn(141)...................: MPI_Comm_spawn(cmd="spawn_slave", 
argv=0, maxprocs=4, MPI_INFO_NULL, root=0, MPI_COMM_WORLD, 
intercomm=ffffffff7fffdf58, errors=0) failed
MPIDI_Comm_spawn_multiple(274)........:
MPID_Comm_accept(153).................:
MPIDI_Comm_accept(1039)...............:
MPIDU_Complete_posted_with_error(1137): Process failed
tyr spawn 109



tyr spawn 111 mpiexec -np 1 --host tyr,tyr,tyr,tyr,tyr spawn_master

Parent process 0 running on tyr.informatik.hs-fulda.de
   I create 4 slave processes


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 3322 RUNNING AT tyr
=   EXIT CODE: 10
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0 at tyr.informatik.hs-fulda.de] HYD_pmcd_pmip_control_cmd_cb 
(../../../../mpich-master-v3.2-331-g44fd9c5f39e5/src/pm/hydra/pm/pmiserv/pmip_cb.c:883): 
assert (!closed) failed
[proxy:0:0 at tyr.informatik.hs-fulda.de] HYDT_dmxu_poll_wait_for_event 
(../../../../mpich-master-v3.2-331-g44fd9c5f39e5/src/pm/hydra/tools/demux/demux_poll.c:77): 
callback returned error status
[proxy:0:0 at tyr.informatik.hs-fulda.de] main 
(../../../../mpich-master-v3.2-331-g44fd9c5f39e5/src/pm/hydra/pm/pmiserv/pmip.c:202): 
demux engine error waiting for event
[mpiexec at tyr.informatik.hs-fulda.de] HYDT_bscu_wait_for_completion 
(../../../../mpich-master-v3.2-331-g44fd9c5f39e5/src/pm/hydra/tools/bootstrap/utils/bscu_wait.c:76): 
one of the processes terminated badly; aborting
[mpiexec at tyr.informatik.hs-fulda.de] HYDT_bsci_wait_for_completion 
(../../../../mpich-master-v3.2-331-g44fd9c5f39e5/src/pm/hydra/tools/bootstrap/src/bsci_wait.c:23): 
launcher returned error waiting for completion
[mpiexec at tyr.informatik.hs-fulda.de] HYD_pmci_wait_for_completion 
(../../../../mpich-master-v3.2-331-g44fd9c5f39e5/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:218): 
launcher returned error waiting for completion
[mpiexec at tyr.informatik.hs-fulda.de] main 
(../../../../mpich-master-v3.2-331-g44fd9c5f39e5/src/pm/hydra/ui/mpich/mpiexec.c:340): 
process manager error waiting for completion
tyr spawn 112




I would be grateful if somebody can fix the problem. Please let me
know, if you need more information. Thank you very much for any help
in advance.


Kind regards

Siegmar
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list