<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">And a short update to the previous email,<div><br></div><div>doing MPI_Comm_spawn even without the ssh launcher and using the native Torque TM interface does not work and returns the same error. </div><div><br></div><div>So in general, I can do MPI_Comm_spawn under the Torque environment immaterial of the launcher that I use. Without the torque environment, it works fine. </div><div><br></div><div>Best,</div><div>Suraj</div><div><br></div><div><br><div><div>On Feb 24, 2014, at 6:37 PM, Suraj Prabhakaran wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><span class="Apple-style-span" style="border-collapse: separate; font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; ">Hello,<br><br>I am trying to do an MPI_Comm_spawn under Torque environment. But I want to use the ssh launcher instead of TM interface since there seems to be a problem with launching large processes through the TM interface. mpiexec across nodes in general works fine but when I spawn using MPI_Comm_spawn to do a remote spawn on another node, I get the following error.<span class="Apple-converted-space"> </span><br><br>mpiexec -launcher ssh -np 8 ./example<br><br>[pid 12026] starting up on host grsacc01!<br>[pid 12027] starting up on host grsacc01!<br>[pid 12028] starting up on host grsacc01!<br>[pid 12021] starting up on host grsacc01!<br>[pid 12023] starting up on host grsacc01!<br>[pid 12025] starting up on host grsacc01!<br>[pid 12022] starting up on host grsacc01!<br>[pid 12024] starting up on host grsacc01!<br>0 completed MPI_Init<br>4 completed MPI_Init<br>Parent [pid 12025] about to spawn!<br>5 completed MPI_Init<br>3 completed MPI_Init<br>Parent [pid 12024] about to spawn!<br>Parent [pid 12026] about to spawn!<br>2 completed MPI_Init<br>Parent [pid 12023] about to spawn!<br>7 completed MPI_Init<br>Parent [pid 12028] about to spawn!<br>Parent [pid 12021] about to spawn!<br>6 completed MPI_Init<br>Parent [pid 12027] about to spawn!<br>1 completed MPI_Init<br>Parent [pid 12022] about to spawn!<br>[pid 20535] starting up on host grsacc02!<br>[pid 20536] starting up on host grsacc02!<br>Assertion failed in file src/util/procmap/local_proc.c at line 112: my_node_id <= max_node_id<br>internal ABORT - process 0<br>Assertion failed in file src/util/procmap/local_proc.c at line 112: my_node_id <= max_node_id<br>internal ABORT - process 1<br><br>===================================================================================<br>= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES<br>= PID 20535 RUNNING AT grsacc02<br>= EXIT CODE: 1<br>= CLEANING UP REMAINING PROCESSES<br>= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES<br>===================================================================================<br>[proxy:0:0@grsacc01] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:886): assert (!closed) failed<br>[proxy:0:0@grsacc01] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status<br>[proxy:0:0@grsacc01] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event<br>[mpiexec@grsacc01] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting<br>[mpiexec@grsacc01] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion<br>[mpiexec@grsacc01] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion<br>[mpiexec@grsacc01] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion<br><br><br>Is there a way to get rid of this?<br><br>Best,<br>Suraj</span></blockquote></div><br></div></body></html>