<div dir="ltr">Hi, <br><br>Whenever I launch more than 400 processes I get the following error. But for less than 400 process every thing goes perfectly fine. <br><div><code> <br></code></div><div><code>I am running my code on two machines: hesh and Ubuntu (from which I am launching the jobs)<br>
</code></div><div><code><br></code></div><div><pre><code>root@ubuntu:/home# mpiexec -f hosts.cfg -n 500 ./hello<br><br>=====================================================================================<br>= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES<br>
= EXIT CODE: 139<br>= CLEANING UP REMAINING PROCESSES<br>= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES<br>=====================================================================================<br>[proxy:0:0@hesh] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:928): assert (!closed) failed<br>
[proxy:0:0@hesh] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status<br>[proxy:0:0@hesh] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event<br>[mpiexec@ubuntu] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting<br>
[mpiexec@ubuntu] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion<br>[mpiexec@ubuntu] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:191): launcher returned error waiting for completion<br>
[mpiexec@ubuntu] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion<br><br><br></code></pre><pre><code>Also some I time I got the following error:<br><br>root@ubuntu:/home# mpiexec -f hosts.cfg -n 400 ./hello<br>
[proxy:0:1@ubuntu] send_cmd_downstream (./pm/pmiserv/pmip_pmi_v1.c:80): assert (!closed) failed<br>[proxy:0:1@ubuntu] fn_get (./pm/pmiserv/pmip_pmi_v1.c:349): error sending PMI response<br>[proxy:0:1@ubuntu] pmi_cb (./pm/pmiserv/pmip_cb.c:327): PMI handler returned error<br>
[proxy:0:1@ubuntu] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status<br>[proxy:0:1@ubuntu] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event<br>[mpiexec@ubuntu] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert (!closed) failed<br>
[mpiexec@ubuntu] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status<br>[mpiexec@ubuntu] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event<br>
[mpiexec@ubuntu] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion<br><br> <br></code></pre><pre><code>Also I raised the </code>open file limit on both the PCs by using <code>ulimit -n 2048. But still the same result. </code></pre>
The code is the simple one:<br><br><pre><code>/* C Example */
#include <mpi.h>
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
int main (int argc, char* argv[])
{
int rank, size;
int buffer_length = MPI_MAX_PROCESSOR_NAME;
char hostname[buffer_length];
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
MPI_Get_processor_name(hostname, &buffer_length); /* get hostname */
printf( "Hello world from process %d running on %s of %d\n", rank, hostname, size );
MPI_Finalize();
return 0;
}<br><br></code></pre><pre><code>For 10000 process the program is unresponsive and I have to close the terminal to exit.t <br></code></pre>How do i solve this issue? <br></div></div>