[mpich-discuss] MPI Hello world Program failes for large no. of process

heshsham basit hf.basit1 at gmail.com
Mon Oct 28 09:16:09 CDT 2013


Hi,

Whenever I launch more than 400 processes I get the following error.  But
for less than 400 process every thing goes perfectly fine.

I am running my code on two machines: hesh and Ubuntu  (from which I am
launching the jobs)

root at ubuntu:/home# mpiexec -f hosts.cfg -n 500 ./hello

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
[proxy:0:0 at hesh] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:928): assert (!closed) failed
[proxy:0:0 at hesh] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at hesh] main (./pm/pmiserv/pmip.c:226): demux engine error
waiting for event
[mpiexec at ubuntu] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:70): one of the processes
terminated badly; aborting
[mpiexec at ubuntu] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error
waiting for completion
[mpiexec at ubuntu] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:191): launcher returned error waiting for
completion
[mpiexec at ubuntu] main (./ui/mpich/mpiexec.c:405): process manager
error waiting for completion


Also some I time I got the following error:

root at ubuntu:/home# mpiexec -f hosts.cfg -n 400 ./hello
[proxy:0:1 at ubuntu] send_cmd_downstream
(./pm/pmiserv/pmip_pmi_v1.c:80): assert (!closed) failed
[proxy:0:1 at ubuntu] fn_get (./pm/pmiserv/pmip_pmi_v1.c:349): error
sending PMI response
[proxy:0:1 at ubuntu] pmi_cb (./pm/pmiserv/pmip_cb.c:327): PMI handler
returned error
[proxy:0:1 at ubuntu] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1 at ubuntu] main (./pm/pmiserv/pmip.c:226): demux engine error
waiting for event
[mpiexec at ubuntu] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert
(!closed) failed
[mpiexec at ubuntu] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec at ubuntu] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
[mpiexec at ubuntu] main (./ui/mpich/mpiexec.c:405): process manager
error waiting for completion



Also I raised the open file limit on both the PCs by using ulimit -n
2048. But still the same result.

 The code is the simple one:

/* C Example */
#include <mpi.h>
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>

int main (int argc, char* argv[])
{
  int rank, size;
  int buffer_length = MPI_MAX_PROCESSOR_NAME;
  char hostname[buffer_length];

  MPI_Init (&argc, &argv);      /* starts MPI */
  MPI_Comm_rank (MPI_COMM_WORLD, &rank);        /* get current process id */
  MPI_Comm_size (MPI_COMM_WORLD, &size);        /* get number of processes */

  MPI_Get_processor_name(hostname, &buffer_length); /* get hostname */

  printf( "Hello world from process %d running on %s of %d\n", rank,
hostname, size );
  MPI_Finalize();
  return 0;
}

For 10000 process the program is unresponsive and I have to close the
terminal to exit.t

How do i solve this issue?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20131028/fb6a0695/attachment.html>


More information about the discuss mailing list