[mpich-discuss] MPI Hello world Program failes for large no. of process

Jeff Hammond jeff.science at gmail.com
Mon Oct 28 12:37:47 CDT 2013


Running anything that isn't system administration as root is a really
bad idea.  It's up there with smoking a cigarette while handling an
open container of gasoline.  You might not always get burned, but...

Jeff

On Mon, Oct 28, 2013 at 9:16 AM, heshsham basit <hf.basit1 at gmail.com> wrote:
> Hi,
>
> Whenever I launch more than 400 processes I get the following error.  But
> for less than 400 process every thing goes perfectly fine.
>
> I am running my code on two machines: hesh and Ubuntu  (from which I am
> launching the jobs)
>
> root at ubuntu:/home# mpiexec -f hosts.cfg -n 500 ./hello
>
> =====================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>
> =   EXIT CODE: 139
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> =====================================================================================
> [proxy:0:0 at hesh] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:928):
> assert (!closed) failed
>
> [proxy:0:0 at hesh] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at hesh] main (./pm/pmiserv/pmip.c:226): demux engine error waiting
> for event
> [mpiexec at ubuntu] HYDT_bscu_wait_for_completion
> (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated
> badly; aborting
>
> [mpiexec at ubuntu] HYDT_bsci_wait_for_completion
> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
> completion
> [mpiexec at ubuntu] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:191): launcher returned error waiting for
> completion
>
> [mpiexec at ubuntu] main (./ui/mpich/mpiexec.c:405): process manager error
> waiting for completion
>
>
> Also some I time I got the following error:
>
> root at ubuntu:/home# mpiexec -f hosts.cfg -n 400 ./hello
>
> [proxy:0:1 at ubuntu] send_cmd_downstream (./pm/pmiserv/pmip_pmi_v1.c:80):
> assert (!closed) failed
> [proxy:0:1 at ubuntu] fn_get (./pm/pmiserv/pmip_pmi_v1.c:349): error sending
> PMI response
> [proxy:0:1 at ubuntu] pmi_cb (./pm/pmiserv/pmip_cb.c:327): PMI handler returned
> error
>
> [proxy:0:1 at ubuntu] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:1 at ubuntu] main (./pm/pmiserv/pmip.c:226): demux engine error
> waiting for event
> [mpiexec at ubuntu] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert
> (!closed) failed
>
> [mpiexec at ubuntu] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at ubuntu] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
>
> [mpiexec at ubuntu] main (./ui/mpich/mpiexec.c:405): process manager error
> waiting for completion
>
>
>
> Also I raised the open file limit on both the PCs by using ulimit -n 2048.
> But still the same result.
>
>  The code is the simple one:
>
> /* C Example */
> #include <mpi.h>
> #include <stdio.h>
> #include <stddef.h>
> #include <stdlib.h>
>
> int main (int argc, char* argv[])
> {
>   int rank, size;
>   int buffer_length = MPI_MAX_PROCESSOR_NAME;
>   char hostname[buffer_length];
>
>   MPI_Init (&argc, &argv);      /* starts MPI */
>   MPI_Comm_rank (MPI_COMM_WORLD, &rank);        /* get current process id */
>   MPI_Comm_size (MPI_COMM_WORLD, &size);        /* get number of processes
> */
>
>   MPI_Get_processor_name(hostname, &buffer_length); /* get hostname */
>
>   printf( "Hello world from process %d running on %s of %d\n", rank,
> hostname, size );
>   MPI_Finalize();
>   return 0;
> }
>
> For 10000 process the program is unresponsive and I have to close the
> terminal to exit.t
>
> How do i solve this issue?
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



-- 
Jeff Hammond
jeff.science at gmail.com



More information about the discuss mailing list