[mpich-discuss] MPI Hello world Program failes for large no. of process

Wesley Bland wbland at mcs.anl.gov
Mon Oct 28 10:20:16 CDT 2013


If you’re only using two machines, you might be running out of resources when you try to use that many processes. If you want to run hundred of processes, you need hundreds of cores.

On Oct 28, 2013, at 9:16 AM, heshsham basit <hf.basit1 at gmail.com> wrote:

> Hi, 
> 
> Whenever I launch more than 400 processes I get the following error.  But for less than 400 process every thing goes perfectly fine.  
>  
> I am running my code on two machines: hesh and Ubuntu  (from which I am launching the jobs)
> 
> root at ubuntu:/home# mpiexec -f hosts.cfg -n 500 ./hello
> 
> =====================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> 
> =   EXIT CODE: 139
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> =====================================================================================
> [proxy:0:0 at hesh] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:928): assert (!closed) failed
> 
> [proxy:0:0 at hesh] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at hesh] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
> [mpiexec at ubuntu] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
> 
> [mpiexec at ubuntu] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
> [mpiexec at ubuntu] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:191): launcher returned error waiting for completion
> 
> [mpiexec at ubuntu] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion
> 
> 
> Also some I time I got the following error:
> 
> root at ubuntu:/home# mpiexec -f hosts.cfg -n 400 ./hello
> 
> [proxy:0:1 at ubuntu] send_cmd_downstream (./pm/pmiserv/pmip_pmi_v1.c:80): assert (!closed) failed
> [proxy:0:1 at ubuntu] fn_get (./pm/pmiserv/pmip_pmi_v1.c:349): error sending PMI response
> [proxy:0:1 at ubuntu] pmi_cb (./pm/pmiserv/pmip_cb.c:327): PMI handler returned error
> 
> [proxy:0:1 at ubuntu] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:1 at ubuntu] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
> [mpiexec at ubuntu] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert (!closed) failed
> 
> [mpiexec at ubuntu] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at ubuntu] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
> 
> [mpiexec at ubuntu] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion
> 
>  
> Also I raised the open file limit on both the PCs by using ulimit -n 2048. But still the same result. 
>  The code is the simple one:
> 
> /* C Example */
> #include <mpi.h>
> #include <stdio.h>
> #include <stddef.h>
> #include <stdlib.h>
> 
> int main (int argc, char* argv[])
> {
>   int rank, size;
>   int buffer_length = MPI_MAX_PROCESSOR_NAME;
>   char hostname[buffer_length];
> 
>   MPI_Init (&argc, &argv);      /* starts MPI */
>   MPI_Comm_rank (MPI_COMM_WORLD, &rank);        /* get current process id */
>   MPI_Comm_size (MPI_COMM_WORLD, &size);        /* get number of processes */
> 
>   MPI_Get_processor_name(hostname, &buffer_length); /* get hostname */
> 
>   printf( "Hello world from process %d running on %s of %d\n", rank, hostname, size );
>   MPI_Finalize();
>   return 0;
> }
> 
> For 10000 process the program is unresponsive and I have to close the terminal to exit.t 
> How do i solve this issue? 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20131028/d06e0f49/attachment.html>


More information about the discuss mailing list