[mpich-discuss] MPI_Comm_spawn_multiple Segment fault

myself chcdlf at 126.com
Fri Sep 27 09:07:13 CDT 2013


Thanks. I found on our 16 nodes cluster, when I run another application which has more initialization parameters, hydra can only spawn 49 process (3 per node). And the segment fault happened in PMI_Spawn_multiple. So I tried to use smpd, and it seems to be able to spawn more. However, when child processes increase to 150+ (almost 10 per node), it seems to run out of resources. My question is whether there is a upper limit of child processes one parent process can spawn and how can I determine the upper limit? If I want to use one parent process to spawn more processes, using its child processes to spawn descendant is the only method ( Suppose we want to spawn 160 process on 160 nodes )? 


At 2013-09-27 21:43:43,"Wesley Bland" <wbland at mcs.anl.gov> wrote:
You're probably just running out of resources. 90 processes (not threads) on one node is a lot. A common use of MPI is to have one (or a few) processes per node and if you need more local instances, you investigate using additional threads (such as OpenMP).


Thanks,
Wesley


On Sep 26, 2013, at 9:45 PM, myself <chcdlf at 126.com> wrote:


I'm trying to spawn several processes on two node. When the number of child processes is small, it works fine, such as less than 80. However, when I tried to spawn 90+ processes, mpirun told me "Segmentation fault (core dumped)". Is something I did wrong or I didn't do?


My MPICH compile command: 


$ ./configure --prefix=/opt/mpich3 --with-device=ch3:nemesis --with-pm=hydra --enable-fast=none --enable-g=dbg CFLAGS=-fPIC --disable-f77 --disable-fc


Here is my source code:


 #include "mpi.h"
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>

 int main( int argc, char *argv[] ) {

     int HLEN=2;
     char *host[]={"node1","node2"};
     char name[65];

     int count;
     if(argc==2)
         count=atoi(argv[1]);
     else
         count=2;

     int root=0;
     char* array_of_commands[count];
     int array_of_maxprocs[count];
     int errcodes[count];
     MPI_Info array_of_info[count];
     MPI_Comm parentcomm, intercomm;

     int i;
     MPI_Init( &argc, &argv );
     for(i=0; i< count; i++){
         array_of_commands[i]="/home/mpitest/spawn";
         array_of_maxprocs[i]=1;
         MPI_Info_create(&array_of_info[i]);
         MPI_Info_set(array_of_info[i],"host",host[i % HLEN]);
     }

     MPI_Comm_get_parent( &parentcomm );
     if (parentcomm == MPI_COMM_NULL) {
         MPI_Comm_spawn_multiple( count, array_of_commands, MPI_ARGVS_NULL /*array_of_argv*/,
                 array_of_maxprocs, array_of_info, root, MPI_COMM_WORLD, &intercomm, errcodes );
         printf("I'm the parent.\n");
     } else {
         gethostname(name, sizeof(name));
         printf("I'm the spawned at %s.\n",name);
     }   
     fflush(stdout);
     MPI_Finalize();
     return 0;
 }   




_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130927/4b59c1c5/attachment.html>


More information about the discuss mailing list