Thanks. I found on our 16 nodes cluster, when I run another application which has more initialization parameters, hydra can only spawn 49 process (3 per node). And the segment fault happened in PMI_Spawn_multiple. So I tried to use smpd, and it seems to be able to spawn more. However, when child processes increase to 150+ (almost 10 per node), it seems to run out of resources. My question is whether there is a upper limit of child processes one parent process can spawn and how can I determine the upper limit? If I want to use one parent process to spawn more processes, using its child processes to spawn descendant is the only method ( Suppose we want to spawn 160 process on 160 nodes )? 

You're probably just running out of resources. 90 processes (not threads) on one node is a lot. A common use of MPI is to have one (or a few) processes per node and if you need more local instances, you investigate using additional threads (such as OpenMP).


I'm trying to spawn several processes on two node. When the number of child processes is small, it works fine, such as less than 80. However, when I tried to spawn 90+ processes, mpirun told me "Segmentation fault (core dumped)". Is something I did wrong or I didn't do?

My MPICH compile command: 

$ ./configure --prefix=/opt/mpich3 --with-device=ch3:nemesis --with-pm=hydra --enable-fast=none --enable-g=dbg CFLAGS=-fPIC --disable-f77 --disable-fc

Here is my source code:

 #include "mpi.h"
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>

 int main( int argc, char *argv[] ) {

     int HLEN=2;
     char *host[]={"node1","node2"};
     char name[65];

     int count;

     int root=0;
     char* array_of_commands[count];
     int array_of_maxprocs[count];
     int errcodes[count];
     MPI_Info array_of_info[count];
     MPI_Comm parentcomm, intercomm;

     int i;
     MPI_Init( &argc, &argv );
     for(i=0; i< count; i++){
         MPI_Info_set(array_of_info[i],"host",host[i % HLEN]);

     MPI_Comm_get_parent( &parentcomm );
     if (parentcomm == MPI_COMM_NULL) {
         MPI_Comm_spawn_multiple( count, array_of_commands, MPI_ARGVS_NULL /*array_of_argv*/,
                 array_of_maxprocs, array_of_info, root, MPI_COMM_WORLD, &intercomm, errcodes );
         printf("I'm the parent.\n");
     } else {
         gethostname(name, sizeof(name));
         printf("I'm the spawned at %s.\n",name);
     return 0;

