[mpich-discuss] MPI_Comm_spawn_multiple Segment fault
myself
chcdlf at 126.com
Fri Sep 27 09:07:13 CDT 2013
Thanks. I found on our 16 nodes cluster, when I run another application which has more initialization parameters, hydra can only spawn 49 process (3 per node). And the segment fault happened in PMI_Spawn_multiple. So I tried to use smpd, and it seems to be able to spawn more. However, when child processes increase to 150+ (almost 10 per node), it seems to run out of resources. My question is whether there is a upper limit of child processes one parent process can spawn and how can I determine the upper limit? If I want to use one parent process to spawn more processes, using its child processes to spawn descendant is the only method ( Suppose we want to spawn 160 process on 160 nodes )?
At 2013-09-27 21:43:43,"Wesley Bland" <wbland at mcs.anl.gov> wrote:
You're probably just running out of resources. 90 processes (not threads) on one node is a lot. A common use of MPI is to have one (or a few) processes per node and if you need more local instances, you investigate using additional threads (such as OpenMP).
Thanks,
Wesley
On Sep 26, 2013, at 9:45 PM, myself <chcdlf at 126.com> wrote:
I'm trying to spawn several processes on two node. When the number of child processes is small, it works fine, such as less than 80. However, when I tried to spawn 90+ processes, mpirun told me "Segmentation fault (core dumped)". Is something I did wrong or I didn't do?
My MPICH compile command:
$ ./configure --prefix=/opt/mpich3 --with-device=ch3:nemesis --with-pm=hydra --enable-fast=none --enable-g=dbg CFLAGS=-fPIC --disable-f77 --disable-fc
Here is my source code:
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main( int argc, char *argv[] ) {
int HLEN=2;
char *host[]={"node1","node2"};
char name[65];
int count;
if(argc==2)
count=atoi(argv[1]);
else
count=2;
int root=0;
char* array_of_commands[count];
int array_of_maxprocs[count];
int errcodes[count];
MPI_Info array_of_info[count];
MPI_Comm parentcomm, intercomm;
int i;
MPI_Init( &argc, &argv );
for(i=0; i< count; i++){
array_of_commands[i]="/home/mpitest/spawn";
array_of_maxprocs[i]=1;
MPI_Info_create(&array_of_info[i]);
MPI_Info_set(array_of_info[i],"host",host[i % HLEN]);
}
MPI_Comm_get_parent( &parentcomm );
if (parentcomm == MPI_COMM_NULL) {
MPI_Comm_spawn_multiple( count, array_of_commands, MPI_ARGVS_NULL /*array_of_argv*/,
array_of_maxprocs, array_of_info, root, MPI_COMM_WORLD, &intercomm, errcodes );
printf("I'm the parent.\n");
} else {
gethostname(name, sizeof(name));
printf("I'm the spawned at %s.\n",name);
}
fflush(stdout);
MPI_Finalize();
return 0;
}
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130927/4b59c1c5/attachment.html>
More information about the discuss
mailing list