[mpich-discuss] error spawning processes in mpich-3.2rc1

Siegmar Gross Siegmar.Gross at informatik.hs-fulda.de
Wed Oct 7 05:03:06 CDT 2015


Hi,

today I've built mpich-3.2rc1 on my machines (Solaris 10 Sparc,
Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-5.1.0
and Sun C 5.13. I still get the following errors on my Sparc machine
which I'd already reported September 8th. "mpiexec" is aliased to
'mpiexec -genvnone'. It still doesn't matter if I use my cc- or
gcc-version of MPICH.


tyr spawn 119 mpichversion
MPICH Version:          3.2rc1
MPICH Release date:     Wed Oct  7 00:00:33 CDT 2015
MPICH Device:           ch3:nemesis
MPICH configure:        --prefix=/usr/local/mpich-3.2_64_cc 
--libdir=/usr/local/mpich-3.2_64_cc/lib64 
--includedir=/usr/local/mpich-3.2_64_cc/include64 CC=cc CXX=CC F77=f77 
FC=f95 CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64 LDFLAGS=-m64 
-L/usr/lib/sparcv9 -R/usr/lib/sparcv9 --enable-fortran=yes --enable-cxx 
--enable-romio --enable-debuginfo --enable-smpcoll 
--enable-threads=multiple --with-thread-package=posix --enable-shared
MPICH CC:       cc -m64   -O2
MPICH CXX:      CC -m64  -O2
MPICH F77:      f77 -m64
MPICH FC:       f95 -m64  -O2
tyr spawn 120



tyr spawn 111 mpiexec -np 1 spawn_master

Parent process 0 running on tyr.informatik.hs-fulda.de
   I create 4 slave processes

Fatal error in MPI_Comm_spawn: Unknown error class, error stack:
MPI_Comm_spawn(144)...........: MPI_Comm_spawn(cmd="spawn_slave", 
argv=0, maxprocs=4, MPI_INFO_NULL, root=0, MPI_COMM_WORLD, 
intercomm=ffffffff7fffde50, errors=0) failed
MPIDI_Comm_spawn_multiple(274):
MPID_Comm_accept(153).........:
MPIDI_Comm_accept(1057).......:
MPIR_Bcast_intra(1287)........:
MPIR_Bcast_binomial(310)......: Failure during collective




tyr spawn 112 mpiexec -np 1 spawn_multiple_master

Parent process 0 running on tyr.informatik.hs-fulda.de
   I create 3 slave processes.

Fatal error in MPI_Comm_spawn_multiple: Unknown error class, error stack:
MPI_Comm_spawn_multiple(162)..: MPI_Comm_spawn_multiple(count=2, 
cmds=ffffffff7fffde08, argvs=ffffffff7fffddf8, 
maxprocs=ffffffff7fffddf0, infos=ffffffff7fffdde8, root=0, 
MPI_COMM_WORLD, intercomm=ffffffff7fffdde4, errors=0) failed
MPIDI_Comm_spawn_multiple(274):
MPID_Comm_accept(153).........:
MPIDI_Comm_accept(1057).......:
MPIR_Bcast_intra(1287)........:
MPIR_Bcast_binomial(310)......: Failure during collective




tyr spawn 113 mpiexec -np 1 spawn_intra_comm
Parent process 0: I create 2 slave processes
Fatal error in MPI_Comm_spawn: Unknown error class, error stack:
MPI_Comm_spawn(144)...........: MPI_Comm_spawn(cmd="spawn_intra_comm", 
argv=0, maxprocs=2, MPI_INFO_NULL, root=0, MPI_COMM_WORLD, 
intercomm=ffffffff7fffded4, errors=0) failed
MPIDI_Comm_spawn_multiple(274):
MPID_Comm_accept(153).........:
MPIDI_Comm_accept(1057).......:
MPIR_Bcast_intra(1287)........:
MPIR_Bcast_binomial(310)......: Failure during collective
tyr spawn 114


I would be grateful if somebody can fix the problem. Thank you very
much for any help in advance. I've attached my programs. Please let
me know if you need anything else.


Kind regards

Siegmar
-------------- next part --------------

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

#define NUM_SLAVES	4		/* create NUM_SLAVES processes	*/
#define SLAVE_PROG	"spawn_slave"	/* slave program name		*/


int main (int argc, char *argv[])
{
  MPI_Comm COMM_CHILD_PROCESSES;	/* inter-communicator		*/
  int	   ntasks_world,		/* # of tasks in MPI_COMM_WORLD	*/
	   ntasks_local,		/* COMM_CHILD_PROCESSES local	*/
	   ntasks_remote,		/* COMM_CHILD_PROCESSES remote	*/
	   mytid,			/* my task id			*/
	   namelen;			/* length of processor name	*/
  char	   processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
  MPI_Comm_size (MPI_COMM_WORLD, &ntasks_world);
  /* check that only the master process is running in MPI_COMM_WORLD.   */
  if (ntasks_world > 1)
  {
    if (mytid == 0)
    {
      fprintf (stderr, "\n\nError: Too many processes (only one "
	       "process allowed).\n"
	       "Usage:\n"
	       "  mpiexec %s\n\n",
	       argv[0]);
    }
    MPI_Finalize ();
    exit (EXIT_SUCCESS);
  }
  MPI_Get_processor_name (processor_name, &namelen);
  printf ("\nParent process %d running on %s\n"
	  "  I create %d slave processes\n\n",
	  mytid,  processor_name, NUM_SLAVES);
  MPI_Comm_spawn (SLAVE_PROG, MPI_ARGV_NULL, NUM_SLAVES,
		  MPI_INFO_NULL, 0, MPI_COMM_WORLD,
		  &COMM_CHILD_PROCESSES, MPI_ERRCODES_IGNORE);
  MPI_Comm_size	(COMM_CHILD_PROCESSES, &ntasks_local);
  MPI_Comm_remote_size (COMM_CHILD_PROCESSES, &ntasks_remote);
  printf ("Parent process %d: "
	  "tasks in MPI_COMM_WORLD:                    %d\n"
	  "                  tasks in COMM_CHILD_PROCESSES local "
	  "group:  %d\n"
	  "                  tasks in COMM_CHILD_PROCESSES remote "
	  "group: %d\n\n",
	  mytid, ntasks_world, ntasks_local, ntasks_remote);
  MPI_Comm_free (&COMM_CHILD_PROCESSES);
  MPI_Finalize ();
  return EXIT_SUCCESS;
}
-------------- next part --------------

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

#define NUM_PROGS	2		/* # of programs		*/
#define NUM_SLAVES_1	1		/* # of slave processes, type 1	*/
#define NUM_SLAVES_2	2		/* # of slave processes, type 2	*/
#define SLAVE_PROG_1	"spawn_slave"	/* slave program name, type 1	*/
#define SLAVE_PROG_2	"spawn_slave"	/* slave program name, type 2	*/


int main (int argc, char *argv[])
{
  MPI_Comm COMM_CHILD_PROCESSES;	/* inter-communicator		*/
  MPI_Info array_of_infos[NUM_PROGS];	/* startup hints for each cmd	*/
  int	   ntasks_world,		/* # of tasks in MPI_COMM_WORLD	*/
	   ntasks_local,		/* COMM_CHILD_PROCESSES local	*/
	   ntasks_remote,		/* COMM_CHILD_PROCESSES remote	*/
	   mytid,			/* my task id			*/
	   namelen,			/* length of processor name	*/
	   array_of_n_procs[NUM_PROGS],	/* number of processes		*/
	   count_slaves,		/* total number of slaves	*/
	   i;				/* loop variable		*/
  char	   processor_name[MPI_MAX_PROCESSOR_NAME],
	   *array_of_commands[NUM_PROGS],
	   **array_of_argvs[NUM_PROGS],
	   *p_argv_1[] = {"program type 1", NULL},
	   *p_argv_2[] = {"program type 2", "another parameter", NULL};

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
  MPI_Comm_size (MPI_COMM_WORLD, &ntasks_world);
  /* check that only the master process is running in MPI_COMM_WORLD.   */
  if (ntasks_world > 1)
  {
    if (mytid == 0)
    {
      fprintf (stderr, "\n\nError: Too many processes (only one "
	       "process allowed).\n"
	       "Usage:\n"
	       "  mpiexec %s\n\n",
	       argv[0]);
    }
    MPI_Finalize ();
    exit (EXIT_SUCCESS);
  }
  MPI_Get_processor_name (processor_name, &namelen);
  count_slaves = 0;
  for (i = 0; i < NUM_PROGS; ++i)
  {
    if ((i % 2) == 0)
    {
      array_of_commands[i] = SLAVE_PROG_1;
      array_of_argvs[i]	   = p_argv_1;
      array_of_n_procs[i]  = NUM_SLAVES_1;
      array_of_infos[i]	   = MPI_INFO_NULL;
      count_slaves	   += NUM_SLAVES_1;
    }
    else
    {
      array_of_commands[i] = SLAVE_PROG_2;
      array_of_argvs[i]	   = p_argv_2;
      array_of_n_procs[i]  = NUM_SLAVES_2;
      array_of_infos[i]	   = MPI_INFO_NULL;
      count_slaves	   += NUM_SLAVES_2;
    }
  }
  printf ("\nParent process %d running on %s\n"
	  "  I create %d slave processes.\n\n",
	  mytid,  processor_name, count_slaves);
  MPI_Comm_spawn_multiple (NUM_PROGS, array_of_commands,
			   array_of_argvs, array_of_n_procs,
			   array_of_infos, 0, MPI_COMM_WORLD,
			   &COMM_CHILD_PROCESSES, MPI_ERRCODES_IGNORE);
  MPI_Comm_size	(COMM_CHILD_PROCESSES, &ntasks_local);
  MPI_Comm_remote_size (COMM_CHILD_PROCESSES, &ntasks_remote);
  printf ("Parent process %d: "
	  "tasks in MPI_COMM_WORLD:                    %d\n"
	  "                  tasks in COMM_CHILD_PROCESSES local "
	  "group:  %d\n"
	  "                  tasks in COMM_CHILD_PROCESSES remote "
	  "group: %d\n\n",
	  mytid, ntasks_world, ntasks_local, ntasks_remote);
  MPI_Comm_free (&COMM_CHILD_PROCESSES);
  MPI_Finalize ();
  return EXIT_SUCCESS;
}
-------------- next part --------------

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"


int main (int argc, char *argv[])
{
  int  ntasks_world,			/* # of tasks in MPI_COMM_WORLD	*/
       mytid,				/* my task id			*/
       namelen,				/* length of processor name	*/
       i;				/* loop variable		*/
  char processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
  MPI_Comm_size (MPI_COMM_WORLD, &ntasks_world);
  MPI_Get_processor_name (processor_name, &namelen);
  /* With the next statement every process executing this code will
   * print one line on the display. It may happen that the lines will
   * get mixed up because the display is a critical section. In general
   * only one process (mostly the process with rank 0) will print on
   * the display and all other processes will send their messages to
   * this process. Nevertheless for debugging purposes (or to
   * demonstrate that it is possible) it may be useful if every
   * process prints itself.
   */
  fprintf (stdout, "Slave process %d of %d running on %s\n",
	   mytid, ntasks_world, processor_name);
  fflush (stdout);
  MPI_Barrier (MPI_COMM_WORLD);		/* wait for all other processes	*/
  for (i = 0; i < argc; ++i)
  {
    printf ("%s %d: argv[%d]: %s\n", argv[0], mytid, i, argv[i]);
  }
  MPI_Finalize ();
  return EXIT_SUCCESS;
}
-------------- next part --------------

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

#define NUM_SLAVES	2		/* create NUM_SLAVES processes	*/


int main (int argc, char *argv[])
{
  MPI_Comm COMM_ALL_PROCESSES,		/* intra-communicator		*/
	   COMM_CHILD_PROCESSES,	/* inter-communicator		*/
	   COMM_PARENT_PROCESSES;	/* inter-communicator		*/
  int	   ntasks_world,		/* # of tasks in MPI_COMM_WORLD	*/
	   ntasks_local,		/* COMM_CHILD_PROCESSES local	*/
	   ntasks_remote,		/* COMM_CHILD_PROCESSES remote	*/
	   ntasks_all,			/* tasks in COMM_ALL_PROCESSES	*/
	   mytid_world,			/* my task id in MPI_COMM_WORLD	*/
	   mytid_all,			/* id in COMM_ALL_PROCESSES	*/
	   namelen;			/* length of processor name	*/
  char	   processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &mytid_world);
  /* At first we must decide if this program is executed from a parent
   * or child process because only a parent is allowed to spawn child
   * processes (otherwise the child process with rank 0 would spawn
   * itself child processes and so on). "MPI_Comm_get_parent ()"
   * returns the parent inter-communicator for a spawned MPI rank and
   * MPI_COMM_NULL if the process wasn't spawned, i.e. it was started
   * statically via "mpiexec" on the command line.
   */
  MPI_Comm_get_parent (&COMM_PARENT_PROCESSES);
  if (COMM_PARENT_PROCESSES == MPI_COMM_NULL)
  {
    /* All parent processes must call "MPI_Comm_spawn ()" but only
     * the root process (in our case the process with rank 0) will
     * spawn child processes. All other processes of the
     * intra-communicator (in our case MPI_COMM_WORLD) will ignore
     * the values of all arguments before the "root" parameter.
     */
    if (mytid_world == 0)
    {
      printf ("Parent process 0: I create %d slave processes\n",
	      NUM_SLAVES);
    }
    MPI_Comm_spawn (argv[0], MPI_ARGV_NULL, NUM_SLAVES,
		    MPI_INFO_NULL, 0, MPI_COMM_WORLD,
		    &COMM_CHILD_PROCESSES, MPI_ERRCODES_IGNORE);
  }
  /* Merge all processes into one intra-communicator. The "high" flag
   * determines the order of the processes in the intra-communicator.
   * If parent and child processes use the same flag the order may
   * be arbitray otherwise the processes with "high == 0" will have
   * a lower rank than the processes with "high == 1".
   */
  if (COMM_PARENT_PROCESSES == MPI_COMM_NULL)
  {
    /* parent processes							*/
    MPI_Intercomm_merge (COMM_CHILD_PROCESSES, 0, &COMM_ALL_PROCESSES);
  }
  else
  {
    /* spawned child processes						*/
    MPI_Intercomm_merge (COMM_PARENT_PROCESSES, 1, &COMM_ALL_PROCESSES);
  }
  MPI_Comm_size	(MPI_COMM_WORLD, &ntasks_world);
  MPI_Comm_size (COMM_ALL_PROCESSES, &ntasks_all);
  MPI_Comm_rank (COMM_ALL_PROCESSES, &mytid_all);
  MPI_Get_processor_name (processor_name, &namelen);
  /* With the following printf-statement every process executing this
   * code will print some lines on the display. It may happen that the
   * lines will get mixed up because the display is a critical section.
   * In general only one process (mostly the process with rank 0) will
   * print on the display and all other processes will send their
   * messages to this process. Nevertheless for debugging purposes
   * (or to demonstrate that it is possible) it may be useful if every
   * process prints itself.
   */
  if (COMM_PARENT_PROCESSES == MPI_COMM_NULL)
  {
    MPI_Comm_size	 (COMM_CHILD_PROCESSES, &ntasks_local);
    MPI_Comm_remote_size (COMM_CHILD_PROCESSES, &ntasks_remote);
    printf ("\nParent process %d running on %s\n"
	    "    MPI_COMM_WORLD ntasks:              %d\n"
	    "    COMM_CHILD_PROCESSES ntasks_local:  %d\n"
	    "    COMM_CHILD_PROCESSES ntasks_remote: %d\n"
	    "    COMM_ALL_PROCESSES ntasks:          %d\n"
	    "    mytid in COMM_ALL_PROCESSES:        %d\n",
	    mytid_world, processor_name, ntasks_world, ntasks_local,
	    ntasks_remote, ntasks_all, mytid_all);
  }
  else
  {
    printf ("\nChild process %d running on %s\n"
	    "    MPI_COMM_WORLD ntasks:              %d\n"
	    "    COMM_ALL_PROCESSES ntasks:          %d\n"
	    "    mytid in COMM_ALL_PROCESSES:        %d\n",
	    mytid_world, processor_name, ntasks_world, ntasks_all,
	    mytid_all);
  }
  MPI_Finalize ();
  return EXIT_SUCCESS;
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5164 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20151007/6b39059a/attachment.p7s>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list