[mpich-discuss] MPI_Comm_Spawn + UCX error

Zhou, Hui zhouh at anl.gov
Wed Apr 21 21:26:47 CDT 2021


Dynamic process is currently not supported with ch4:ucx.

--
Hui Zhou


From: Iker Martín Álvarez via discuss <discuss at mpich.org>
Date: Wednesday, April 21, 2021 at 6:22 PM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Iker Martín Álvarez <martini at uji.es>
Subject: [mpich-discuss] MPI_Comm_Spawn + UCX error
Hello,

I have been working with the MPI_Comm_spawn function, which was working fine for a simple compiled version of MPICH 3.4.1, where in the configure step it only has the "--prefix" argument.

However, when this function was called with another compiled version of MPICH 3.4.1 which uses Infiniband, an error arised. Am I missing some arguments in the compilation step of MPICH when using UCX?

Here is the output of mpichversion:
$ mpichversion
MPICH Version:     3.4.1
MPICH Release date: Fri Jan 22 14:17:48 CST 2021
MPICH Device:     ch4:ucx
MPICH configure:  --prefix=/soft/gnu/mpich-3.4.1-ucx --with-device=ch4:ucx --with-ucx=/soft/gnu/ucx-1.11
MPICH CC:  gcc    -O2
MPICH CXX:  g++   -O2
MPICH F77:  gfortran   -O2
MPICH FC:  gfortran   -O2
MPICH Custom Information:


The following is the information about the minimal code which arises the error
Source: https://www.rookiehpc.com/mpi/docs/mpi_comm_spawn.php
Compiling: mpicc mpi_spawn.c
Running: mpirun -np 2 ./a.out
We are processes spawned directly by you, we now spawn a new instance of an MPI application.
We are processes spawned directly by you, we now spawn a new instance of an MPI application.
Assertion failed in file src/mpid/ch4/src/ch4_init.c at line 651: MPIR_Process.comm_parent != NULL
/soft/gnu/mpich-3.4.1-ucx/lib/libmpi.so.12(MPL_backtrace_show+0x39) [0x7fe15d506d41]
/soft/gnu/mpich-3.4.1-ucx/lib/libmpi.so.12(+0x32eaa8) [0x7fe15d4a6aa8]
/soft/gnu/mpich-3.4.1-ucx/lib/libmpi.so.12(+0x3602f8) [0x7fe15d4d82f8]
/soft/gnu/mpich-3.4.1-ucx/lib/libmpi.so.12(+0x225803) [0x7fe15d39d803]
/soft/gnu/mpich-3.4.1-ucx/lib/libmpi.so.12(PMPI_Init+0xa8) [0x7fe15d39d598]
./a.out(+0x123e) [0x55ece110a23e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fe15cfa10b3]
./a.out(+0x114e) [0x55ece110a14e]
Abort(1) on node 0: Internal error
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20210422/ce222209/attachment.html>


More information about the discuss mailing list