[mpich-discuss] Occasional hang with MPI_Intercomm_merge and OFI+provider verbs

Iker Martín Álvarez martini at uji.es
Mon Oct 28 12:34:44 CDT 2024


Hi,

Lately I have been dealing with an unexpected problem when using
MPI_Comm_spawn + MPI_Intercomm_merge, where on some occasions my
application hangs when two conditions are met.

Specifically, the hang occurs when using the resulting Intracommunicator of
MPI_Intercomm_merge in collective operations as MPI_Bcast. The conditions
are
- There is an oversubscription state. The number of processes is greater
than the available number of physical cores.
- Using CH4:ofi with FI_PROVIDER="verbs:ofi_rxd".

I tested a minimal code with MPICH 4.2.0 and MPICH 4.2.3 configured as:
./configure --prefix=... --with-device=ch4:ofi --disable-psm3

The minimal code
<https://urldefense.us/v3/__https://lorca.act.uji.es/gitlab/martini/mpich_ofi_rxd_intracomm_hang/-/blob/main/BaseCode.c__;!!G_uCfscf7eWS!ZoB5c9APxvNk40SirehC83dWaIUE_w3yOQ2EEoJ4MaJlUaREOTMYR85Dd4SCEE0Exrr4U6ZETBg1sg$ >
to reproduce the problem is the following:
==========================

#include <stdio.h>#include <stdlib.h>#include <mpi.h>int main(int
argc, char* argv[]) {  int rank, numP, numO;  int rootBcast, order;
double test = 0;  int solution = 0;  MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);  MPI_Comm_size(MPI_COMM_WORLD,
&numP);  MPI_Comm intercomm, intracomm;
MPI_Comm_get_parent(&intercomm);  if(intercomm == MPI_COMM_NULL) {
numO = atoi(argv[1]);    MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, numO,
MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm, MPI_ERRCODES_IGNORE);
order = 0;  } else { order = 1; }  MPI_Intercomm_merge(intercomm,
order, &intracomm);  printf("TEST 1 P%02d/%d\n", rank, numP);
MPI_Bcast(&test, 1, MPI_DOUBLE, 0, intracomm); // Hangs here
if(solution) { MPI_Barrier(intercomm); }  printf("TEST 2 P%02d/%d\n",
rank, numP);  MPI_Finalize();  return 0;}

==========================
The code only hangs at the MPI_Bcast operation for some of the spawned
processes. All my executions have been with a single node of 20 cores, with
10 initial processes and spawning 20 at the MPI_Comm_spawn function. If I
change the variable "solution" to 1, I rarely get a hang of the
application, but it still happens on some occasions.

>From my perspective, the code seems to follow the standard. Is this the
case? I have been able to run the code with other providers for OFI, but I
am confused as to why it does not work in this case.

Thank you for your time.
Best regards,
Iker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20241028/bb0bf8a2/attachment.html>


More information about the discuss mailing list