[mpich-discuss] Occasional hang with MPI_Intercomm_merge and OFI+provider verbs
Iker Martín Álvarez
martini at uji.es
Mon Oct 28 12:34:44 CDT 2024
Hi,
Lately I have been dealing with an unexpected problem when using
MPI_Comm_spawn + MPI_Intercomm_merge, where on some occasions my
application hangs when two conditions are met.
Specifically, the hang occurs when using the resulting Intracommunicator of
MPI_Intercomm_merge in collective operations as MPI_Bcast. The conditions
are
- There is an oversubscription state. The number of processes is greater
than the available number of physical cores.
- Using CH4:ofi with FI_PROVIDER="verbs:ofi_rxd".
I tested a minimal code with MPICH 4.2.0 and MPICH 4.2.3 configured as:
./configure --prefix=... --with-device=ch4:ofi --disable-psm3
The minimal code
<https://urldefense.us/v3/__https://lorca.act.uji.es/gitlab/martini/mpich_ofi_rxd_intracomm_hang/-/blob/main/BaseCode.c__;!!G_uCfscf7eWS!ZoB5c9APxvNk40SirehC83dWaIUE_w3yOQ2EEoJ4MaJlUaREOTMYR85Dd4SCEE0Exrr4U6ZETBg1sg$ >
to reproduce the problem is the following:
==========================
#include <stdio.h>#include <stdlib.h>#include <mpi.h>int main(int
argc, char* argv[]) { int rank, numP, numO; int rootBcast, order;
double test = 0; int solution = 0; MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD,
&numP); MPI_Comm intercomm, intracomm;
MPI_Comm_get_parent(&intercomm); if(intercomm == MPI_COMM_NULL) {
numO = atoi(argv[1]); MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, numO,
MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm, MPI_ERRCODES_IGNORE);
order = 0; } else { order = 1; } MPI_Intercomm_merge(intercomm,
order, &intracomm); printf("TEST 1 P%02d/%d\n", rank, numP);
MPI_Bcast(&test, 1, MPI_DOUBLE, 0, intracomm); // Hangs here
if(solution) { MPI_Barrier(intercomm); } printf("TEST 2 P%02d/%d\n",
rank, numP); MPI_Finalize(); return 0;}
==========================
The code only hangs at the MPI_Bcast operation for some of the spawned
processes. All my executions have been with a single node of 20 cores, with
10 initial processes and spawning 20 at the MPI_Comm_spawn function. If I
change the variable "solution" to 1, I rarely get a hang of the
application, but it still happens on some occasions.
>From my perspective, the code seems to follow the standard. Is this the
case? I have been able to run the code with other providers for OFI, but I
am confused as to why it does not work in this case.
Thank you for your time.
Best regards,
Iker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20241028/bb0bf8a2/attachment.html>
More information about the discuss
mailing list