[mpich-discuss] Occasional hang with MPI_Intercomm_merge and OFI+provider verbs
Zhou, Hui
zhouh at anl.gov
Mon Oct 28 14:01:22 CDT 2024
Hi Iker,
Does it work with `FI_PROVIDER="verbs;ofi_rxm"?
Hui
________________________________
From: Iker Martín Álvarez via discuss <discuss at mpich.org>
Sent: Monday, October 28, 2024 12:34 PM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Iker Martín Álvarez <martini at uji.es>
Subject: [mpich-discuss] Occasional hang with MPI_Intercomm_merge and OFI+provider verbs
Hi, Lately I have been dealing with an unexpected problem when using MPI_Comm_spawn + MPI_Intercomm_merge, where on some occasions my application hangs when two conditions are met. Specifically, the hang occurs when using the resulting Intracommunicator
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Hi,
Lately I have been dealing with an unexpected problem when using MPI_Comm_spawn + MPI_Intercomm_merge, where on some occasions my application hangs when two conditions are met.
Specifically, the hang occurs when using the resulting Intracommunicator of MPI_Intercomm_merge in collective operations as MPI_Bcast. The conditions are
- There is an oversubscription state. The number of processes is greater than the available number of physical cores.
- Using CH4:ofi with FI_PROVIDER="verbs:ofi_rxd".
I tested a minimal code with MPICH 4.2.0 and MPICH 4.2.3 configured as:
./configure --prefix=... --with-device=ch4:ofi --disable-psm3
The minimal code<https://urldefense.us/v3/__https://lorca.act.uji.es/gitlab/martini/mpich_ofi_rxd_intracomm_hang/-/blob/main/BaseCode.c__;!!G_uCfscf7eWS!ZoB5c9APxvNk40SirehC83dWaIUE_w3yOQ2EEoJ4MaJlUaREOTMYR85Dd4SCEE0Exrr4U6ZETBg1sg$> to reproduce the problem is the following:
==========================
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char* argv[]) {
int rank, numP, numO;
int rootBcast, order;
double test = 0;
int solution = 0;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numP);
MPI_Comm intercomm, intracomm;
MPI_Comm_get_parent(&intercomm);
if(intercomm == MPI_COMM_NULL) {
numO = atoi(argv[1]);
MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, numO, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm, MPI_ERRCODES_IGNORE);
order = 0;
} else { order = 1; }
MPI_Intercomm_merge(intercomm, order, &intracomm);
printf("TEST 1 P%02d/%d\n", rank, numP);
MPI_Bcast(&test, 1, MPI_DOUBLE, 0, intracomm); // Hangs here
if(solution) { MPI_Barrier(intercomm); }
printf("TEST 2 P%02d/%d\n", rank, numP);
MPI_Finalize();
return 0;
}
==========================
The code only hangs at the MPI_Bcast operation for some of the spawned processes. All my executions have been with a single node of 20 cores, with 10 initial processes and spawning 20 at the MPI_Comm_spawn function. If I change the variable "solution" to 1, I rarely get a hang of the application, but it still happens on some occasions.
>From my perspective, the code seems to follow the standard. Is this the case? I have been able to run the code with other providers for OFI, but I am confused as to why it does not work in this case.
Thank you for your time.
Best regards,
Iker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20241028/03834872/attachment-0001.html>
More information about the discuss
mailing list