[mpich-discuss] MPI_Reduce on an inter-communicator hangs

Jenke, Joachim jenke at itc.rwth-aachen.de
Sun Apr 21 17:22:30 CDT 2024


Hi Kurt,

The workers need to pass root=0. Mpi_proc_null indicates that the local process is not involved in the communication (like in p2p). This would be the appropriate argument for non-root arguments in the manager group, i.e., if you start with more than one process.
> all other processes in the same group as the root use MPI_PROC_NULL

https://urldefense.us/v3/__https://www.mpi-forum.org/docs/mpi-2.2/mpi22-report/node91.htm__;!!G_uCfscf7eWS!aV1POdepzplFrCm-LLPMqZ1ET59oRoaqkMehvVh0QY88149KRsdfaIZBPCJrZmMTw4_MFNTmF9h4epvTkEoeSQ$ 

It also explains your observation of falling through the call without waiting.

Best Joachim
________________________________
From: Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org>
Sent: Sunday, April 21, 2024 11:52:48 PM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>
Subject: [mpich-discuss] MPI_Reduce on an inter-communicator hangs

I am calling MPI_Reduce on a set of inter-communicators created by MPI_Comm_spawn, each with one process in the local group (the single manager) and two processes in the remote group (the workers). The inter- communicators are visited one at
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

I am calling MPI_Reduce on a set of inter-communicators created by MPI_Comm_spawn,  each with one

process in the local group (the single manager) and two processes in the remote group (the workers).    The inter-

communicators are visited one at a time in the manager.



All workers enter and exit MPI_Reduce without blocking,  but the manager enters the first MPI_Reduce for the

first inter-communicator and never returns.   What am I doing wrong?   I am using MPICH 4.1.2.



Here is my manager code:



#define N_PROC 4

#define N_IN_GROUP 2



int main(int argc, char *argv[])

{

    int rank, world_size, error_codes[N_PROC];

    MPI_Comm intercoms[N_PROC];

    char hostname[64];



    MPI_Init(&argc, &argv);

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    MPI_Comm_size(MPI_COMM_WORLD, &world_size);



    gethostname(hostname, sizeof(hostname));

    char *p = strstr(hostname, ".");

    *p = '\0';



    MPI_Info info;

    MPI_Info_create(&info);

    MPI_Info_set(info, "host", hostname);

    MPI_Info_set(info, "bind_to", "core");



    for (int i = 0; i < N_PROC; ++i)

    {

        MPI_Comm_spawn("test_reduce_work", argv, N_IN_GROUP, info,

            0, MPI_COMM_SELF, &intercoms[i], &error_codes[i]);

    }



    sleep(10);



    unsigned array[100]{0};



    for (int i = 0; i < N_PROC; ++i)

    {

        cout << "MANAGER: starting reduction " << i << "\n";



        MPI_Reduce(NULL, array, 100, MPI_UNSIGNED, MPI_SUM, MPI_ROOT,

            intercoms[i]);



        cout << "MANAGER: finished reduction " << i << "\n";   // we never reach this point

    }



    for (int i = 0; i < 100; ++i) cout << array[i] << " ";

    cout << endl;



    MPI_Finalize();

}





And here is my worker code:





int main(int argc, char *argv[])

{

    int rank, world_size;

    MPI_Comm manager_intercom;



    MPI_Init(&argc, &argv);

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    MPI_Comm_size(MPI_COMM_WORLD, &world_size);



    MPI_Comm_get_parent(&manager_intercom);



    unsigned array[100]{1};



    cout << "WORKER: starting reduction\n";



    MPI_Reduce(array, NULL, 100,  MPI_UNSIGNED, MPI_SUM, MPI_PROC_NULL,

        manager_intercom);



    cout << "WORKER: finishing reduction\n";



    sleep(10);



    MPI_Finalize();

}





Finally, here it the invocation:







$ mpiexec -launcher ssh -print-all-exitcodes -wdir /home/kmccall/test_dir -np 1 -ppn 1 test_reduce_man



Thanks,

Kurt


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20240421/ea6d8841/attachment-0001.html>


More information about the discuss mailing list