[mpich-discuss] MPI_Comm_spawn crosses node boundaries

Zhou, Hui zhouh at anl.gov
Mon Feb 7 16:54:35 CST 2022


Hi Kurt,

As Ken mentioned, there is often the complication or confusion of several distinct combinations of setup. Mailinglist is especially bad at resolving such issues as we try different suggestions and the feedback is often all mixed up. Could you check https://github.com/pmodels/mpich/issues/5835 and provide (or re-provide) relevant information there? Particularly useful information are: how mpich is configured, how the job is invocated, the failure symptom, and the debug output by using mpiexec -verbose​. As Ken listed, there are three different scenarios. Each should work. It probably will help if we focus on one scenario at a time.

--
Hui Zhou
[https://opengraph.githubassets.com/f73eca0aa1fd8cd3bc03c57525990e22e36e447bdc6b0393f863e131273f0563/pmodels/mpich/issues/5835]<https://github.com/pmodels/mpich/issues/5835>
MPI_Comm_spawn in Slurm environment · Issue #5835 · pmodels/mpich<https://github.com/pmodels/mpich/issues/5835>
Originated from user email https://lists.mpich.org/pipermail/discuss/2022-January/006360.html. MPICH + Hydra + PMI1 (crashes) MPICH + Hydra + PMI2 (works but ignores "hosts" info key) MPI...
github.com

________________________________
From: Raffenetti, Ken via discuss <discuss at mpich.org>
Sent: Monday, February 7, 2022 4:28 PM
To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>; discuss at mpich.org <discuss at mpich.org>
Cc: Raffenetti, Ken <raffenet at anl.gov>
Subject: Re: [mpich-discuss] MPI_Comm_spawn crosses node boundaries

Darn. I'm creating an issue to track this since it will likely take some time and effort to investigate each configuration.

https://github.com/pmodels/mpich/issues/5835

Ken

On 2/7/22, 12:41 PM, "Mccall, Kurt E. (MSFC-EV41)" <kurt.e.mccall at nasa.gov> wrote:

    Ken,

    To review, I configured as follows:

    $ configure   CFLAGS=-DUSE_PMI2_API   LIBS=-lpmi2   --with-pm=none   --with-pmi=slurm   --with-slurm=/opt/slurm     < ...>

    and ran srun with the argument --mpi=pmi2.

    The job is stil segfaulting in MPI_Comm_spawn in one process and returning an error from MPI_Barrier in the other.   Error messages below:


    The MPI_Comm_spawn error:


    backtrace for error: backtrace after receiving signal SIGSEGV:
        /home/kmccall/Needles2/./NeedlesMpiMM() [0x45ab36]
        /lib64/libpthread.so.0(+0x12c20) [0x7f4833387c20]
        /lib64/libc.so.6(+0x15d6b7) [0x7f483310d6b7]
        /lib64/libc.so.6(__strdup+0x12) [0x7f4833039802]
        /lib64/libpmi2.so.0(+0x171c) [0x7f483294371c]
        /lib64/libpmi2.so.0(+0x185e) [0x7f483294385e]
        /lib64/libpmi2.so.0(PMI2_Job_Spawn+0x1a7) [0x7f48329453d8]
        /home/kmccall/mpich-slurm-install-4.0_2/lib/libmpi.so.12(+0x23a7db) [0x7f4834e0a7db]
        /home/kmccall/mpich-slurm-install-4.0_2/lib/libmpi.so.12(+0x1fc805) [0x7f4834dcc805]
        /home/kmccall/mpich-slurm-install-4.0_2/lib/libmpi.so.12(MPI_Comm_spawn+0x507) [0x7f4834cea9f7]


    The MPI_Barrier error:

    MPI_Barrier returned the error MPI runtime error: Unknown error class, error stack:
    internal_Barrier(84).......................: MPI_Barrier(MPI_COMM_WORLD) failed
    MPIR_Barrier_impl(91)......................:
    MPIR_Barrier_allcomm_auto(45)..............:
    MPIR_Barrier_intra_dissemination(39).......:
    MPIDI_CH3U_Complete_posted_with_error(1090): Process failed




    Thanks,
    Kurt

    -----Original Message-----
    From: Raffenetti, Ken <raffenet at anl.gov>
    Sent: Friday, February 4, 2022 4:08 PM
    To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>; discuss at mpich.org
    Subject: [EXTERNAL] Re: [mpich-discuss] MPI_Comm_spawn crosses node boundaries

    :(. We need to link -lpmi2 instead of -lpmi. This really needs a patch in our configure script, but adding this to your configure is worth a shot:

      LIBS=-lpmi2

    Ken

    On 2/4/22, 1:56 PM, "Mccall, Kurt E. (MSFC-EV41)" <kurt.e.mccall at nasa.gov> wrote:

        I added the CFLAGS argument and the configuration completed, but make ended with a link error.

        lib/.libs/libmpi.so: undefined reference to `PMI2_Abort'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Info_GetJobAttr'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Job_Spawn'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Nameserv_publish'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Finalize'
        lib/.libs/libmpi.so: undefined reference to `PMI2_KVS_Put'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Info_GetNodeAttr'
        lib/.libs/libmpi.so: undefined reference to `PMI2_KVS_Get'
        lib/.libs/libmpi.so: undefined reference to `PMI2_KVS_Fence'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Nameserv_unpublish'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Info_PutNodeAttr'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Job_GetId'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Init'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Nameserv_lookup'
        collect2: error: ld returned 1 exit status

        -----Original Message-----
        From: Raffenetti, Ken <raffenet at anl.gov>
        Sent: Friday, February 4, 2022 1:23 PM
        To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>; discuss at mpich.org
        Subject: [EXTERNAL] Re: [mpich-discuss] MPI_Comm_spawn crosses node boundaries

        I think I see a new issue. The Slurm website documentation says that their PMI library doesn't support PMI_Spawn_multiple from the PMI 1 API. We can try to force PMI 2 and see what happens. Try adding this to your configure line.

          CFLAGS=-DUSE_PMI2_API

        Ken

        On 2/4/22, 11:58 AM, "Mccall, Kurt E. (MSFC-EV41)" <kurt.e.mccall at nasa.gov> wrote:

            Did that, and launched the job with "srun --mpi=none" and one of the processes failed when MPI_Comm_spawn was called.   Note the :

            internal_Comm_spawn(101)......: MPI_Comm_spawn(command=NeedlesMpiMM, argv=0x226b030, maxprocs=1, info=0x9c000000, 0, MPI_COMM_SELF, intercomm=0x7ffffda9448c, array_of_errcodes=0x7ffffda94378) failed
            MPIDI_Comm_spawn_multiple(225): PMI_Spawn_multiple returned -1



            The other process failed when MPI_Barrier was called:


            internal_Barrier(84).......................: MPI_Barrier(MPI_COMM_WORLD) failed
            MPIR_Barrier_impl(91)......................:
            MPIR_Barrier_allcomm_auto(45)..............:
            MPIR_Barrier_intra_dissemination(39).......:
            MPIDI_CH3U_Complete_posted_with_error(1090): Process failed
            MPI runtime error: Unknown error class, error stack:
            internal_Barrier(84).......................: MPI_Barrier(MPI_COMM_WORLD) failed
            MPIR_Barrier_impl(91)......................:
            MPIR_Barrier_allcomm_auto(45)..............:
            MPIR_Barrier_intra_dissemination(39).......:
            MPIDI_CH3U_Complete_posted_with_error(1090): Process failed
            MPI runtime error: Unknown error class, error stack:
            internal_Barrier(84).......................: MPI_Barrier(MPI_COMM_WORLD) failed
            MPIR_Barrier_impl(91)......................:
            MPIR_Barrier_allcomm_auto(45)..............:
            MPIR_Barrier_intra_dissemination(39).......:
            MPIDI_CH3U_Complete_posted_with_error(1090): Process failed
            MPI manager 1 threw exception: MPI runtime error: Unknown error class, error stack:
            internal_Barrier(84).......................: MPI_Barrier(MPI_COMM_WORLD) failed
            MPIR_Barrier_impl(91)......................:
            MPIR_Barrier_allcomm_auto(45)..............:
            MPIR_Barrier_intra_dissemination(39).......:
            MPIDI_CH3U_Complete_posted_with_error(1090): Process failed

            -----Original Message-----
            From: Raffenetti, Ken <raffenet at anl.gov>
            Sent: Friday, February 4, 2022 11:42 AM
            To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>; discuss at mpich.org
            Subject: [EXTERNAL] Re: [mpich-discuss] MPI_Comm_spawn crosses node boundaries

            Yes, you should also use --with-pm=none. If using mpicc to build your application, you should not have to add -lpmi. The script will handle it for you.

            If using another method, you might have to add it. These days with shared libraries, linkers are often able to manage "inter-library" dependencies just fine. Static builds are a different story.

            Ken

            On 2/4/22, 11:35 AM, "Mccall, Kurt E. (MSFC-EV41)" <kurt.e.mccall at nasa.gov> wrote:

                Ken,

                >> configure --with-slurm=/opt/slurm --with-pmi=slurm

                That is similar to your first suggestion below.   With the above, do I have to include --with-pm=none?   I guess I also have to link my application with -lpmi, right?

                Thanks,
                Kurt

                -----Original Message-----
                From: Raffenetti, Ken <raffenet at anl.gov>
                Sent: Friday, February 4, 2022 11:02 AM
                To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>; discuss at mpich.org
                Subject: [EXTERNAL] Re: [mpich-discuss] MPI_Comm_spawn crosses node boundaries

                When running with srun you need to use the Slurm PMI library, not the embedded Simple PMI2 library. Simple PMI2 is API compatible, but uses a different wire protocol that the Slurm implementation. Try this instead:

                  configure --with-slurm=/opt/slurm --with-pmi=slurm

                This will link the Slurm PMI library to MPICH. I do acknowledge how confusing this must be to users :). Probably a good FAQ topic for our Github discussions page.

                Ken

                On 2/3/22, 7:00 PM, "Mccall, Kurt E. (MSFC-EV41)" <kurt.e.mccall at nasa.gov> wrote:

                    Ken,

                    I'm trying to build MPICH 4.0 in several ways, one of which will be what you suggested below.   For this particular attempt suggested by the Slurm MPI guide, I built it with

                    configure --with-slurm=/opt/slurm --with-pmi=pmi2/simple <etc>

                    and invoked it with

                    srun --mpi=pmi2 <etc>

                    The job is crashing with this message.   Any idea what is wrong?

                    slurmstepd: error: mpi/pmi2: no value for key  in req
                    slurmstepd: error: mpi/pmi2: no value for key  in req
                    slurmstepd: error: mpi/pmi2: no value for key <99>è­þ^? in req
                    slurmstepd: error: mpi/pmi2: no value for key  in req
                    slurmstepd: error: mpi/pmi2: no value for key  in req
                    slurmstepd: error: mpi/pmi2: no value for key ´2¾ÿ^? in req
                    slurmstepd: error: mpi/pmi2: no value for key ; in req
                    slurmstepd: error: mpi/pmi2: no value for key  in req
                    slurmstepd: error: *** STEP 52227.0 ON n001 CANCELLED AT 2022-02-03T18:48:02 ***

                    -----Original Message-----
                    From: Raffenetti, Ken <raffenet at anl.gov>
                    Sent: Friday, January 28, 2022 3:15 PM
                    To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>; discuss at mpich.org
                    Subject: [EXTERNAL] Re: [mpich-discuss] MPI_Comm_spawn crosses node boundaries

                    On 1/28/22, 2:22 PM, "Mccall, Kurt E. (MSFC-EV41)" <kurt.e.mccall at nasa.gov> wrote:

                        Ken,

                        I confirmed that MPI_Comm_spawn fails completely if I build MPICH without the PMI2 option.

                    Dang, I thought that would work :(.

                        Looking at the Slurm documentation https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fmpi_guide.html%23intel_mpiexec_hydra&data=04%7C01%7Ckurt.e.mccall%40nasa.gov%7C9249f74763f1428b909908d9e82adeef%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637796093091225985%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=dcpAlSu4FLsL2o39rDH0t1jyzq4AYSOQlZXMReZmmJU%3D&reserved=0
                        it states  "All MPI_comm_spawn work fine now going through hydra's PMI 1.1 interface."   The full quote is below for reference.

                        1) how do I build MPICH to support hydra's PMI 1.1 interface?

                    That is the default, so no extra configuration should be needed. One thing I notice in your log output is that the Slurm envvars seems to have changed name from what we have in our source. E.g. SLURM_JOB_NODELIST vs. SLURM_NODELIST. Do your initial processes launch on the right nodes?

                        2) Can you offer any guesses on how to build Slurm to do the same?  (I realize this isn't a Slurm forum  😊)

                    Hopefully you don't need to rebuild Slurm to do it. What you could try is configuring the Slurm PMI library when building MPICH. Add "--with-pm=none --with-pmi=slurm --with-slurm=<path/to/install>". Then use srun instead of mpiexec and see how it goes.

                    Ken






_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20220207/77ce3f88/attachment-0001.html>


More information about the discuss mailing list