[mpich-discuss] MPI_Comm_spawn crosses node boundaries

Raffenetti, Ken raffenet at anl.gov
Mon Feb 7 16:28:47 CST 2022


Darn. I'm creating an issue to track this since it will likely take some time and effort to investigate each configuration.

https://github.com/pmodels/mpich/issues/5835

Ken

On 2/7/22, 12:41 PM, "Mccall, Kurt E. (MSFC-EV41)" <kurt.e.mccall at nasa.gov> wrote:

    Ken,

    To review, I configured as follows:

    $ configure   CFLAGS=-DUSE_PMI2_API   LIBS=-lpmi2   --with-pm=none   --with-pmi=slurm   --with-slurm=/opt/slurm     < ...>

    and ran srun with the argument --mpi=pmi2.

    The job is stil segfaulting in MPI_Comm_spawn in one process and returning an error from MPI_Barrier in the other.   Error messages below:


    The MPI_Comm_spawn error:


    backtrace for error: backtrace after receiving signal SIGSEGV:
        /home/kmccall/Needles2/./NeedlesMpiMM() [0x45ab36]
        /lib64/libpthread.so.0(+0x12c20) [0x7f4833387c20]
        /lib64/libc.so.6(+0x15d6b7) [0x7f483310d6b7]
        /lib64/libc.so.6(__strdup+0x12) [0x7f4833039802]
        /lib64/libpmi2.so.0(+0x171c) [0x7f483294371c]
        /lib64/libpmi2.so.0(+0x185e) [0x7f483294385e]
        /lib64/libpmi2.so.0(PMI2_Job_Spawn+0x1a7) [0x7f48329453d8]
        /home/kmccall/mpich-slurm-install-4.0_2/lib/libmpi.so.12(+0x23a7db) [0x7f4834e0a7db]
        /home/kmccall/mpich-slurm-install-4.0_2/lib/libmpi.so.12(+0x1fc805) [0x7f4834dcc805]
        /home/kmccall/mpich-slurm-install-4.0_2/lib/libmpi.so.12(MPI_Comm_spawn+0x507) [0x7f4834cea9f7]


    The MPI_Barrier error:

    MPI_Barrier returned the error MPI runtime error: Unknown error class, error stack:
    internal_Barrier(84).......................: MPI_Barrier(MPI_COMM_WORLD) failed
    MPIR_Barrier_impl(91)......................:
    MPIR_Barrier_allcomm_auto(45)..............:
    MPIR_Barrier_intra_dissemination(39).......:
    MPIDI_CH3U_Complete_posted_with_error(1090): Process failed




    Thanks,
    Kurt

    -----Original Message-----
    From: Raffenetti, Ken <raffenet at anl.gov> 
    Sent: Friday, February 4, 2022 4:08 PM
    To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>; discuss at mpich.org
    Subject: [EXTERNAL] Re: [mpich-discuss] MPI_Comm_spawn crosses node boundaries

    :(. We need to link -lpmi2 instead of -lpmi. This really needs a patch in our configure script, but adding this to your configure is worth a shot:

      LIBS=-lpmi2

    Ken

    On 2/4/22, 1:56 PM, "Mccall, Kurt E. (MSFC-EV41)" <kurt.e.mccall at nasa.gov> wrote:

        I added the CFLAGS argument and the configuration completed, but make ended with a link error.

        lib/.libs/libmpi.so: undefined reference to `PMI2_Abort'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Info_GetJobAttr'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Job_Spawn'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Nameserv_publish'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Finalize'
        lib/.libs/libmpi.so: undefined reference to `PMI2_KVS_Put'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Info_GetNodeAttr'
        lib/.libs/libmpi.so: undefined reference to `PMI2_KVS_Get'
        lib/.libs/libmpi.so: undefined reference to `PMI2_KVS_Fence'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Nameserv_unpublish'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Info_PutNodeAttr'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Job_GetId'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Init'
        lib/.libs/libmpi.so: undefined reference to `PMI2_Nameserv_lookup'
        collect2: error: ld returned 1 exit status

        -----Original Message-----
        From: Raffenetti, Ken <raffenet at anl.gov> 
        Sent: Friday, February 4, 2022 1:23 PM
        To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>; discuss at mpich.org
        Subject: [EXTERNAL] Re: [mpich-discuss] MPI_Comm_spawn crosses node boundaries

        I think I see a new issue. The Slurm website documentation says that their PMI library doesn't support PMI_Spawn_multiple from the PMI 1 API. We can try to force PMI 2 and see what happens. Try adding this to your configure line.

          CFLAGS=-DUSE_PMI2_API

        Ken

        On 2/4/22, 11:58 AM, "Mccall, Kurt E. (MSFC-EV41)" <kurt.e.mccall at nasa.gov> wrote:

            Did that, and launched the job with "srun --mpi=none" and one of the processes failed when MPI_Comm_spawn was called.   Note the :

            internal_Comm_spawn(101)......: MPI_Comm_spawn(command=NeedlesMpiMM, argv=0x226b030, maxprocs=1, info=0x9c000000, 0, MPI_COMM_SELF, intercomm=0x7ffffda9448c, array_of_errcodes=0x7ffffda94378) failed
            MPIDI_Comm_spawn_multiple(225): PMI_Spawn_multiple returned -1



            The other process failed when MPI_Barrier was called:


            internal_Barrier(84).......................: MPI_Barrier(MPI_COMM_WORLD) failed
            MPIR_Barrier_impl(91)......................:
            MPIR_Barrier_allcomm_auto(45)..............:
            MPIR_Barrier_intra_dissemination(39).......:
            MPIDI_CH3U_Complete_posted_with_error(1090): Process failed
            MPI runtime error: Unknown error class, error stack:
            internal_Barrier(84).......................: MPI_Barrier(MPI_COMM_WORLD) failed
            MPIR_Barrier_impl(91)......................:
            MPIR_Barrier_allcomm_auto(45)..............:
            MPIR_Barrier_intra_dissemination(39).......:
            MPIDI_CH3U_Complete_posted_with_error(1090): Process failed
            MPI runtime error: Unknown error class, error stack:
            internal_Barrier(84).......................: MPI_Barrier(MPI_COMM_WORLD) failed
            MPIR_Barrier_impl(91)......................:
            MPIR_Barrier_allcomm_auto(45)..............:
            MPIR_Barrier_intra_dissemination(39).......:
            MPIDI_CH3U_Complete_posted_with_error(1090): Process failed
            MPI manager 1 threw exception: MPI runtime error: Unknown error class, error stack:
            internal_Barrier(84).......................: MPI_Barrier(MPI_COMM_WORLD) failed
            MPIR_Barrier_impl(91)......................:
            MPIR_Barrier_allcomm_auto(45)..............:
            MPIR_Barrier_intra_dissemination(39).......:
            MPIDI_CH3U_Complete_posted_with_error(1090): Process failed

            -----Original Message-----
            From: Raffenetti, Ken <raffenet at anl.gov> 
            Sent: Friday, February 4, 2022 11:42 AM
            To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>; discuss at mpich.org
            Subject: [EXTERNAL] Re: [mpich-discuss] MPI_Comm_spawn crosses node boundaries

            Yes, you should also use --with-pm=none. If using mpicc to build your application, you should not have to add -lpmi. The script will handle it for you.

            If using another method, you might have to add it. These days with shared libraries, linkers are often able to manage "inter-library" dependencies just fine. Static builds are a different story.

            Ken

            On 2/4/22, 11:35 AM, "Mccall, Kurt E. (MSFC-EV41)" <kurt.e.mccall at nasa.gov> wrote:

                Ken,

                >> configure --with-slurm=/opt/slurm --with-pmi=slurm

                That is similar to your first suggestion below.   With the above, do I have to include --with-pm=none?   I guess I also have to link my application with -lpmi, right?

                Thanks,
                Kurt

                -----Original Message-----
                From: Raffenetti, Ken <raffenet at anl.gov> 
                Sent: Friday, February 4, 2022 11:02 AM
                To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>; discuss at mpich.org
                Subject: [EXTERNAL] Re: [mpich-discuss] MPI_Comm_spawn crosses node boundaries

                When running with srun you need to use the Slurm PMI library, not the embedded Simple PMI2 library. Simple PMI2 is API compatible, but uses a different wire protocol that the Slurm implementation. Try this instead:

                  configure --with-slurm=/opt/slurm --with-pmi=slurm

                This will link the Slurm PMI library to MPICH. I do acknowledge how confusing this must be to users :). Probably a good FAQ topic for our Github discussions page.

                Ken

                On 2/3/22, 7:00 PM, "Mccall, Kurt E. (MSFC-EV41)" <kurt.e.mccall at nasa.gov> wrote:

                    Ken,

                    I'm trying to build MPICH 4.0 in several ways, one of which will be what you suggested below.   For this particular attempt suggested by the Slurm MPI guide, I built it with

                    configure --with-slurm=/opt/slurm --with-pmi=pmi2/simple <etc>

                    and invoked it with

                    srun --mpi=pmi2 <etc>

                    The job is crashing with this message.   Any idea what is wrong?

                    slurmstepd: error: mpi/pmi2: no value for key  in req
                    slurmstepd: error: mpi/pmi2: no value for key  in req
                    slurmstepd: error: mpi/pmi2: no value for key <99>è­þ^? in req
                    slurmstepd: error: mpi/pmi2: no value for key  in req
                    slurmstepd: error: mpi/pmi2: no value for key  in req
                    slurmstepd: error: mpi/pmi2: no value for key ´2¾ÿ^? in req
                    slurmstepd: error: mpi/pmi2: no value for key ; in req
                    slurmstepd: error: mpi/pmi2: no value for key  in req
                    slurmstepd: error: *** STEP 52227.0 ON n001 CANCELLED AT 2022-02-03T18:48:02 ***

                    -----Original Message-----
                    From: Raffenetti, Ken <raffenet at anl.gov> 
                    Sent: Friday, January 28, 2022 3:15 PM
                    To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>; discuss at mpich.org
                    Subject: [EXTERNAL] Re: [mpich-discuss] MPI_Comm_spawn crosses node boundaries

                    On 1/28/22, 2:22 PM, "Mccall, Kurt E. (MSFC-EV41)" <kurt.e.mccall at nasa.gov> wrote:

                        Ken,

                        I confirmed that MPI_Comm_spawn fails completely if I build MPICH without the PMI2 option.

                    Dang, I thought that would work :(.

                        Looking at the Slurm documentation https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fmpi_guide.html%23intel_mpiexec_hydra&data=04%7C01%7Ckurt.e.mccall%40nasa.gov%7C9249f74763f1428b909908d9e82adeef%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637796093091225985%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=dcpAlSu4FLsL2o39rDH0t1jyzq4AYSOQlZXMReZmmJU%3D&reserved=0
                        it states  "All MPI_comm_spawn work fine now going through hydra's PMI 1.1 interface."   The full quote is below for reference.

                        1) how do I build MPICH to support hydra's PMI 1.1 interface?

                    That is the default, so no extra configuration should be needed. One thing I notice in your log output is that the Slurm envvars seems to have changed name from what we have in our source. E.g. SLURM_JOB_NODELIST vs. SLURM_NODELIST. Do your initial processes launch on the right nodes?

                        2) Can you offer any guesses on how to build Slurm to do the same?  (I realize this isn't a Slurm forum  😊)

                    Hopefully you don't need to rebuild Slurm to do it. What you could try is configuring the Slurm PMI library when building MPICH. Add "--with-pm=none --with-pmi=slurm --with-slurm=<path/to/install>". Then use srun instead of mpiexec and see how it goes.

                    Ken








More information about the discuss mailing list