[mpich-discuss] MPICH 5.0.1 performance on HPE SS11 plus more - a slurm problem

Howard Pritchard hppritcha at gmail.com
Wed Jul 30 13:42:41 CDT 2025


Hi Hui

That didn’t help.  I am not surprised though as our cluster is an NVIDIA
free zone.  What did help is to switch to the mpich 4.3.x branch and
latency results are nominal and the slurm problem went away too.  So we
will stick with that branch.

Howard

On Mon, Jul 28, 2025 at 4:15 PM Zhou, Hui <zhouh at anl.gov> wrote:

> Hi Howard,
>
>  I wonder whether it is due to the overhead of querying pointer
> attributes. Could you try disable GPU support with `MPIR_CVAR_ENABLE_GPU=0`
> and see if the latency improves?
>
> Hui
> ------------------------------
> *From:* Howard Pritchard via discuss <discuss at mpich.org>
> *Sent:* Monday, July 28, 2025 9:41 AM
> *To:* discuss at mpich.org <discuss at mpich.org>
> *Cc:* Howard Pritchard <hppritcha at gmail.com>
> *Subject:* [mpich-discuss] MPICH 5.0.1 performance on HPE SS11 plus more
> - a slurm problem
>
> Hi Folks, We are seeing a strange performance issue on our HPE SS11 system
> when testing osu_latency inter-node with MPICH. First the info: system
> using libfabric 1. 22. 0 slurm - 24. 11. 5 Here's my mpichversion output:
> MPICH Version:       5. 0. 0a1
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
> Hi Folks,
>
> We are seeing a strange performance issue on our HPE SS11 system when
> testing osu_latency inter-node with MPICH.
>
> First the info:
> system using libfabric 1.22.0
> slurm - 24.11.5
>
> Here's my mpichversion output:
>
> MPICH Version:      5.0.0a1
>
> MPICH Release date: unreleased development copy
>
> MPICH ABI:          0:0:0
>
> MPICH Device:       ch4:ofi
>
> MPICH configure:    --prefix=/XXXX/mpich_again/install --enable-g=no
> --enable-error-checking=no --with-device=ch4:ofi --enable-threads=multiple
> --with-ch4-shmmods=posix,xpmem --enable-thread-cs=per-vci
> --with-libfabric=/opt/cray/libfabric/1.22.0
> --with-xpmem=/opt/cray/xpmem/default --with-pmix=/opt/pmix/gcc4x/5.0.8
> --enable-fast=O3
>
> MPICH CC:           gcc     -O3
>
> MPICH CXX:          g++   -O3
>
> MPICH F77:          gfortran   -O3
>
> MPICH FC:           gfortran   -O3
>
> MPICH features:     threadcomm
>
>
>
> And here's the OSU latency results:
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_belong_chk: nid001439 [1]:
> pmixp_coll.c:280: No process controlled by this slurmstepd is involved in
> this collective.
>
> slurmstepd: error:  mpi/pmix_v4: _process_server_request: nid001439 [1]:
> pmixp_server.c:923: Unable to pmixp_state_coll_get()
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_check: nid001438 [0]:
> pmixp_coll_ring.c:614: 0x15005c005dc0: unexpected contrib from nid001439:1,
> expected is 0
>
> slurmstepd: error:  mpi/pmix_v4: _process_server_request: nid001438 [0]:
> pmixp_server.c:937: 0x15005c005dc0: unexpected contrib from nid001439:1,
> coll->seq=0, seq=0
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_reset_if_to: nid001438
> [0]: pmixp_coll_ring.c:738: 0x1500580532f0: collective timeout seq=0
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_log: nid001438 [0]:
> pmixp_coll.c:286: Dumping collective state
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001438 [0]:
> pmixp_coll_ring.c:756: 0x1500580532f0: COLL_FENCE_RING state seq=0
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001438 [0]:
> pmixp_coll_ring.c:758: my peerid: 0:nid001438
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001438 [0]:
> pmixp_coll_ring.c:765: neighbor id: next 1:nid001439, prev 1:nid001439
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001438 [0]:
> pmixp_coll_ring.c:775: Context ptr=0x150058053368, #0, in-use=0
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001438 [0]:
> pmixp_coll_ring.c:775: Context ptr=0x1500580533a0, #1, in-use=0
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001438 [0]:
> pmixp_coll_ring.c:775: Context ptr=0x1500580533d8, #2, in-use=1
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001438 [0]:
> pmixp_coll_ring.c:786:  seq=0 contribs: loc=1/prev=0/fwd=1
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001438 [0]:
> pmixp_coll_ring.c:788:  neighbor contribs [2]:
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001438 [0]:
> pmixp_coll_ring.c:821:  done contrib: -
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001438 [0]:
> pmixp_coll_ring.c:823:  wait contrib: nid001439
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001438 [0]:
> pmixp_coll_ring.c:825:  status=PMIXP_COLL_RING_PROGRESS
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001438 [0]:
> pmixp_coll_ring.c:829:  buf (offset/size): 36/16384
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_reset_if_to: nid001439
> [1]: pmixp_coll_ring.c:738: 0x151d0c053400: collective timeout seq=0
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_log: nid001439 [1]:
> pmixp_coll.c:286: Dumping collective state
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001439 [1]:
> pmixp_coll_ring.c:756: 0x151d0c053400: COLL_FENCE_RING state seq=0
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001439 [1]:
> pmixp_coll_ring.c:758: my peerid: 1:nid001439
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001439 [1]:
> pmixp_coll_ring.c:765: neighbor id: next 0:nid001438, prev 0:nid001438
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001439 [1]:
> pmixp_coll_ring.c:775: Context ptr=0x151d0c053478, #0, in-use=0
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001439 [1]:
> pmixp_coll_ring.c:775: Context ptr=0x151d0c0534b0, #1, in-use=0
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001439 [1]:
> pmixp_coll_ring.c:775: Context ptr=0x151d0c0534e8, #2, in-use=1
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001439 [1]:
> pmixp_coll_ring.c:786:  seq=0 contribs: loc=1/prev=0/fwd=1
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001439 [1]:
> pmixp_coll_ring.c:788:  neighbor contribs [2]:
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001439 [1]:
> pmixp_coll_ring.c:821:  done contrib: -
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001439 [1]:
> pmixp_coll_ring.c:823:  wait contrib: nid001438
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001439 [1]:
> pmixp_coll_ring.c:825:  status=PMIXP_COLL_RING_PROGRESS
>
> slurmstepd: error:  mpi/pmix_v4: pmixp_coll_ring_log: nid001439 [1]:
> pmixp_coll_ring.c:829:  buf (offset/size): 36/16384
>
> # OSU MPI Latency Test v5.8
>
> # Size          Latency (us)
>
> 0                       1.66
>
> 1                       9.29
>
> 2                       9.57
>
> 4                       9.69
>
> 8                       9.76
>
> 16                      9.77
>
> 32                      9.76
>
> 64                      9.77
>
> 128                    10.32
>
> 256                     7.54
>
> 512                     7.45
>
> 1024                    7.38
>
> 2048                    7.37
>
> 4096                    7.45
>
> 8192                    9.21
>
> 16384                   9.70
>
> 32768                  10.63
>
> 65536                  13.15
>
> 131072                 16.96
>
> 262144                 23.84
>
> 524288                 36.16
>
> 1048576                60.36
>
> 2097152               108.43
>
> 4194304               228.31
>
>
> Note the slurm behavior is - I launch the job.  Go get coffee, do some
> duo-lingo, read some emails, then after about 10 minutes the osu latency
> runs.
>
>
> I did not get the slurm problems using an older mpich 4.3.1 but did get
> the same performance issue.  9 usecs doesn't seem right for an 8-byte
> pingpong over libfabric S11.  I was expecting more like 1.6 or so.
>
>
> I am confident the slurm issue is unrelated to the latency issue.
>
> Thanks for any suggestions on how to address either issue however.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20250730/bf6c4818/attachment-0001.html>


More information about the discuss mailing list