[mpich-discuss] libfabric+psm2 performance

Bland, Wesley wesley.bland at intel.com
Wed Jun 16 08:46:37 CDT 2021


I see. My guess (and I’ll let Hui confirm) is that you’re not actually using the PSM2 capabilities when you compile with —with-device=ch4:ofi. Instead, you’re getting a backup capability set that is supposed to work with any provider. When you switch to —with-device=ch4:ofi:psm2, you’re forcing MPICH to use the PSM2 capabilities (as we currently think of them for OFI 1.11 or whatever we expect right now) and your version of PSM2 doesn’t support that, so it crashes during initialization because it can’t find a provider that meets its requirements. If you set the environment variable MPIR_CVAR_CH4_OFI_CAPABILITY_SETS_DEBUG, it will print out the capability set (and provider) that MPICH is using so you can confirm.

Assuming that’s the case, there’s probably some manual set of CVARs that will get you the right set of capabilities, but I’m not sure what it would be off the top of my head. 1.5 is pretty old at this point so it’s disappeared from my brain. :)

I’m not that surprised that MVAPICH might be winning with an older version of OFI. My understanding is that it’s still on CH3 (I might be wrong here) and isn’t using the CH4 capability set code. I think the capability sets probably improve things in MPICH when matching our expected versions, but could cause these sorts of legacy issues. Just a guess though.

Good luck!
Wes

On Jun 16, 2021, at 2:48 AM, Antonio J. Peña <antonio.pena at bsc.es<mailto:antonio.pena at bsc.es>> wrote:


Hi Wesley,

Happy to hear from you. With that setting I cannot get out from this runtime error at init:

Abort(69832847) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
PMPI_Init(98)............: MPI_Init(argc=0x7ffd4776159c, argv=0x7ffd47761590) failed
MPII_Init_thread(196)....:
MPID_Init(472)...........:
MPID_Init_local(585).....:
MPIDI_OFI_init_local(633):
open_fabric(1360)........: OFI fi_getinfo() failed (ofi_init.c:1360:open_fabric:Function not implemented)

There must be something in MPICH, at least since v. 3.4.2, or I'm doing something wrong at the MPICH side. My libfabric is working fine: I get good performance with fi_pingpong, but I've also just tried MVAPICH (with psm2 netmod, not under libfabric) and it gave good performance out of the box.

Although I'd rather prefer to tweak MPICH because I'm far more comfortable with that code, I'm okay moving ahead with MVAPICH, so unless this is interesting from your side (I guess you don't care that much about psm2 now), we can close this thread.

Thanks a lot for your help.

  Toni



El 15/6/21 a las 15:03, Wesley Bland via discuss escribió:
Hey Toni,

I’d be surprised that the performance drops that much, but you can try —with-device=ch4:ofi:psm2 to convert at least some of the branches to be compile-time instead of runtime. After that, I don’t remember enough about OFI 1.5. There might have been some changes in MPICH over the last year or two that makes that version not perform as well…

Thanks,
Wes

On Jun 15, 2021, at 4:46 AM, Antonio Peña via discuss <discuss at mpich.org<mailto:discuss at mpich.org>> wrote:


Hi folks,

I'm setting up an MPICH over libfabric over psm2 for MareNostrum (Omni-Path), to try out some ideas.

I've compiled libfabric 1.5 (last one that compiles in this machine) over opa-psm2-11.2.185, and mpich-3.4.2 + mpich-4.0a1 in both ch3 and ch4 (yes 4 MPICH variants). There's only psm2 support in libfabric, so no danger of falling back to other providers. ldd confirms my libfabric is linked.

./fi_info
    provider: psm2
    fabric: psm2
    domain: psm2
    version: 1.5
    type: FI_EP_RDM
    protocol: FI_PROTO_PSMX2

I'm comparing 2-node pt2pt performance against impi/2017.4 using osu microbenchmarks.

While both fi_pingong and impi give me a max. BW of ~10 MB/s, all mpich versions stick at ~3 MB/s.

Is this expected? I mean, is there so much secret sauce in impi? Or, am likely doing something wrong?

I'm doing fairly plain configures, nothing fancy, e.g.:
  ./configure --prefix=... --with-device=ch4:ofi --with-libfabric=...

I'd appreciate some guidance  - my MPICH tweaking is a little rusted :)

Best,
  Toni
_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss

--
Antonio J. Peña (PhD)
Team Lead, Accelerators and Communications for HPC | Teaching and Research Staff
Sr. Researcher, Computer Sciences Department       | Computer Architecture Department
Barcelona Supercomputing Center (BSC)              | Universitat Politècnica de Catalunya (UPC)
http://www.bsc.es/pena-antonio
===============================
Looking for job opportunities? Open positions in my team. Please contact me.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20210616/ca12ec6f/attachment-0001.html>


More information about the discuss mailing list