[mpich-discuss] libfabric+psm2 performance

Antonio J. Peña antonio.pena at bsc.es
Wed Jun 16 02:48:21 CDT 2021


Hi Wesley,

Happy to hear from you. With that setting I cannot get out from this 
runtime error at init:

Abort(69832847) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: 
Other MPI error, error stack:
PMPI_Init(98)............: MPI_Init(argc=0x7ffd4776159c, 
argv=0x7ffd47761590) failed
MPII_Init_thread(196)....:
MPID_Init(472)...........:
MPID_Init_local(585).....:
MPIDI_OFI_init_local(633):
open_fabric(1360)........: OFI fi_getinfo() failed 
(ofi_init.c:1360:open_fabric:Function not implemented)

There must be something in MPICH, at least since v. 3.4.2, or I'm doing 
something wrong at the MPICH side. My libfabric is working fine: I get 
good performance with fi_pingpong, but I've also just tried MVAPICH 
(with psm2 netmod, not under libfabric) and it gave good performance out 
of the box.

Although I'd rather prefer to tweak MPICH because I'm far more 
comfortable with that code, I'm okay moving ahead with MVAPICH, so 
unless this is interesting from your side (I guess you don't care that 
much about psm2 now), we can close this thread.

Thanks a lot for your help.

   Toni



El 15/6/21 a las 15:03, Wesley Bland via discuss escribió:
> Hey Toni,
>
> I’d be surprised that the performance drops that much, but you can try —with-device=ch4:ofi:psm2 to convert at least some of the branches to be compile-time instead of runtime. After that, I don’t remember enough about OFI 1.5. There might have been some changes in MPICH over the last year or two that makes that version not perform as well…
>
> Thanks,
> Wes
>
>> On Jun 15, 2021, at 4:46 AM, Antonio Peña via discuss <discuss at mpich.org> wrote:
>>
>>
>> Hi folks,
>>
>> I'm setting up an MPICH over libfabric over psm2 for MareNostrum (Omni-Path), to try out some ideas.
>>
>> I've compiled libfabric 1.5 (last one that compiles in this machine) over opa-psm2-11.2.185, and mpich-3.4.2 + mpich-4.0a1 in both ch3 and ch4 (yes 4 MPICH variants). There's only psm2 support in libfabric, so no danger of falling back to other providers. ldd confirms my libfabric is linked.
>>
>> ./fi_info
>>      provider: psm2
>>      fabric: psm2
>>      domain: psm2
>>      version: 1.5
>>      type: FI_EP_RDM
>>      protocol: FI_PROTO_PSMX2
>>
>> I'm comparing 2-node pt2pt performance against impi/2017.4 using osu microbenchmarks.
>>
>> While both fi_pingong and impi give me a max. BW of ~10 MB/s, all mpich versions stick at ~3 MB/s.
>>
>> Is this expected? I mean, is there so much secret sauce in impi? Or, am likely doing something wrong?
>>
>> I'm doing fairly plain configures, nothing fancy, e.g.:
>>    ./configure --prefix=... --with-device=ch4:ofi --with-libfabric=...
>>
>> I'd appreciate some guidance  - my MPICH tweaking is a little rusted :)
>>
>> Best,
>>    Toni
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

-- 
Antonio J. Peña (PhD)
Team Lead, Accelerators and Communications for HPC | Teaching and Research Staff
Sr. Researcher, Computer Sciences Department       | Computer Architecture Department
Barcelona Supercomputing Center (BSC)              | Universitat Politècnica de Catalunya (UPC)
http://www.bsc.es/pena-antonio
===============================
Looking for job opportunities? Open positions in my team. Please contact me.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4214 bytes
Desc: Firma criptogr��fica S/MIME
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20210616/9fb6fd93/attachment.p7s>


More information about the discuss mailing list