[mpich-discuss] Issue with OrangeFS 2.9.7 direct interface and MPICH 3.3.1 using CH4 device

Si, Min msi at anl.gov
Sun Oct 6 10:25:42 CDT 2019


Hi Kun,

Can you please try to reproduce the issue in a simple MPI program which does not use OrangeFS ? It is hard for the MPICH community to help when mixing MPI and OrangeFS together, because we are not OrangeFS experts.

Besides, for InfiniBand networks, you might want to use `ch4:ucx` instead of  `ch4:ofi`. But I do not think it causes the failure in your use case.

Best regards,
Min

On 2019/10/04 12:21, Kun Feng via discuss wrote:
To Whom It May Concern,

Recently, I switched to CH4 device in MPICH 3.3.1 for better network performance over the RoCE network we are using.
I realized that my code fails to run when I use direct interface of OrangeFS 2.9.7. It exits without any error. But even simple helloworld cannot print anything. It happens only when I enable direct interface of OrangeFS by linking -lorangefsposix.
Could you please help me on this issue?
Here are some information that might be useful:
Output of ibv_devinfo of 40Gbps Mellanox ConnectX-4 Lx adapter:
hca_id: mlx5_0
        transport:                      InfiniBand (0)
        fw_ver:                         14.20.1030
        node_guid:                      248a:0703:0015:a800
        sys_image_guid:                 248a:0703:0015:a800
        vendor_id:                      0x02c9
        vendor_part_id:                 4117
        hw_ver:                         0x0
        board_id:                       LNV2430110027
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

hca_id: i40iw0
        transport:                      iWARP (1)
        fw_ver:                         0.2
        node_guid:                      7cd3:0aef:3da0:0000
        sys_image_guid:                 7cd3:0aef:3da0:0000
        vendor_id:                      0x8086
        vendor_part_id:                 14289
        hw_ver:                         0x0
        board_id:                       I40IW Board ID
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               1
                        port_lmc:               0x00
                        link_layer:             Ethernet
MPICH 3.3.1 configuration command: ./configure --with-device=ch4:ofi --with-pvfs2=/home/kfeng/install --enable-shared --enable-romio --with-file-system=ufs+pvfs2+zoidfs --enable-fortran=no --with-libfabric=/home/kfeng/install
OrangeFS 2.9.7 configuration command: ./configure --prefix=/home/kfeng/install --enable-shared --enable-jni --with-jdk=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-0.el7_5.x86_64 --with-kernel=/usr/src/kernels/3.10.0-862.el7.x86_64
Make command: mpicc -o ~/hello ~/hello.c -L/home/kfeng/install/lib -lorangefsposix
The verbose outputs of mpiexec are attached.

Thanks
Kun



_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20191006/7c615bfa/attachment.html>


More information about the discuss mailing list