[mpich-discuss] Issue with OrangeFS 2.9.7 direct interface and MPICH 3.3.1 using CH4 device

Kun Feng kfeng1 at hawk.iit.edu
Tue Oct 8 18:53:42 CDT 2019


Hi Rob,

The liborangefsposix library works when I use standard file path like
/mnt/pfs/data instead of pvfs2://mnt/pfs/data.
I want to use that because it gives better performance in metadata
operation which I need for my project.
I'm not using MPI I/O to access the files on OrangeFS. It is just a
standard C interface call (fopen and fclose).
CH4 and liborangefsposix is the only combination that produces this error.
Here is the result matrix I have tried.
MPICH device Linking OrangeFS Result
CH3 non direct interface success
CH3 direct interface success
CH4 non direct interface success
CH4 direct interface no output

Thanks
Kun


On Tue, Oct 8, 2019 at 10:10 AM Latham, Robert J. <robl at mcs.anl.gov> wrote:

> On Sun, 2019-10-06 at 12:20 -0500, Kun Feng via discuss wrote:
> > Hi Min,
> >
> > If that is the case, please ignore this email. Nothing is wrong
> > without OrangeFS direct interface. I will try "ch4:ucx". Thank you
> > for the info.
>
> Does the 'pvfs2' driver still work?  the liborangefsposix library might
> be intercepting system calls MPICH expects to use natively.
>
> The liborangefsposix library is intended more for non-mpi applications
> -- Hadoop workflows, for example.  MPICH's pvfs2 driver (the old name
> for OrangeFS) speaks directly to the orangefs servers.  It also uses a
> few optimizations not available if MPICH treats the OrangeFS like a
> traditional UNIX-like file system.
>
> ==rob
>
> >
> > On Sun, Oct 6, 2019 at 10:25 AM Si, Min via discuss <
> > discuss at mpich.org> wrote:
> > > Hi Kun,
> > >
> > > Can you please try to reproduce the issue in a simple MPI program
> > > which does not use OrangeFS ? It is hard for the MPICH community to
> > > help when mixing MPI and OrangeFS together, because we are not
> > > OrangeFS experts.
> > >
> > > Besides, for InfiniBand networks, you might want to use `ch4:ucx`
> > > instead of  `ch4:ofi`. But I do not think it causes the failure in
> > > your use case.
> > >
> > > Best regards,
> > > Min
> > >
> > > On 2019/10/04 12:21, Kun Feng via discuss wrote:
> > > > To Whom It May Concern,
> > > >
> > > > Recently, I switched to CH4 device in MPICH 3.3.1 for better
> > > > network performance over the RoCE network we are using.
> > > > I realized that my code fails to run when I use direct interface
> > > > of OrangeFS 2.9.7. It exits without any error. But even simple
> > > > helloworld cannot print anything. It happens only when I enable
> > > > direct interface of OrangeFS by linking -lorangefsposix.
> > > > Could you please help me on this issue?
> > > > Here are some information that might be useful:
> > > > Output of ibv_devinfo of 40Gbps Mellanox ConnectX-4 Lx adapter:
> > > > hca_id: mlx5_0
> > > >         transport:                      InfiniBand (0)
> > > >         fw_ver:                         14.20.1030
> > > >         node_guid:                      248a:0703:0015:a800
> > > >         sys_image_guid:                 248a:0703:0015:a800
> > > >         vendor_id:                      0x02c9
> > > >         vendor_part_id:                 4117
> > > >         hw_ver:                         0x0
> > > >         board_id:                       LNV2430110027
> > > >         phys_port_cnt:                  1
> > > >                 port:   1
> > > >                         state:                  PORT_ACTIVE (4)
> > > >                         max_mtu:                4096 (5)
> > > >                         active_mtu:             1024 (3)
> > > >                         sm_lid:                 0
> > > >                         port_lid:               0
> > > >                         port_lmc:               0x00
> > > >                         link_layer:             Ethernet
> > > >
> > > > hca_id: i40iw0
> > > >         transport:                      iWARP (1)
> > > >         fw_ver:                         0.2
> > > >         node_guid:                      7cd3:0aef:3da0:0000
> > > >         sys_image_guid:                 7cd3:0aef:3da0:0000
> > > >         vendor_id:                      0x8086
> > > >         vendor_part_id:                 14289
> > > >         hw_ver:                         0x0
> > > >         board_id:                       I40IW Board ID
> > > >         phys_port_cnt:                  1
> > > >                 port:   1
> > > >                         state:                  PORT_ACTIVE (4)
> > > >                         max_mtu:                4096 (5)
> > > >                         active_mtu:             1024 (3)
> > > >                         sm_lid:                 0
> > > >                         port_lid:               1
> > > >                         port_lmc:               0x00
> > > >                         link_layer:             Ethernet
> > > > MPICH 3.3.1 configuration command: ./configure --with-
> > > > device=ch4:ofi --with-pvfs2=/home/kfeng/install --enable-shared
> > > > --enable-romio --with-file-system=ufs+pvfs2+zoidfs --enable-
> > > > fortran=no --with-libfabric=/home/kfeng/install
> > > > OrangeFS 2.9.7 configuration command: ./configure --
> > > > prefix=/home/kfeng/install --enable-shared --enable-jni --with-
> > > > jdk=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-0.el7_5.x86_64
> > > > --with-kernel=/usr/src/kernels/3.10.0-862.el7.x86_64
> > > > Make command: mpicc -o ~/hello ~/hello.c
> > > > -L/home/kfeng/install/lib -lorangefsposix
> > > > The verbose outputs of mpiexec are attached.
> > > >
> > > > Thanks
> > > > Kun
> > > >
> > > >
> > > > _______________________________________________
> > > > discuss mailing list     discuss at mpich.org
> > > > To manage subscription options or unsubscribe:
> > > > https://lists.mpich.org/mailman/listinfo/discuss
> > >
> > > _______________________________________________
> > > discuss mailing list     discuss at mpich.org
> > > To manage subscription options or unsubscribe:
> > > https://lists.mpich.org/mailman/listinfo/discuss
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20191008/ed4f8b39/attachment.html>


More information about the discuss mailing list