Starting from OneFS 9.2.0.0, NFSv3 over RDMA is introduced for better performance. Please refer to Chapter 6 of OneFS NFS white paper for the technical details. This article provides guidance on using the NFSv3 over RDMA feature with your OneFS clusters. Note that the OneFS NFSv3 over RDMA functionality requires that any clients are ROCEv2 capable. As such, client-side configuration is also needed.
OneFS Cluster configuration
To use NFSv3 over RDMA, your OneFS cluster hardware must meet requirements:
- Node type: All Gen6 (F800/F810/H600/H500/H400/A200/A2000), F200, F600, F900
- Front end network: Mellanox ConnectX-3 Pro, ConnectX-4 and ConnectX-5 network adapters that deliver 25/40/100 GigE speed.
1. Check your cluster network interfaces have ROCEv2 capability by running the following command and noting the interfaces that report ‘SUPPORTS_RDMA_RRoCE’. This check is only available on the CLI.
# isi network interfaces list -v
2. Create an IP pool that contains ROCEv2 capable network interface.
(CLI)
# isi network pools create --id=groupnet0.40g.40gpool1 --ifaces=1:40gige- 1,1:40gige-2,2:40gige-1,2:40gige-2,3:40gige-1,3:40gige-2,4:40gige-1,4:40gige-2 --ranges=172.16.200.129-172.16.200.136 --access-zone=System --nfsv3-rroce-only=true
(WebUI) Cluster management –> Network configuration
3. Enable NFSv3 over RDMA feature by running the following command.
(CLI)
# isi nfs settings global modify --nfsv3-enabled=true --nfsv3-rdma-enabled=true
(WebUI) Protocols –> UNIX sharing(NFS) –> Global settings
4. Enable OneFS cluster NFS service by running the following command.
(CLI)
# isi services nfs enable
(WebUI) See step 3
5. Create NFS export by running the following command. The –map-root-enabled=false is used to disable NFS export root-squash for testing purpose, which allows root user to access OneFS cluster data via NFS.
(CLI)
# isi nfs exports create --paths=/ifs/export_rdma --map-root-enabled=false
(WebUI) Protocols –> UNIX sharing (NFS) –> NFS exports
NFSv3 over RDMA client configuration
Note: As the client OS and Mellanox NICs may vary in your environment, you need to look for your client OS documentation and Mellanox documentation for the accurate and detailed configuration steps. This section only demonstrates an example configuration using our in-house lab equipment.
To use NFSv3 over RDMA service of OneFS cluster, your NFSv3 client hardware must meet requirements:
- RoCEv2 capable NICs: Mellanox ConnectX-3 Pro, ConnectX-4, ConnectX-5, and ConnectX-6
- NFS over RDMA Drivers: Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) or OS Distributed inbox driver. It is recommended to install Mellanox OFED driver to gain the best performance.
If you just want to have a functional test on the NFSv3 over RDMA feature, you can set up Soft-RoCE for your client.
Set up a RDMA capable client on physical machine
In the following steps, we are using the Dell PowerEdge R630 physical server with CentOS 7.9 and Mellanox ConnectX-3 Pro installed.
- Check OS version by running the following command:
# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
2. Check the network adapter model and information. From the output, we can find the ConnectX-3 Pro is installed, and the network interfaces are named 40gig1 and 40gig2.
# lspci | egrep -i --color 'network|ethernet'
01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
03:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
05:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
05:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
# lshw -class network -short
H/W path
==========================================================
/0/102/2/0 40gig1 network MT27520 Family [ConnectX-3 Pro]
/0/102/3/0 network 82599ES 10-Gigabit SFI/SFP+ Network Connection
/0/102/3/0.1 network 82599ES 10-Gigabit SFI/SFP+ Network Connection
/0/102/1c.4/0 1gig1 network I350 Gigabit Network Connection
/0/102/1c.4/0.1 1gig2 network I350 Gigabit Network Connection
/3 40gig2 network Ethernet interface
3. Find the suitable Mellanox OFED driver version from Mellanox website. As of MLNX_OFED v5.1, ConnectX-3 Pro are no longer supported and can be utilized through MLNX_OFED LTS version. See Figure 3. If you are using ConnectX-4 and above, you can use the latest Mellanox OFED version.
An important note: the NFSoRDMA module was removed from the Mellanox OFED 4.0-2.0.0.1 version, then it was added again in Mellanox OFED 4.7-3.2.9.0 version. Please refer to Release Notes Change Log History for the details.
4. Download the MLNX_OFED 4.9-2.2.4.0 driver for ConnectX-3 Pro to your client.
5. Extract the driver package, find the “mlnxofedinstall” script to install the driver. As of MLNX_OFED v4.7, NFSoRDMA driver is no longer installed by default. In order to install it over a supported kernel, add the “–with-nfsrdma” installation option to the “mlnxofedinstall” script. Firmware update is skipped in this example, please update it as needed.
# ./mlnxofedinstall --with-nfsrdma --without-fw-update
Logs dir: /tmp/MLNX_OFED_LINUX.19761.logs
General log file: /tmp/MLNX_OFED_LINUX.19761.logs/general.log
Verifying KMP rpms compatibility with target kernel...
This program will install the MLNX_OFED_LINUX package on your machine.
Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.
Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.
Do you want to continue?[y/N]:y
Uninstalling the previous version of MLNX_OFED_LINUX
rpm --nosignature -e --allmatches --nodeps mft
Starting MLNX_OFED_LINUX-4.9-2.2.4.0 installation ...
Installing mlnx-ofa_kernel RPM
Preparing... ########################################
Updating / installing...
mlnx-ofa_kernel-4.9-OFED.4.9.2.2.4.1.r########################################
Installing kmod-mlnx-ofa_kernel 4.9 RPM
...
Preparing... ########################################
mpitests_openmpi-3.2.20-e1a0676.49224 ########################################
Device (03:00.0):
03:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
Link Width: x8
PCI Link Speed: 8GT/s
Installation finished successfully.
Preparing... ################################# [100%]
Updating / installing...
1:mlnx-fw-updater-4.9-2.2.4.0 ################################# [100%]
Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf
Skipping FW update.
To load the new driver, run:
# /etc/init.d/openibd restart
6. Load the new driver by running the following command. Unload all module that is in use prompted by the command.
# /etc/init.d/openibd restart
Unloading HCA driver: [ OK ]
Loading HCA driver and Access Layer: [ OK ]<br>
7. Check the driver version to ensure the installation is successful.
# ethtool -i 40gig1
driver: mlx4_en
version: 4.9-2.2.4
firmware-version: 2.36.5080
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
8. Check the NFSoRDMA module is also installed. If you are using a driver downloaded from server vendor website (like Dell PowerEdge server) rather than Mellanox website, the NFSoRDMA module may not be included in the driver package. You must obtain the NFSoRDMA module from Mellanox driver package and install it.
# yum list installed | grep nfsrdma
kmod-mlnx-nfsrdma.x86_64 5.0-OFED.5.0.2.1.8.1.g5f67178.rhel7u8
9. Mount NFS export with RDMA protocol.
# mount -t nfs -vo nfsvers=3,proto=rdma,port=20049 172.16.200.29:/ifs/export_rdma /mnt/export_rdma
mount.nfs: timeout set for Tue Feb 16 21:47:16 2021
mount.nfs: trying text-based options 'nfsvers=3,proto=rdma,port=20049,addr=172.16.200.29'
Useful reference for Mellanox OFED documentation:
Set up Soft-RoCE client for functional test only
Soft-RoCE (also known as RXE) is a software implementation of RoCE that allows RoCE to run on any Ethernet network adapter whether it offers hardware acceleration or not. Soft-RoCE is released as part of upstream kernel 4.8 (or above). It is intended for users who wish to test RDMA on software over any 3rd party adapters.
In the following example configuration, we are using CentOS 7.9 virtual machine to configure Soft-RoCE. Since Red Hat Enterprise Linux 7.4, the Soft-RoCE driver is already merged into the kernel.
1. Install required software packages.
# yum install -y nfs-utils rdma-core libibverbs-utils
2. Start Soft-RoCE.
# rxe_cfg start
3. Get status, which will display ethernet interfaces
# rxe_cfg status
rdma_rxe module not loaded
Name Link Driver Speed NMTU IPv4_addr RDEV RMTU
ens33 yes e1000 1500 192.168.198.129
4. Verify RXE kernel module is loaded by running the following command, ensure that you see rdma_rxe in the list of modules.
# lsmod | grep rdma_rxe
rdma_rxe 114188 0
ip6_udp_tunnel 12755 1 rdma_rxe
udp_tunnel 14423 1 rdma_rxe
ib_core 255603 13 rdma_cm,ib_cm,iw_cm,rpcrdma,ib_srp,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_rxe,rdma_ucm,ib_ipoib,ib_isert
5. Create a new RXE device/interface by using rxe_cfg add <interface from rxe_cfg status>.
# rxe_cfg add ens33
6. Check status again, make sure the rxe0 was added under RDEV (rxe device)
# rxe_cfg status
Name Link Driver Speed NMTU IPv4_addr RDEV RMTU
ens33 yes e1000 1500 192.168.198.129 rxe0 1024 (3)
7. Mount NFS export with RDMA protocol.
# mount -t nfs -o nfsvers=3,proto=rdma,port=20049 172.16.200.29:/ifs/export_rdma /mnt/export_rdma
You can refer to Red Hat Enterprise Linux configuring Soft-RoCE for more details.