The final article in this series focuses on the Linux client-side configuration that’s required when connecting to a PowerScale via the NFS over RDMA protocol.
Note that there are certain client hardware prerequisites which must be met in order use NFSv3 over RDMA service on a PowerScale cluster. These include:
Prerequisite | Details |
RoCEv2 capable NICs | NVIDIA Mellanox ConnectX-3 Pro, ConnectX-4, ConnectX-5, and ConnectX-6 |
NFS over RDMA Drivers | NVIDIA Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) or OS Distributed inbox driver. For best performance, the recommendation is to install the OFED driver. |
Alternatively, if these hardware requirements cannot be met, basic NFS over RDMA functionality can be verified using a Soft-RoCE configuration on the client. However, Soft-RoCE should not be used in a production environment.
The following procedure can be used to configure a Linux client for NFS over RDMA:
The example below uses a Dell PowerEdge R630 server running CentOS 7.9 with an NVIDIA Mellanox ConnectX-3 Pro NIC as the NFS over RDMA client system.
- First, verify the OS version by running the following command:
# cat /etc/redhat-release CentOS Linux release 7.9.2009 (Core)
- Next, check the network adapter model and spec. The following example involves a ConnectX-3 Pro NIC with two interfaces: 40gig1 and 40gig2:
# lspci | egrep -i 'network|ethernet' 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 03:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro] 05:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 05:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) # lshw -class network -short H/W path Device Class Description ================================================= /0/100/15/0 ens160 network MT27710 Family [ConnectX-4 Lx Virtual Function] /0/102/2/0 40gig1 network MT27520 Family [ConnectX-3 Pro] /0/102/3/0 network 82599ES 10-Gigabit SFI/SFP+ Network Connection /0/102/3/0.1 network 82599ES 10-Gigabit SFI/SFP+ Network Connection /0/102/1c.4/0 1gig1 network I350 Gigabit Network Connection /0/102/1c.4/0.1 1gig2 network I350 Gigabit Network Connection /3 40gig2 network Ethernet interface
- Add the prerequisite RDMA packages (‘rdma-core’ and ‘libibverbs-utils’) for the Linux version using the appropriate package manager for the distribution:
Linux Distribution | Package Manager | Package Utility |
OpenSUSE | RPM | Zypper |
RHEL | RPM | Yum |
Ubuntu | Deb | Apt-get / Dpkg |
For example, to install both the above packages on a CentOS/RHEL client:
# sudo yum install rdma-core libibverbs-utils
- Locate and download the appropriate OFED driver version from the NVIDIA website. Be aware that, as of MLNX_OFED v5.1, ConnectX-3 Pro NICs are no longer supported. For ConnectX-4 and above, the latest OFED version will work.
Note that the NFSoRDMA module was removed from the OFED 4.0-2.0.0.1 version, then re-added in OFED 4.7-3.2.9.0 version. Please refer to Release Notes Change Log History for the details.
- Extract the driver package and use the ‘mlnxofedinstall’ script to install the driver. As of MLNX_OFED v4.7, NFSoRDMA driver is no longer installed by default. In order to install it on a Linux client with a supported kernel, include the ‘–with-nfsrdma’ option for the ‘mlnxofedinstall’ script. For example:
# ./mlnxofedinstall --with-nfsrdma --without-fw-update Logs dir: /tmp/MLNX_OFED_LINUX.19761.logs General log file: /tmp/MLNX_OFED_LINUX.19761.logs/general.log Verifying KMP rpms compatibility with target kernel... This program will install the MLNX_OFED_LINUX package on your machine. Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed. Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them. Do you want to continue?[y/N]:y Uninstalling the previous version of MLNX_OFED_LINUX rpm --nosignature -e --allmatches --nodeps mft Starting MLNX_OFED_LINUX-4.9-2.2.4.0 installation ... Installing mlnx-ofa_kernel RPM Preparing... ######################################## Updating / installing... mlnx-ofa_kernel-4.9-OFED.4.9.2.2.4.1.r######################################## Installing kmod-mlnx-ofa_kernel 4.9 RPM ... ... ... Preparing... ######################################## mpitests_openmpi-3.2.20-e1a0676.49224 ######################################## Device (03:00.0): 03:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro] Link Width: x8 PCI Link Speed: 8GT/s Installation finished successfully. Preparing... ################################# [100%] Updating / installing... :mlnx-fw-updater-4.9-2.2.4.0 ################################# [100%] Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf Skipping FW update.
- Load the new driver by restarting the ‘openibd’ driver.
# /etc/init.d/openibd restart Unloading HCA driver: Loading HCA driver and Access
- Check the driver version to ensure that the installation was successful.
# ethtool -i 40gig1 driver: mlx4_en version: 4.9-2.2.4 firmware-version: 2.36.5080 expansion-rom-version: bus-info: 0000:03:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes
- Verify that the NFSoRDMA module is also installed.
# yum list installed | grep nfsrdma kmod-mlnx-nfsrdma.x86_64 5.0-OFED.5.0.2.1.8.1.g5f67178.rhel7u8
Note that if using a vendor-supplied driver for the Linux client system (eg. Dell PowerEdge), the NFSoRDMA module may not be included in the driver package. If this is the case, download and install the NFSoRDMA module directly from the NVIDIA driver package, per the instructions in step 4 above.
- Finally, mount the desired NFS export(s) from the cluster with the appropriate version and RDMA options.
For example, for NFSv3 over RDMA:
# mount -t nfs -vo vers=3,proto=rdma,port=20049 myserver:/ifs/data /mnt/myserver
Similarly, to mount with NFSv4.0 over RDMA:
# mount –t nfs –o vers=4,minorvers=0,proto=rdma myserver:/ifs/data /mnt/myserver
And for NFSv4.1 over RDMA:
# mount –t nfs –o vers=4,minorvers=1,proto=rdma myserver:/ifs/data /mnt/myserver
For NFSv4.2 over RDMA:
# mount –t nfs –o vers=4,minorvers=2,proto=rdma myserver:/ifs/data /mnt/myserver
And finally for NFSv4.1 over RDMA across an IPv6 network:
# mount –t nfs –o vers=4,minorvers=1,proto=rdma6 myserver:/ifs/data /mnt/myserver
Note that RDMA is a non-assumable mount option, safeguarding any existing NFSv3 clients. For example:
# mount –t nfs –o vers=3,proto=rdma myserver:/ifs/data /mnt/myserver
The above mount cannot automatically ‘upgrade’ itself to NFSv4, nor can an NFSv4 connection upgrade itself from TCP to RDMA.
Performance-wise, NFS over RDMA can deliver impressive results. That said, RDMA is not for everything. For highly concurrently workloads with high thread and/or connection counts, other cluster resource bottlenecks may be encountered first, so RDMA often won’t provide much benefit over TCP. However, for workloads like high bandwidth streams, NFS over RDMA can often provide significant benefits.
For example, in media content creation and post-production, RDMA can enable workflows that TCP-based NFS is unable to sustain. Specifically, Dell’s M&E solutions architects determined that:
- With FileStream on PowerScale F600 nodes, RDMA doubled performance compared to TCP. 8K DCI DPX image sequence playback, 24 frames per second 6K PIZ compressed EXR image sequence playback, 24 frames per second 4K DCI DPX image sequence playback, 60 frames per second Conclusions 14 PowerScale OneFS: NFS over RDMA for Media
- Using Autodesk Flame 2022 with 59.94 frames per second 4K DCI video, the number of dropped frames from the broadcast output was reduced from 6000 with TCP to 11 with RDMA.
- Using DaVinci Resolve 16 with RDMA enabled workstations to play uncompressed 8K DCI, PIZ compressed 6K, and 60 frames per second 4K DCI content. None of this media would play using NFS over TCP.
In such cases, often the reduction in the NFS client’s CPU load that RDMA offers is equally importantly. Even when the PowerScale cluster can easily support a workload, freeing up the workstation’s compute resources is vital to sustain smooth playback.