OneFS Front-end Infiniband Configuration

In the previous article in this series, we examined the ‘what’ and ‘why’ of front-end Infiniband on a PowerScale cluster. Now we turn our attention to the ‘how’ – i.e. the configuration and management of front-end IB.

The networking portion of the OneFS WebUI has seen some changes in 9.10, and cluster admins now have the flexibility to create either Ethernet or InfiniBand (IB) subnets. Depending on the choice, the interface list, pool and rule details automatically adjust to match the selected link layer type. This means that if Infiniband (IB) is selected, the interface list and pool details will update to reflect settings specific to IB, including a new ‘green pill’ icon to indicate the presence of IB subnets and pools on the external network table. For example:

Similarly, the subnet1 view from the CLI, with the ‘interconnect’ field indicating ‘Infiniband’:

# isi network subnets view subnet1
              ID: groupnet0.subnet1
            Name: subnet1
        Groupnet: groupnet0
           Pools: pool-infiniband
     Addr Family: ipv4
       Base Addr: 10.205.228.0
            CIDR: 10.205.228.0/23
     Description: Initial subnet
         Gateway: 10.205.228.1
Gateway Priority: 10
    Interconnect: Infiniband
             MTU: 2044
       Prefixlen: 24
         Netmask: 255.255.254.0
SC Service Addrs: 1.2.3.4
 SC Service Name: cluster.tme.isilon.com
    VLAN Enabled: False
         VLAN ID: -

Alternatively, if Ethernet is chosen, the relevant subnet, pool, and rule options for that topology are displayed.

This dynamic adjustment ensures that only the relevant options and settings for the configured network type are displayed, making the configuration process more intuitive and streamlined.

For example, to create an IB subnet under Cluster management > Network configuration > External network > Create subnet:

Or from the CLI:

# isi network subnets create groupnet0.subnet1 ipv4 255.255.254.0 --gateway --gateway-priority 10 10.205.228.1 --linklayer infiniband

Similarly, editing an Infiniband subnet:

Note that an MTU configuration option is not available when configuring an Infiniband subnet. Also, the WebUI displays a banner warning that NFS over Infiniband will operate at a reduced speed if NFS over RDMA has not already been enabled.

In contrast, editing an Ethernet subnet provides the familiar MTU frame-size configuration options:

A font-end network IP pool can be easily created under a subnet. For example from the CLI, using the ‘<groupnet>.<subnet>.<pool>’ notation:

# isi network pools create groupnet0.infiniband1.ibpool1

Or via the WebUI:

Adding an Infiniband subnet is permitted on any cluster, regardless of its network configuration. However, the above messages will be displayed if attempting to create a pool under an Infiniband subnet on a cluster or node without any configured front-end IB interfaces.

From the CLI, the ‘isi_hw_status’ utility can be used to easily verify a node’s front and back-end networking link layer types. For example, take the following F710 configuration:

The ‘isi_hw_status’ CLI command output also confirms the front-end network ‘FEType’ parameter, in this case as ‘Infiniband’:

# isi_hw_status
  SerNo: FD7LRY3
 Config: PowerScale F710
ChsSerN: FD7LRY3
ChsSlot: n/a
FamCode: F
ChsCode: 1U
GenCode: 10
PrfCode: 7
   Tier: 16
  Class: storage
 Series: n/a
Product: F710-1U-Dual-512GB-2x1GE-2x100GE QSFP28-2x200GE QSFP56-38TB SSD
  HWGen: PSI
Chassis: POWEREDGE (Dell PowerEdge)
    CPU: GenuineIntel (2.60GHz, stepping 0x000806f8)
   PROC: Dual-proc, 24-HT-core
    RAM: 549739036672 Bytes
   Mobo: 071PXR (PowerScale F710)
  NVRam: NVDIMM (SDPM VOSS Module) (8192MB card) (size 8589934592B)
 DskCtl: NONE (No disk controller) (0 ports)
 DskExp: None (No disk expander)
PwrSupl: PS1 (type=AC, fw=00.1D.9C)
PwrSupl: PS2 (type=AC, fw=00.1D,9C)
  NetIF: bge0,bge1,lagg0,mce0,mce1,mce2,mce3,mce4
 BEType: Infiniband
 FEType: 200GigE
 LCDver: IsiVFD3 (Isilon VFD V3)
 Midpln: NONE (No FCB Support)
Power Supplies OK

In contrast, the back-end network on this F710 is 200Gb Ethernet, as reported by the ‘BEType’ parameter.

From the node cabling perspective, the interface assignments on the rear of the F710 are as follows:

Additionally, the ‘mlxfwmanager’ CLI utility can be helpful for gleaning considerably more detail on a node’s NICs, including firmware versions, MAC address, GUID, part number, etc. For example:

# mlxfwmanager

Querying Mellanox devices firmware ...

Device #1:
----------
  Device Type:      ConnectX6
  Part Number:      ORRM24_Ax
  Description:      Nvidia ConnectX-6 VPI adapter card; HDR IB (200Gb/s) and 200GbE; dual-port QSFP56; PCIe4.0 x16
  PSID:             DEL0000000052
  PCI Device Name:  pci0:13:0:0
  Base MAC:         59a2e18dfdac
  Versions:         Current        Available
     FW             20.39.1002     N/A
     PXE            3.7.0201       N/A
     UEFI           14.32.0012     N/A
  Status:           No matching image found

Device #2:
----------
  Device Type:      ConnectX6DX
  Part Number:      OF6FXM_08P2T2_Ax
  Description:      Mellanox ConnectX-6 Dual Port 100 GbE QSFP56 Network Adapter
  PSID:             DEL0000000027
  PCI Device Name:  pci0:139:0:0
  Base GUID:        e8ebd30300060684
  Base MAC:         e8ebd3060684
  Versions:         Current        Available
     FW             22.36.1010     N/A
     PXE            3.6.0901       N/A
     UEFI           14.29.0014     N/A
  Status:           No matching image found

Device #3:
----------
  Device Type:      ConnectX6
  Part Number:      ORRM24_Ax
  Description:      Nvidia ConnectX-6 VPI adapter card; HDR IB (200Gb/s) and 200GbE; dual-port QSFP56; PCIe4.0 x16
  PSID:             DEL0000000052
  PCI Device Name:  pci0:181:0:0
  Base MAC:         a088c2ec499e
  Base GUID:        a088c20300ec499a
  Versions:         Current        Available
     FW             22.39.1002     N/A
     PXE            3.7.0201       N/A
     UEFI           14.32.0012     N/A
  Status:           No matching image found

In the example above, ‘Device #1’ is the back-end NIC, ‘Device #2’ is the 100Gb Ethernet ConnectX6 DX NIC in the PCIe4 slot, and ‘Device #3’ is the front-end Infiniband ConnectX6 VPI NIC in the primary PCIe5 slot.

There are a couple of caveats to be aware of when using front-end Infiniband on F710 and F910 node pools:

  • Upon upgrade to OneFS 9.10, any front-end Infiniband interfaces will only be enabled once the new release is committed.
  • Network pools created within Infiniband subnets will have their default ‘aggregation mode’ set to ‘unset’. Furthermore, this parameter will not be modifiable.
  • Since VLANs are not supported on Infiniband, OneFS includes validation logic to prevent this.