OneFS QoS and DSCP Tagging – Configuration and Management

As we saw in the previous article in this series, OneFS 9.9 introduces support for DSCP marking, and the configuration is cluster-wide, and based on the class of network traffic. This is performed by the OneFS firewall, which inspects outgoing network traffic on the front-end ports and assigns it to the appropriate QoS class based on a set of DSCP tagging rules:

Configuration-wise, DSCP requires OneFS 9.9 or later, and is disabled by default – both for new installations and legacy cluster upgrades. The QoS feature can be configured through the CLI, WebUI, and pAPI endpoints. And for clusters that are upgrading to OneFS 9.9, the release must be committed before DSCP configuration can proceed.

Before enabling DSCP tagging, verify the current firewall and DSCP settings:

# isi network firewall settings view

Enabled: True

DSCP Enabled: False

Update these as required, remembering that both the firewall and DSCP must be running in order for QoS tagging to work. DSCP is off by default, but can be easily started with the following CLI syntax:

# isi network firewall settings modify dscp-enabled true

The OneFS DCSP implementation includes four default tagging rules:

Class  Traffic   Default DSCP Value  Source Ports  Destination Ports 
Transactional File Access and Sharing Protocols:

NFS, FTP, HTTPS data, HDFS, S3, RoCE

Security and Authentication Protocols:

Kerberos, LDAP, LSASS, DCE/RPC

RPC and Inter-Process Communication Protocols:

rpc.bind, mountd, statd, lockd, quotd, mgmntd

Naming Services Protocols: NetBIOS, Microsoft-DS

18 20, 21, 80, 88, 111, 135 137, 138, 139, 300, 302, 304, 305, 306, 389, 443, 445, 585, 636, 989, 990, 2049, 3268, 3269, 8020, 8082, 8440, 8441, 8443, 9020, 9021 Not defined by default, but administrator may configure.
Network Management WebUI, SSH, SMTP, syslog, DNS, NTP, SNMP, Perf collector, CEE, alerts 16 22, 25, 53, 123, 161, 162, 514, 6514, 6567, 8080, 9443, 12228 Not defined by default, but administrator may configure.
Bulk Data SmartSync, SyncIQ, NDMP 10 2097, 2098, 3148, 3149, 5667, 5668, 7722, 8470, 10000 Not defined by default, but administrator may configure.
Catch-All All other traffic that does not match any of the above 0 all Not defined by default, but administrator may configure.

The ‘isi network firewall dscp list’ command can be used to view all of a cluster’s DSCP firewall rules. For example:

# isi network firewall dscp list
DSCP Rules in Priority Order From High To Low:
ID                      Description                      DSCP Value  Src Ports  Dst Ports
------------------------------------------------------------------------------------------
rule_transactional_data DSCP Rule for transactional data 18          20         -
                                                                     21
                                                                     80
                                                                     88
                                                                    111
                                                                    135
                                                                    137
                                                                    138
                                                                    139
                                                                    300
                                                                    302
                                                                    304
                                                                    305
                                                                    306
                                                                    389
                                                                    443
                                                                    445
                                                                    585
                                                                    636
                                                                    989
                                                                    990
                                                                   2049
                                                                   3268
                                                                   3269
                                                                   8020
                                                                   8082
                                                                   8440
                                                                   8441
                                                                   8443
                                                                   9020
                                                                   9021
                                                                  20049

rule_network_management DSCP Rule for network management 16          22         -
                                                                     25
                                                                     53
                                                                    123
                                                                    161
                                                                    162
                                                                    514
                                                                   6514
                                                                   6567
                                                                   8080
                                                                   9443
                                                                  12228

rule_bulk_data          DSCP Rule for bulk data          10          2097       -
                                                                   2098
                                                                   3148
                                                                   3149
                                                                   5667
                                                                   5668
                                                                   7722
                                                                   8470
                                                                  10000

rule_best_effort        DSCP Rule for best effort        0           all        all
------------------------------------------------------------------------------------------
Total: 4

If desired, the ‘isi network firewall dscp modify’, followed by the appropriate rule name, can be used to modify a rule’s associated DSCP value, source ports, or destination ports. For example:

# isi network firewall dscp modify rule_transactional_data –src-port 123 –dst-ports 456 –dscp-value 10

Note that a ‘–live’ option is also available to effect the changes immediately on active rules. If the –live option is used when DSCP is inactive, the command is automatically rejected.

If needed, all of the DSCP configuration can be easily reset to its OneFS defaults and DSCP disabled as follows:

# isi network firewall reset-dscp-setting

This command will reset the global firewall DSCP setting to the original system defaults. Are you sure you want to continue? (yes/[no]): yes

GUI-wise, DSCP has a new ‘settings’ tab under the WebUI’s firewall section for managing its operation and configuration, and editing the rules:

Again, although the DSCP feature can be configured and enabled with the firewall itself still disabled, DSCP will only activate once the firewall is up and running too.

The WebUI allows modification of a rule’s associated DSCP value, source ports, or destination ports. For example:

Like the CLI, the WebUI also has a ‘Reset Default Settings’ option which clears all the current DSCP configuration parameters and resets them to the OneFS defaults:

Also, there’s a comprehensive set of RESTful platform API endpoints, which include:

  • GET/PUT platform/network/firewall/settings
  • POST platform/network/firewall/reset-dscp-setting?live=true
  • GET platform/network/firewall/dscp
  • PUT platform/network/firewall/dscp/<rule_name>?live=true

All DSCP’s configuration data is stored in gconfig at the cluster level, and all the firewall daemon instances across the nodes work as peers. So if it becomes necessary to troubleshooting QoS and tagging, the following logs and utilities are a great place to start.

  • /var/log/isi_firewall_d.log, which includes information from the Firewall daemon.
  • /var/log/isi_papi_d.log, which covers all the command handlers, including the firewall and DSCP related ones.
  • ‘isi_gconfig -t firewall’ utility, which returns all the firewall’s configuration info.
  • ‘ipfw show’ command, which dumps the kernel’s ipfw table.

Also note that all these logs and command outputs are included in a standard isi_gather_info log collection.

OneFS QoS and DSCP Tagging

As more applications contend for shared network links with finite bandwidth, ensuring Quality of Service (QoS) becomes more critical. Each application or workload can have varying QoS requirements to deliver not only service availability, but also an optimal client experience. Associating each app with an appropriate QoS marking helps provide some traffic policing, by allowing certain packets to be prioritized across a shared network, all while meeting SLAs.

QoS can be implemented using a variety of methods, but the most common is through a Differentiated Services Code Point, or DSCP, which specifies a value in the packet header that maps to a traffic effort level.

OneFS 9.9 introduces support for DSCP marking, and the configuration is cluster-wide, and based on the class of network traffic. Once configured, OneFS inserts the DSCP marking in the Traffic Class or Type of Service fields of the IP packet header, and away you go.

The pertinent part of each IPv4 and IPv6 packet header is as follows:

OneFS QoS tagging separates network traffic into four default classes, each with an associated DSCP value, plus configurable source and destination ports. The four classes OneFS provides are ‘transactional’, ‘network management’, ‘bulk data’, and ‘catch all’:

Class  Traffic   Default DSCP Value  Source Ports  Destination Ports 
Transactional File Access and Sharing Protocols:

NFS, FTP, HTTPS data, HDFS, S3, RoCE

Security and Authentication Protocols:

Kerberos, LDAP, LSASS, DCE/RPC

RPC and Inter-Process Communication Protocols:

rpc.bind, mountd, statd, lockd, quotd, mgmntd

Naming Services Protocols: NetBIOS, Microsoft-DS

18 20, 21, 80, 88, 111, 135 137, 138, 139, 300, 302, 304, 305, 306, 389, 443, 445, 585, 636, 989, 990, 2049, 3268, 3269, 8020, 8082, 8440, 8441, 8443, 9020, 9021 Not defined by default, but administrator may configure.
Network Management WebUI, SSH, SMTP, syslog, DNS, NTP, SNMP, Perf collector, CEE, alerts 16 22, 25, 53, 123, 161, 162, 514, 6514, 6567, 8080, 9443, 12228 Not defined by default, but administrator may configure.
Bulk Data SmartSync, SyncIQ, NDMP 10 2097, 2098, 3148, 3149, 5667, 5668, 7722, 8470, 10000 Not defined by default, but administrator may configure.
Catch-All All other traffic that does not match any of the above 0 all Not defined by default, but administrator may configure.

The default DSCP feature values for each were specifically chosen to meet US government requirements and satisfy the Fed APL needs. While destination ports are undefined in the classes by default, cluster admins can customize the DSCP values, source ports, and destination ports per site requirements.

Under the hood, QoS tagging is built upon the OneFS firewall (ipfw):

As such, QoS tagging is only functional when both the firewall and the DSCP features are enabled.

The firewall inspects outgoing network traffic on the front-end ports and assigns it to the appropriate QoS class. The outbound IP packets are matched to the cluster’s four DSCP rules, one by one, from top to bottom, using the source ports, and destination ports too, if configured.

When a good match is found, the Firewall engine marks the packets’ DSCP bits as specified by that rule. If no match is found, the last ‘Best Effort’ rule will catch all outgoing IP packets which are unmatched with the other 3 DSCP rules.

The firewall assigns the DSCP value based on the QoS class, and the DSCP configuration and values are cluster wide and preserved across upgrades.

Note though, that this DSCP feature does not allow the creation of any additional or custom DSCP rules currently. Additionally, DSCP tagging is disabled by default in both STIG hardening and compliance modes.

Also, consider that in order to provide QoS, the firewall has to inspect and filter the outgoing packets, which obviously comes with a performance cost. Although this overhead should be fairly minimal, the recommendation is to test DSCP tagging in a lab environment first, to confirm workloads are not significantly impacted, before letting it loose on a production cluster.

In the next article in this series, we’ll look at the DSCP configuration and management, plus some basic troubleshooting tools.

OneFS Namespace API (RAN) – Advanced Requests and Troubleshooting

A cluster’s files and directories can be accessed programmatically, and controlled by filesystem permissions, through the OneFS RESTful Access to Namespace (RAN) API, similarly to the way they’re accessed through the core NAS protocols such as NFS and SMB.

Within the RAN namespace, the following system attributes are common to directories and files:

Attribute Description Type
name Specifies the name of the object. String
size Specifies the size of the object in bytes. Integer
block_size Specifies the block size of the object. Integer
blocks Specifies the number of blocks that compose the object. Integer
last_modified Specifies the time when the object data was last modified in HTTP date/time format. HTTP date
create_time Specifies the date when the object data was created in HTTP date/time format. HTTP date
access_time Specifies the date when the object was last accessed in HTTP date/time format. HTTP date
change_time Specifies the date when the object was last changed (including data and metadata changes) in HTTP date/time format. String
type Specifies the object type, which can be one of the following values: container, object, pipe, character_device, block_device, symbolic_link, socket, or whiteout_file. String
mtime_val Specifies the time when the object data was last modified in UNIX Epoch format. Integer
btime_val Specifies the time when the object data was created in UNIX Epoch format. Integer
atime_val Specifies the time when the object was last accessed in UNIX Epoch format. Integer
ctime_val Specifies the time when the object was last changed (including data and metadata changes) in UNIX Epoch format. Integer
owner Specifies the user name for the owner of the object. String
group Specifies the group name for the owner of the object. String
uid Specifies the UID for the owner. Integer
gid Specifies the GID for the owner. Integer
mode Specifies the UNIX mode octal number. String
id Specifies the object ID, which is also the INODE number. Integer
nlink Specifies the number of hard links to the object. Integer
is_hidden Specifies whether the file is hidden or not. Boolean

In response, the following response headers may be returned when sending a request to RAN.

Attribute Description Type
Content-length Provides the length of the body message in the response. Integer
Connection Provides the state of connection to the server. String
Date Provides the date when the object store last responded. HTTP-date
Server Provides platform and version information about the server that responded to the request. String
x-isi-ifs-target-type Provides the resource type. This value can be a container or an object. String

For diagnostic and troubleshooting purposes, failed requests to the namespace can often be resolved via common error codes and viewing activity logs. Activity logs capture server and object activity and can help identify problems. The following table shows the location of different types of activity logs.

Log Location
Server logs
  • /var/log/<server>/webui_httpd_error.log

·         /var/log/<server>/webui_httpd_access.log

Object Daemon Log ·         /var/log/isi_object_d.log
Generic Log ·         /var/log/message

For <server> above, the path to the server directory should be used. For example: /apache2.

The common JSON error is returned in the following format:

{
"errors":[
{
"code":"<Error code>",
"message":"<some detailed error msg>"
}
]
}

The following table includes the common error codes, plus their status and description:

Error Code Description HTTP status
AEC_TRANSIENT The specified request returned a transient error code that is treated as OK. 200 OK
AEC_BAD_REQUEST The specified request returned a bad request error. 400 Bad Request
AEC_ARG_REQUIRED The specified request requires an argument for the operation. 400 Bad Request
AEC_ARG_SINGLE_ONLY The specified request requires only a single argument for the operation. 400 Bad Request
AEC_UNAUTHORIZED The specified request requires user authentication. 401 Unauthorized
AEC_FORBIDDEN The specified request was denied by the server. Typically, this response includes permission errors on OneFS. 403 Forbidden
AEC_NOT_FOUND The specified request has a target object that was not found. 404 Not Found
AEC_METHOD_NOT_ALLOWED The specified request sent a method that is not allowed for the target object. 405 Method Not Allowed
AEC_NOT_ACCEPTABLE The specified request is unacceptable. 406 Not Acceptable
AEC_CONFLICT The specified request has a conflict that prevents the operation from completing. 409 Conflict
AEC_PRE_CONDITION_FAILED The specified request has failed a precondition. 412 Precondition failed
AEC_INVALID_REQUEST_RANGE The specified request has requested a range that cannot be satisfied. 416 Requested Range not Satisfiable
AEC_NOT_MODIFIED The specified request was not modified. 304 Not Modified
AEC_LIMIT_EXCEEDED The specified request exceeded the limit set on the server side. 403 Forbidden
AEC_INVALID_LICENSE The specified request has an invalid license. 403 Forbidden
AEC_NAMETOOLONG The specified request has an object name size that is too long. 403 Forbidden
AEC_SYSTEM_INTERNAL_ERROR The specified request has failed because the server encountered an unexpected condition. 500 Internal Server Error

For example, an invalid copy source path yields the ‘AEC_BAD_REQUEST’ code:

# curl -X PUT --insecure --basic --user <name>:<passwd> --header "clone=true" --header "x-isi-ifs-copy-source:/namespace/ifs/data-other/testfile1/" https://10.1.10.20:8080/namespace/ifs/data/testfile1/
{
"errors" :
[
{
"code" : "AEC_BAD_REQUEST",
"message" : "Unable to open object '/data-other/testfile1/' in store 'ifs' -- a component of the path is not a directory."
}
]
}

When crafting straightforward HTTP requests to RAN, such as create a file (object), the ‘curl’ CLI utility can be a useful asset:

# curl -X PUT --insecure --basic --user <username>:<passwd> -H "x-isi-ifs-target-type:object" https://<cluster_ip>:8080/namespace/<path>/<file>/

For example, to create ‘file1’ under ‘/ifs/data’:

# curl -X PUT --insecure --basic --user <username>:<passwd> -H "x-isi-ifs-target-type:object" https://10.1.10.20:8080/namespace/ifs/data/file1/

# ls -lsia /ifs/data/file1

6668484639 64 -rw-------     1 root  wheel  0 Aug 28 00:58 /ifs/data/file1

And to read the contents of the file via RAN:

# echo "This is file1" > /ifs/data/file1

# curl -X GET --insecure --basic --user <username>:<passwd> https://10.1.10.20:8080/namespace/ifs/data/file1

This is file1

However, ‘curl’ and its ‘-H’ header option can quickly get unwieldy for more complex HTML requests, such as setting ACLs and configuring SmartLock immutability via RAN. As such, more versatile dev tools and/or scripting languages may be a better alternative in these cases. Plus, familiarity with HTTP/1.1 and experience writing HTTP-based client utilities is of considerable help when implementing RAN endpoints in production environments.

Next, are a couple of examples of more complex HTTP requests to RAN.

Set the ACL on a file

In the first instance, the following request syntax can be used to configure the access control list (ACL) of a file:

PUT /namespace/<access_point>/<container_path>/<file_name>?acl HTTP/1.1
Host: <hostname>[:<port>]
Content-Length: <length>
Date: <date>
Authorization: <signature>
x-isi-ifs-target-type: object
Content-Type: application/json

{
"owner":{
"id":"<owner id>",
"name":"<owner name>",
"type":"<type>"
},
"group":{
"id":"<group id>",
"name":"<group name>",
"type":"<type>"
},
"authoritative":"acl"|"mode",
"mode":"<POSIX mode>",
"action":"<action_value>",
"acl":[
{
"trustee":{
"id":"<trustee id>",
"name":"<trustee name>",
"type":"<trustee type>"
},
"accesstype":"allow"|"deny",
"accessrights":"<accessrights_list>",
"op":"<operation_value>"
}
]
}

The ACL endpoint parameters for RAN include:

Parameter Description
acl The acl argument must be placed at the first position of the argument list in the URI.
owner Specifies the JSON object for the owner persona. You should only specify the owner or group persona if you want to change the owner or group of the target.
group Specifies the JSON object for the group persona of the owner. You should only specify the owner or group persona if you want to change the owner or group of the target.
authoritative The authoritative field is mandatory and can take the value of either acl or mode.

acl: You can modify the owner, group personas, or access rights for the file by setting the authoritative field to acl and by setting <action_value> to update. When the authoritative field is set to acl, access rights are set for the file from the acl structure. Any value that is specified for the mode parameter is ignored.

Note: When the authoritative field is set to acl, the default value for the <action_value> field is replace. If the <action_value> field is set to replace, the system replaces the existing access rights of the file with the access rights that are specified in the acl structure. If the acl structure is empty, the existing access rights are deleted and default access rights are provided by the system. The default access rights for files are read access control list (‘std_read_dac’) and write access control list (‘std_write_dac’) for the owner.

mode: You can modify the owner and group personas by setting the authoritative field to mode. When the authoritative field is set to mode, POSIX permissions are set on the file. The <action_value> field and acl structure are ignored. If mode is set on a file that already has access rights or if access rights are set on a file that already has POSIX permissions set, the result of the operation varies based on the Global ACL Policy.

mode Specifies the POSIX mode as an cctal number string. By default, these are 0700 for directories and 0600 for files.
action The <action_value> field is applied when the authoritative field is set to acl. You can set the <action_value> field to either update or replace. The default value is replace.

When set to update, the existing access control list of the file is modified with the access control entries that are specified in the acl structure of the JSON body.

When set to replace, the entire access control list is deleted and replaced with the access control entries that are specified in the acl structure of the JSON body.

Also, when set to replace, the acl structure is optional. If the acl structure is left empty, the entire access control list is deleted and replaced with the system set default access rights. The default access rights for files are read access control list (‘ std_read_dac’) and write access control list (‘ std_write_dac’) for the owner.

acl Specifies the JSON array of access rights.
accesstype Can be set to allow or deny.

allow: Allows access to the file based on the access rights set for the trustee.

deny: Denies access to the file based on the access rights set for the trustee.

accessrights Specifies the access right values that are defined for the file.
inherit_flags Specifies the inherit flag values for the file.
op The <operation_value> field is applied when the <action_value> field is set to update. You can set the <operation_value> field to add, replace, or delete. If no <operation_value> field is specified, the default value is add.

add: Creates an access control entry (ACE) if an ACE is not already present for a trustee and trustee access type. If an entry is already present for that trustee and trustee access type, this operation appends the access rights list to the current ACE for that trustee and trustee access type.

delete: Removes the access rights list provided from the existing ACE for a trustee and trustee access type. If the input access rights list is empty , the entire ACE that corresponds to the trustee and trustee access type is deleted.

replace: Replaces the entire ACE for the trustee and trustee access type with the input access rights list.

The following HTTP ‘put’ syntax can be used to set the ACL of a file, in this case ‘file1’.

PUT /namespace/ifs/dir1/dir2/ns/file1?acl HTTP/1.1
Host: my_cluster:8080
Content-Length: <length>
Date: Tue, 22 May 2024 12:00:00 GMT
Authorization: <signature>
Content-Type: application/json

{
"owner":{
"id":"UID:0",
"name":"root",
"type":"user"
},
"group":{
"id":"GID:0",
"name”:"wheel",
"type":"group"
},
"authoritative":"acl",
"action":"update",
"acl": [
{
"trustee":{
"id":"UID:0",
"name":"root",
"type":"user"
},
"accesstype":"allow",
"accessrights":[
"file_read",
"file_write"
],
"op":"add"
},
{
"trustee":{
"id":"GID:1201",
"name":"group12",
"type":"group"
},
"accesstype":"allow",
"accessrights":"std_write_dac"
],
"op":"replace"
}
]
}

And the corresponding successful response from RAN is along the following lines:

HTTP/1.1 200 OK

Date: Tue, 22 May 2024 12:00:00 GMT

Content-Length: <length>

Connection: close

Server: Apache2/2.2.19

 

Set the retention period and commit a file in a SmartLock directory

Similarly, the following request syntax can be used to set the retention period and commits a file in a SmartLock directory.

PUT /namespace/<access_point>/<WORM_directory>/<file_name>?worm HTTP/1.1
Host: <hostname>[:<port>]
Date: <date>
Authorization: <signature>

{
"worm_retention_date":<"YYYY-MM-DD hh:mm:ss GMT">,
"commit_to_worm":<Boolean>
}

Note that if a file is not explicitly committed when an autocommit time period is configured for the SmartLock directory where the file resides, the file is automatically committed when the autocommit period elapses.

If the file is committed without setting a retention expiration date, the default retention period that is specified for the SmartLock directory where the file resides is applied. The retention date on the file can also be limited by the maximum retention period set on the SmartLock directory.

The pertinent WORM endpoint parameters in RAN include:

Parameter Description
Parameter Description
worm The worm argument must be placed at the first position of the argument list in the URI.
worm_committed Indicates whether the file was committed to the WORM state.
worm_retention_date Provides the retention expiration date in Coordinated Universal Time (such as UTC/GMT). If a value is not specified, the field has a null value.
worm_retention_date_val Provides the retention expiration date in seconds from UNIX Epoch or UTC.
worm_override_retention_date Provides the override retention date that is set on the SmartLock directory where the file resides. If the date is not set or is earlier than or equal to the existing file retention date, this field has a null value. Otherwise, the date is expressed in UTC/GMT, and is the retention expiration date for the file if the worm_committed parameter is also set to true.
worm_override_retention_date_val Provides the override retention date that is set on the SmartLock directory where the file resides. If the date is not set or if the date is set to earlier than or equal to the file retention date, this field has a null value. Otherwise, the date is expressed in seconds from UNIX Epoch and UTC, and is the retention expiration date set for the file if the worm_committed parameter is set to true. This parameter is the same as worm_override_retention_date, but is expressed in seconds from the Epoch or UTC.

For example, the following request will set the retention date for a ‘file1’ in the SmartLock directory ‘dir1’ to 25th December 2024:

PUT /namespace/ifs/dir1/file1?worm HTTP/1.1
Host: my_cluster:8080
Date: Wed, 25 Dec 2024 12:00:00 GMT
Authorization: <signature>

{
"worm_retention_date":"2024-12-25 12:00:00 GMT",
"commit_to_worm":true
}

And the corresponding successful response:

HTTP/1.1 200 OK
Date: Tue, 25 Dec 2024 12:00:00 GMT
Content-Length: 0
Connection: close
Server: Apache2/2.2.19

OneFS Namespace API (RAN) – Part 2

As we saw in the previous article in this series, a cluster’s files and directories can be accessed programmatically through the OneFS RESTful Access to Namespace (RAN) API, similarly to the way they’re accessed through SMB or NFS protocols – as well as controlled by filesystem permissions.

Under the hood, the general architecture and workflow of the OneFS RAN namespace API is as follows:

Upon receiving an HTTP request sent through the OneFS API, the cluster’s web server (Apache) verifies the username and password credentials – either through HTTP Basic Authentication for single requests or via an established session to a single node for multiple requests.

Once the user has been successfully authenticated, OneFS role-based access control (RBAC) then verifies the privileges associated with the account and, if sufficient, enables access to either the /ifs file system, or to the cluster configuration, as specified in the request URL.

The request URL that calls the API is comprised of a base URL and end-point, with the ‘namespace’ argument denoting the RAN API. For example:

And the GET request response to a <path><object> endpoint typically yields the object’s payload. For example, the ASCII contents of the ‘file1’, in this case:

Or from the CLI with ‘curl’:

# curl -X GET https://10.1.10.20:8080/namespace/ifs/data/dir1/file2 --insecure --basic --user <user>:<passwd>

Test file for RAN access...

If the object is unavailable, a response similar to the following is displayed:

As we saw in the previous article in this series, RAN supports the following types of file system operations:

Operation Action Description
Access points CREATE, DELETE Identify and configure access points (shares) and obtain protocol information.
Directory CREATE, GET, PUT, LIST, DELETE List directory content.; get and set directory attributes; delete directories from the file system.
File CREATE, GET, PUT, LIST, DELETE View, move, copy, and delete files from the file system.
Access control GET/SET ACLs Manage user rights; set ACL or POSIX permissions for files and directories. Set access list on access points (RAN Share Permissions).
Query QUERY

 

Search system metadata or extended attributes, and tag files.
SmartLock GET, SET, Commit Allow retention dates to be set on files; commit files to a WORM state.

In support of these, RAN allows pre-defined keywords to be appended to the URL when sending a namespace request. These keywords must be placed first in the argument list and must not contain any value. If these keywords are placed in any other position in the argument list, the keywords are ignored. Pre-defined keywords include: ‘acl’, ‘metadata’, ‘worm’, and ‘query’.

For example:

https://<cluster_ip>:8080/namespace/ifs/data/dir1?acl

When using the ‘curl’ CLI utility, the following syntax options can be useful for crafting PUT or POST requests to RAN:

  1. When sending form data:
# curl -X PUT -H "Content-Type: multipart/form-data;" -F "key1=val1" "YOUR_URI"
  1. If sending raw data as json:
# curl -X PUT -H "Content-Type: application/json" -d '{"key1":"value"}' "YOUR_URI"
  1. When sending a file with a POST request:
# curl -X POST "YOUR_URI" -F 'file=@/file-path.csv'

Where:

-X – option can be used for request command,

-d – option can be used in order to put data on remote URL.

-H – header option can express the content type.

-v – Plus the verbose option, which is handy for debugging..

When sending a request to RAN, data can be accessed through customized headers, in addition to the standard HTTP headers. The common RAN HTTP request headers include:

Name Description Type Required
Authorization Specifies the authentication signature. String Yes
Content-length Specifies the length of the message body. Integer Conditional
Date Specifies the current date according to the requestor. HTTP-date No. A client should only send a Date header in a request that includes an entity-body, such as in PUT and POST requests. A client without a clock must not send a Date header in a request.
x-isi-ifs-spec-version Specifies the protocol specification version. The client specifies the protocol version, and the server determines if the protocol version is supported. You can test backwards compatibility with this header. String Conditional
x-isi-ifs-target-type Specifies the resource type. For PUT operations, this value can be container or object. For GET operations, this value can be container, object, or any, or this parameter can be omitted. String Yes, for PUT operations.

Conditional, for GET operations.

The following curl syntax can be used to instruct RAN to create a file, or ‘object’:

# curl -X PUT --insecure --basic --user <username>:<passwd> -H "x-isi-ifs-target-type:object" https://<cluster_ip>:8080/namespace/<path>/<file>/

For example, to create ‘testfile1’ under ‘/ifs/data’:

# ls -lsia /ifs/data/testfile1

ls: /ifs/data/testfile1: No such file or directory

# curl -X PUT --insecure --basic --user <username>:<passwd> -H "x-isi-ifs-target-type:object" https://10.1.10.20:8080/namespace/ifs/data/testfile1/

# ls -lsia /ifs/data/testfile1

6668484639 64 -rw-------     1 root  wheel  0 Aug 28 00:58 /ifs/data/testfile1

And to read the contents of the file via RAN:

# echo "This is testfile1" > /ifs/data-other/testfile1

# curl -X GET --insecure --basic --user <username>:<passwd> https://10.1.10.20:8080/namespace/ifs/data-other/testfile1

This is testfile1

Or using the ‘POST’ option to move the file, say from the /ifs/data/ directory to /ifs/data-other/:

# curl -X POST --insecure --basic --user <username>:<passwd> --header "x-isi-ifs-target-type=object" --header "x-isi-ifs-set-location:/namespace/ifs/data-other/testfile1" https://10.1.10.20:8080/namespace/ifs/data/testfile1/

Then using ‘PUT’ in conjunction with ‘clone’ and ‘x-isi-ifs-copy-source’ headers to create a clone of ‘/usr/data-other/testfile1’ under /usr/data:

# curl -X PUT --insecure --basic --user <username>:<passwd> --header "clone=true" --header "x-isi-ifs-copy-source:/namespace/ifs/data-other/" https://10.1.10.20:8080/namespace/ifs/data/testfile1/

Note that, if the response body contains a JSON message, the operation has partially failed. If the server fails to initiate a copy due to an issue, such as an invalid copy source, an error is returned. If the server initiates the copy, and then fails, ‘copy_errors’ are returned in structured JSON format. Because the copy operation is synchronous, the client cannot stop an ongoing copy operation or check the status of a copy operation asynchronously.

To remove a file, the ‘DELETE’ option can be  used in the request. For example, to delete ‘testfile1’:

# curl -X DELETE --insecure --basic --user <username>:<passwd> -H "x-isi-ifs-target-type:object" https://10.1.10.20:8080/namespace/ifs/data/testfile1/

The following curl ‘PUT’ syntax can be to create a directory, or ‘container’:

# curl -X PUT --insecure --basic --user <username>:<passwd> --header "x-isi-ifs-target-type:container" https://10.1.10.20:8080/namespace/ifs/data/testdir1/

The ‘HEAD’ option can also be used to view the attributes of the directory, including its ACL (x-isi-ifs-access-control). For example:

# curl --head --insecure --basic --user <username>:<passwd> https://10.1.10.20:8080/namespace/ifs/data/testdir1

HTTP/1.1 200 Ok

Date: Wed, 28 Aug 2024 01:29:16 GMT

Server: Apache

Allow: GET, PUT, POST, DELETE, HEAD

Etag: "6668484641-18446744073709551615-1"

Last-Modified: Wed, 28 Aug 2024 01:16:28 GMT

x-isi-ifs-access-control: 0700

x-isi-ifs-spec-version: 1.0

x-isi-ifs-target-type: container

X-Frame-Options: sameorigin

X-Content-Type-Options: nosniff

X-XSS-Protection: 1; mode=block

Strict-Transport-Security: max-age=31536000;

Content-Security-Policy: default-src 'none'

Content-Type: application/json

The curl verbose option (-v) provides step by step insight into the HTTP client/server interaction, which can be valuable for debugging. For example, the output from a request to create the file /ifs/data/testfile2:

# curl -v -X PUT --insecure --basic --user <name>:<passwd> --header "x-isi-ifs-target-type:object" https://10.1.10.20:8080/namespace/ifs/data/testfile2/
*   Trying 10.1.10.20:8080...
* Connected to 10.1.10.20 (10.1.10.20) port 8080
* ALPN: curl offers http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 / [blank] / UNDEF
* ALPN: server accepted http/1.1
* Server certificate:
*  subject: C=US; ST=Washington; L=Seattle; O=Isilon Systems, Inc.; OU=Isilon Systems; CN=Isilon Systems; emailAddress=support@isilon.com
*  start date: Aug  4 17:39:14 2024 GMT
*  expire date: Nov  6 17:39:14 2026 GMT
*  issuer: C=US; ST=Washington; L=Seattle; O=Isilon Systems, Inc.; OU=Isilon Systems; CN=Isilon Systems; emailAddress=support@isilon.com
*  SSL certificate verify result: self signed certificate (18), continuing anyway.
* using HTTP/1.x
* Server auth using Basic with user 'root'
> PUT /namespace/ifs/data/testfile2/ HTTP/1.1
> Host: 10.1.10.20:8080
> Authorization: Basic cm9vdDph
> User-Agent: curl/8.7.1
> Accept: */*
> x-isi-ifs-target-type:object
> 
* Request completely sent off
< HTTP/1.1 200 Ok
< Date: Wed, 28 Aug 2024 00:46:36 GMT
< Server: Apache
< Allow: GET, PUT, POST, DELETE, HEAD
< x-isi-ifs-spec-version: 1.0
< X-Frame-Options: sameorigin
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Strict-Transport-Security: max-age=31536000;
< Content-Security-Policy: default-src 'none'
< Transfer-Encoding: chunked
< Content-Type: text/plain
< 
* Connection #0 to host 10.219.64.11 left intact
#

Beyond this, crafting more complex HTML requests with the curl utility can start to become unwieldy, and more powerful dev tools can be beneficial instead. Plus a solid understanding of HTTP/1.1, and experience writing HTTP-based client software, before getting too heavily involved with implementing the RAN API in production environments.

OneFS Namespace API (RAN)

 Among the array of HTTP services, and in addition to the platform API, OneFS also provides a namespace API.

This RESTful Access to Namespace, or RAN, provides HTTP access to any of the files and directories within a cluster’s /ifs filesystem hierarchy.

RAN can be accessed by making an HTTP call that references the /namespace/ API, rather than the /platform/ API that we’ve seen in the previous articles in this series. For example:

namespace == “http share”

This provides HTTPS access to the files & directories on the filesystem, but more as a data management API rather than object.

As you would expect, by default, the root of a cluster’s RAN namespace is the top level /ifs directory:

Namespace resources are accessed through a URL, such as:

Where:

Attribute Description
Access point Access Point is the root path of the file system endpoint (RAN share), ie. /ifs.
Authority Authority is the IP address or FQDN for the cluster.
Container Container is a directory or folder.
Data object Data object contains content data, such as a file on the system.
Endpoint Endpoint is the targetable URL.
File File is the data object you wish to query/modify.
Namespace Namespace is the file system structure on the cluster.
Object Object is either a container or data object.
Path Path is the file path to the object you want to access.
Port Port is the number of the port; the default port number is 8080
Scheme Scheme is the access protocol method

For example, using the RAN API to access the file ‘file1’ which resides in the ‘dir1’ directory under the access point and path ‘/ifs/data/dir1/’.

From the OneFS CLI, the ‘curl’ utility can be used to craft a ‘GET’ request for this file:

# curl -u <user:password> -k https://10.1.10.20:8080/namespace/ifs/data/dir1/file1

Or from a browser:

Also, using ‘curl’ via the CLI to view the RAN access point:

# curl -X GET https://10.1.10.20:8080/namespace --insecure --basic --user root:a

{"namespaces":[{

   "name" : "ifs",

   "path" : "/ifs"

}

]}#

Additionally, you can append a pre-defined keyword to the end of the URL when you send a request to the namespace. These keywords must be placed first in the argument list and must not contain any value. If these keywords are placed in any other position in the argument list, the keywords are ignored. Pre-defined keywords include: ‘acl’, ‘metadata’, ‘worm’, and ‘query’.

For example:

https://10.1.10.20:8080/namespace/ifs/data/dir1?acl

Or for metadata. For example:

https://10.1.10.20:8080/namespace/ifs/data/dir1/file1?metadata

A cluster’s files and directories can be accessed programmatically through RAN, similarly to the way they’re accessed through SMB or NFS protocols, as well as limited by filesystem permissions. As such RAN enables the following types of file system operations to be performed.

Operation Action Description
Access points CREATE, DELETE Identify and configure access points (shares) and obtain protocol information.
Directory CREATE, GET, PUT, LIST, DELETE List directory content.; get and set directory attributes; delete directories from the file system.
File CREATE, GET, PUT, LIST, DELETE View, move, copy, and delete files from the file system.
Access control GET/SET ACLs Manage user rights; set ACL or POSIX permissions for files and directories. Set access list on access points (RAN Share Permissions).
Query QUERY

 

Search system metadata or extended attributes, and tag files.
SmartLock GET, SET, Commit Allow retention dates to be set on files; commit files to a WORM state.

Additionally, applications or external clients can be built to access RAN in any major programming or scripting language, such as C++, Java, .net, Python, etc.

Note, though, that RAN access in general is disabled for clusters running in ‘hardened’ mode. In such cases a warning will be displayed notifying that HTTP browse is disabled, similar to the following:

OneFS Platform API Configuration, Management, and Monitoring

In addition to the platform API (pAPI) and RESTful access to a cluster’s namespace (RAN), OneFS makes extensive use of HTTP for a variety of services and client protocols.

As such, OneFS also supports the following HTTP-based services:

Service Description Ports
PlatformAPI OneFS platform API service, for remote cluster management. TCP 8080
PowerScaleUI OneFS WebUI configuration and management console. TCP 8080
RAN RESTful Access to Namespace, allowing cluster data access via HTTP. TCP 8080
RemoteService Remote-Service API  handlers under the /remote-service/ namespace, managed by isi_rsapi_d. TCP 8080
S3 AWS S3 object protocol. TCP 9020 (http) TCP 9021 (https)
SWIFT SWIFT object protocol (deprecated in favor of S3). TCP 8083
WebHDFS WebHDFS over HTTP. TCP 8082

In OneFS 9.4 and later, the above HTTP services may be enabled or disabled independently via the CLI or platform API, by a user account with the ‘ISI_PRIV_HTTP RBAC’ privilege.

The ‘isi http services’ CLI command set can be used to view and modify the services HTTP services. For example, remote HTTP access to the platform API can easily be disabled as follows:

 # isi http services modify Platform-API-External --enabled=0

You are about to modify the service Platform-API-External. If you disable Platform-API-External then PowerScaleUI will also be disabled. Are you sure? (yes/[no]):

Similarly, a subset of the HTTP configuration settings, including WebDAV, can also be viewed and modified via the WebUI by navigating to Protocols > HTTP settings:

Similarly, the OneFS web services can be viewed and controlled from the CLI via the ‘isi http services’ command set. For example:

# isi http services list

ID                    Enabled

------------------------------

Platform-API-External Yes

PowerScaleUI          Yes

RAN                   Yes

RemoteService         Yes

------------------------------

Total: 4

The astute will have observed that both S3 and Swift are notably absent from the above list of OneFS HTTP services.

Since S3 has become the de facto object protocol, after a period of gradual deprecation the OpenStack Swift protocol & API has finally been completely removed in OneFS 9.9. That said, Swift will remain available and supported in OneFS 9.8 and earlier releases, until their respective end of support dates.

Also, while S3 service uses HTTP as its transport, it is considered as a tier-1 protocol, and as such is managed via its own ‘isi s3’ CLI command set, corresponding WebUI area, and platform API endpoints:

In the example above, the ‘?describe&list’ suffix provides all of the S3 pAPI endpoints.

Another useful facet is that the OneFS command line syntax provides a ‘—debug’ option, which displays the associated pAPI endpoint information for each CLI command entered. For example, when querying OneFS for a cluster’s storage pool info, the ‘GET [‘9’, ‘storagepool’, ‘storagepools’]’ endpoint is being used by the CLI command:

# isi --debug storagepool list

2024-08-14 07:33:01,652 DEBUG rest.py:72: >>>GET ['9', 'storagepool', 'storagepools']

2024-08-14 07:33:01,652 DEBUG rest.py:74:    args={}

   body={}

2024-08-14 07:33:01,752 DEBUG rest.py:96: <<<(200, {'status': '200 Ok', 'content-type': 'application/json', 'allow': 'GET, HEAD'}, '\n{\n"storagepools" : \n[\n\n{\n"can_disable_l3" : true,\n"can_enable_l3" : false,\n"health_flags" : [],\n"id" : 1,\n"l3" : false,\n"l3_status" : "storage",\n"lnns" : [ 1, 2, 3 ],\n"manual" 

<snip>

So the corresponding pAPI URL to the ‘isi storagepool storagepools list’ CLI command is:

Or via curl:

# curl --insecure --basic --user <uname:passwd> https://10.1.10.20:8080/platform/9/storagepool/storagepools

{

"storagepools" :

[

{

"can_disable_l3" : true,

"can_enable_l3" : false,

"health_flags" : [],

"id" : 1,

"l3" : false,

"l3_status" : "storage",

"lnns" : [ 1, 2, 3 ],

"manual" : false,

<snip>

In addition to curl, the OneFS API endpoints can also be incorporated into script languages such as bash, perl, powershell, python, etc. This provides a powerful option for automating routine cluster management tasks.

For example, a python script along the lines of the following can be used to view a cluster’s critical events:

#!/usr/bin/python
import requests
import json
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) 
# Suppresses the self signed cert warning

CLUSTERIP = '10.1.10.20'
PORT=8080
USER='root'
PASSWD='$1cyL@wn'

uri = "https://%s:%s" % (CLUSTERIP, PORT)
papi = uri + '/platform'
headers = {'Content-Type': 'application/json'}

data = json.dumps({'username': USER, 'password': PASSWD, 'services': ['platform']})

# uri of the cluster used in the referrer header
uri = f"https://{CLUSTERIP}:{PORT}"

# url of Papi used for all further calls to Papi
papi = uri + '/platform'

# Set header as content will provided in json format
headers = {'Content-Type': 'application/json'}

# Create json dictionary for auth
data = json.dumps({'username': USER, 'password': PASSWD, 'services': ['platform']})

# create a session object to hold cookies
session = requests.Session()

# Establish session using auth credentials
response = session.post(uri + "/session/1/session", data=data, headers=headers, verify=False)

if 200 <= response.status_code < 299:
    # Set headers for CSRF protection. Without these two headers all further calls with be "auth denied"
    session.headers['referer'] = uri
    session.headers['X-CSRF-Token'] = session.cookies.get('isicsrf')
    print("Authorization Successful")
else:
    print("Authorization Failed")
    print(response.content)

endpoint = '/7/event/eventlists'
response = session.get(papi + endpoint, verify=False)
result = json.loads(response.content)

#iterate through each event in each eventlist and output only critical events
for eventlist in result['eventlists']:
    for event in eventlist['events']:
        if event['severity'] == 'critical':
            print("Event ID: %s -- Event Severity: %s -- Description: %s " % (event['event'], event['severity'], event['message']))

Note that the ‘CLUSTERIP’, ‘USER’, and ‘PASSWD’ fields in the above python script will need to be edited appropriately, to reflect a cluster’s settings.

There is also an extensive OneFS API portal and developer community:

https://www.delltechnologies.com/en-us/storage/storage-automation-and-developer-resources/index.htm

This portal provides a central location for all the Dell ecosystem integrations (plugins), including CSI drivers, VMware, Containers, DevOps, Infrastructure as Code (IaC), OpenStack, etc. It also provides community forums to collaborate, post questions, discuss ideas, share tips & tricks, etc. – in addition to code samples and ready to use integrations for developers.

PowerScale OneFS 9.9

Dell PowerScale is already powering up summer with the launch of the innovative OneFS 9.9 release, which shipped today (13th August 2024). This new 9.9 release is an all-rounder, introducing PowerScale innovations in performance, security, serviceability, protocols, and ease of use.

OneFS 9.9 delivers the next version of PowerScale’s common software platform for on-prem and cloud deployments. This can make it a solid fit for traditional file shares and home directories, vertical workloads like M&E, healthcare, life sciences, financial services, and next-gen AI, ML and analytics applications.

PowerScale’s scale-out architecture can be deployed on-site, in co-lo facilities, or as customer managed Amazon AWS and Microsoft Azure deployments, providing core to edge to cloud flexibility, plus the scale and performance and needed to run a variety of unstructured workflows on-prem or in the public cloud.

With data security, detection, and monitoring being top of mind in this era of unprecedented cyber threats, OneFS 9.9 brings an array of new features and functionality to keep your unstructured data and workloads more available, manageable, and secure than ever.

Monitoring and Alerting

On the monitoring and reporting front, OneFS 9.9 sees the debut of an automatic maintenance mode, allowing a maintenance window to be automatically triggered during an upgrade, node reboot, shutdown, or similarly disruptive event. In concert with this are the ‘noise reduction’ benefits provided by a new superfluous alerting and call-home suppression feature. Additionally, OneFS 9.9 also adds IPv6 networking support to its SupportAssist ‘phone home’ service.

Security

OneFS 9.9 now provides class of service (CoS) and quality of service (QoS) tagging via the introduction of Differentiated Service Code Point (DSCP) support within the OneFS firewall. This allows storage and network administrators to classify and separate network traffic, such as transactional data, management, bulk data, etc, in line with an organization’s security mandate and operational needs.

Upgrading

Non-disruptive cluster upgrades (NDU) get a boost in OneFS 9.9 from both enhanced SMB connection management and a new pre-upgrade health check system. The latter identifies potential issues before an upgrade can commence, improving the reliability and efficiency of system updates, and ensuring minimal disruption and optimal system performance post-upgrade.

Hardware Innovation

On the platform hardware front, OneFS 9.9 also unlocks dramatic performance and capacity enhancements – particularly for the all-flash F710, which sees the introduction of support for 61TB QLC SSDs, plus a 200Gb Ethernet backend network.

Performance

Performance is further boosted in OneFS 9.9 with the release of an NFS multipath client driver. For successful large-scale AI model customization and training, GPUs need data served to them quickly and efficiently. As such, compute and storage must be sized and deployed accordingly to eliminate potential bottlenecks in the infrastructure.

To meet this demand, the new multipath client driver enables performance aggregation of multiple PowerScale nodes through a single NFS mount point to one or many NFS clients, dramatically increasing the I/O to multiple PowerScale nodes through a single NFS mount point, for higher single-client throughput.

This can directly benefit generative AI and machine learning environments, plus other workloads involving highly concurrent streaming reads and writes of different files from individual, high throughput capable Linux servers. As such, the multipath client driver enables PowerScale to deliver the first Ethernet storage solution validated on the NVIDIA DGX SuperPOD.

In summary, OneFS 9.9 brings the following new features and functionality to the Dell PowerScale ecosystem:

Feature Info
Events and Alerts ·         Automatic maintenance mode.

·         Superfluous alerting suppression.

Upgrade ·         NDU improved pre-upgrade healthchecks

·         SMB disruption reduction.

Protocols ·         Support for a multipath NFS driver for Linux client systems.

·         Complete removal of deprecated SWIFT object protocol.

Networking ·         Nvidia Spectrum-4 switch support for F710 200GbE.

·         Support for in-field Ethernet adapter (NIC) swaps.

CoS / QoS Tagging ·         Class of service and quality of service tagging enabled via the introduction of  Differentiated Service Code Point (DSCP) support.
Platform ·         Infrastructure support for 61TB QLC drives and a 200Gb/s back-end Ethernet network on the all-flash F710 platform.
Support ·         IPv6 networking added for SupportAssist phone home service.

Additionally, since S3 has become the de facto object protocol due to its extensive features, robust performance, and wide industry adoption, while the OpenStack community’s focus has shifted significantly, OneFS 9.9 sees the complete removal of the OpenStack Swift protocol & API. Swift will remain available and supported in OneFS 9.8 and earlier, until their respective end of support dates.

We’ll be taking a deeper look at OneFS 9.9’s new features and functionality in future blog articles over the course of the next few weeks.

Meanwhile, the new OneFS 9.9 code is available on the Dell Support site, as both an upgrade and reimage file, allowing both installation and upgrade of this new release.

For existing clusters running a prior OneFS release, the recommendation is to open a Service Request with to schedule an upgrade. To provide a consistent and positive upgrade experience, Dell EMC is offering assisted upgrades to OneFS 9.9 at no cost to customers with a valid support contract. Please refer to Knowledge Base article KB544296 for additional information on how to initiate the upgrade process.

OneFS Platform API Architecture and Operation

Upon receiving an HTTP request sent through the OneFS API, the cluster’s web server (Apache) verifies the username and password credentials – either through HTTP Basic Authentication for single requests or via an established session to a single node for multiple requests.

Once the user has been successfully authenticated, OneFS role-based access control (RBAC) then verifies the privileges associated with the account and, if sufficient, enables access to either the /ifs file system, or to the cluster configuration, as specified in the request URL.

The request URL that calls the API is comprised of a base URL and end-point:

And the response is usually along the lines of the following:

Or from the CLI using the ‘curl’ utility:

# curl -X GET https://10.224.127.8:8080/platform/21/audit/settings/global --insecure --basic --user <username:password>

{
"settings" :
{
"audited_zones" : [],
"auto_purging_enabled" : false,
"cee_server_uris" : [],
"config_auditing_enabled" : false,
"config_syslog_certificate_id" : "",
"config_syslog_enabled" : false,
"config_syslog_servers" : [],
"config_syslog_tls_enabled" : false,
"hostname" : "",
"protocol_auditing_enabled" : false,
"protocol_syslog_certificate_id" : "",
"protocol_syslog_servers" : [],
"protocol_syslog_tls_enabled" : false,
"retention_period" : 180,
"system_auditing_enabled" : false,
"system_syslog_certificate_id" : "",
"system_syslog_enabled" : false,
"system_syslog_servers" : [],
"system_syslog_tls_enabled" : false
}
}

From the CLI, the ‘curl’ output can be parsed using standard shell commands and utilities. For example, to find the cluster’s OneFS version:

# curl -X GET https://10.224.127.8:8080/platform/21/cluster/config --insecure --basic --user root:a | grep -I release

"release" : "9.8.0.0",

In the event that authentication fails for some reason, a response similar to the following notification will be returned:

When it comes to finding the appropriate platform API endpoint, the following ‘describe’ suffix can be used at the base level (platform) within the URL to list all the available options:

https://<cluster_IP>:8080/platform?describe

For example:

As can be seen above, the endpoints are appropriately named to aid navigation of the API.

When used on an endpoint, rather than a path, the ‘describe’ option returns a collection of JSON of methods and fields. In addition to the supported methods (GET, POST, DELETE), the output also includes all support fields and types. Given the breadth of information, it is best viewed from a web browser using a JSON viewer add-on/extension, such as the popular ‘JSONview’ utility. For example:

In addition to the ‘describe’ syntax, the platform API also recognizes the ‘list’ suffix. This can be used at any point in the API hierarchy, in conjunction with ‘describe’, to report the available endpoint(s) for a particular OneFS feature or function. For example, to show the pAPI options for the S3 protocol:

https://<cluster_IP>:8080/platform/21/protocols/S3?describe&list

For example, we can see that OneFS currently provides seven API endpoints for the S3 protocol:

Note that the numerical ‘protocol version’ must be included in the URL – in this case version 21 (the most current).

Typically, new features and API primitives are added to new releases by incrementing the pAPI version number. As such, the endpoints functionality is consistent in each version. For example, the /1/cluster/config endpoint is not be changed to add new functionality, and any new calls and features are uprev’d and placed into the next version. That said, the version number is not guaranteed to be whole number. For example, an incremental version number (v5.1) was introduced back in OneFS 8.1.0.4 to accommodate the NDU rolling reboot endpoint. Querying a particular version will only report the API endpoints that were available at that point in time. For instance, since the S3 protocol was only added in pAPI v10, querying earlier versions will not return any of the seven current endpoints:

Additionally, if you already know the CLI command for something you can get the REST endpoint etc. by simply running:

# isi --debug <command>

The output from the debug flag contains the HTTP REST traffic in both directions, including both endpoint and payloads. For example, take the following ‘isi dedupe settings view’ CLI command output, both with and without the –debug flag:

# isi dedupe settings view

Dedupe Schedule: -

          Paths: /ifs/data

   Assess Paths: -

And:

# isi --debug dedupe settings view

2024-07-21 21:59:52,488 DEBUG rest.py:80: >>>GET ['1', 'dedupe', 'settings']

2024-07-21 21:59:52,488 DEBUG rest.py:81:    args={}

   body={}

2024-07-21 21:59:52,503 DEBUG rest.py:106: <<<(200, {'content-type': 'application/json', 'allow': 'GET, PUT, HEAD', 'status': '200 Ok'}, b'\n{\n"settings" : \n{\n"assess_paths" : [],\n"dedupe_schedule" : null,\n"paths" : [ "/ifs/data" ]\n}\n}\n')

Dedupe Schedule: -

          Paths: /ifs/data

   Assess Paths: -

The first line of output above shows that the equivalent endpoint for this CLI command is:

/1/dedupe/settings/

From this, the API URL can be inferred:

# curl -X GET https://10.1.10.20:8080/platform/1/dedupe/settings/ --insecure --basic --user <username>:<password>

{

"settings" :

{

"assess_paths" : [],

"dedupe_schedule" : null,

"paths" : [ "/ifs/data" ]

}

}

In addition to ‘curl’, OneFS also offers an ‘isi_papi_tool’ CLI utility for querying a node’s platform API endpoints. For example,  the following syntax can be used to view the status of a node’s SMB sessions via the ’16/protocols/smb/sessions’ endpoint:

# isi_papi_tool GET 16/protocols/smb/sessions | egrep "computer|client_type|encryption"

Or to see all the SMB sessions across the entire cluster with ‘isi_papi_tool’:

# for node in $(isi_nodes %{lnn}); do echo Sessions on Node $node:; isi_papi_tool GET 16/protocols/smb/sessions\?lnn=$node | egrep "computer|client_type|encryption";done

Sessions on Node 1:

Sessions on Node 2:

"client_type" : "SMB 3.1.1",

"computer" : "10.1.10.22",

"encryption" : false,

Sessions on Node 3:

Sessions on Node 4:

Sessions on Node 5:

"client_type" : "SMB 3.1.1",

"computer" : "10.1.10.25",

"encryption" : true,

Sessions on Node 9:

In the next article in this series, we’ll turn our attention to OneFS RAN, the RESTful namespace access API.

OneFS Web APIs

In addition to the OneFS WebUI and CLI administrative management interfaces, a PowerScale cluster can also be accessed, queried and configured via a representative state transfer (RESTful) API. This API includes a superset of the Web and CLI interfaces and provides the additional benefit of being easily programmable. As such, it allows most of the cluster’s administrative tasks to be scripted and automated.

RESTful APIs are web based (HTTP or HTTPS) interfaces that use the HTTP methods, combined with the URL (uniform resource locator), to undertake a predefined action. The URL can describe either a collection of objects (eg. ‘https://papi.isln.com:8080/<resources>/’) or an individual object from a collection (eg. ‘https://papi.isln.com:8080/<resources>/<object>’).

There are typically six principal HTTP operations, or ‘methods’:

Method Object Collection
Get Retrieve a representation of the addressed member of the collection. List the URIs and (optionally) additional details of a collection’s members.
Put Replace or create the addressed member of a collection. Replace the entire collection with another collection.
Post Infrequently used to promote an element to a collection in its own right, creating a new object within it. Create a new entry in the collection. The new entry’s URI is typically automatically assigned and usually returned by the operation.
Patch Update the addressed member of a collection. Rarely used.
Delete Delete the addressed member of a collection. Delete an entire collection.
Head Returns response header metadata without the response body content. Returns response header metadata without the response body content.

For a given application programming interface (API), its path component typically conveys specific meaning, or ‘representative state’, to the RESTful spec. The ‘human readability’ of a RESTful endpoint can be seen, for example, by looking at a request for a cluster’s SMB shares information:

As shown above, the URL is clearly comprised of distinct parts:

Component Description
Scheme Essentially the HTTP protocol version
Authority IP address (<cluster_ip>) and TCP port (<port>) of the cluster.
Path HTTP path to the endpoint
Query The specific endpoint and data requested.
Fragment Occasionally the query is subdivided, such as ‘query#fragment’.

Additionally, OneFS also uses the following API definitions, which are worth understanding:

Item Description
Access point Root path of the URL to the file system. An access point can be defined for any directory in the file system.
Collection Group of objects of a similar type. For example, all the user-defined quotas on a cluster make up a collection of quotas.
Data object An object that contains content data, such as a file on the system
Endpoint Point of access to a resource, comprising a path, query, and sometimes fragment(s).
Namespace The file system structure on the cluster.
Object Containers or data objects. Also known as system configuration data that a user creates, or a global setting on the system.

·         user-created object: snapshot, quota, share, export, replication policy, etc.

·         global settings:  default share settings, HTTP settings, snapshot settings, etc.

Platform Indicates pAPI and the OneFS configuration hierarchy.
Resource An object, collection, or function that you can access by a URI.
Version The version of the OneFS API. It is an optional component, as OneFS automatically uses the latest API.

At a high level, the overall OneFS API is divided into two distinct sections:

Section API Description
Namespace RAN Enables operations on files and directories on the cluster.
Platform pAPI Provides endpoints for cluster configuration, management, and monitoring functionality.

As such, the general topology is as follows:

The Platform API (pAPI) provides a variety of endpoints for managing the administrative aspects of a PowerScale cluster. Indeed, the OneFS CLI and WebUI both use these pAPI handlers to facilitate their cluster config and management functionality, so pAPI represents a superset of both user interfaces.

For file system configuration API requests, the resource URI is composed of the following components:

 https://<cluster_ip>:<port>/<api><version>/<path>/<query>

For example, a GET request sent to the following platform URI will return all the SMB shares on a cluster. Where ‘platform’ indicates pAPI, ’17’ is the API version, ‘protocols’ is the configuration area, ‘SMB’ is the collection name, and ‘shares’ is the object ID:

GET https://10.1.10.20:8080/platform/17/protocols/smb/shares

By way of contrast, file system access APIs requests are served by the RESTful Access to Namespace (RAN) API. RAN uses resource URIs, which are composed of the following components:

https://<cluster_ip>:<port>/<access_point>/<resource_path>

For example, a GET request to the following RAN URI will return the files that are stored within the namespace under /ifs/data/dir1:

GET https://10.1.10.20:8080/namespace/ifs/data/dir1

The response will look something like the following:

In the next couple of articles in this series we’ll dig into the architecture and details of the platform (pAPI) and namespace (RAN) APIs in more depth.

OneFS IceAge and Automated Core File Analysis

The curious and observant may have noticed the appearance of a new service in OneFS 9.8, namely isi_iceage_d.

For example:

# isi services -a | grep -i iceage

isi_iceage_d         Ice Age Monitor Daemon                   Enabled

So what exactly is this new IceAge process and what does it do, you may ask?

Well, OneFS IceAge is a python tool based on lldb, which automatically extracts, optimizes, compresses, and disseminates information from OneFS core files. The goal of this is to streamline the detection and diagnosis of issues and bugs and improve time to resolution.

The IceAge service (IceAge monitor) performs the following core functions:

Function Description
Detection Monitoring the /var/crash directory for fresh core files.
Extraction Extraction (and subsequent removal) of IceAge reports and headers from cores.
Upload Uploading reports to Dell Backend Services .

The IceAge service runs on a cluster, immediately extracting IceAge reports from any core dumps as they are generated, and outputting to a JSON report file, which is suitable for further processing. Reports also include a stack trace to show the potential crash cause. Information can be extracted without the presence of debug symbols  and can also be retroactively annotated with further useful information (such source code line numbers, etc) once symbols are available. Additional information can also be extracted from debug symbols in order to help debug application-specific data structures from a core.

Once a core has been detected, optimized, and processed, IceAge then uses two principal methods of transmission for the report and header:

Uploader Description
isi_gather_info In addition to OneFS logsets, the isi_gather_info utility in OneFS 9,8 and later can collect and transmit JSON IceAge reports and headers as a default option and retain sending cores by request from command line options.
SupportAssist Secure Remote Services (SRS) is used for sending alerts, log_gathers, usage intelligence, managed device status to the backend. OneFS uses SRS to communicate with Dell Support’s backend systems. OneFS 9.8 introduces the ability to collect and send JSON IceAge reports and retain sending cores by request from specific command.

The isi_gather_info command on the cluster gathers various files, including dumps and the output of various commands and uploads them to Dell Support. The /usr/bin/remotesupport directory contains a set of gather and remote support scripts which are designed to collate specific log information about the cluster. Under  this directory is the ‘get_data_iceage’ script which, in conjunction with ‘GetData.sh’, gather and upload data about IceAge reports and headers. These scripts are typically called from the Remote Support Shell, which is a simple, limited shell, solely for running these support scripts.

To aid identification, the header files are generated with the following nomenclature:

YYYYMMDD_HHMMSS_$(SWID)_$(RANDOM_GUID)_IceAgeHeader.tgz

For example:

20240712_173427_ELMISL0121YLVD_4793e5ec-3605-41a6-b72c-d3c404059988_IceAgeHeader.tgz

The header also includes backtrace information and several important sections from the IceAge JSON report.

When IceAge headers have been created and written out to a temporary file, the temporary file is renamed to match the ESRS backend requirements and is uploaded to Dell (ie. CloudIQ). If the upload succeeds the file is removed. However, if the upload fails for any reason, the file is placed into a ‘retry’ state, and a subsequent upload attempted at the beginning of the next interval. Upload retry files are stored in the ‘/ifs/.ifsvar/iceage-reports/headers/retries’ directory.

Architecturally, IceAge looks and operates as follows:

The core isi_iceage_d daemon spawns several additional process, which run on each node in the cluster. These include:

  • IceAge monitor upload
  • Cluster queue watcher
  • Local core watcher
  • Local core timer

For example:

# ps -auxw | grep -i iceage

root    4668    0.0  0.0  99976  50480  -  S    Sat12        1:34.52 /usr/libexec/isilon/isi_iceage_d /usr/local/lib/python3.8/site

root    4688    0.0  0.0 126200  51996  -  I    Sat12        0:06.87 iceage_monitor_upload (isi_iceage_d)

root   63440    0.0  0.0  99976  50480  -  S    18:33        0:00.00 iceage_monitor: cluster queue watcher (isi_iceage_d)

root   63459    0.0  0.0 102384  50656  -  S    18:33        0:00.00 iceage_monitor: local core watcher (isi_iceage_d)

root   63462    0.0  0.0  99976  50480  -  S    18:33        0:00.00 iceage_monitor: local core timer (isi_iceage_d)

When a OneFS component or service fails and a core file is written to /var/crash, IceAge enters it into a queue under /ifs/.ifsvar/iceage-cores/, in which cores awaiting processing are held. To facilitate this, OneFS creates a temporary crash space on the cluster’s existing drives and provisions an ephemeral UFS file system for IceAge to use. IceAge plug-ins are also provided for several OneFS protocols and data services, such as NFS, SMB, etc, in order to generate more detailed reports from the often large and complex cores derived from issues with these processes.

Additionally, the IceAge cluster monitor service watches for cores in the queue and processes them one by one. This generates a report with a summary of information from the core. These reports can then be transmitted to Dell Support by the isi_gather_info process, or via SupportAssist (ESE).

Enabled by default in OneFS 9.8 and later, the IceAge service is managed by MCP, and can be enabled and disabled via the ‘isi services’ CLI command.

# isi services -a isi_iceage_d

isi: Service 'isi_iceage_d' is enabled.

# isi services -a isi_iceage_d disable

The service 'isi_iceage_d' has been disabled.

# isi services -a isi_iceage_d enable

The service 'isi_iceage_d' has been enabled.

Integration with SupportAssist/ESE and isi_gather_info allows IceAge to automatically and securely send the generated report text files back.

Configuration-wise, the IceAge monitor uses a gconfig file in which parameters such as log level can be specified. For example:

# isi_gconfig -t iceage_monitor

[root] {version:1}

iceage_monitor.queue_max_size_gb (int) = 20

iceage_monitor.retention_period_min (int) = 43800

iceage_monitor.log_level (char*) = INFO

iceage_monitor.header_dispatch (bool) = true

iceage_monitor.min_core_create_time_supported (int) = 1715245735

The above configuration is also exposed via the OneFS PlatformAPI, and any modifications are recorded in the /ifs/.ifsvar/ iceage_monitor_config_changes.log file.

The basic flow of the IceAge service and SupportAssist transport is as follows:

  1. First, ensure that SupportAssist is configured and running on the cluster:
# isi supportassist settings view | grep -i enabled

Service enabled:  Yes

If not, SupportAssist can be activated as follows:

# isi supportassist settings modify --connection-mode gateway --gateway-host <host_FQDN> --gateway-port 9443 --backup-gateway-host <backup_FQDN> --backup-gateway-port 9443 --network-pools="subnet0.pool0"

Note that the changes made to SupportAssist settings may take some time to take effect.

  1. Next, generate one or more cores. This can be done with the following CLI syntax:
# isi_noatime isi_kcore <PID> /var/crash/<PID>.<service>.cor.gz

For example, creating two NFS core files for processes with PIDs ‘22120 and ‘22121 in the following output:

# ps -aux | grep nfsroot   22109   0.0  0.5  54840  30356  -  Ss   17:21     0:00.01 /usr/sbin/isi_netgroup_d -P isi_netgroup_d_nfsroot   22120   0.0  0.4  55000  26652  -  Ss   17:21     0:00.04 /usr/libexec/isilon/nfs proxy nfs /var/run/nfs.pidroot   22121   0.0  0.7 111340  42812  -  S<   17:21     0:00.13 lw-container nfs (nfs)root   22175   0.0  0.0  14208   2896  0  S+   17:21     0:00.00 grep nfs# isi_noatime isi_kcore 22120 /var/crash/22120.nfs.core.gz# isi_noatime isi_kcore 22121 /var/crash/22121.nfs.core.gz# ls -ltr /var/crash | grep -i core-rw-------      1 root  daemon     716005 Jul  9 17:22 22120.nfs.core.gz-rw-------      1 root  daemon    1211863 Jul  9 17:22 22121.nfs.core.gz
  1. Next, the monitor log shows the location of the report file for each cores:
# cat /var/log/isi_iceage_monitor.log

For example:

# cat /var/log/isi_iceage_monitor.log

tme2: 2024-07-09T17:23:30.541904+00:00 <3.6> tme-2(id2) isi_iceage_d[4327]: INFO:cluster.py:176 -- Run ClusterProcess with cores: ['/ifs/.ifsvar/iceage-cores/tme-1-1707499378.08631-22121.nfs.core.gz']tme2: INFO:__main__.py:569 -- IceAge startedtme2: INFO:__main__.py:320 -- Detected information for /ifs/.ifsvar/iceage-cores/tme-1-1707499378.08631-22121.nfs.core.gz:tme-2: INFO:__main__.py:360 --              build : b.main.4102rtme-2: INFO:__main__.py:360 --              domain : usertme-2: INFO:__main__.py:360 --              executable : /usr/likewise/sbin/lwsmdtme-2: INFO:__main__.py:360 --              handler : lldbtme-2: INFO:__main__.py:232 -- Calculating space needed...tme-2: INFO:__main__.py:250 -- 379992064 bytes.tme-2: INFO:__main__.py:254 -- Setting up scratch space...tme-2: INFO:__main__.py:259 -- Ready.tme-2: INFO:__main__.py:385 -- Set vmem limit to 2147483648 for pid 15640tme-2: INFO:__main__.py:389 -- Loading core...tme-2: INFO:__main__.py:391 -- Core /ifs/.ifsvar/iceage-cores/tme-1-1707499378.08631-22121.nfs.core.gz loaded.tme-2: INFO:__main__.py:394 -- Extracting...<snip>isi_iceage_d[15637]: INFO:makedigest.py:124 -- Written tgz file: '/ifs/.ifsvar/iceage-reports/headers/20240209_172334_DEFAULTSWID_db3bb260-88ce-4619-9f48-b9828eddccd5_IceAgeHeader.tgz'tme-2: 2024-07-09T17:23:34.318304+00:00 <3.6> tme-2(id2) isi_iceage_d[15637]: INFO:makedigest.py:124 -- Written tgz file: '/ifs/.ifsvar/iceage-reports/20240709_172334_DEFAULTSWID_db3bb260-88ce-4619-9f48-b9828eddccd5_IceAgeHeader.tgz'
  1. The IceAge JSON files are located under /ifs/.ifsvar/iceage-cores, and contain a wealth of information, including OneFS versions and paths, etc. For example:
# cat tme-2-1720811519.5973-59660.nfs.core.json | grep -i core

  "core-file": "/ifs/.ifsvar/iceage-cores/tme-2-1720811519.5973-59660.nfs.core.gz",

        "set_core_hook": 18446744071587293992,

    "corefile_build": "B_9_8_0_0_003(RELEASE)",

    "corefile_version": "Isilon OneFS 9.8.0.0 (Release, Build B_9_8_0_0_003(RELEASE), 2024-03-11 09:27:38, 0x909005000000003)",
  1. Finally, if SupportAssist is configured on the cluster, the ESE logs can be used verify that the reports have been successfully transmitted back to Dell Support with the following CLI command:
# cat /usr/local/ese/var/log/ESE.log | grep -I iceage

For example:

"path": "/ifs/.ifsvar/iceage-reports/headers/20240709_172303_ELMISL0224SM54_0740a853-517c-4fc5-b162-64991d9494b9_IceAgeHeader.tgz",
20067 2024-07-09 17:26:41,235 CP Server Thread-7 INFO     DellESE.ese.threads.web.cherrypydata LN:  61 /ifs/.ifsvar/iceage-reports/headers/20240709_172303_ELMISL0224SM54_0740a853-517c-4fc5-b162-64991d9494b9_IceAgeHeader.tgz is a file

20067 2024-07-09 17:26:43,696 Web Dispatcher DEBUG    urllib3.connectionpool LN: 474 https://eng-sea-v4scg-01.west.isilon.com:9443 "PUT /esrs/v1/devices/ISILON-GW/ELMISL0224SM54/mft/BINARY-ELMISL0224SM54-20240709T172642Z-33MJ9WiT5Swt4mcLdEwSkMA-20240709_172303_ELMISL0224SM54_0740a853-517c-4fc5-b162-64991d9494b9_IceAgeHeader.tgz HTTP/1.1" 200 0
20067 2024-07-09 17:26:43,699 Web Dispatcher DEBUG    DellESE.ese.srs.srswebapi LN:  89 Sending ESE binary file [20240709_172303_ELMISL0224SM54_0740a853-517c-4fc5-b162-64991d9494b9_IceAgeHeader.tgz], Workitem [33MJ9WiT5Swt4mcLdEwSkMA], sent to url https://eng-sea-v4scg-01.west.isilon.com:9443/esrs/v1/devices/ISILON-GW/ELMISL0224SM54/mft/BINARY-ELMISL0224SM54-20240709T172642Z-33MJ9WiT5Swt4mcLdEwSkMA-20240209_172303_ELMISL0224SM54_0740a853-517c-4fc5-b162-64991d9494b9_IceAgeHeader.tgz.  Date: 2024-02-09T17:26:43.282+0000.   Status: 200

  "path": "/ifs/.ifsvar/iceage-reports/headers/20240209_172334_ELMISL0224SM54_db3bb260-88ce-4619-9f48-b9828eddccd5_IceAgeHeader.tgz",
20067 2024-07-09 17:26:47,235 CP Server Thread-8 INFO     DellESE.ese.threads.web.cherrypydata LN:  61 /ifs/.ifsvar/iceage-reports/headers/20240709_*172334_ELMISL0224SM54_db3bb260-88ce-4619-9f48-b9828eddccd5_IceAgeHeader.tgz* is a file

20067 2024-07-09 17:26:58,632 Web Dispatcher DEBUG    urllib3.connectionpool LN: 474 https://eng-sea-v4scg-01.west.isilon.com:9443 "PUT /esrs/v1/devices/ISILON-GW/ELMISL0224SM54/mft/BINARY-ELMISL0224SM54-20240709T172658Z-3hJcHU9hEomZYyWLCkqh5Jj-20240709_172334_ELMISL0224SM54_db3bb260-88ce-4619-9f48-b9828eddccd5_IceAgeHeader.tgz HTTP/1.1" 200 0
20067 2024-07-09 17:26:58,636 Web Dispatcher DEBUG    DellESE.ese.srs.srswebapi LN:  89 Sending ESE binary file [20240709_172334_ELMISL0224SM54_db3bb260-88ce-4619-9f48-b9828eddccd5_IceAgeHeader.tgz], Workitem [3hJcHU9hEomZYyWLCkqh5Jj], sent to url https://eng-sea-v4scg-01.west.isilon.com:9443/esrs/v1/devices/ISILON-GW/ELMISL0224SM54/mft/BINARY-ELMISL0224SM54-20240709T172658Z-3hJcHU9hEomZYyWLCkqh5Jj-20240709_172334_ELMISL0224SM54_db3bb260-88ce-4619-9f48-b9828eddccd5_IceAgeHeader.tgz.  Date: 2024-07-09T17:26:58.362+0000.   Status: 200

There are some caveats to be aware of with IceAge, and it may not be able to process every core in all situations. As such, it is considered ‘best effort’ relative to security and performance constraints.

Specifically, the scenarios under which IceAge monitor will not automatically process cores include:

Component Condition Details
Filesystem During unavailability of ifs
On-disk encryption On SED Nodes, because IceAge uses the band on SEDs that is not encrypted for scratch.
Drive maintenance During drive distmirror rebalancing and drive firmware upgrade
Capacity If OneFS is unable to find sufficient free space on drives.
Memory If it would require too much memory that could cause instability. The vmem limit is determined by the amount of scratch space needed as well as system memory.
Version For any cores generated on OneFS versions older than the running build, IceAge may struggle to interpret them accurately using the debug symbols from the current build.