OneFS’ non-disruptive upgrade (NDU) functionality allows administrators to upgrade a cluster while their end user community continue to access data without error or interruption. During the OneFS rolling upgrade process, one node at a time is updated to the new code, and the active clients attached to it are automatically migrated to other nodes in the cluster. Partial upgrade is also supported, allowing a subset of cluster nodes can be upgraded, and the subset of nodes may also be grown during the upgrade. OneFS also permits an upgrade to be paused and resumed, enabling customers to span cluster upgrades over multiple smaller Maintenance Windows.
OneFS CloudPools v1.0 originally debuted back in the OneFS 8.0 release. The next major update, CloudPools v2.0, was delivered in OneFS 8.2 release, and introduced significant enhancements which included:
- Support for AWS signature authentication version 4.
- Network statistics per CloudPools account or file pool policy.
- Support for Alibaba Cloud and Amazon C2S public cloud providers.
- Full integration of CloudPools and data services like Snapshot, Sparse file handling, Quota, AVScan and WORM.
- NDMP and SyncIQ support
- Non-Disruptive Upgrade (NDU) support
CloudPools, like its SmartPools counterpart, uses the OneFS file pool policy engine to designate which data on a cluster should reside on which tier, or be archived to a cloud storage target. If files match the criteria specified in a file pool policy, the content of those files is moved to cloud storage when the job runs. Under the hood, CloudPools uses ‘SmartLink’ files within the /ifs namespace, each of which contains information about where to retrieve each file’s data blocks that have been cloud tiered. In CloudPools 1.0, these SmartLink v1 files, often referred to as ‘stubs’, do not behave like a normal file. By contrast, the SmartLink v2 files in CloudPools 2.0 are more like traditional files, each containing pointers to the CloudPools target where the data resides.
When a CloudPools 1.0 cluster is upgraded to OneFS 8.2 or later, a ‘changeover’ process is automatically initiated upon upgrade commit. This process is responsible for converting the v1 SmartLink files to v2, ensuring a seemless transition from CloudPools 1.0 to 2.0.
The following table outlines the upgrade paths available when transitioning from CloudPools 1.0 to 2.0:
Current OneFS Version | Upgrade to OneFS 8.2 | Upgrade to OneFS 8.2.1 with 5/2020 RUPs | Upgrade to OneFS 8.2.2 with 5/2020 RUPs | Upgrade to OneFS 9.x |
OneFS 8.0 | Discouraged | Viable | Recommended | Highly recommended |
OneFS 8.1 | Discouraged | Viable | Recommended | Highly recommended |
In a SyncIQ environment with unidirectional replication, the SyncIQ target cluster should be upgraded before the source cluster. Conversely, for bi-directional replication, the recommendation is to disable SyncIQ on both the source and target, and upgrade both clusters simultaneously.
The following CLI commands can be run on both the source and target clusters to verify and capture their storage account, CloudPools, file pool policy, and SyncIQ configurations:
# isi cloud accounts list -v # isi cloud pools list -v # isi filepool policies list -v # isi sync policies list -v
SyncIQ can be re-enabled on both source and target once the OneFS upgrades have been committed on both clusters. Be aware that the SmartLink conversion process can take considerable time, depending on the number of SmartLink files and the processing power of the target cluster.
Note that there is no need to stop the SyncIQ and/or SnapshotIQ services during the upgrade in a SyncIQ environment with unidirectional replication. However, since SyncIQ must resynchronize all converted stub files, it might take it some time to process all the changes.
The ‘isi cloud job view <job ID>’ CLI command can be used to check the status of a SmartLink upgrade process. For example, to view job ID 6:
# isi cloud job view 6 ID: 6 Description: Update SmartLink file formats Effective State: running Type: smartlink-upgrade Operation State: running Job State: running Create Time: 2022-05-23T14:20:26 State Change Time: 2022-05-17T09:56:08 Completion Time: - Job Engine Job: - Job Engine State: - Total Files: 21907433 Total Canceled: 0 Total Failed: 61 Total Pending: 318672 Total Staged: 0 Total Processing: 48 Total Succeeded: 21588652
Note that the CloudPools recall jobs will not run during an active SmartLink upgrade or conversion.
CloudPools 2.0 supports AWS signature version 4 (v4), in addition to version 2 (v2). Version 4 is generally preferred, since it provides an additional level of security.. However, be aware that any legacy CloudPools v2 cloud storage accounts cannot use v4 in the ‘upgraded’ state if the version prior to the OneFS 8.2.0 upgrade did not support V4. A patch is available for OneFS 8.1.2 to support v4 authentication, as a work-around for this issue.
While CloudPools 2.0 supports write-back in a snapshot, it does not support archiving and recalling files in the snapshot directory. If there is legacy file data in a snapshot on a cluster running a OneFS 8.1.2 or earlier, since that data consumes storage space, upon upgrade to OneFS 8.2, this snapshot storage space cannot be released since CloudPools 2.0 does not support archiving files in snapshots to the cloud.
OneFS non-disruptive upgrades can be easily managed from the WebUI by navigating to Cluster Management > Upgrade, and selecting the desired ‘Upgrade type’ from the drop-down menu. For example:
Rolling upgrades can be initiated from the OneFS CLI with the following syntax:
# isi upgrade cluster start <upgrade_image>
Since OneFS supports the ability to roll back to the previous version, in-order to complete an upgrade it must be committed.
# isi upgrade cluster commit
Up until the time an upgrade is committed, an upgrade can be rolled back to the prior version as follows.
# isi upgrade cluster rollback
The isi upgrade view CLI command can be used to monitor how the upgrade is progressing:
# isi upgrade view -i/--interactive
The following command will provide more detailed/verbose output:
# isi_upgrade_status
A faster, simpler version of isi_upgrade_status is also available in OneFS 8.2.2 and later:
isi_upgrade_node_state-a (aggregate the latest hook update for each node)-devid=<X,Y,E-F> (filter and display by devid)-lnn=<X-Y,A,C> (filter and display by LNN)-ts (time sort entries)
If the end of a maintenance window is reached but the cluster is not fully upgraded, the upgrade process can be quiesced and then restarted using the following CLI commands:
# isi upgrade pause # isi upgrade resume
For example:
# isi upgrade pause You are about to pause the running process, are you sure? (yes/[no]): yes The process will be paused once the current step completes. The current operation can be resumed with the command: `isi upgrade resume`
Note that pausing is not immediate: The upgrade will remain in a “Pausing” state until the currently
upgrading node is completed. Additional nodes will not be upgraded until the upgrade process is resumed.
The ‘pausing’ state can be viewed with the following commands: ‘isi upgrade view’ and ‘isi_upgrade_status’. Note that a rollback can be initiated either during ‘Pausing’ or ‘Paused’ states. Also, be aware that the ‘isi upgrade pause’ command has no effect when performing a simultaneous OneFS upgrade.
A rolling reboot can be initiated from the CLI on a subset of cluster nodes using the ‘isi upgrade rolling-reboot’ syntax and the ‘–nodes’ flag specifying the desired LNNs for upgrade:
# isi upgrade rolling-reboot --help Description: Perform a Rolling Reboot of cluster. Required Privileges: ISI_PRIV_SYS_UPGRADE Usage: isi upgrade cluster rolling-reboot [--nodes <integer_range_list>] [--force] [{--help | -h}] Options: --nodes <integer_range_list> List of comma (1,3,7) or dash (1-7) specified node LNNs to select. "all" can also be used to select all the cluster nodes at any given time. Display Options: --force Do not ask confirmation. --help | -h Display help for this command.
This ‘isi upgrade view’ syntax provides better visibility, status and progress of the rolling reboot process. For example:
# isi upgrade view Upgrade Status: Current Upgrade Activity: RollingReboot Cluster Upgrade State: committed Upgrade Process State: Not started Current OS Version: 9.2.0.0 Upgrade OS Version: N/A Percent Complete: 0% Nodes Progress: Total Cluster Nodes: 3 Nodes On Older OS: 3 Nodes Upgraded: 0 Nodes Transitioning/Down: 0 LNN Progress Version Status --------------------------------- 1 100% 9.2.0.0 committed 2 rebooting Unknown non-responsive 3 0% 9.2.0.0 committed