As the name suggests, PowerScale OneFS healthchecks enable a storage administrator to quickly and easily evaluate the status of specific software and hardware components of the cluster and its environment.
The OneFS 9.10 release includes several healthcheck enhancements, which aid cluster administrator in quickly understanding the health of the system, plus offering resolution guidance in the event of a failure. In a nutshell, these include:
Function | Details |
Dashboard | Display current healthcheck results in the landing page to indicate the current health of the system (Real-time health of the system). |
Export | The ability to export in CSV or JSON formats. |
Grouping | Grouping of healthcheck based on category, frequency. |
History | Historical healthchecks presented as a separate category. |
Links | Links provided to relevant knowledge base (KB) articles instead of plain texts. |
Troubleshooting | Detailed information on the failure and troubleshooting guidance. |
The healthcheck landing page in OneFS 9.10, accessible under Cluster Management > Healthchecks, displays navigation tabs for three pages:
Of these, the ‘evaluation’ and ‘heathcheck’ views are enhanced in the new release, with ‘Evaluations’ being the default landing page.
In earlier versions, the ‘healthcheck’ page, under Cluster Management > Healthchecks > Healthchecks, displayed two separate tables – one for checklists themselves and another for their contents, the checklist items. Plus, there was no properly directed relationship between the checklists and their items.
To address this, OneFS 9.10 condenses these into a single table view, where each checklist row can be expanded to make its associated items visible. For example, the expanded CELOG checklist and contents in the following:
Moving to a single table format has also enabled the addition of keyword search functionality. As the desired search string is entered into the search box, the WebUI automatically expands and collapses rows to make the matching content visible. This allows the admin to quickly drill down into their checks of interest, and then easily run the full checklist – or just individual items themselves. For example, searching for ‘quota’ reveals the following related items within the ‘auth’ and ‘basic’ checklists:
Additionally, the email settings button for each healthcheck are now more apparent, intuitive, and accessible, offering either default or custom distribution list options:
For ‘evaluations’, the enhanced Healthcheck dashboard in OneFS 9.10 clearly displays the current healthcheck status and results on the landing page. As such, navigating to Cluster Management > Healthchecks now provides a single screen synopsis of the real-time health of the cluster. For example:
In addition to a keyword search option, this view can also be filtered by the ‘latest’ evaluation, or ‘all’ evaluations.
Under the ‘Actions’ field, the ‘More’ dropdown allows logs to be easily gathered and/or downloaded:
If a log gather is selected, its progress is reported is the ‘status’ field for the associated check. For example:
Clicking the ‘view details’ button for a particular failed checklist opens up a pane with both ‘passed’ and ‘failed’ items:
The ‘passed items’ tab provides details on the specific check(s) that were successfully completed (or unsupported) in the evaluation run.
Similarly, the ‘failed items’ tab displays the unsuccessful check(s) with their error description. For example, the following job engine healthcheck, notifying of LIN-based jobs and suggesting remediation steps:
In this case, even though 260 of the checklist items have passed and only 1 has failed, the overall status for the ‘basic’ checklist is ‘failed’.
The ‘export’ drop-down allows the healthcheck error details to be exported for further analysis as either a CSV or JSON file. For example:
Similarly, the OneFS 9.10 CLI also has a ‘format’ option for exporting healthcheck evaluations. However, unlike the WebUI, the command line options include a list and table format, in addition to CSV and JSON. As such, the 9.10 Healthcheck export options can be summarized as follows:
Export Format | CLI | WebUI |
CSV | x | x |
JSON | x | x |
List | x | |
Table | x |
The CLI syntax for specifying the export format is as follows:
# isi healthcheck evaluations view <id> --format <csv | json | list | table>
For example, to limit the view to one basic evaluation, in table format, and without the header and footer:
# isi healthcheck evaluations view basic20250304T1105 --format table --limit 1 --no-header --no-footer basic20250304T1105 basic - Completed Fail WARNING 75 [NODE 5] port_flapping * Network port flapping has been detected at some point in the last 24 hours on the following ports mce0, mce1. This can cause issues such as memory leaks if not addressed. Contact Dell Technologies Support if you are experiencing network issues.
Note that the default output contains failing items for that evaluation only. However, the ‘—verbose’ flag can be included to display all the pass and fail items for that evaluation.
On the platform API (pAPI) front, the following new v21 endpoints have been added in OneFS 9.10:
/21/healthcheck/evaluations
This now includes the ‘format_for_csv_download’ option, and is used to enable CSV download of a healthcheck evaluation.
There’s also a new endpoint to track the status of a log gather in progress:
/21/cluster/diagnostics/gather/status
For example:
# curl -k https://<name>:<Passwd>@localhost:8080/platform/21/cluster/diagnostics/gather/status { "gather" : { "item" : null, "path" : "/ifs/data/Isilon_Support/pkg", "status" : { "Active_Status" : "RUNNING", "Last_Status" : " NOT_RUNNING " } } }