OneFS Healthcheck Enhancements

As the name suggests, PowerScale OneFS healthchecks enable a storage administrator to quickly and easily evaluate the status of specific software and hardware components of the cluster and its environment.

The OneFS 9.10 release includes several healthcheck enhancements, which aid cluster administrator in quickly understanding the health of the system, plus offering resolution guidance in the event of a failure. In a nutshell, these include:

Function Details
Dashboard Display current healthcheck results in the landing page to indicate the current health of the system (Real-time health of the system).
Export The ability to export in CSV or JSON formats.
Grouping Grouping of healthcheck based on category, frequency.
History Historical healthchecks presented as a separate category.
Links Links provided to relevant knowledge base (KB) articles instead of plain texts.
Troubleshooting Detailed information on the failure and troubleshooting guidance.

The healthcheck landing page in OneFS 9.10, accessible under Cluster Management > Healthchecks, displays navigation tabs for three pages:

Of these, the ‘evaluation’ and ‘heathcheck’ views are enhanced in the new release, with ‘Evaluations’ being the default landing page.

In earlier versions, the ‘healthcheck’ page, under Cluster Management > Healthchecks > Healthchecks, displayed two separate tables – one for checklists themselves and another for their contents, the checklist items. Plus, there was no properly directed relationship between the checklists and their items.

To address this, OneFS 9.10 condenses these into a single table view, where each checklist row can be expanded to make its associated items visible. For example, the expanded CELOG checklist and contents in the following:

Moving to a single table format has also enabled the addition of keyword search functionality. As the desired search string is entered into the search box, the WebUI automatically expands and collapses rows to make the matching content visible. This allows the admin to quickly drill down into their checks of interest, and then easily run the full checklist – or just individual items themselves. For example, searching for ‘quota’ reveals the following related items within the ‘auth’ and ‘basic’ checklists:

Additionally, the email settings button for each healthcheck are now more apparent, intuitive, and accessible, offering either default or custom distribution list options:

For ‘evaluations’, the enhanced Healthcheck dashboard in OneFS 9.10 clearly displays the current healthcheck status and results on the landing page. As such, navigating to Cluster Management > Healthchecks now provides a single screen synopsis of the real-time health of the cluster. For example:

In addition to a keyword search option, this view can also be filtered by the ‘latest’ evaluation, or ‘all’ evaluations.

Under the ‘Actions’ field, the ‘More’ dropdown allows logs to be easily gathered and/or downloaded:

If a log gather is selected, its progress is reported is the ‘status’ field for the associated check. For example:

Clicking the ‘view details’ button for a particular failed checklist opens up a pane with both ‘passed’ and ‘failed’ items:

The ‘passed items’ tab provides details on the specific check(s) that were successfully completed (or unsupported) in the evaluation run.

Similarly, the ‘failed items’ tab displays the unsuccessful check(s) with their error description. For example, the following job engine healthcheck, notifying of LIN-based jobs and suggesting remediation steps:

In this case, even though 260 of the checklist items have passed and only 1 has failed, the overall status for the ‘basic’ checklist is ‘failed’.

The ‘export’ drop-down allows the healthcheck error details to be exported for further analysis as either a CSV or JSON file. For example:

Similarly, the OneFS 9.10 CLI also has a ‘format’ option for exporting healthcheck evaluations. However, unlike the WebUI, the command line options include a list and table format, in addition to CSV and JSON. As such, the 9.10 Healthcheck export options can be summarized as follows:

Export Format CLI WebUI
CSV x x
JSON x x
List x
Table x

The CLI syntax for specifying the export format is as follows:

# isi healthcheck evaluations view <id> --format <csv | json | list | table>

For example, to limit the view to one basic evaluation, in table format, and without the header and footer:

# isi healthcheck evaluations view basic20250304T1105 --format table --limit 1 --no-header --no-footer

basic20250304T1105 basic -    Completed Fail

WARNING    75  [NODE   5] port_flapping

 * Network port flapping has been detected at some point in the

   last 24 hours on the following ports mce0, mce1. This can cause

   issues such as memory leaks if not addressed. Contact Dell

   Technologies Support if you are experiencing network issues.

Note that the default output contains failing items for that evaluation only. However, the ‘—verbose’ flag can be included to display all the pass and fail items for that evaluation.

On the platform API (pAPI) front, the following new v21 endpoints have been added in OneFS 9.10:

/21/healthcheck/evaluations

This now includes the ‘format_for_csv_download’ option, and is used to enable CSV download of a healthcheck evaluation.

There’s also a new endpoint to track the status of a log gather in progress:

/21/cluster/diagnostics/gather/status

For example:

# curl -k https://<name>:<Passwd>@localhost:8080/platform/21/cluster/diagnostics/gather/status
{
"gather" :
{
"item" : null,
"path" : "/ifs/data/Isilon_Support/pkg",
"status" :
{
"Active_Status" : "RUNNING",
"Last_Status" : " NOT_RUNNING "
}
}
}

Leave a Reply

Your email address will not be published. Required fields are marked *