When it comes to managing replication bandwidth in OneFS, SyncIQ allows cluster admins to configure reservations on a per-policy basis, thereby permitting fine-grained bandwidth control.
SyncIQ attempts to satisfy these reservation requirements based on what is already running and on the existing bandwidth rules and schedules. If a policy doesn’t have a specified reservation, its bandwidth is allocated from the reserve specified in the global configuration. If there is insufficient bandwidth available, SyncIQ will evenly divide the resources across all running policies until they reach the requested reservation. The salient goal here is to prevent starvation of policies.
Under the hood, each PowerScale node has a SyncIQ scheduler process running, which is responsible for launching replication jobs, creating the initial job directory, and updating jobs in response to any configuration changes. The scheduler also launches a coordinator process, which manages bandwidth throttling, in addition to overseeing the replication worker processes, snapshot management, report generation, target monitoring, and work allocation.
Component | Process | Description |
Scheduler | isi_migr_sched | The SyncIQ scheduler processes (isi_migr_sched) are responsible for the initialization of data replication jobs. The scheduler processes monitor the SyncIQ configuration and source record files for updates and reloads them whenever changes are detected in order to determine if and when a new job should be started. In addition, once a job has started, one of the schedulers will create a coordinator process (isi_migrate) responsible for the creation and management of the worker processes that handle the actual data replication aspect of the job.
The scheduler processes also creates the initial job directory when a new job starts. In addition, they are responsible for monitoring the coordinator process and restarting it if the coordinator crashes or becomes unresponsive during a job. The scheduler processes are limited to one per node. |
Coordinator | isi_migrate | The coordinator process (isi_migrate) is responsible for the creation and management of worker processes during a data replication job. In addition, the coordinator is responsible for:
Snapshot management: Takes the file system snapshots used by SyncIQ, keeps them locked while in use, and deletes them once they are no longer needed. Writing reports: Aggregates the job data reported from the workers and writes it to /ifs/.ifsvar/modules/tsm/sched/reports/ Bandwidth throttling Managing target monitor (tmonitor) process |
Bandwidth Throttler | isi_migr_bandwidth | The bandwidth host (isi_migr_bandwidth) provides rationing information to the coordinator in order to regulate the job’s bandwidth usage. |
Pworker | isi_migr_pworker | Primary worker processes on the source cluster, responsible for handling and transferring cluster data while a replication job runs. |
Sworker | isi_migr_sworker | Secondary worker processes on the target cluster, responsible for handling and transferring cluster data while a replication job runs. |
Tmonitor | The coordinator process contacts the sworker daemon on the target cluster, which then forks off a new process to become the tmonitor. The tmonitor process acts as a target-side coordinator, providing a list of target node IP addresses to the coordinator, communicating target cluster changes (such as the loss or addition of a node), and taking target-side snapshots when necessary. Unlike a normal sworker process, the tmonitor process does not directly participate in any data transfer duties during a job. |
These running processes can be viewed from the CLI as follows:
# ps -auxw | grep -i migr root 493 0.0 0.0 62764 39604 - Ss Mon06 0:01.25 /usr/bin/isi_migr_pworker root 496 0.0 0.0 63204 40080 - Ss Mon06 0:03.99 /usr/bin/isi_migr_sworker root 499 0.0 0.0 39612 22148 - Is Mon06 2:41.30 /usr/bin/isi_migr_sched root 523 0.0 0.0 44692 26396 - Ss Mon06 0:24.47 /usr/bin/isi_migr_bandwidth root 49726 0.0 0.0 63944 41224 - D Thu06 0:42.04 isi_migr_sworker: onefs1.zone1-zone2 (isi_migr_sworker) root 49801 0.0 0.0 63564 40992 - S Thu06 1:21.84 isi_migr_pworker: zone1-zone2 (isi_migr_pworker)
Global Bandwidth Reservation can be configured from the OneFS WebUI by browsing to Data Protection > SyncIQ > Performance Rules, or from CLI using the ‘isi sync rules’ command. Bandwidth limits are typically configured and associated with a schedule, creating a limit for the sum of all policies and applying a schedule. For example:
The newly created rule is displayed as follows:
Global bandwidth is applied as a combined limit of policies, allowing for a reservation configuration per policy. The recommended practice is to set a bandwidth reservation for each policy.
Per-policy bandwidth reservation can be configured via the OneFS CLI as follows:
- Configure one or more bandwidth rules:
# isi sync rules
- For each policy, configure desired bandwidth amount to reserve:
# isi sync policy <create | modify> --bandwidth-reservation=#
- Optionally, specify global configuration defaults:
# isi sync settings modify --bandwidth-reservation-reserve-percentage=# # isi sync settings modify --bandwidth-reservation-reserve-absolute=# # isi sync settings modify --clear-bandwidth-reservation-reserve
These settings relate to how much bandwidth should be allocated to policies that do not have a reservation
By default, there is a 1% percentage reserve. Bandwidth calculations are based on the bandwidth rule that is set, not on actual network conditions. If a policy does not have a specified reservation, resources are allocated from the reserve defined in the global configuration settings.
If there is insufficient bandwidth available for all policies to get their requested amounts, the bandwidth is evenly split across all running policies until they reach their requested reservation. This effectively ensures that the policies with the lowest requirements will reach their reservation before policies with larger reservations, helping to prevent bandwidth starvation.
For example, take the following three policies:
Total of 15 Mb/s bandwidth | ||
Policy | Requested | Allocated |
Policy 1 | 10 Mb/s | 5 Mb/s |
Policy 2 | 20 Mb/s | 5 Mb/s |
Policy 3 | 30 Mb/s | 5 Mb/s |
All three policies equally share the available 15 Mb/s of bandwidth (5 Mb/s each):
Say that the total bandwidth allocation in the scenario above is increased from 15 Mb/s to 40 Mb/s:
Total of 40 Mb/s bandwidth | ||
Policy | Requested | Allocated |
Policy 1 | 10 Mb/s | 10 Mb/s |
Policy 2 | 20 Mb/s | 15 Mb/s |
Policy 3 | 30 Mb/s | 15 Mb/s |
The lowest reservation rule, policy 1, now receives its full allocation of 10 Mb/s, and the two other policies split the remaining bandwidth (15 Mb/s each).
There are several tools to aid comprehending and troubleshooting SyncIQ’s bandwidth allocation. For example, the following command will display the SyncIQ policy configuration:
# isi sync policies list Name Path Action Enabled Target ------------------------------------------------------ policy1 /ifs/data/zone1 copy Yes onefs-trgt1 policy2 /ifs/data/zone3 copy Yes onefs-trgt2 ------------------------------------------------------
# isi sync policies view <name> # isi sync policies view zone1-zone2 ID: ce0cbbba832e60d7ce7713206f7367bb Name: policy1 Path: /ifs/data/zone1 Action: copy Enabled: Yes Target: onefs-trgt1 Description: Check Integrity: Yes Source Include Directories: - Source Exclude Directories: /ifs/data/zone1/zone4 Source Subnet: - Source Pool: - Source Match Criteria: - Target Path: /ifs/data/zone2/zone1_sync Target Snapshot Archive: No Target Snapshot Pattern: SIQ-%{SrcCluster}-%{PolicyName}-%Y-%m-%d_%H-%M-%S Target Snapshot Expiration: Never Target Snapshot Alias: SIQ-%{SrcCluster}-%{PolicyName}-latest Sync Existing Target Snapshot Pattern: %{SnapName}-%{SnapCreateTime} Sync Existing Snapshot Expiration: No Target Detect Modifications: Yes Source Snapshot Archive: No Source Snapshot Pattern: Source Snapshot Expiration: Never Snapshot Sync Pattern: * Snapshot Sync Existing: No Schedule: when-source-modified Job Delay: 10m Skip When Source Unmodified: No RPO Alert: - Log Level: trace Log Removed Files: No Workers Per Node: 3 Report Max Age: 1Y Report Max Count: 2000 Force Interface: No Restrict Target Network: No Target Compare Initial Sync: No Disable Stf: No Expected Dataloss: No Disable Fofb: No Disable File Split: No Changelist creation enabled: No Accelerated Failback: No Database Mirrored: False Source Domain Marked: False Priority: high Cloud Deep Copy: deny Bandwidth Reservation: - Last Job State: running Last Started: 2022-03-15T11:35:39 Last Success: 2022-03-15T11:35:39 Password Set: No Conflicted: No Has Sync State: Yes Source Certificate ID: Target Certificate ID: OCSP Issuer Certificate ID: OCSP Address: Encryption Cipher List: Encrypted: No Linked Service Policies: - Delete Quotas: Yes Disable Quota Tmp Dir: No Ignore Recursive Quota: No Allow Copy Fb: No
Bandwidth Rules can be viewed via the CLI as follows:
# isi sync rules list ID Enabled Type Limit Days Begin End ------------------------------------------------------- bw-0 Yes bandwidth 50000 kbps Mon-Fri 08:00 18:00 ------------------------------------------------------- Total: 1 # isi sync rules view bw-0 ID: bw-0 Enabled: Yes Type: bandwidth Limit: 50000 kbps Days: Mon-Fri Schedule Begin: 08:00 End: 18:00 Description:
Additionally, the following CLI command will show the global SyncIQ unallocated reserve settings
# isi sync settings view Service: on Source Subnet: - Source Pool: - Force Interface: No Restrict Target Network: No Tw Chkpt Interval: - Report Max Age: 1Y Report Max Count: 2000 RPO Alerts: Yes Max Concurrent Jobs: 50 Bandwidth Reservation Reserve Percentage: 1 Bandwidth Reservation Reserve Absolute: - Encryption Required: Yes Cluster Certificate ID: OCSP Issuer Certificate ID: OCSP Address: Encryption Cipher List: Renegotiation Period: 8H Service History Max Age: 1Y Service History Max Count: 2000 Use Workers Per Node: No