xiRAID is a lightweight yet high-performance and refillable software RAID engine. However, like any software product, its performance and reliability depends not only on the engine configuration but also on the overall system environment and its health. To maximize the potential of xiRAID storage, data center administrators must continuously monitor a wide range of parameters.
The xiRAID engine exposes several properties that enable the assessment of both its internal health and that of the storage devices it manages. By analyzing these metrics, the storage support team can identify and address subtle issues before they escalate.
xiRAID Engine Health
The xiRAID engine provides several parameters that should be monitored:
-
RAID Autostart
- The RAID autostart parameter remains unchanged unless an administrator intervenes, so monitoring it in isolation is not particularly useful. However, when raid_autostart is set to 0, it indicates that the server is operating in cluster mode, which alters the interpretation of the active parameter for xiRAID storage devices.
- You can inspect this parameter using the following command: xicli settings cluster show
-
Faulty Count
- The faulty count parameter increases when the xiRAID engine experiences delays in accessing its underlying block devices. Any change in this parameter should be carefully reviewed.
- You can inspect this parameter using the following command: xicli drive faulty-count show
xiRAID Storage Device Health
xiRAID storage devices expose several runtime parameters that require monitoring:
-
active
- A value of false indicates that no physical device is present—only configuration data exists in the system. In a cluster configuration (when the RAID autostart engine parameter is set to 0), a false value may simply denote a passive component in a failover setup. However, if no cluster is configured, a false value warrants further investigation. Additionally, any change in this parameter's value should be closely monitored.
- Inspection command: xicli raid show
-
config
- The xiRAID engine requires that RAID storage devices be created with corresponding configuration data saved to a file. If the configuration file is missing, the device exists only within the kernel module and will be lost once the module is unloaded. This scenario demands immediate attention from the storage support team.
- Inspection command: xicli raid show
-
state
- This parameter may list several status items (for details, refer to the xiRAID documentation ("Showing RAID state")). The RAID storage device is considered to be operating normally only if the state includes online and, for non-RAID0 configurations, initialized. The presence of any other state values should prompt further examination by the storage support team.
- Inspection command: xicli raid show
-
wear
- For each drive, this parameter reflects its wear level, similar to the information provided by SMART data. A value approaching 100% indicates that the drive is nearing failure. If the wear value exceeds the 90% threshold, the drive should be considered for replacement.
- Inspection command: xicli raid show -e
-
memory_prealloc_conf (implemented in xiRAID 4.2)
- The presence of this parameter indicates that the configured value for the memory preallocation feature could not be applied to the RAID storage device. This discrepancy may negatively impact the device's performance and should be investigated.
- Inspection command: xicli raid show -e
Reference Implementation of xiRAID Health Monitoring Model for Zabbix with Zabbix Agent 2
The xiRAID engine is monitored through a comprehensive set of items and triggers:
Name | Values | Triggers | Description |
---|---|---|---|
Autostart enabled |
numeric 1 = autostart 0 = manual start |
This item shows if xiRAID engine activates the defined RAIDs on load. In case of 0, there may be xiRAID pacemaker/corosync cluster activated. | |
Faulty drives count | numeric | item != 0: severity: high |
This item shows if there are drives with faulty counter set to value greater than zero. |
License status | text valid/trial/expired |
item = trial: severity: information item = expired: severity: high |
This item shows current license status. |
Module state | numeric 0 = Not installed 1 = Not loaded 2 = Loaded |
item = 0: severity: information item = 1: severity: warning |
This item shows if xiRAID module is installed and loaded. |
Module version | character | item has changed: severity: warning |
The item shows the version of the xiRAID engine installed on the host. |
Additionally, xiRAID storage devices are managed using Zabbix’s low-level discovery feature, which automatically creates a corresponding set of items and triggers for each defined xiRAID storage device:
Name | Values | Triggers | Description |
---|---|---|---|
RAID {#XI_NAME} config presense | character True/False |
item = False: severity: disaster |
This item shows if the RAID’s configuration file is present in the file system. |
RAID {#XI_NAME} state | text | This item is used to create a set of dependent items. | |
active | numeric | item = 0 and Autostart enabled = 1: severity: warning item has changed and Autostart enabled = 0 |
|
initing reconstructing restriping |
numeric | item = 1: severity: warning |
|
degraded needs initialization needs reconstruction needs resize needs restripe |
numeric | item = 1: severity: high |
|
unrecovered offline read only |
numeric | item = 1: severity: disaster |
|
RAID {#XI_NAME} memory preallocation mismatch | numeric 0, 1 |
item = 1: severity: warning |
This item shows if the preallocated memory size differs from the configured value. |
RAID {#XI_NAME} max wear | numeric 0-100, 255 |
item is N/A for all drives: severity: high item > {#MAX_WEAR_WARN}: severity: high |
The parameter shows maximum “wear“ state value for the RAID-joined storage devices in percent (being the data unavailable, the item value is 255%). |
For xiRAID, we have developed and tested a template using Zabbix version 7, together with xiRAID we use it in our internal infrastructure.
The Xinnor team is available to provide examples of the Zabbix configuration and template files upon request. The following file examples have already been developed:
- Data Gathering Module
- Zabbix Agent Configuration File
- Zabbix Template