How System Admins Can Monitor xiRAID Classic Health with Zabbix

February 7, 2025

Back to all posts

xiRAID is a lightweight yet high-performance and refillable software RAID engine. However, like any software product, its performance and reliability depends not only on the engine configuration but also on the overall system environment and its health. To maximize the potential of xiRAID storage, data center administrators must continuously monitor a wide range of parameters.

The xiRAID engine exposes several properties that enable the assessment of both its internal health and that of the storage devices it manages. By analyzing these metrics, the storage support team can identify and address subtle issues before they escalate.

xiRAID Engine Health

The xiRAID engine provides several parameters that should be monitored:

  • RAID Autostart
    • The RAID autostart parameter remains unchanged unless an administrator intervenes, so monitoring it in isolation is not particularly useful. However, when raid_autostart is set to 0, it indicates that the server is operating in cluster mode, which alters the interpretation of the active parameter for xiRAID storage devices.
    • You can inspect this parameter using the following command: xicli settings cluster show
  • Faulty Count
    • The faulty count parameter increases when the xiRAID engine experiences delays in accessing its underlying block devices. Any change in this parameter should be carefully reviewed.
    • You can inspect this parameter using the following command: xicli drive faulty-count show

xiRAID Storage Device Health

xiRAID storage devices expose several runtime parameters that require monitoring:

  • active
    • A value of false indicates that no physical device is present—only configuration data exists in the system. In a cluster configuration (when the RAID autostart engine parameter is set to 0), a false value may simply denote a passive component in a failover setup. However, if no cluster is configured, a false value warrants further investigation. Additionally, any change in this parameter's value should be closely monitored.
    • Inspection command: xicli raid show
  • config
    • The xiRAID engine requires that RAID storage devices be created with corresponding configuration data saved to a file. If the configuration file is missing, the device exists only within the kernel module and will be lost once the module is unloaded. This scenario demands immediate attention from the storage support team.
    • Inspection command: xicli raid show
  • state
    • This parameter may list several status items (for details, refer to the xiRAID documentation ("Showing RAID state")). The RAID storage device is considered to be operating normally only if the state includes online and, for non-RAID0 configurations, initialized. The presence of any other state values should prompt further examination by the storage support team.
    • Inspection command: xicli raid show
  • wear
    • For each drive, this parameter reflects its wear level, similar to the information provided by SMART data. A value approaching 100% indicates that the drive is nearing failure. If the wear value exceeds the 90% threshold, the drive should be considered for replacement.
    • Inspection command: xicli raid show -e
  • memory_prealloc_conf (implemented in xiRAID 4.2)
    • The presence of this parameter indicates that the configured value for the memory preallocation feature could not be applied to the RAID storage device. This discrepancy may negatively impact the device's performance and should be investigated.
    • Inspection command: xicli raid show -e

Reference Implementation of xiRAID Health Monitoring Model for Zabbix with Zabbix Agent 2

The xiRAID engine is monitored through a comprehensive set of items and triggers:

Name Values Triggers Description
Autostart enabled numeric
1 = autostart
0 = manual start
  This item shows if xiRAID engine activates the defined RAIDs on load. In case of 0, there may be xiRAID pacemaker/corosync cluster activated.
Faulty drives count numeric item != 0:
severity: high
This item shows if there are drives with faulty counter set to value greater than zero.
License status text
valid/trial/expired
item = trial:
severity: information

item = expired:
severity: high
This item shows current license status.
Module state numeric
0 = Not installed
1 = Not loaded
2 = Loaded
item = 0:
severity: information

item = 1:
severity: warning
This item shows if xiRAID module is installed and loaded.
Module version character item has changed:
severity: warning
The item shows the version of the xiRAID engine installed on the host.

Additionally, xiRAID storage devices are managed using Zabbix’s low-level discovery feature, which automatically creates a corresponding set of items and triggers for each defined xiRAID storage device:

Name Values Triggers Description
RAID {#XI_NAME} config presense character
True/False
item = False:
severity: disaster
This item shows if the RAID’s configuration file is present in the file system.
RAID {#XI_NAME} state text   This item is used to create a set of dependent items.
active numeric item = 0 and Autostart enabled = 1:
severity: warning

item has changed and Autostart enabled = 0
 
initing
reconstructing
restriping
numeric item = 1:
severity: warning
 
degraded
needs initialization
needs reconstruction
needs resize
needs restripe
numeric item = 1:
severity: high
 
unrecovered
offline
read only
numeric item = 1:
severity: disaster
 
RAID {#XI_NAME} memory preallocation mismatch numeric
0, 1
item = 1:
severity: warning
This item shows if the preallocated memory size differs from the configured value.
RAID {#XI_NAME} max wear numeric
0-100, 255
item is N/A for all drives:
severity: high

item > {#MAX_WEAR_WARN}:
severity: high
The parameter shows maximum “wear“ state value for the RAID-joined storage devices in percent (being the data unavailable, the item value is 255%).

For xiRAID, we have developed and tested a template using Zabbix version 7, together with xiRAID we use it in our internal infrastructure.

The Xinnor team is available to provide examples of the Zabbix configuration and template files upon request. The following file examples have already been developed:

  • Data Gathering Module
  • Zabbix Agent Configuration File
  • Zabbix Template