Drive I/O Error Counter

You can keep track of drives where I/O errors (faults) have started to appear so that you can replace such drives with healthy ones in a timely manner.

Tip:

We recommend setting up email notifications (to learn more, see the Setting up Email Notifications) chapter to trace drives with I/O errors.

Fault threshold is the common number of faults for each drive, above which the drive will be removed from the RAID (marked as 'missing') or replaced with a suitable drive from the spare pool. You can set the fault threshold value in the range from 1 to 1000 using the command xicli settings faulty-count modify -t. If you change the fault threshold value, the current number of faults on the drives is reset.

When a drive is removed from a RAID because the fault threshold is exceeded:

if the RAID has a SparePool with the suitable drive, the removed drive will be replaced and then the RAID reconstruction will start;
if the removed drive has not been replaced in the RAID (automatically or manually), the drive will return in the RAID after resetting the current number of faults on that drive;
the drive clean command applied to the removed drive resets the current number of faults and does not remove metadata from the drive.

To manage the threshold value of I/O errors for all drives, run

# xicli settings faulty-count modify <arg>

Attention:

When you change any parameter of the xicli settings faulty-count modify command, the xiraid-scanner.service restarts.

Table 1. Argument for the faulty-count modify subcommand
Required argument
-t	--threshold	The threshold value for all drives. If you set a new fault threshold value, the current numbers of faults are reset for all the drives. Possible values: integers from `1` to `1000`. The default: `3`.

Example: Set the drive fault threshold value to 10:

# xicli settings faulty-count modify -t 10

To show the threshold value of I/O errors, run

# xicli settings faulty-count show

Table 2. Argument for the faulty-count show subcommand
Optional argument
-f	--format	Output format: `table`; `json`; `prettyjson` – human-readable json. The default: `table`.

To reset the current numbers of faults for drives, run

# xicli drive faulty-count reset <arg>

Warning:

The RAID that contains the drive must be loaded.

When you change any parameter of the xicli drive faulty-count reset command, the xiraid-scanner.service restarts.

Table 3. Arguments for the faulty-count reset subcommand
Required argument
-d	--drives	The list of block devices (/dev/sd, /dev/mapper/mpath, /dev/nvme, /dev/dm-) separated by a space to reset their current numbers of faults.

Example: reset current values of fault count for drives /dev/sda, /dev/sdb, /dev/sdd:

# xicli drive faulty-count reset -d /dev/sd[a-b] /dev/sdd

To show the current numbers of faults for drives, run

# xicli drive faulty-count show [optional_args]

Table 4. Arguments for the faulty-count show subcommand
Mutually exclusive optional arguments
-n	--name	The RAID name for which drives the current number of faults will be shown. If neither of the two arguments is specified, show the values for all drives.
-d	--drives	The list of block devices (/dev/sd, /dev/mapper/mpath, /dev/nvme, /dev/dm-) separated by a space to show their current numbers of faults. If neither of the two arguments is specified, show the values for all drives.
Optional argument
-f	--format	Output format: `table`; `json`; `prettyjson` – human-readable json. The default: `table`.

Example: show current values of fault count for drives /dev/sda, /dev/sdb, /dev/sdd:

# xicli drive faulty-count show -d /dev/sd[a-b] /dev/sdd