Drive I/O Error Counter

You can keep track of drives where I/O errors (faults) have started to appear so that you can replace such drives with healthy ones in a timely manner.

Tip:

We recommend setting up email notifications (to learn more, see the Setting up Email Notifications) chapter to trace drives with I/O errors.

Fault threshold is the common number of faults for each drive, above which the drive will be removed from the RAID (marked as 'missing') or replaced with a suitable drive from the spare pool. You can set the fault threshold value in the range from 1 to 1000 using the command xicli settings faulty-count modify -t. If you change the fault threshold value, the current number of faults on the drives is reset.

When a drive is removed from a RAID because the fault threshold is exceeded:

  • if the RAID has a SparePool with the suitable drive, the removed drive will be replaced and then the RAID reconstruction will start;
  • if the removed drive has not been replaced in the RAID (automatically or manually), the drive will return in the RAID after resetting the current number of faults on that drive;
  • the drive clean command applied to the removed drive resets the current number of faults and does not remove metadata from the drive.

To manage the threshold value of I/O errors for all drives, run

# xicli settings faulty-count modify <arg>
Attention:

When you change any parameter of the xicli settings faulty-count modify command, the xiraid-scanner.service restarts.

Table 1. Argument for the faulty-count modify subcommand

Required argument

-t

--threshold

The threshold value for all drives.

If you set a new fault threshold value, the current numbers of faults are reset for all the drives.

Possible values: integers from 1 to 1000.

The default: 3.

Example: Set the drive fault threshold value to 10:

# xicli settings faulty-count modify -t 10

To show the threshold value of I/O errors, run

# xicli settings faulty-count show
Table 2. Argument for the faulty-count show subcommand

Optional argument

-f

--format

Output format:

  • table;
  • json;
  • prettyjson – human-readable json.

The default: table.

To reset the current numbers of faults for drives, run
# xicli drive faulty-count reset <arg>
Warning:

The RAID that contains the drive must be loaded.

When you change any parameter of the xicli drive faulty-count reset command, the xiraid-scanner.service restarts.

Table 3. Arguments for the faulty-count reset subcommand

Required argument

-d

--drives

The list of block devices (/dev/sd*, /dev/mapper/mpath*, /dev/nvme*, /dev/dm-*) separated by a space to reset their current numbers of faults.

Example: reset current values of fault count for drives /dev/sda, /dev/sdb, /dev/sdd:

# xicli drive faulty-count reset -d /dev/sd[a-b] /dev/sdd

To show the current numbers of faults for drives, run

# xicli drive faulty-count show [optional_args]
Table 4. Arguments for the faulty-count show subcommand

Mutually exclusive optional arguments

-n

--name

The RAID name for which drives the current number of faults will be shown.

If neither of the two arguments is specified, show the values for all drives.

-d

--drives

The list of block devices (/dev/sd*, /dev/mapper/mpath*, /dev/nvme*, /dev/dm-*) separated by a space to show their current numbers of faults.

If neither of the two arguments is specified, show the values for all drives.

Optional argument

-f

--format

Output format:

  • table;
  • json;
  • prettyjson – human-readable json.

The default: table.

Example: show current values of fault count for drives /dev/sda, /dev/sdb, /dev/sdd:

# xicli drive faulty-count show -d /dev/sd[a-b] /dev/sdd