Drives

Manual Drive Replacement or Excluding

To exclude or replace a drive in a RAID, run:

# xicli raid replace <args>
Tip:

If you manually replace a drive that is a part of a spare pool, the drive excludes from the spare pool.

Table 1. Arguments for the replace subcommand

Required arguments

-n

--name

The name of the RAID.

-no

--number

The number of the drive.

To find out the number of the drive, use

# xicli raid show

-d

--drive

The new block device.

To remove the drive (to mark it as missing) set the null value.

Example: In the RAID ”media5”, replacing the drive ”0” with the drive ”nvme4n1”:

  1. Mark the drive “0” as “missing”:

    # xicli raid replace -n media5 -no 0 -d null
  2. Replace the drive ”0” with the drive ”nvme4n1”:

    # xicli raid replace -n media5 -no 0 -d /dev/nvme4n1

Automatic Drive Replacement

A drive can be automatically replaced after it

  • was physically removed from a RAID;
  • exceeded the threshold value of wear (90%);
  • exceeded the threshold value of I/O errors (3).

To automatically replace drives on a RAID, create a spare pool, then assign the created spare pool to the RAID. You can only assign one spare pool to each RAID. We recommend creating a sparepool with storage devices of the same type.

If the system has a spare pool, you can assign it to an existing RAID or when creating a new RAID.

Commands for managing spare pools

To add drive(s) to the spare pool, run

# xicli pool add <args>
Table 2. Arguments for the add subcommand

Required arguments

-n

--name

The name of the spare pool.

-d

--drives

The list of block devices (/dev/sd*, /dev/mapper/mpath*, /dev/nvme*, /dev/dm-*) separated by a space.

To create the spare pool, run

# xicli pool create <args>
Table 3. Arguments for the create subcommand

Required arguments

-n

--name

The name for the spare pool.

-d

--drives

The list of block devices (/dev/sd*, /dev/mapper/mpath*, /dev/nvme*, /dev/dm-*) separated by a space.

To delete the spare pool, run

# xicli pool delete <arg>
Table 4. Argument for the delete subcommand

Required argument

-n

--name

The name of the spare pool.

To remove drive(s) from the spare pool, run

# xicli pool remove <args>
Table 5. Arguments for the remove subcommand

Required arguments

-n

--name

The name of the spare pool.

-d

--drives

The list of block devices (/dev/sd*, /dev/mapper/mpath*, /dev/nvme*, /dev/dm-*) separated by a space.

To show info on the spare pool, run

# xicli pool show [optional_args]
Table 6. Arguments for the show subcommand

Optional arguments

-n

--name

The name of the spare pool.

Without the argument, show info on all spare pools.

-f

--format

Output format:

  • table;
  • json;
  • prettyjson – human-readable json.

The default: table.

-u

--units

Size units:

  • s – sectors (1 sector=512 bytes);
  • k – kilobytes;
  • m – megabytes;
  • g – gigabytes.

The default: g.

xicli pool show output example

Possible drive states:

  • ready – the drive is able for replacement;
  • absent – drive is missing in the system;
  • failed – attempt to replace with this drive from the spare pool failed, the drive will not be used for replacement.

To manage delay timer for the drive replacement from the spare pools, run

# xicli settings pool modify <arg>
Table 7. Argument for the pool modify subcommand

Required argument

-rd

--replace_delay

Delay time (in seconds) for the drive replacement from the spare pools.

Only one delay time is used for all the spare pools.

Possible values: integers from 1 to 3600.

The default: 180.

To show delay time used for the drive replacement from the spare pools, run

# xicli settings pool show
Table 8. Argument for the pool show subcommand

Optional argument

-f

--format

Output format:

  • table;
  • json;
  • prettyjson – human-readable json.

The default: table.

Example: Creating a sparepool “pool1” and assigning it to the RAID “media5”:
  1. Create a sparepool:

    # xicli pool create -n pool1 -d /dev/sda /dev/sdb
  2. Assign the created sparepool to the RAID:

    # xicli raid modify -n media5 -sp pool1

Example: Setting the replacement timer for the sparepools to 60 seconds:

# xicli settings pool modify -rd 60

Drive I/O Error Counter

You can keep track of drives where I/O errors (faults) have started to appear so that you can replace such drives with healthy ones in a timely manner.

Tip:

We recommend setting up email notifications (to learn more, see the Setting up Email Notifications) chapter to trace drives with I/O errors.

Fault threshold is the common number of faults for each drive, above which the drive will be removed from the RAID or replaced with a suitable drive from the spare pool. You can set the fault threshold value in the range from 1 to 1000. If you change the fault threshold value, the current number of faults on the drives is reset.

When a drive is removed from a RAID because the fault threshold is exceeded:

  • if the RAID has a SparePool with the suitable drive, the removed drive will be replaced and then the RAID reconstruction will start;
  • if the removed drive has not been replaced in the RAID (automatically or manually), the drive will return in the RAID after resetting the current number of faults on that drive;
  • the drive clean command applied to the removed drive resets the current number of faults and does not remove metadata from the drive.

To manage the threshold value of I/O errors for all drives, run

# xicli settings faulty-count modify <arg>
Table 9. Argument for the faulty-count modify subcommand

Required argument

-t

--threshold

The threshold value for all drives.

If you set a new fault threshold value, the current numbers of faults are reset for all the drives.

Possible values: integers from 1 to 1000.

The default: 3.

Example: Set the drive fault threshold value to 10:

# xicli settings faulty-count modify -t 10

To show the threshold value of I/O errors, run

# xicli settings faulty-count show
Table 10. Argument for the faulty-count show subcommand

Optional argument

-f

--format

Output format:

  • table;
  • json;
  • prettyjson – human-readable json.

The default: table.

To reset the current numbers of faults for drives, run
# xicli drive faulty-count reset <arg>
Table 11. Arguments for the faulty-count reset subcommand

Required argument

-d

--drives

The list of block devices (/dev/sd*, /dev/mapper/mpath*, /dev/nvme*, /dev/dm-*) separated by a space to reset their current numbers of faults.

Example: reset current values of fault count for drives /dev/sda, /dev/sdb, /dev/sdd:

# xicli drive faulty-count reset -d /dev/sd[a-b] /dev/sdd

To show the current numbers of faults for drives, run

# xicli drive faulty-count show [optional_args]
Table 12. Arguments for the faulty-count show subcommand

Mutually exclusive optional arguments

-n

--name

The RAID name for which drives the current number of faults will be shown.

If neither of the two arguments is specified, show the values for all drives.

-d

--drives

The list of block devices (/dev/sd*, /dev/mapper/mpath*, /dev/nvme*, /dev/dm-*) separated by a space to show their current numbers of faults.

If neither of the two arguments is specified, show the values for all drives.

Optional argument

-f

--format

Output format:

  • table;
  • json;
  • prettyjson – human-readable json.

The default: table.

Example: show current values of fault count for drives /dev/sda, /dev/sdb, /dev/sdd:

# xicli drive faulty-count show -d /dev/sd[a-b] /dev/sdd

Removing Drive Metadata and Resetting Current Error Count

Warning:

Warning! The result of the command is irreversible. Read the description carefully.

Metadata is Xinnor xiRAID device configuration information (to learn more, see Configuration Files and Metadata).

The drive clean command resets the current error counter value and/or removes metadata from selected disks depending on the status and state of those disks.

The drive clean command resets the current error counter and doesn’t delete the metadata:

  • on a disk that was removed from a RAID due to exceeding the I/O error threshold.

    To remove the metadata from such disks, add a new disk to the RAID to replace the removed one.

  • on a disk included in a RAID that is present in the current configuration file.

To remove metadata from the drives and/or reset their current error count, run:

# xicli drive clean <arg>
Table 13. Argument for the clean subcommand

Required argument

-d

--drives

The list of block devices (/dev/sd*, /dev/mapper/mpath*, /dev/nvme*, /dev/dm-*) separated by a space to reset the current fault counter and/or delete the metadata.

Example: Deleting metadata from drives “/dev/nvme5n1” and “/dev/nvme1n1”:

# xicli drive clean -d /dev/nvme1n1 /dev/nvme5n1