Drive Fail and Degraded Mode

A RAID supports user IO if one or more drives have failed or been disconnected. As soon as a drive cannot be used for IO, its state is changed to ‘Offline’ and the RAID state is changed to ‘Degraded’. If too many drives fail and the RAID cannot support IO, its state is changed to ‘Offline’. When the RAID goes ‘Offline’, the engine logic immediately interrupts user IO to minimize potential data damage. The RAID engine ensures data consistency except for IO packets being processed when the RAID goes ‘Offline’. These IO requests are marked as failed to the client, indicating that the data was not written. The corresponding data areas on disks may contain “old” data, “new” data or a mix of old and new data if the IO packet size is larger than a single RAID chunk size and spans multiple disks.

RAID stripes impacted by failed IO data packets may suffer from damaged parity consistency, because some stripe chunks are updated while others, including the parity area, may not be correctly updated. This scenario is referred to as a “write hole” situation.

If the RAID is marked as ‘Degraded’, follow these steps:

  1. Reinsert the physical device hosting the drive. If the disconnected device is not damaged it is strongly recommended to reinsert the devices used in the RAID instead of replacing them with new ones. If the device is damaged, replace it with a new device hosting a drive of the same or larger size. If too many logical drives are replaced by new ones, the RAID can lose too many data drives and be switched to ‘Unrecoverable’ state, resulting in the loss of user data.
  2. Attach all devices that are reinserted by using the Device Manager. The device state can be shown using the xnr_cli drive-manager show command. Correctly attached drive shall report SPDK status. If a device is attached its drives are visible to the xiRAID engine as BDEVs and can be listed using the xnr_cli bdev show command. However, the corresponded logical RAID drives may still be Offline that means the BDEVs are not connected to the RAID logically.
  3. If a drive is replaced by new one, clean it using the command xnr_cli bdev zero --bdev <bdev_name>, otherwise the new drive can be considered as foreign drive (used by another RAID or application) and will be declined to insert to the RAID. If the same drive used in the RAID is reinserted then do NOT zero it because it contains valid data and partial reconstruction can be applied instead of the full reconstruction algorithm.
  4. Replace the Offline logical drive in the RAID with the re-inserted and re-attached drive using the command xnr_cli raid replace --name xnraid --position 1 --bdev 0000:06:0b.0n1. Repeat the operation for all re-attached drives. The drive state reported by the “raid show” command should change to Online.
  5. If RAID is Offline (due too many drives failed and being re-inserted), unload the RAID and restore it using the commands:
    xnr_cli raid unload --name xnraid
    xnr_cli raid restore --name xnraid
  6. Start the RAID reconstruction to recover data from temporary disconnected disks and address the “write hole” stripe situation. The reconstruction process also fills new disks with actual data. Use the command:

    xnr_cli raid recon start --name xnraid

When replacing drives, ensure you insert one new drive for RAID 5 (or for each group of RAID 50), two drives for RAID 6 (or each group of RAID 60), and up to three drives for RAID 7 (or each group of RAID 70). Do not replace additional drives while the reconstruction process is in progress, as this can lead to data loss.

If there is no IO in progress and disks are disconnected and the same disks are reinserted back, the RAID can switch from a Degraded (or Degraded and Offline) state back to Online without the need for reconstruction.