Reconstruction

If a RAID backend drive fails and replaced by the new disk or the drive is temporarily disconnected and reconnected back after some time the drive data may become invalid and shall be reconstructed from other drives data using RAID recovery algorithms.

A RAID data area is logically divided to slices. If a disk failed or removed the xiRAID Opus logic remembers what slices are modified by user IO. If same disk returns back to the RAID this information allows to identify what slices of temporary disconnected disk are impacted and shall be reconstructed and what slices data is unchanged. If some slices are impacted only the partial recovery algorithm is used for reconstruction. The partial reconstruction logic significantly reduces reconstruction time. This is the reason always trying to reinsert temporary disconnected disk back instead of replacing it with a new disk. In case of replacing the disk with a new one all slices are marked as impacted and full reconstruction logic is used.

Many RAID levels (such as 6 and 7 or group levels 50, 60, 70) operates if more than one backend disk is disconnected. If some disconnected disks returns back (or replaced by new disks) the reconstruction can start and recover replaced disks while other disks are still disconnected.

At RAID 6 and 7 levels if several disks are removed and reinserted back one by one some slices data can be impacted at two or three reconnected disks and other slices data can be impacted at one disk only. In such case the reconstruction logic recovers most impacted slices first and less impacted slices next to minimize risk of loosing user data.

The RAID reconstruction should be initiated manually as soon as the "raid show" command reports the need for reconstruction state for the RAID. This state indicates that there are disks that have been logically inserted into the RAID and require reconstruction.

To start reconstruction, use the command:

xnr_cli raid recon start --name xnraid

The reconstruction process cannot run in parallel to initialization (resync) service. Initialization process is stopped automatically if the RAID is Degraded. If the RAID is degraded and requires initialization because the initialization process was not completed or a resync is requested, follow these steps for the correct recovery sequence:

  1. Insert and attach absent devices by the device manager.
  2. Re-insert (replace) all Offline logical drives by same or new drives
  3. Run the reconstruction process and wait until it is completed.
  4. Once the reconstruction is finished, check if the RAID has stopped reporting Degraded state.
  5. Run initialization (resync) service.

The reconstruction process can be temporarily stopped, which can be useful in reducing the impact of reconstruction IO flow on the user's IO performance and latency. However, it is not recommended because it increases the risk of data loss, and the user's IO performance and latency are not optimal if the RAID is degraded.

To stop the reconstruction process, use the command:

xnr_cli raid recon stop --name xnraid

For TEST PURPOSES ONLY, it is possible to force finish a RAID reconstruction using the command xnr_cli raid recon finish --force --name xnraid. This operation marks the RAID as reconstructed (clearing Degraded and Need recon states), but it does not actually recover the data. It is important to note that this operation results in data loss and corruption. It is strongly NOT RECOMMENDED to force RAID reconstruction.