Resync

A RAID contains of set of stripes. Each stripe has several data chunks and one or several parity chunks. When writing data to the stripe, the RAID engine updates some data chunks and recalculate the parity chunks resulting in several IOs to the RAID backing device. So, write operation is not atomic. If a write operation is interrupted by xiRAID engine restart, a driver failure, the server OS or hardware restart or the power down, the stripe can be logically broken and the parity chunks may not correspond to the data chunks. This scenario is known as a “write hole”.

When multiple threads are running IO operations with some IO depth, many stripes can be impacted by such interruptions. Interrupted IO operations can be completed to the client with an error or be incomplete. The RAID logic has no mechanism to recover data to “old” or “new” state and the client should perform data recovery if an IO operation is completed with an error or times out.

The RAID logic is designed to recover broken stripes' parity to restore the stripe to operational state, preparing for potential disk failures and reconstruction of the broken and replaced disk.

xiRAID Opus engine detects unexpected RAID down events and mark such RAID as dirty at next restart. The “resync” logic is implemented to locate and repair all broken stripes by recovering their parity. If a RAID is 'dirty,' the initialization restarts automatically during the next RAID restart, unless it is reconstructing or ‘Degraded’. The “resync” logic is designed as an additional initialization cycle and the RAID state reports as Initializing if the “resunc” is in progress.

Similar to regular initialization process, the resync initialization can be stopped and restarted using the commands:
xnr_cli raid init stop --name xnraid
xnr_cli raid init start --name xnraid
The resync logic can be disabled or re-enabled by setting resync_enabled configuration parameter to False or True respectively:
xnr_cli config set --name resync_enabled --value False

Current resync mode can be displayed using the command

xnr_cli config get --name resync_enabled

If resync is disabled then possible dirty RAID event will be lost. In this case the RAID can get broken stipes and this cannot be detected later.