In today's computing environments, storage performance can become a critical bottleneck that constrains overall system effectiveness. The combination of SanDisk DC SN861 PCIe Gen5 drives with xiRAID Opus software creates an exceptionally powerful storage solution that delivers remarkable performance with minimal resource consumption. This approach allows organizations to dedicate more computing power to working with their data rather than simply accessing it.
This solution is particularly valuable for high-performance computing workloads, real-time analytics, AI/ML data pipelines, media production, and enterprise database applications where storage throughput and IOPS can directly impact business outcomes.
About xiRAID Opus
xiRAID Opus is a next-generation storage solution that integrates local and network-attached drives into a unified, high-performance system. It goes beyond traditional software RAID by offering disaggregated storage capabilities, virtualization integration, and native NVMe-over-Fabrics (NVMe-oF) support—both at the NVMe device level and volume level—via an integrated volume manager.
The system operates through a Linux user-space data path engine that bypasses the traditional kernel I/O stack, thereby eliminating operating system dependencies and reducing latency.
This architecture enables a fault-tolerant storage infrastructure using both local and remote media, delivers optimized performance in virtualized environments via VHOST, and provides network-accessible storage with comprehensive Quality of Service (QoS) controls. By disaggregating storage from compute resources, xiRAID Opus enhances resource utilization and scalability for enterprise, high-performance computing (HPC), and AI/ML workloads.
About SanDisk DC SN861
Engineered for the future of mission-critical workloads, the SanDisk DC SN861SSD is a cutting-edge PCIe Gen5 NVMe SSD that delivers exceptional performance tailored for enterprise applications. With capacity options of up to 7.68TB, the drive is optimized for compute-intensive AI and machine learning environments by offering high-speed random read capabilities and extremely low latency, all while maintaining minimal power consumption to maximize IOPs per Watt. The DC SN861 SSD is also enriched with a robust set of enterprise features such as Flexible Data Placement (FDP), support for OCP 2.0, and integrated safeguards like Power Loss Protection and End-to-End Data Path Protection. This comprehensive feature set makes the DC SN861 ideally suited for hyperscale, cloud, and enterprise data centers that demand both high performance and operational efficiency.
Test Objectives
In modern computing environments, resource efficiency is a critical consideration. Every CPU cycle dedicated to storage management is a cycle unavailable for processing the actual data workloads. Organizations invest in high-performance storage to accelerate their applications, not to consume their computing resources with storage overhead. xiRAID Opus engine, with its user-space architecture that bypasses traditional OS storage stacks, represents a significant advancement in addressing this challenge.
This solution brief evaluates how effectively xiRAID Opus engine can maximize the performance potential of SanDisk DC SN861 PCIe Gen5 drives in a RAID6 configuration, while minimizing system resource utilization.
Our primary objective is to demonstrate maximum highly protected storage performance with xiRAID Opus and SanDisk drives while minimizing resource consumption.
Test Environment
System configuration
Platform: Supermicro AS -2125HS-TNR
CPU: 2 * AMD EPYC 9454 48-Core Processor
RAM: 24 * SAMSUNG M321R4GA3BB6-CQKET
Drives: 8 * SanDisk SDS6BA138PSP9X3
OS: Rocky Linux 9.5
RAID Engine: xiRAID Opus
RAID Configuration
Level: RAID 6
Drives used: 8
Chunk size: 128K
Stripe size: 768K
Performance testing and results
Testing Methodology
To evaluate the performance capabilities of the SanDisk DC SN861 PCIe Gen5 drives with xiRAID Opus, we conducted comprehensive benchmark testing using fio. Complete fio configurations are provided in the Appendix.
In our testing, the term Jobs refers to both the number of fio testing threads and the number of CPU cores dedicated to the xiRAID Opus engine. We deliberately synchronized these values to accurately measure resource efficiency. For example, when testing with numbjobs=4, xiRAID Opus was bound to those same 4 CPU cores with no access to additional system resources. This one-to-one mapping allows us to precisely determine the minimum resources required to achieve optimal performance.
When considering the results presented below, it is important to note that xiRAID Opus runs on the same CPU threads as the FIO plugin. Therefore, the actual resource consumption of xiRAID Opus is significantly lower than the reported values.
For sustained performance testing, we use multiple disk overwrites to precondition the drives by setting fio loops=4: for individual drives, we preconditioned the drive's block device.
The preconditioning process involved overwriting the entire device using block sizes matched to the planned workload:
- Random I/O testing: 4K blocks
- Sequential RAW testing: Chunk size blocks
With preconditioning eliminating performance variables, sustained performance tests were run for 120 seconds.
The number of Jobs presented in the tables below is the optimal one, meaning it provides the best scaling.
It's worth emphasizing that achieving absolute, world-record performance is not always necessary. In many practical scenarios, particularly in production environments, what truly matters is attaining performance that is close to the theoretical maximum, but with efficient use of available resources. Striving for the last few percentage points of performance often comes at a disproportionate cost in terms of complexity, energy consumption, and hardware utilization.
Instead, optimizing for a balanced and sustainable approach—where computational resources are used judiciously and performance remains within an acceptable margin of peak theoretical limits—tends to offer better long-term value. This is especially relevant in constrained environments or large-scale systems, where maximizing throughput per watt, per dollar, or per unit of hardware is far more impactful than achieving record-breaking benchmarks that may not translate into practical benefits. In essence, the goal should often be smart efficiency, not brute-force speed.
RAW Drives Performance Testing
Performance tests of raw drives were conducted in user space.
Single drive
Load | Workload pattern | Results | Avr. Latency (usec) | SanDisk DC SN861 specification baseline |
---|---|---|---|---|
Sequential Write | job=1; iodepth =32 | 7.2 GB/s | 586.07 | 7.2 GB/s |
Sequential Read | job=1; iodepth =32 | 13.6 GB/s | 307.80 | 13.7 GB/s |
Random Write | job=1; iodepth =32 | 362k IOPS | 31.09 | 330k IOPS |
Random Read | job=1; iodepth =64 | 3002k IOPS | 148.78 | 3300k IOPS |
8 drives
Load | Workload pattern | Results | Avr. Latency (usec) |
---|---|---|---|
Sequential Write | job=8 (1 per drive); iodepth =32 | 57.5GB/s | 582.76 |
Sequential Read | job=8 (1 per drive); iodepth =32 | 110GB/s | 305.65 |
Random Write | job=32 (2 per drive); iodepth =32 | 2.8M IOPS | 136.09 |
Random Read | job=56 (7 per drive); iodepth =64 | 24M IOPS | 149.34 |
As we can see from the results of raw drive measurements, they are close to the performance specified in the specification, and we can conclude that the system works correctly and we have no bottlenecks.
xiRAID Opus Performance Tests Results
Sequential tests
All sequential tests were run with a queue depth of 32. The number of jobs/CPU threads were changed for each test.
Theoretical maximum RAID6 sequential write/read performance:
- Sequential Read: 110 GB/s (100% of raw performance)
- Sequential Write: 43 GB/s is calculated taking into account penalties for RAID6 with 2 parity drives (75% of raw performance: 57,5 GB/s x 75% = 43.125 GB/s)
Load | Jobs/CPU threads | Results | Avr. Latency (usec) |
---|---|---|---|
Sequential Write | 1 | 15.9GB/s | 1434.85 |
Sequential Write | 2 | 21.9GB/s | 2114.36 |
Sequential Write | 4 | 39.0GB/s | 2553.06 |
Sequential Write | 8 | 40.5GB/s | 4944.03 |
Sequential Read | 1 | 99.1GB/s | 251.45 |
Sequential Read | 2 | 110GB/s | 913.41 |
Sequential Read | 4 | 110GB/s | 1238.06 |
Sequential Read | 8 | 110GB/s | 1654.11 |
Here is a graph comparing sequential write and read performance with different numbers of CPU threads:

Key observations
As shown in the sequential test results, 90%* of the sequential read performance on RAID (99.1 GB/s), relative to raw drives, is achieved using just a single CPU thread.
*Proportion of theoretical maximum RAID6 sequential read performance:
(99.1GB/s ÷ 110GB/s) × 100% ≈ 90.09%
In contrast, 90%* of the sequential write performance on RAID, relative to raw drives, is achieved with 4 CPU threads.
*Proportion of theoretical maximum RAID6 sequential write performance:
(39GB/s ÷43GB/s) × 100% ≈ 90.7%
Moreover, increasing the number of threads for sequential reading and writing may not significantly improve performance, but it can increase the operation's wait time.
Random tests
In addition to varying the number of jobs for random tests, we also varied the queue depth to demonstrate the correlation between maximum performance values and operation execution time.
Theoretical maximum RAID6 random write/read performance:
- Random Read: 24M IOPS (100% of raw performance)
- Random Write: 955к IOPS (33%* of raw performance due to Read-Modify-Writes: 2896 IOPS x 33% = 955k IOPS).
*In RAID 6, each small random write typically involves 3 write operations (1 for data and 2 for parity) and 3 read operations (to fetch old data and parity). Since read operations are generally faster and less performance-limiting than writes, the impact on write performance is mostly attributed to the 3 writes. As a result, the effective write performance can be approximated as one-third of raw performance, or about 33% (100% ÷ 3).
Random writes results
Load | Jobs/CPU threads | Results (kIOPS) | Avr. Latency (usec) |
---|---|---|---|
Random Write iodepth=32 | |||
Random Write | 1 | 251k | 124.75 |
Random Write | 2 | 378k | 167.23 |
Random Write | 4 | 587k | 216.25 |
Random Write | 8 | 659k | 387.06 |
Random Write | 16 | 693k | 736.69 |
Random Write | 32 | 706k | 1447.78 |
Random write iodepth=64 | |||
Random Write | 1 | 302k | 183.21 |
Random Write | 2 | 533k | 236.70 |
Random Write | 4 | 712k | 365.37 |
Random Write | 8 | 700k | 730.13 |
Random Write | 16 | 721k | 1306.96 |
Random Write | 32 | 716k | 2852.31 |
Here's the graph comparing Random Write performance across different CPU threads for two I/O depths (32 and 64):

Key Observations:
Optimal random write performance is achieved on 4 CPU threads and ranges from 61%* theoretical raid6 maximum random write performance at 32 queue depth and 74%* at 64 queue depth, while maintaining extremely low latency.
At the same time, when the number of CPUs is greater than 8 (16 and 32), the write speed doesn’t change significantly, but the speed of operations execution does.
*Proportion of theoretical maximum RAID6 random write performance:
(587kIOPS ÷ 955kOPS) × 100% ≈ 61.14%
(712kIOPS ÷ 955kOPS) × 100% ≈ 74.26%
Random read results
Load | Jobs/CPU threads | Results | Avr. Latency (usec) |
---|---|---|---|
Random read iodepth=64 | |||
Random Read | 1 | 1090k | 57.40 |
Random Read | 2 | 2153k | 58.17 |
Random Read | 4 | 4156k | 60.30 |
Random Read | 8 | 7444k | 66.99 |
Random Read | 16 | 13000k | 75.32 |
Random Read | 24 | 17900k | 83.83 |
Random read iodepth=128 | |||
Random Read | 1 | 1298k | 96.33 |
Random Read | 2 | 2602k | 96.10 |
Random Read | 4 | 5064k | 98.73 |
Random Read | 8 | 8932k | 112.03 |
Random Read | 16 | 15000k | 128.47 |
Random Read | 24 | 22000k | 136.61 |
Here's the graph comparing Random Read performance across different CPU threads for two I/O depths (64 and 128):

Key Observations:
- Throughput increases with more threads, for iodepth=64. and iodepth=128 liner.
- Latency remains low for iodepth=64.
- A performance of 1090+kIOPS is achieved with only 1 core. At the same time 90% of the theoretical maximum performance is achieved with 24 cores, which is 1/4 of the available cores in the system.
Conclusions
In summary, our measurements on the SanDisk DC SN861 prove that the xiRAID Opus engine delivers performance close to the theoretical maximum that the disks can achieve with minimal resource consumption, even when running in the same context as the FIO measurement program.
xiRAID Opus is engineered to minimize CPU and memory usage while maximizing I/O efficiency, ensuring that you get every last drop of compute and I/O potential out of your hardware.
Running in Linux User Space, xiRAID Opus maintains a low resource footprint, enabling seamless adaptability to future hardware innovations without adding significant overhead. This balance of high performance and low resource consumption makes xiRAID Opus an optimal choice for demanding storage environments.
The SanDisk DC SN861, known for its enterprise-grade reliability, high endurance, and consistent low latency, demonstrates stable performance that does not degrade under high workload conditions. This makes it an ideal device for xiRAID Opus in demanding storage environments.
Appendix
RAW
Preconditioning:
[job1] filename=nvme0n1 [job2] ...
Sequential tests:
[job1] filename=nvme0n1 [job2] ...
Random test:
[job1] filename=nvme0n1 [job2] ...
RAID
Sequential tests:
[global] ioengine=./spdk_bdev spdk_json_conf=./spdk.json thread=1 numjobs=[1,2,4,8...] group_reporting=1 direct=1 verify=0 iodepth=32 rw=[write/read] blocksize=768K time_based=1 runtime=120s[job1] filename=xnraid
Random test:
[job1] filename=xnraid exitall=1