High performance with minimal resource consumption with the xiRAID Opus and Sandisk PCIe Gen5 drives

June 5, 2025

Back to all posts

In today's computing environments, storage performance can become a critical bottleneck that constrains overall system effectiveness. The combination of SanDisk DC SN861 PCIe Gen5 drives with xiRAID Opus software creates an exceptionally powerful storage solution that delivers remarkable performance with minimal resource consumption. This approach allows organizations to dedicate more computing power to working with their data rather than simply accessing it.

This solution is particularly valuable for high-performance computing workloads, real-time analytics, AI/ML data pipelines, media production, and enterprise database applications where storage throughput and IOPS can directly impact business outcomes.

About xiRAID Opus

xiRAID Opus is a next-generation storage solution that integrates local and network-attached drives into a unified, high-performance system. It goes beyond traditional software RAID by offering disaggregated storage capabilities, virtualization integration, and native NVMe-over-Fabrics (NVMe-oF) support—both at the NVMe device level and volume level—via an integrated volume manager.

The system operates through a Linux user-space data path engine that bypasses the traditional kernel I/O stack, thereby eliminating operating system dependencies and reducing latency.

This architecture enables a fault-tolerant storage infrastructure using both local and remote media, delivers optimized performance in virtualized environments via VHOST, and provides network-accessible storage with comprehensive Quality of Service (QoS) controls. By disaggregating storage from compute resources, xiRAID Opus enhances resource utilization and scalability for enterprise, high-performance computing (HPC), and AI/ML workloads.

About SanDisk DC SN861

Engineered for the future of mission-critical workloads, the SanDisk DC SN861SSD is a cutting-edge PCIe Gen5 NVMe SSD that delivers exceptional performance tailored for enterprise applications. With capacity options of up to 7.68TB, the drive is optimized for compute-intensive AI and machine learning environments by offering high-speed random read capabilities and extremely low latency, all while maintaining minimal power consumption to maximize IOPs per Watt. The DC SN861 SSD is also enriched with a robust set of enterprise features such as Flexible Data Placement (FDP), support for OCP 2.0, and integrated safeguards like Power Loss Protection and End-to-End Data Path Protection. This comprehensive feature set makes the DC SN861 ideally suited for hyperscale, cloud, and enterprise data centers that demand both high performance and operational efficiency.

Test Objectives

In modern computing environments, resource efficiency is a critical consideration. Every CPU cycle dedicated to storage management is a cycle unavailable for processing the actual data workloads. Organizations invest in high-performance storage to accelerate their applications, not to consume their computing resources with storage overhead. xiRAID Opus engine, with its user-space architecture that bypasses traditional OS storage stacks, represents a significant advancement in addressing this challenge.

This solution brief evaluates how effectively xiRAID Opus engine can maximize the performance potential of SanDisk DC SN861 PCIe Gen5 drives in a RAID6 configuration, while minimizing system resource utilization.

Our primary objective is to demonstrate maximum highly protected storage performance with xiRAID Opus and SanDisk drives while minimizing resource consumption.

Test Environment

System configuration

Platform: Supermicro AS -2125HS-TNR
CPU: 2 * AMD EPYC 9454 48-Core Processor
RAM: 24 * SAMSUNG M321R4GA3BB6-CQKET
Drives: 8 * SanDisk SDS6BA138PSP9X3
OS: Rocky Linux 9.5
RAID Engine: xiRAID Opus

RAID Configuration

Level: RAID 6
Drives used: 8
Chunk size: 128K
Stripe size: 768K

Performance testing and results

Testing Methodology

To evaluate the performance capabilities of the SanDisk DC SN861 PCIe Gen5 drives with xiRAID Opus, we conducted comprehensive benchmark testing using fio. Complete fio configurations are provided in the Appendix.

In our testing, the term Jobs refers to both the number of fio testing threads and the number of CPU cores dedicated to the xiRAID Opus engine. We deliberately synchronized these values to accurately measure resource efficiency. For example, when testing with numbjobs=4, xiRAID Opus was bound to those same 4 CPU cores with no access to additional system resources. This one-to-one mapping allows us to precisely determine the minimum resources required to achieve optimal performance.

When considering the results presented below, it is important to note that xiRAID Opus runs on the same CPU threads as the FIO plugin. Therefore, the actual resource consumption of xiRAID Opus is significantly lower than the reported values.

For sustained performance testing, we use multiple disk overwrites to precondition the drives by setting fio loops=4: for individual drives, we preconditioned the drive's block device.

The preconditioning process involved overwriting the entire device using block sizes matched to the planned workload:

  • Random I/O testing: 4K blocks
  • Sequential RAW testing: Chunk size blocks

With preconditioning eliminating performance variables, sustained performance tests were run for 120 seconds.

The number of Jobs presented in the tables below is the optimal one, meaning it provides the best scaling.

It's worth emphasizing that achieving absolute, world-record performance is not always necessary. In many practical scenarios, particularly in production environments, what truly matters is attaining performance that is close to the theoretical maximum, but with efficient use of available resources. Striving for the last few percentage points of performance often comes at a disproportionate cost in terms of complexity, energy consumption, and hardware utilization.

Instead, optimizing for a balanced and sustainable approach—where computational resources are used judiciously and performance remains within an acceptable margin of peak theoretical limits—tends to offer better long-term value. This is especially relevant in constrained environments or large-scale systems, where maximizing throughput per watt, per dollar, or per unit of hardware is far more impactful than achieving record-breaking benchmarks that may not translate into practical benefits. In essence, the goal should often be smart efficiency, not brute-force speed.

RAW Drives Performance Testing

Performance tests of raw drives were conducted in user space.

Single drive

Load Workload pattern Results Avr. Latency (usec) SanDisk DC SN861 specification baseline
Sequential Write job=1; iodepth =32 7.2 GB/s 586.07 7.2 GB/s
Sequential Read job=1; iodepth =32 13.6 GB/s 307.80 13.7 GB/s
Random Write job=1; iodepth =32 362k IOPS 31.09 330k IOPS
Random Read job=1; iodepth =64 3002k IOPS 148.78 3300k IOPS

8 drives

Load Workload pattern Results Avr. Latency (usec)
Sequential Write job=8 (1 per drive); iodepth =32 57.5GB/s 582.76
Sequential Read job=8 (1 per drive); iodepth =32 110GB/s 305.65
Random Write job=32 (2 per drive); iodepth =32 2.8M IOPS 136.09
Random Read job=56 (7 per drive); iodepth =64 24M IOPS 149.34

As we can see from the results of raw drive measurements, they are close to the performance specified in the specification, and we can conclude that the system works correctly and we have no bottlenecks.

xiRAID Opus Performance Tests Results

Sequential tests

All sequential tests were run with a queue depth of 32. The number of jobs/CPU threads were changed for each test.

Theoretical maximum RAID6 sequential write/read performance:

  • Sequential Read: 110 GB/s (100% of raw performance)
  • Sequential Write: 43 GB/s is calculated taking into account penalties for RAID6 with 2 parity drives (75% of raw performance: 57,5 GB/s x 75% = 43.125 GB/s)
Load Jobs/CPU threads Results Avr. Latency (usec)
Sequential Write 1 15.9GB/s 1434.85
Sequential Write 2 21.9GB/s 2114.36
Sequential Write 4 39.0GB/s 2553.06
Sequential Write 8 40.5GB/s 4944.03
Sequential Read 1 99.1GB/s 251.45
Sequential Read 2 110GB/s 913.41
Sequential Read 4 110GB/s 1238.06
Sequential Read 8 110GB/s 1654.11

Here is a graph comparing sequential write and read performance with different numbers of CPU threads:

Key observations

As shown in the sequential test results, 90%* of the sequential read performance on RAID (99.1 GB/s), relative to raw drives, is achieved using just a single CPU thread.

*Proportion of theoretical maximum RAID6 sequential read performance:
(99.1GB/s ÷ 110GB/s) × 100% ≈ 90.09%

In contrast, 90%* of the sequential write performance on RAID, relative to raw drives, is achieved with 4 CPU threads.

*Proportion of theoretical maximum RAID6 sequential write performance:
(39GB/s ÷43GB/s) × 100% ≈ 90.7%

Moreover, increasing the number of threads for sequential reading and writing may not significantly improve performance, but it can increase the operation's wait time.

Random tests

In addition to varying the number of jobs for random tests, we also varied the queue depth to demonstrate the correlation between maximum performance values and operation execution time.

Theoretical maximum RAID6 random write/read performance:

  • Random Read: 24M IOPS (100% of raw performance)
  • Random Write: 955к IOPS (33%* of raw performance due to Read-Modify-Writes: 2896 IOPS x 33% = 955k IOPS).

*In RAID 6, each small random write typically involves 3 write operations (1 for data and 2 for parity) and 3 read operations (to fetch old data and parity). Since read operations are generally faster and less performance-limiting than writes, the impact on write performance is mostly attributed to the 3 writes. As a result, the effective write performance can be approximated as one-third of raw performance, or about 33% (100% ÷ 3).

Random writes results

Load Jobs/CPU threads Results (kIOPS) Avr. Latency (usec)
Random Write iodepth=32
Random Write 1 251k 124.75
Random Write 2 378k 167.23
Random Write 4 587k 216.25
Random Write 8 659k 387.06
Random Write 16 693k 736.69
Random Write 32 706k 1447.78
Random write iodepth=64
Random Write 1 302k 183.21
Random Write 2 533k 236.70
Random Write 4 712k 365.37
Random Write 8 700k 730.13
Random Write 16 721k 1306.96
Random Write 32 716k 2852.31

Here's the graph comparing Random Write performance across different CPU threads for two I/O depths (32 and 64):

Key Observations:

Optimal random write performance is achieved on 4 CPU threads and ranges from 61%* theoretical raid6 maximum random write performance at 32 queue depth and 74%* at 64 queue depth, while maintaining extremely low latency.

At the same time, when the number of CPUs is greater than 8 (16 and 32), the write speed doesn’t change significantly, but the speed of operations execution does.

*Proportion of theoretical maximum RAID6 random write performance:
(587kIOPS ÷ 955kOPS) × 100% ≈ 61.14%
(712kIOPS ÷ 955kOPS) × 100% ≈ 74.26%

Random read results

Load Jobs/CPU threads Results Avr. Latency (usec)
Random read iodepth=64
Random Read 1 1090k 57.40
Random Read 2 2153k 58.17
Random Read 4 4156k 60.30
Random Read 8 7444k 66.99
Random Read 16 13000k 75.32
Random Read 24 17900k 83.83
Random read iodepth=128
Random Read 1 1298k 96.33
Random Read 2 2602k 96.10
Random Read 4 5064k 98.73
Random Read 8 8932k 112.03
Random Read 16 15000k 128.47
Random Read 24 22000k 136.61

Here's the graph comparing Random Read performance across different CPU threads for two I/O depths (64 and 128):

Key Observations:

  • Throughput increases with more threads, for iodepth=64. and iodepth=128 liner.
  • Latency remains low for iodepth=64.
  • A performance of 1090+kIOPS is achieved with only 1 core. At the same time 90% of the theoretical maximum performance is achieved with 24 cores, which is 1/4 of the available cores in the system.

Conclusions

In summary, our measurements on the SanDisk DC SN861 prove that the xiRAID Opus engine delivers performance close to the theoretical maximum that the disks can achieve with minimal resource consumption, even when running in the same context as the FIO measurement program.

xiRAID Opus is engineered to minimize CPU and memory usage while maximizing I/O efficiency, ensuring that you get every last drop of compute and I/O potential out of your hardware.

Running in Linux User Space, xiRAID Opus maintains a low resource footprint, enabling seamless adaptability to future hardware innovations without adding significant overhead. This balance of high performance and low resource consumption makes xiRAID Opus an optimal choice for demanding storage environments.

The SanDisk DC SN861, known for its enterprise-grade reliability, high endurance, and consistent low latency, demonstrates stable performance that does not degrade under high workload conditions. This makes it an ideal device for xiRAID Opus in demanding storage environments.

Appendix

RAW

Preconditioning:

[global]
ioengine=./spdk_bdev
spdk_json_conf=./spdk_raw.json
thread=1
numjobs=1
group_reporting=1
direct=1
iodepth=256
rw=write
blocksize=[4k/128k]
loops=4

[job1]
filename=nvme0n1
[job2]
...

Sequential tests:

[global]
ioengine=./spdk_bdev
spdk_json_conf=./spdk_raw.json
thread=1
numjobs=[1,2,4,8...]
group_reporting=1
direct=1
iodepth=32
rw=[write/read]
blocksize=128k
time_based=1
runtime=120
exitall=1

[job1]
filename=nvme0n1
[job2]
...

Random test:

[global]
ioengine=./spdk_bdev
spdk_json_conf=./spdk_raw.json
thread=1
numjobs=[1,2,4,8...]
group_reporting=1
direct=1
iodepth=[32/64]
rw=rand[write/read]
blocksize=4k
time_based=1
runtime=120
randrepeat=0
norandommap=1
random_generator=tausworthe64

[job1]
filename=nvme0n1
[job2]
...

RAID

Sequential tests:

[global]
ioengine=./spdk_bdev
spdk_json_conf=./spdk.json
thread=1
numjobs=[1,2,4,8...]
group_reporting=1
direct=1
verify=0
iodepth=32
rw=[write/read]
blocksize=768K
time_based=1
runtime=120s

[job1]
filename=xnraid

Random test:

[global]
ioengine=./spdk_bdev
spdk_json_conf=./spdk.json
thread=1
numjobs=[1,2,4,8...]
group_reporting=1
direct=1
verify=0
iodepth=[32/64]
rw=rand[write/read]
blocksize=4K
time_based=1
runtime=120s
randrepeat=0
norandommap=1
random_generator=tausworthe64
cpus_allowed=0-23
cpus_allowed_policy=split

[job1]
filename=xnraid
exitall=1