The SPDK (Storage Performance Development Kit) is a driver and utility framework that allows you to build high-performance storage systems in the operating system user space. It is used by cloud providers, SDS developers, and DPU-type device manufacturers as part of their SDKs.
SPDK has a layer that allows you to develop storage services and integrate them.
The SPDK block device layer, often simply called bdev, is a C library intended to be equivalent to the operating system block storage layer that often sits immediately above the device drivers in a traditional kernel storage stack. Specifically, this library provides the following functionality:
- A pluggable module API for implementing block devices that interface with different types of block storage devices.
- Driver modules for NVMe, malloc (ramdisk), Linux AIO, virtio-scsi, Ceph RBD, Pmem and Vhost-SCSI Initiator and more.
- An application API for enumerating and claiming SPDK block devices and then performing operations (read, write, unmap, etc.) on those devices.
- Facilities to stack block devices to create complex I/O pipelines, including logical volume management (lvol) and partition support (GPT).
- Configuration of block devices via JSON-RPC.
- Request queueing, timeout, and reset handling.
- Multiple, lockless queues for sending I/O to block devices.
For a long time, the SPDK lacked important functionality in the form of an out-of-the-box fault tolerance module. But a year ago Intel talked about the implementation of a RAID feature which was then called Write Shaping RAID (WSR) and is now available as RAID5f.
A limitation of this solution is that it does not support Read-Modify-Write functions, and only full-stripe writes are supported. In this article we will test raid5f, explain its features and compare it with Xinnor SPDK RAID implementation for SPDK.
First let us set our key expectations from RAID:
- Performance. We expect array performance to be higher than a single drive and ideally be close to the sum of the drives’ total performance, reduced by the RAID penalty factor.
- Fault tolerance. RAID must withstand the number of failures it is designed to tolerate.
- Rebuild. When replacing drives or using a spare, the array must restore its structure.
- Availability. The array must be able to continue operation after a system reboot and component failures and must keep data consistency during unexpected power outages.
The second most important set of expectations is the ability to (re)configure and tune the array:
- Change the number of drives and array level;
- Adjust various settings such as stripe sizes, internal operation priorities, caching, merging, resource limits, direction of checksum allocation, metadata handling, optimal drive selection for reads;
- Spare pool configuration;
- Notification settings.
Testing SPDK
We will look at the available SPDK RAID implementations specifically from the perspective of these expectations.
For this testing we will use an AMD EPYC 7702P server with 16 Western Digital SN840 drives installed. The server is running Ubuntu 22.11 with 5.19.0-26-generic kernel.
First, we will make sure that iommu is enabled in passthrough mode and download SPDK:
In the process of familiarization, we will perform load testing. This can be done using FIO with the SPDK plugin and the perf utility available at https://github.com/spdk/
We will use the first one in further tests.
Let us download and compile fio:
cd fio
./configure
Make
Now we will build the SPDK. First, we need to run 2 commands:
sudo scripts/pkgdep.sh
Do the configuration before we build:
build
make -j
We can now run the SPDK application and set up the array:
./spdk/build/bin/spdk_tgt -m 0xf
We can create the array via the command line, or we can use a configuration file.
To work with raid5f we have the following methods:
bdev_raid_get_bdevs
This is used to list all the raid bdev details based
'all' means all the raid bdevs whether they are online
or configuring or offline. 'online' is the raid bdev
the raid bdev which does not have full configuration
discovered yet. 'offline' is the raid bdev which is
the raid bdev
bdev_raid_create Create new raid bdev
bdev_raid_delete Delete existing raid bdev
It is important to consider the features of raid5f. It does not support read-modify-write operations and therefore must in most cases be used together with FTL. We will perform tests on a raw device.
Therefore, we will use a few patterns:
- Random read-modify-write accesses in a 4k block;
- Sequential accesses in a block equal to the size of the strip.
The first step is to connect the drives:
rpc.py bdev_nvme_attach_controller -b nvme1 -t PCIe -a 0000:45:00.0
rpc.py bdev_nvme_attach_controller -b nvme2 -t PCIe -a 0000:03:00.0
rpc.py bdev_nvme_attach_controller -b nvme3 -t PCIe -a 0000:81:00.0
rpc.py bdev_nvme_attach_controller -b nvme4 -t PCIe -a 0000:84:00.0
rpc.py bdev_nvme_attach_controller -b nvme5 -t PCIe -a 0000:41:00.0
rpc.py bdev_nvme_attach_controller -b nvme6 -t PCIe -a 0000:46:00.0
rpc.py bdev_nvme_attach_controller -b nvme7 -t PCIe -a 0000:44:00.0
rpc.py bdev_nvme_attach_controller -b nvme8 -t PCIe -a 0000:43:00.0
rpc.py bdev_nvme_attach_controller -b nvme9 -t PCIe -a 0000:82:00.0
rpc.py bdev_nvme_attach_controller -b nvme10 -t PCIe -a 0000:48:00.0
rpc.py bdev_nvme_attach_controller -b nvme11 -t PCIe -a 0000:47:00.0
rpc.py bdev_nvme_attach_controller -b nvme12 -t PCIe -a 0000:83:00.0
rpc.py bdev_nvme_attach_controller -b nvme13 -t PCIe -a 0000:42:00.0
rpc.py bdev_nvme_attach_controller -b nvme14 -t PCIe -a 0000:01:00.0
rpc.py bdev_nvme_attach_controller -b nvme15 -t PCIe -a 0000:04:00.0
After that we can create an array:
When creating an array, you can specify the array name, level, strip size and device list.
You can check the created array:
[
{
"name": "raid5",
"strip_size_kb": 64,
"state": "online",
"raid_level": "raid5f",
"num_base_bdevs": 16,
"num_base_bdevs_discovered": 16,
"base_bdevs_list": [
"nvme0n1",
"nvme1n1",
"nvme2n1",
"nvme3n1",
"nvme4n1",
"nvme5n1",
"nvme6n1",
"nvme7n1",
"nvme8n1",
"nvme9n1",
"nvme10n1",
"nvme11n1",
"nvme12n1",
"nvme13n1",
"nvme14n1",
"nvme15n1"
]
}
]
The array can now be used with FTL, exported as a target over NVMf or iSCSI or tested with FIO or Perf.
SPDK performance
First, we will test the raw drives to have a baseline for comparison.
The following parameters were chosen for the tests:
Random read IOPs with 8k block size, sequential read and write throughput.
Random reads
SPDK, no RAID, raw drives
8 drives, IOps |
16 drives, IOps |
8 drives, avg lat, us |
16 drives, avg lat, us |
8 drives, 99.9 lat, us |
16 drives, 99.9 lat, us |
|
---|---|---|---|---|---|---|
8k Rand Read, 1 Job QD1 |
174k | 352k | 45.01 | 44.82 | 140 | 139 |
8k Rand Read, 1 Job QD32 |
2554k | 5177k | 99.45 | 98.16 | 799 | 832 |
8k Rand Read, 2 Jobs QD32 |
2945k | 5682k | 173.23 | 179.57 | 906 | 930 |
8k Rand Read, 3 Jobs QD32 |
2908k | 5649k | 263.50 | 271.29 | 1012 | 1045 |
8k Rand Read, 4 Jobs QD32 |
2905k | 5645k | 351.89 | 362.17 | 1106 | 1139 |
Sequential reads
SPDK, no RAID, raw drives
8 drives, GBps | 16 drives, GBps | |
---|---|---|
64k Seq Read, 1 Job QD16 | 26.9 | 54.0 |
64k Seq Read, 2 Jobs QD16 | 28.6 | 57.3 |
64k Seq Read, 3 Jobs QD16 | 28.6 | 57.3 |
64k Seq Read, 4 Jobs QD16 | 28.6 | 57.3 |
Sequential reads
SPDK, no RAID, raw drives
8 drives, GBps | 16 drives, GBps | |
---|---|---|
64k Seq Read, 1 Job QD16 | 26.9 | 54.0 |
64k Seq Read, 2 Jobs QD16 | 28.6 | 57.3 |
64k Seq Read, 3 Jobs QD16 | 28.6 | 57.3 |
64k Seq Read, 4 Jobs QD16 | 28.6 | 57.3 |
After testing raw devices, let us observe SPDK performance and behavior in raid5f.
Same as above, we will perform tests on arrays of 8 and 16 drives:
8k random reads, full stripe reads and writes, random and sequential reads in degraded mode with 1 drive failed.
Random reads
SPDK, raid5f
8 drives, IOps |
16 drives, IOps |
8 drives, avg lat, us |
16 drives, avg lat, us |
8 drives, 99.9 lat, us |
16 drives, 99.9 lat, us |
|
---|---|---|---|---|---|---|
8k Rand Read, 1 Job QD1 |
14.3k | 15.9k | 69.43 | 62.46 | 139 | 139 |
8k Rand Read, 16 Jobs QD32 |
2821k | 5618k | 180.92 | 90.10 | 1352 | 359 |
8k Rand Read, 32 Jobs QD32 |
2824k | 5319k | 362.03 | 191.89 | 2900 | 2573 |
8k Rand Read, 64 Jobs QD32 |
2833k | 5333k | 722.44 | 383.46 | 5866 | 5800 |
Full stripe reads
SPDK, raid5f
8 drives, GBps | 16 drives, GBps | |
---|---|---|
448k/960k Seq Read, 1 Job QD16 | 22.7 | 47.3 |
448k/960k Seq Read, 2 Jobs QD16 | 28.6 | 57.0 |
448k/960k Seq Read, 3 Jobs QD16 | 28.6 | 57.3 |
448k/960k Seq Read, 4 Jobs QD16 | 28.6 | 57.3 |
Full stripe writes
SPDK, raid5f
8 drives, GBps | 16 drives, GBps | |
---|---|---|
448k/960k Seq Write, 1 Job QD16 | 12.8 | 2.8 |
448k/960k Seq Read, 2 Jobs QD16 | 14.5 | 5.4 |
448k/960k Seq Read, 3 Jobs QD16 | 19.1 | 8.4 |
448k/960k Seq Write, 4 Jobs QD16 | 20.8 | 11 |
Degraded operation
When attempting to disconnect 1 drive using the following method
"method": "bdev_nvme_detach_controller",
{ "params": {
{ "name": "nvme2"
}
we lose the ability to perform operations on the array
Conclusion
Due to its simplicity raid5f has a very good level of performance, close to raw device performance. But today degraded reads and array rebuild don't work, which brings raid5f fault tolerance to RAID0.
Random writing and writing with blocks unequal to strip width
To work normally with raid5f you need to use the ftl module (https://spdk.io/doc/ftl.html) which uses a high-performance DIX-enabled NVMe drive as a caching layer. This will change the overall performance of the solution, as the FTL drive will become the bottleneck.
We formatted 1 of the drives to enable DIX using the nvme format command but there is an error when connecting the cache to raid5f.
Xinnor SPDK RAID
We see that today raid5f capabilities are very limited to say the least. That's why we ported the technologies used in our xiRAID product to SPDK.
Who might need this product:
- Customers that intend to use SPDK in their infrastructure and need a cost-effective approach to protecting data. The main driver is the proliferation of EBOFs, that provide super-fast shared storage with a modest acquisition cost.
- Storage appliance manufacturers and SDS developers who have adopted SPDK as their framework.
In developing the solution, we adhered to the following principles:
- Functionality consistent with xiRAID.
- Performance level reaching 90-95% of theoretical maximum.
- Usability with an intelligent and user-friendly interface.
- Universal CLI to uniformly manage xiRAID and SPDK RAID instances.
Moreover, in the case of SPDK we can also manage the NVMf subsystem through our CLI, making it convenient to use in disaggregated environments. Xinnor SPDK RAID also has a much wider range of RAID levels because disaggregated environments require failure protection for the whole EBOF/JBOF.
Here is an example of creating an array:
As with xiRAID, we can modify the RAID operating parameters:
Import the configuration when restarting the system and transferring the array to another node:
Xinnor SPDK RAID testing
We benchmarked our RAID similar to the raid5f tests, adding 8k write and mixed workload.
Random reads
8 drives, Xinnor SPDK RAID5 and RAID5f
IOps, RAID5f |
IOps, Xinnor SPDK RAID5 |
avg lat, RAID5f, us |
avg lat, Xinnor SPDK RAID5, us |
99.9% lat, RAID5f, us |
99.9% lat, Xinnor SPDK RAID5, us |
|
---|---|---|---|---|---|---|
8k Rand Read, 1 Job QD1 | 14.3k | 15.9k | 69.43 | 62.46 | 139 | 139 |
8k Rand Read, 16 Jobs QD32 | 2821k | 2756k | 180.92 | 184.93 | 1352 | 873 |
8k Rand Read, 32 Jobs QD32 | 2824k | 2784k | 362.03 | 367.15 | 2900 | 2606 |
8k Rand Read, 64 Jobs QD32 | 2833k | 2784k | 722.44 | 493.73 | 5866 | 3687 |
Full stripe reads
8 drives, Xinnor SPDK RAID5 and RAID5f
RAID5f, GBps | Xinnor SPDK RAID5, GBps | |
---|---|---|
448k Seq Read, 1 Job QD16 | 22.7 | 23.7 |
448k Seq Read, 2 Jobs QD16 | 28.6 | 28.6 |
448k Seq Read, 3 Jobs QD16 | 28.6 | 28.6 |
448k Seq Read, 4 Jobs QD16 | 28.6 | 28.6 |
Random writes
8 drives, Xinnor SPDK RAID5 and RAID5f
IOps, RAID5f |
IOps, Xinnor SPDK RAID5 |
avg lat, RAID5f, us |
avg lat, Xinnor SPDK RAID5, us |
99.9% lat, RAID5f, us |
99.9% lat, Xinnor SPDK RAID5, us |
|
---|---|---|---|---|---|---|
8k Rand Write, 16 Jobs QD32 |
- | 488k | - | 1049.47 | - | 5538 |
8k Rand Write, 32 Jobs QD32 |
- | 487k | - | 2102.87 | - | 10814 |
8k Rand Write, 64 Jobs QD32 |
- | 460k | - | 3058.34 | - | 14222 |
Full stripe writes
8 drives, Xinnor SPDK RAID5 and RAID5f
RAID5f, GBps | Xinnor SPDK RAID5, GBps | |
---|---|---|
448k Seq Write, 1 Job QD16 | 14.2 | 15.1 |
448k Seq Write, 2 Jobs QD16 | 19.3 | 21.9 |
448k Seq Write, 3 Jobs QD16 | 19.3 | 21.0 |
448k Seq Write, 4 Jobs QD16 | 21.5 | 22.9 |
Mixed 70% reads, 30% writes
8 drives, Xinnor SPDK RAID5 and RAID5f
IOps, RAID5f |
IOps, Xinnor SPDK RAID5 |
avg lat, RAID5f, us |
avg lat, Xinnor SPDK RAID5, us |
99.9% lat, RAID5f, us |
99.9% lat, Xinnor SPDK RAID5, us |
|
---|---|---|---|---|---|---|
8k Rand Read/Write, 16 Jobs QD32 |
- | 805k/345k | - | 330.88/708.97 | - | 1811/3621 |
8k Rand Read/Write, 32 Jobs QD32 |
- | 820k/351k | - | 580.14/1557.69 | - | 3818/7898 |
8k Rand Read/Write, 64 Jobs QD32 |
- | 787k/337k | - | 751.32/2230.39 | - | 5407/11076 |
Random reads
16 drives, Xinnor SPDK RAID5 and RAID5f
IOps, RAID5f |
IOps, Xinnor SPDK RAID5 |
avg lat, RAID5f, us |
avg lat, Xinnor SPDK RAID5, us |
99.9% lat, RAID5f, us |
99.9% lat, Xinnor SPDK RAID5, us |
|
---|---|---|---|---|---|---|
8k Rand Read, 16 Jobs QD32 |
5618k | 4228k | 90,1 | 119.56 | 359 | 490 |
8k Rand Read, 32 Jobs QD32 |
5319k | 5095k | 191.89 | 200.15 | 2573 | 2180 |
8k Rand Read, 64 Jobs QD32 |
5333k | 5099k | 383.46 | 243.99 | 5800 | 2966 |
Full stripe reads
16 drives, Xinnor SPDK RAID5 and RAID5f
RAID5f, GBps | Xinnor SPDK RAID5, GBps | |
---|---|---|
960k Seq Read, 1 Job QD16 | 47.3 | 45.9 |
960k Seq Read, 2 Jobs QD16 | 57.0 | 56.5 |
960k Seq Read, 3 Jobs QD16 | 57.3 | 57.2 |
960k Seq Read, 4 Jobs QD16 | 57.3 | 57.3 |
Random writes
16 drives, Xinnor SPDK RAID5 and RAID5f
IOps, RAID5f |
IOps, Xinnor SPDK RAID5 |
avg lat, RAID5f, us |
avg lat, Xinnor SPDK RAID5, us |
99.9% lat, RAID5f, us |
99.9% lat, Xinnor SPDK RAID5, us |
|
---|---|---|---|---|---|---|
8k Rand Write, 16 Jobs QD32 | - | 1031k | - | 495.88 | - | 3359 |
8k Rand Write, 32 Jobs QD32 |
- | 860k | - | 1189.50 | - | 13435 |
8k Rand Write, 64 Jobs QD32 |
- | 779k | - | 1600.79 | - | 19792 |
Full stripe writes
16 drives, Xinnor SPDK RAID5 and RAID5f
RAID5f, GBps | Xinnor SPDK RAID5, GBps | |
---|---|---|
960k Seq Write, 1 Job QD16 | 2.7 | 22.9 |
960k Seq Write, 2 Jobs QD16 | 5.4 | 29.2 |
960k Seq Write, 3 Jobs QD16 | 8.3 | 33 |
960k Seq Write, 4 Jobs QD16 | 11.0 | 37.8 |
Mixed 70% reads, 30% writes
16 drives, Xinnor SPDK RAID5 and RAID5f
IOps, RAID5f |
IOps, Xinnor SPDK RAID5 |
avg lat, RAID5f, us |
avg lat, Xinnor SPDK RAID5, us |
99.9% lat, RAID5f, us |
99.9% lat, Xinnor SPDK RAID5, us |
|
---|---|---|---|---|---|---|
8k Rand Read/Write, 16 Jobs QD32 |
- | 1493k/640k | - | 202.75/322.94 | - | 1123/2089 |
8k Rand Read/Write, 32 Jobs QD32 |
- | 1731k/742k | - | 300.64/676.08 | - | 2835/5932 |
8k Rand Read/Write, 64 Jobs QD32 |
- | 1625k/696k | - | 356.67/911.83 | - | 4490/9896 |
Xinnor SPDK RAID and SPDK RAID5f feature comparison
RAID5f | Xinnor SPDK RAID5 | |
---|---|---|
Random read performance | ~ full drive performance | ~ full drive performance |
Random write performance | Needs FTL | ~ theoretical maximum |
Sequential read performance | ~ full drive performance | ~ full drive performance |
Sequential Write Performance | ~ full drive performance (Needs io size = stripe size, or FTL) |
~ full drive performance |
Mixed workload performance | Needs FTL | ~ theoretical maximum |
Degraded mode performance | Non-functional | ~ theoretical maximum |
Array rebuild and recovery | Non-functional | High performance with prioritization |
Initialization priority | Not needed | Yes |
Supported RAID levels | 5f | 0, 1, 10, 5, 6, 7.3, 50, 60, 70 |
Changing settings | None | Yes |
Array recovery and migration | None | Yes |
Restriping | None | In development |
Management | rpc.py | unified CLI |
{ "subsystems": [ { "subsystem": "bdev", "config": [ { "method": "bdev_set_options", "params": { "bdev_io_pool_size": 65535, "bdev_io_cache_size": 256, "bdev_auto_examine": true } }, { "method": "bdev_nvme_set_options", "params": { "action_on_timeout": "none", "timeout_us": 0, "timeout_admin_us": 0, "keep_alive_timeout_ms": 10000, "transport_retry_count": 4, "arbitration_burst": 0, "low_priority_weight": 0, "medium_priority_weight": 0, "high_priority_weight": 0, "nvme_adminq_poll_period_us": 10000, "nvme_ioq_poll_period_us": 0, "io_queue_requests": 512, "delay_cmd_submit": true, "bdev_retry_count": 3, "transport_ack_timeout": 0, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0, "generate_uuids": false } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme0", "trtype": "PCIe", "traddr": "0000:02:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme1", "trtype": "PCIe", "traddr": "0000:45:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme2", "trtype": "PCIe", "traddr": "0000:03:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme3", "trtype": "PCIe", "traddr": "0000:81:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme4", "trtype": "PCIe", "traddr": "0000:84:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme5", "trtype": "PCIe", "traddr": "0000:41:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme6", "trtype": "PCIe", "traddr": "0000:46:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme7", "trtype": "PCIe", "traddr": "0000:44:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme8", "trtype": "PCIe", "traddr": "0000:43:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme9", "trtype": "PCIe", "traddr": "0000:82:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme10", "trtype": "PCIe", "traddr": "0000:48:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme11", "trtype": "PCIe", "traddr": "0000:47:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme12", "trtype": "PCIe", "traddr": "0000:83:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme13", "trtype": "PCIe", "traddr": "0000:42:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme14", "trtype": "PCIe", "traddr": "0000:01:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme15", "trtype": "PCIe", "traddr": "0000:04:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_set_hotplug", "params": { "period_us": 100000, "enable": false } }, { "method": "bdev_raid_create", "params": { "name": "raid5", "strip_size_kb": 64, "raid_level": "raid5f", "base_bdevs": [ "nvme0n1", "nvme1n1", "nvme2n1", "nvme3n1", "nvme4n1", "nvme5n1", "nvme6n1", "nvme7n1", "nvme8n1", "nvme9n1", "nvme10n1", "nvme11n1", "nvme12n1", "nvme13n1", "nvme14n1", "nvme15n1" ] } }, { "method": "bdev_wait_for_examine" } ] } ] } Xinnor SPDK RAID config file { "subsystems": [ { "subsystem": "scheduler", "config": [ { "method": "framework_set_scheduler", "params": { "name": "static" } } ] }, { "subsystem": "accel", "config": [] }, { "subsystem": "vmd", "config": [] }, { "subsystem": "sock", "config": [ { "method": "sock_impl_set_options", "params": { "impl_name": "posix", "recv_buf_size": 2097152, "send_buf_size": 2097152, "enable_recv_pipe": true, "enable_quickack": false, "enable_placement_id": 0, "enable_zerocopy_send_server": true, "enable_zerocopy_send_client": false, "zerocopy_threshold": 0, "tls_version": 0, "enable_ktls": false } }, { "method": "sock_impl_set_options", "params": { "impl_name": "ssl", "recv_buf_size": 2097152, "send_buf_size": 2097152, "enable_recv_pipe": true, "enable_quickack": false, "enable_placement_id": 0, "enable_zerocopy_send_server": true, "enable_zerocopy_send_client": false, "zerocopy_threshold": 0, "tls_version": 0, "enable_ktls": false } } ] }, { "subsystem": "bdev", "config": [ { "method": "bdev_set_options", "params": { "bdev_io_pool_size": 65535, "bdev_io_cache_size": 256, "bdev_auto_examine": true } }, { "method": "bdev_nvme_set_options", "params": { "action_on_timeout": "none", "timeout_us": 0, "timeout_admin_us": 0, "keep_alive_timeout_ms": 10000, "transport_retry_count": 4, "arbitration_burst": 0, "low_priority_weight": 0, "medium_priority_weight": 0, "high_priority_weight": 0, "nvme_adminq_poll_period_us": 10000, "nvme_ioq_poll_period_us": 0, "io_queue_requests": 512, "delay_cmd_submit": true, "bdev_retry_count": 3, "transport_ack_timeout": 0, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme0", "trtype": "PCIe", "traddr": "0000:02:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme1", "trtype": "PCIe", "traddr": "0000:45:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme2", "trtype": "PCIe", "traddr": "0000:03:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme3", "trtype": "PCIe", "traddr": "0000:81:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme4", "trtype": "PCIe", "traddr": "0000:84:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme5", "trtype": "PCIe", "traddr": "0000:41:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme6", "trtype": "PCIe", "traddr": "0000:46:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme7", "trtype": "PCIe", "traddr": "0000:44:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme8", "trtype": "PCIe", "traddr": "0000:43:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme9", "trtype": "PCIe", "traddr": "0000:82:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme10", "trtype": "PCIe", "traddr": "0000:48:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme11", "trtype": "PCIe", "traddr": "0000:47:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme12", "trtype": "PCIe", "traddr": "0000:83:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme13", "trtype": "PCIe", "traddr": "0000:42:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme14", "trtype": "PCIe", "traddr": "0000:01:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_attach_controller", "params": { "name": "nvme15", "trtype": "PCIe", "traddr": "0000:04:00.0", "prchk_reftag": false, "prchk_guard": false, "ctrlr_loss_timeout_sec": 0, "reconnect_delay_sec": 0, "fast_io_fail_timeout_sec": 0 } }, { "method": "bdev_nvme_set_hotplug", "params": { "period_us": 100000, "enable": false } }, { "method": "bdev_rdx_raid_create", "params": { "name": "raid5", "strip_size_kb": 64, "raid_level": 5, "base_bdevs": [ "nvme0n1", "nvme1n1", "nvme2n1", "nvme3n1", "nvme4n1", "nvme5n1", "nvme6n1", "nvme7n1", "nvme8n1", "nvme9n1", "nvme10n1", "nvme11n1", "nvme12n1", "nvme13n1", "nvme14n1", "nvme15n1" ] } }, { "method": "bdev_rdx_raid_init", "params": { "name": "raid5", "command": "force" } }, { "method": "bdev_wait_for_examine" } ] } ] } Small IO fio config file [global] ioengine=spdk_bdev spdk_json_conf=/home/xinnor/spdk_config_raid5_xi_8.json thread=1 group_reporting=1 direct=1 verify=0 time_based=1 ramp_time=0 runtime=900 iodepth=32 rw=randrw rwmixread=[0, 70, 100] bs=8k norandommap random_generator=tausworthe64 numjobs=[1, 16, 32, 64] [test0] filename=raid5 Full stripe IO fio config file [global] ioengine=spdk_bdev spdk_json_conf=/home/xinnor/spdk_config_raid5_8.json thread=1 group_reporting=1 direct=1 verify=0 time_based=1 ramp_time=0 runtime=60 iodepth=16 rw=[read, write] bs=[448k, 960k] offset_increment=15% norandommap random_generator=tausworthe64 numjobs=[1, 2, 3, 4] [test0] filename=raid5