We use cookies to personalize your site experience.
Privacy policyCase Studies
About SCI Institute
The University of Utah Scientific Computing and Imaging (SCI) Institute is a global leader in applied, multidisciplinary computing research with real‑world impact. Founded in 1994, SCI brings together experts and students in computing, engineering, math, medicine, education, urban planning, environmental science, and more to develop computational methods and tools that drive discovery and address pressing societal challenges. Led by Manish Parashar, the university’s chief artificial intelligence officer, the institute oversees the One‑U Responsible AI Initiative and the Center for High Performance Computing, serving as the collaborative hub of the university’s computing research ecosystem.
The SCI Institute uses Starfish to catalog and manage research storage consisting of over 3 billion files and several petabytes of data. Starfish keeps its metadata catalog in PostgreSQL, and the storage backing that database had become the limiting factor on SCI's Starfish server: a single NVMe drive with ext4. Queries were sluggish, scans exceeded their allotted time window and ran concurrently waiting on database writes, and the database had outgrown its disk. SCI added more drives and migrated tables, but needed a refresh that solved both problems at once, performance and capacity, with minimal sysadmin overhead.
Design Goals
Todd Green, Director of IT at SCI, shares their design goals:
We had a short, focused list:
- A vendor we trust. The storage market has burned us before; one hardware RAID supplier we considered went out of business during our evaluation. We wanted a solid company with a proven track record.
- A single, growable RAID volume. No LVM gymnastics with long extension times, no moving tables around or trimming retention periods. Just one device the database can sit on, supporting quick growth.
- Operational simplicity. A single, well-documented CLI.
- Documentation and support. Easy access to documentation and a vendor that responds quickly to support queries. Xinnor has been very responsive and available to us outside of normal business hours.
- Cost-effective, but it had to outperform the free alternatives. If mdadm or ZFS could have matched what we needed, that's where we would have ended up. xiRAID is affordable and the support and maintenance plans are attractive.
Why xiRAID Over mdadm and ZFS
Both mdadm and ZFS are battle-tested and free. Neither met the bar of the client.
mdadm works well until something goes wrong. On modern multi-TB NVMe, rebuild times stretch into days, not hours, and rebuild contention badly degrades foreground performance. A recent joint study from Xinnor and Solidigm found xiRAID rebuilt a 61.44 TB array 10x faster than mdadm with no host workload, and 30x faster under active workload, while cutting write amplification from 1.20 to 1.02. For a database under continuous load, that gap is the difference between users not noticing a drive replacement and filing support tickets about performance.
ZFS has a richer feature set, but Xinnor's published benchmarks against RAIDZ show xiRAID delivering roughly 2x the sequential throughput, with ZFS running at 80%+ core utilization to compete. Starfish itself is not a quiet neighbor; its scanners and rule engine need CPU cycles. Spending most of the cores on the storage layer was a non-starter. SCI's previous testing had shown ZFS to not be nearly as performant in their use case, and Starfish recommends ext4.
xiRAID's pitch is built around exactly the things the client cared about:
- Lockless datapath, with a patented RAID engine. Xinnor claims up to 97% of raw device performance, sub-0.5 ms latency, and less than 5% CPU utilization under maximum load. That last figure is the one that mattered most to SCI.
- Initialization and rebuild measured in hours, not days.
- Single management CLI (xicli): create, destroy, show, replace, resize, all one verb away.
- Production references. Xinnor powers MEGWARE's storage at the Erlangen National HPC Center and ranks #3 globally in the IO500 Production list.
What We Tested
The new server has 10 enterprise NVMe drives (3.84 TB each) dedicated to the data array, plus two smaller drives in mdadm RAID1 for the OS. We ran fio against four configurations:
- Phase 1. 10 raw NVMe drives in parallel. Each drive gets its own job section. This is the hardware ceiling.
- Phase 2. xiRAID RAID10, raw block device. No filesystem.
- Phase 3. xiRAID RAID10 + ext4. The production configuration.
- Phase 4. Old server: single NVMe + ext4. What we are replacing.
Workload: 16 KiB random reads and writes, iodepth=64 per job, libaio, direct=1, 600 seconds per data point, with the number of fio jobs swept through 1, 2, 4, 8, 16, 32, 64 to show how each system scales.
Fio configuration for random read tests:
[global] rw=randread bs=4K iodepth=64 direct=1 ioengine=libaio runtime=100 random_generator=tausworthe64 group_reporting cpus_allowed=0-63 cpus_allowed_policy=split
Fio configuration for random write tests:
[global] rw=randwrite bs=4K iodepth=64 direct=1 ioengine=libaio runtime=100 random_generator=tausworthe64 group_reporting cpus_allowed=0-63 cpus_allowed_policy=split
Results
| Configuration | Peak read | Peak write | Notes |
|---|---|---|---|
| 10 raw NVMe, parallel | 8.03 M IOPS 125 GB/s |
1.31 M IOPS 20.5 GB/s |
hardware ceiling |
| xiRAID RAID10 (raw) | 8.24 M IOPS 129 GB/s |
1.20 M IOPS 18.8 GB/s |
matches bare drives |
| xiRAID RAID10 + ext4 (prod) | 7.71 M IOPS 120 GB/s |
1.02 M IOPS 16.0 GB/s |
only ~6% read overhead vs raw |
| Old server (1× NVMe + ext4) | 200 K IOPS 3.1 GB/s |
211 K IOPS 3.3 GB/s |
what was replaced |
Two things stood out. At reads, xiRAID matches the bare 10-drive parallel result, which is consistent with the lockless read path benefiting from striping across mirrors. In practice, RAID10 read performance is effectively on par with raw drives at this concurrency. The ext4 layer adds only about 6% read overhead versus the raw RAID device, which is exceptionally low for a journaled filesystem on a fast block device.
For writes, the picture is more nuanced. Peak IOPS on this system appear to be limited by the CPU and RAM configuration, which is a known quirk of the environment. As a result, the gap between raw drives and RAID10 is smaller than the theoretical maximum would suggest. In theory, 10 raw drives can deliver roughly twice the write throughput of RAID10 on the same number of drives, because RAID10 duplicates each write by design. In this setup, however, the measured write performance was constrained before that theoretical gap could fully appear. The read results, by contrast, are representative and do not require the same reservations.
Compared to the old server, the new system delivers roughly 39x more read IOPS and 5x more write IOPS at peak. Latencies on the old server collapsed under concurrency (N=64 read mean = 24 ms, 4096 outstanding I/Os against a single drive); on the new system, the same workload stays at sub-millisecond mean latency.
Production Outcome
Before going to production, SCI ran Starfish's standard pgbench (TPC-B-like, 4 clients, 4 threads, 60 s) benchmark on both as a sanity check:
| Metric | Old server (Postgres 13, 1x NVMe) |
New server (Postgres 17, xiRAID + ext4) |
|---|---|---|
| TPS | 5,690 | 15,069 |
| Mean latency | 0.703 ms | 0.265 ms |
| Transactions in 60 s | 341,273 | 904,082 |
That is 2.65x more transactions per second at 2.65x lower latency. Postgres 17 contributes some of that uplift, but the storage layer is doing the heavy lifting, which is exactly what SCI needed.
After a week in production, Starfish Storage reported aggregate scan throughput of approximately 76.6K TPS across concurrent diff scans (spanning multiple namespaces and billions of files) on the new server. This metric highlights the system's ability to sustain high operational throughput under real workload conditions.
Todd Green says: “Operationally, xiRAID has been a non-event in the best possible way: initialization completed in minutes, not days like mdadm; every lifecycle operation we have needed is one xicli command; and Xinnor's support was responsive and technically sharp during evaluation. We would choose xiRAID again.”