Why You Should Use GPU to Protect Your Data in the AI Era
Why You Should Use GPU to Protect Your Data in the AI Era
Record-class parity RAID performance for massive parallel AI I/O with SupremeRAID™ Ultra Ada 2.0 and InnoGrit N3X
Date: [placeholder: publication date]
Authors: Graid Technology and InnoGrit Corporation
Test media: 24x InnoGrit N3X NVMe SSDs (SLC, ultra low latency)
Acceleration hardware: SupremeRAID™ Ultra Ada Card (NVIDIA RTX 2000E Ada)
Executive Summary
AI workloads are massive and highly parallel. They generate intense bursts of small-block and mixed I/O patterns across many threads, queues, and datasets. At scale, this turns storage into a first-order limiter for training, inference, and data preparation.
This joint whitepaper shows how SupremeRAID™ 2.0 redefines parity RAID performance for the AI era by pairing 24x InnoGrit N3X SLC NVMe drives with a compact GPU-based RAID engine, the SupremeRAID™ Ultra Ada Card powered by a 50W NVIDIA RTX 2000E Ada.
The results focus on what matters most to AI infrastructure. SupremeRAID™ delivers multi-million IOPS parity RAID random writes in optimal mode and sustains record-class performance in degraded mode, where traditional parity RAID often collapses. The result is faster data ingestion, higher metadata performance, and more stable throughput when failures happen at scale.
The Hardware Advantage: High Density Without a High Power Footprint
This test platform reflects modern AI servers where every watt and every PCIe slot matters.
The RAID engine is the SupremeRAID™ Ultra Ada Card (NVIDIA RTX 2000E Ada), a single-slot, low-profile accelerator operating in a 50W envelope. It enables GPU-accelerated parity RAID without requiring a large GPU footprint or a high power budget.
The storage media is 24x InnoGrit N3X, an SLC-based NVMe SSD designed for extremely low latency and consistency. That low-latency foundation is essential for sustaining high parallel I/O rates typical of AI storage pipelines.
[placeholder: one publishable paragraph from InnoGrit describing N3X positioning and why SLC matters for AI storage]
What AI Workloads Demand From Storage
AI data platforms stress storage in ways that traditional enterprise workloads do not. They combine high parallel reads during training and dataset shuffling, write-heavy patterns during checkpointing and logging, and continuous metadata and small-block activity from distributed data services.
Parity RAID is attractive because it provides capacity efficiency at scale. The challenge is keeping parity RAID fast under heavy random writes and maintaining strong performance in degraded mode when reconstruction enters the data path. SupremeRAID™ 2.0 is engineered for these two moments.
Performance Comparisons
Test Description
The results below compare Linux MD (mdadm) and SupremeRAID™ 2.0 Linux driver on the same 24-drive NVMe configuration. Each subsection presents the numeric outcome first, followed by a short observation that explains why the result matters for AI workloads.
Testing Environment
- Hardware
- CPU: AMD EPYC 9755 128-Core Processor × 2
- Memory: 32 GB DDR5-6400 RDIMM × 24
- GPU RAID Accelerator: SupremeRAID™ Ultra Ada Card (NVIDIA RTX 2000E Ada), single slot, low profile, 50W
- NVMe Drives: InnoGrit N3X SLC NVMe × 24
- Software
- OS: Ubuntu 24.04.2 LTS
- Kernel: 6.8.0-62-generic
- RAID Implementations
- Linux MD (mdadm) v4.3
- SupremeRAID™ 2.0 (2.0.0-uad-76-71)
- Benchmark Tool: fio-3.40
- RAID Configuration
- One RAID group with 24 physical drives (RAID5 and RAID6)
- Test Conditions
- Optimal: All drives healthy
- Degraded: One drive failed
- Workload naming:
randread/randwritewith 4K or 1M block size
RAID5/6 4K Random Write (Optimal)
Result (IOPS):
- RAID5: Linux MD 0.223M vs SupremeRAID™ 2.0 6.477M
- RAID6: Linux MD 0.149M vs SupremeRAID™ 2.0 5.687M
Observation and analysis
Parity RAID 4K random write is a decisive workload for AI pipelines because it maps to checkpoint writes, metadata updates, and high-frequency small writes from distributed services. SupremeRAID™ 2.0 moves parity random write into the multi-million IOPS class for both RAID5 and RAID6, enabling parity protection without forcing a write-performance compromise.
RAID5/6 4K Random Read (Degraded)
Result (IOPS):
- RAID5: Linux MD 0.163M vs SupremeRAID™ 2.0 12.6M
- RAID6: Linux MD 0.186M vs SupremeRAID™ 2.0 12.6M
Observation and analysis
Degraded mode is where parity RAID must prove it can protect data without a performance cliff. The results show SupremeRAID™ 2.0 sustaining 12.6M IOPS in degraded 4K reads, keeping the storage layer responsive under failure conditions that commonly disrupt AI training and data services.
RAID5/6 4K Random Write (Degraded)
Result (IOPS)
- RAID5: Linux MD 0.246M vs SupremeRAID™ 2.0 6.466M
- RAID6: Linux MD 0.147M vs SupremeRAID™ 2.0 5.499M
Observation and analysis
Degraded write behavior is one of the most punishing cases for parity RAID. SupremeRAID™ 2.0 maintains multi-million IOPS even during failure, supporting stable checkpoint cadence and preventing storage stalls from propagating into GPU underutilization.
RAID5/6 1M Random Read (Degraded)
Result (Throughput, GiB/s)
- RAID5: Linux MD 12.0 vs SupremeRAID™ 2.0 195.0
- RAID6: Linux MD 11.8 vs SupremeRAID™ 2.0 194.0
Observation and analysis
Large-block degraded reads map to AI-era data motion such as dataset staging and shuffle phases under failure. SupremeRAID™ 2.0 sustains near 200 GiB/s in degraded mode, supporting predictable throughput when the system is already under fault pressure.
RAID5/6 1M Random Write (Degraded)
Result (Throughput, GiB/s)
- RAID5: Linux MD 13.3 vs SupremeRAID™ 2.0 203.0
- RAID6: Linux MD 14.7 vs SupremeRAID™ 2.0 197.0
Observation and analysis
Large-block degraded writes represent ingestion and checkpoint flows under failure. SupremeRAID™ 2.0 keeps the pipe full in degraded mode, preventing a single failure from turning into a throughput collapse.
Numerical Comparison
| Scenario | RAID5 MD | RAID5 2.0 | RAID6 MD | RAID6 2.0 | Unit | Improvement |
|---|---|---|---|---|---|---|
| 4K Random Write (Optimal) | 0.223 | 6.477 | 0.149 | 5.687 | M IOPS | up to 38x |
| 4K Random Read (Degraded) | 0.163 | 12.6 | 0.186 | 12.6 | M IOPS | up to 77x |
| 4K Random Write (Degraded) | 0.246 | 6.466 | 0.147 | 5.499 | M IOPS | up to 37x |
| 1M Random Read (Degraded) | 12.0 | 195 | 11.8 | 194 | GiB/s | about 16x |
| 1M Random Write (Degraded) | 13.3 | 203 | 14.7 | 197 | GiB/s | up to 15x |
Conclusion
AI workloads are massive and highly parallel, and storage must keep up without collapsing under parity write pressure or degraded reconstruction. The results in this joint evaluation show that SupremeRAID™ 2.0 delivers parity RAID performance aligned with AI-era requirements: multi-million IOPS random writes in optimal mode and record-class degraded performance when failures occur.
With the SupremeRAID™ Ultra Ada Card powered by a compact 50W NVIDIA RTX 2000E Ada, this performance is achieved with a footprint that fits real servers, enabling dense, efficient, AI-ready storage nodes.
Appendix
Benchmarking Instructions
SupremeRAID™ RAID5
- Create physical drives:
sudo graidctl create pd /dev/nvme0-23 - Create a RAID5 drive group:
sudo graidctl create dg raid5 0-23 - Create a virtual drive using all available space in the drive group:
sudo graidctl create vd 0 - Run fio with the parameters defined in Detailed fio Parameters to measure optimal performance.
- Mark the first physical drive offline to force the drive group into a degraded state:
sudo graidctl edit pd 0 marker offline - Run fio again with the same parameters to measure degraded performance.
SupremeRAID™ RAID6
Create physical drives:
sudo graidctl create pd /dev/nvme0-23Create a RAID6 drive group:
sudo graidctl create dg raid6 0-23Create a virtual drive using all available space in the drive group:
sudo graidctl create vd 0Run fio with the parameters defined in Detailed fio Parameters to measure optimal performance.
Mark the first physical drive offline to force the drive group into a degraded state:
sudo graidctl edit pd 0 marker offlineRun fio again with the same parameters to measure degraded performance.
MD RAID5
- Create the MD RAID5 array (/dev/md5) using 24 NVMe drives:
sudo bash -c ' NVME_LIST=($(nvme list | grep INNOGRIT | awk "{print \$1}")) mdadm --create /dev/md5 \ --verbose \ --level=5 \ --raid-devices=24 \ --chunk=16K \ --consistency-policy=resync \ --force \ "${NVME_LIST[@]:0:24}" ' - Increase the MD parity processing thread count:
echo 32 | sudo tee /sys/block/md5/md/group_thread_cnt - Run fio with the parameters defined in Detailed fio Parameters to measure optimal performance.
- Fail one member device to force the array into a degraded state:Ensure the selected device is an active member of the array.
sudo mdadm --manage /dev/md5 --fail /dev/nvme0n1 - Run fio again with the same parameters to measure degraded performance.
MD RAID6
- Create the MD RAID6 array (/dev/md6) using 24 NVMe drives:
sudo bash -c ' NVME_LIST=($(nvme list | grep INNOGRIT | awk "{print \$1}")) mdadm --create /dev/md6 \ --verbose \ --level=6 \ --raid-devices=24 \ --chunk=16K \ --consistency-policy=resync \ --force \ "${NVME_LIST[@]:0:24}" ' - Increase the MD parity processing thread count:
echo 32 | sudo tee /sys/block/md6/md/group_thread_cnt - Run fio with the parameters defined in Detailed fio Parameters to measure optimal performance.
- Fail one member device to force the array into a degraded state:Ensure the selected device is an active member of the array.
sudo mdadm --manage /dev/md6 --fail /dev/nvme0n1 - Run fio again with the same parameters to measure degraded performance.
Fio Parameters
[global]
filename=/dev/gdg0n1
# /dev/gdg0n1 for SupremeRAID™ RAID5 and RAID6
# /dev/md5 for MD RAID5, /dev/md6 for MD RAID6
randrepeat=0
ioengine=libaio
direct=1
random_generator=tausworthe64
cpus_allowed_policy=split
group_reporting=1
norandommap=1
[precondition]
rw=write
bs=1m
numjobs=100
iodepth=32
size=1%
offset_increment=1%
[4k_random_read]
rw=randread
bs=4k
numjobs=512
iodepth=16
cpus_allowed=0-511
[1m_random_read]
rw=randread
bs=1m
numjobs=512
iodepth=8
cpus_allowed=0-511
[1m_random_write]
rw=randwrite
bs=1m
numjobs=256
iodepth=16
cpus_allowed=0-255
[4k_random_write]
rw=randwrite
bs=4k
numjobs=256
iodepth=16
cpus_allowed=0-255
Detailed Benchmark Results
RAID5 Optimal (SupremeRAID™ vs Linux MD)
| Workload | SupremeRAID™ RAID5 | CPU (user/system/idle) | Linux MD RAID5 | CPU (user/system/idle) |
|---|---|---|---|---|
| 4K Random Read | 36.2M IOPS | 4.79 / 22.08 / 73.12 | 28.6M IOPS | 8.87 / 91.13 / 0.00 |
| 4K Random Write | 6.477M IOPS | 1.35 / 10.73 / 87.91 | 223k IOPS | 0.11 / 17.13 / 82.77 |
| 1M Random Read | 301GiB/s (323GB/s) | 0.11 / 2.93 / 96.96 | 301GiB/s (323GB/s) | 0.12 / 75.94 / 23.94 |
| 1M Random Write | 225GiB/s (242GB/s) | 1.07 / 6.53 / 92.40 | 13.8GiB/s (14.8GB/s) | 0.05 / 49.15 / 50.80 |
RAID6 Optimal (SupremeRAID™ vs Linux MD)
| Workload | SupremeRAID™ RAID6 | CPU (user/system/idle) | Linux MD RAID6 | CPU (user/system/idle) |
|---|---|---|---|---|
| 4K Random Read | 36.2M IOPS | 4.79 / 22.08 / 73.12 | 27.9M IOPS | 7.09 / 92.91 / 0.00 |
| 4K Random Write | 5.687M IOPS | 1.86 / 16.89 / 81.26 | 149k IOPS | 0.03 / 50.31 / 49.66 |
| 1M Random Read | 300GiB/s (322GB/s) | 0.10 / 2.80 / 97.10 | 302GiB/s (324GB/s) | 0.13 / 78.46 / 21.42 |
| 1M Random Write | 217GiB/s (233GB/s) | 0.84 / 7.11 / 92.05 | 14.4GiB/s (15.4GB/s) | 0.05 / 52.67 / 47.28 |
RAID5 Degraded (SupremeRAID™ vs Linux MD)
| Workload | SupremeRAID™ RAID5 | CPU (user/system/idle) | Linux MD RAID5 | CPU (user/system/idle) |
|---|---|---|---|---|
| 4K Random Read | 12.6M IOPS | 3.04 / 21.03 / 75.93 | 163k IOPS | 0.06 / 89.37 / 10.56 |
| 4K Random Write | 6.466M IOPS | 0.74 / 5.59 / 93.67 | 246k IOPS | 0.11 / 14.16 / 85.73 |
| 1M Random Read | 195GiB/s (210GB/s) | 0.11 / 7.30 / 92.59 | 12.0GiB/s (12.9GB/s) | 0.01 / 98.47 / 1.51 |
| 1M Random Write | 203GiB/s (218GB/s) | 0.75 / 5.07 / 94.18 | 13.3GiB/s (14.2GB/s) | 0.05 / 48.89 / 51.06 |
RAID6 Degraded (SupremeRAID™ vs Linux MD)
| Workload | SupremeRAID™ RAID6 | CPU (user/system/idle) | Linux MD RAID6 | CPU (user/system/idle) |
|---|---|---|---|---|
| 4K Random Read | 12.6M IOPS | 2.96 / 20.74 / 76.31 | 186k IOPS | 0.06 / 91.28 / 8.66 |
| 4K Random Write | 5.499M IOPS | 0.77 / 8.35 / 90.88 | 147k IOPS | 0.03 / 51.03 / 48.94 |
| 1M Random Read | 194GiB/s (208GB/s) | 0.10 / 6.97 / 92.93 | 11.8GiB/s (12.7GB/s) | 0.01 / 98.42 / 1.57 |
| 1M Random Write | 197GiB/s (212GB/s) | 0.66 / 6.19 / 93.15 | 14.7GiB/s (15.8GB/s) | 0.05 / 51.69 / 48.26 |