Make every GPU
earn its keep
ZK-DPU WS5000 is an all-flash accelerated storage appliance for AI. A disaggregated architecture and an end-to-end high-speed data path free your GPU cluster from waiting on data — lifting utilization and cutting total cost, with no changes to your framework.
Independently validated by Beijing Information Science and Technology University · median latency reduction 90.9% across 7 metrics
You bought top-tier GPUs — and they wait on data
Stacking more GPUs yields diminishing returns. The real bottleneck is data supply: model loading, checkpoint I/O and KV-cache scheduling.
Compute throttled by storage
Average utilization at China’s AI data centers is below 60%; in I/O-bound cases effective GPU utilization is often just 30–50%.S11
Storage is the hidden ceiling
Conventional NFS / centralized storage caps bandwidth, so GPUs idle waiting for data. The larger the model, the higher the toll.
Turn storage into an amplifier
ZK-DPU disaggregates storage from a supporting role into a compute amplifier, lifting GPU utilization by 2–3×.S4
The WS5000 all-flash storage appliance
A high-performance appliance for AI training and inference. Disaggregated storage plus an end-to-end fast data path raise effective utilization and slash total cost — without touching your framework.
- ✓300 GB/s aggregate bandwidth, 50M random IOPS, 20 µs latency
- ✓90%+ mainstream GPU coverage, deeply tuned for Huawei Ascend and domestic accelerators
- ✓Turnkey deployment in 48-72 hours; ~40% lower total cost
- ✓Four core technologies: NVMe-oF/RDMA, GPUDirect, all-flash EBOF, KV-cache scheduling
Reproducible third-party benchmarks
Beijing Information Science and Technology University ran an independent test on the Huawei Ascend Atlas 910B platform against an NFS baseline — leading on all 7 metrics.
Four scenarios, one disaggregated platform
From greenfield clusters to brownfield retrofits, from training to inference — across the full lifecycle of AI infrastructure.
Training clusters
Accelerate model loading and checkpoint I/O to shorten training iterations and cut idle GPU time.
Inference serving
Long-context and high-frequency multi-model switching — markedly higher effective GPU utilization.
AI centers / domestic stack
Disaggregation plus deep Ascend tuning for sovereign, self-controlled infrastructure.
Brownfield retrofit
No GPU swap, no downtime — revive idle compute assets in place.
Ecosystem & certainty
Validated · manufacturable · ecosystem-ready
Honesty discipline
We separate what is delivered from what is in progress: third-party validation and mass-production foundry are delivered; AMD and xFusion platform adaptation are in testing (subject to final reports).
Benchmark it on your own workload
2 live demo units are ready for immediate PoC. Let the data do the talking.