Technology
Disaggregation: storage decoupled into an independently scalable all-flash pool, linked to compute over a lossless fabric.
Disaggregated architecture
Compute pool ⟷ lossless fabric ⟷ all-flash pool — each scaling independently.
Compute pool
NVMe-oF · RDMA / RoCE
All-flash pool
Data moves directly between storage and GPU memory; compute and capacity scale independently.
Four core technologies
Each maps directly to a shortened data path.
NVMe-oF over RDMA / RoCE
Carry NVMe over remote direct memory access, bypassing redundant copies to approach local-disk performance.
GPUDirect
Data moves directly between storage and GPU memory, shortening the path and cutting CPU and latency overhead.
All-flash EBOF
Controller-less, high-density flash pool; bandwidth and IOPS scale near-linearly with capacity, at lower power.
KV-cache scheduling
Offload and reuse KV cache for long-context / high-switch inference, lifting effective GPU utilization.
Why KV cache is the key to cheaper inference
Long contexts and model switching rebuild KV cache repeatedly, consuming memory and time. Offloading / reusing it to fast storage cuts online-workload cost by up to ~73.7% in industry and internal tests.S5
Versus the NFS baseline
Third-party results on the same Ascend platform and workload (excerpt).
| Metric | NFS baseline | ZK-DPU WS5000 | Gain |
|---|---|---|---|
| DeepSeek-32B model load | 563.85 s | 6.62 s | 85.17× |
| Training checkpoint load | 131.37 s | 10.55 s | 12.45× |
| Token throughput (40 switches/day) | 21.7% | 99.1% | +356.9% |
Self-controlled, domestic-ready
Deeply optimized for Huawei Ascend and domestic accelerators with 90%+ coverage; AMD and xFusion adaptation in testing (subject to final reports). Meets sovereignty needs of enterprises and AI centers.
Benchmark it on your own workload
2 live demo units are ready for immediate PoC. Let the data do the talking.