ProductTechnologySolutionsValidationCustomersCompanyNewsAcademy Contact 中文

Technology

Disaggregation: storage decoupled into an independently scalable all-flash pool, linked to compute over a lossless fabric.

DISAGGREGATION

Disaggregated architecture

Compute pool ⟷ lossless fabric ⟷ all-flash pool — each scaling independently.

Compute pool

GPU / NPU nodes
Ascend Atlas 910B
Training / inference frameworks (transparent)

All-flash pool

EBOF flash array
CPFS parallel file system
KV-cache acceleration layer

Data moves directly between storage and GPU memory; compute and capacity scale independently.

FOUR PILLARS

Four core technologies

Each maps directly to a shortened data path.

NVMe-oF over RDMA / RoCE

Carry NVMe over remote direct memory access, bypassing redundant copies to approach local-disk performance.

GPUDirect

Data moves directly between storage and GPU memory, shortening the path and cutting CPU and latency overhead.

All-flash EBOF

Controller-less, high-density flash pool; bandwidth and IOPS scale near-linearly with capacity, at lower power.

KV-cache scheduling

Offload and reuse KV cache for long-context / high-switch inference, lifting effective GPU utilization.

Why KV cache is the key to cheaper inference

Long contexts and model switching rebuild KV cache repeatedly, consuming memory and time. Offloading / reusing it to fast storage cuts online-workload cost by up to ~73.7% in industry and internal tests.S5

VS. NFS

Versus the NFS baseline

Third-party results on the same Ascend platform and workload (excerpt).

MetricNFS baselineZK-DPU WS5000Gain
DeepSeek-32B model load563.85 s6.62 s85.17×
Training checkpoint load131.37 s10.55 s12.45×
Token throughput (40 switches/day)21.7%99.1%+356.9%

Self-controlled, domestic-ready

Deeply optimized for Huawei Ascend and domestic accelerators with 90%+ coverage; AMD and xFusion adaptation in testing (subject to final reports). Meets sovereignty needs of enterprises and AI centers.

Benchmark it on your own workload

2 live demo units are ready for immediate PoC. Let the data do the talking.