ProductTechnologySolutionsValidationCustomersCompanyNewsAcademy Contact 中文

Independent validation

Beijing Information Science and Technology University · Huawei Ascend Atlas 910B · leading on all 7 metrics.

SETUP

A reproducible test setup

Objective and checkable: an independent third party, a stated platform, a stated baseline.

ItemDetail
TesterBeijing Information Science and Technology University (independent third party)
PlatformHuawei Ascend Atlas 910B
BaselineNFS network storage (NFS over TCP, 10GbE, ~1.25 GB/s)
ZK-DPU linkNVMe-oF over RDMA / RoCE (2×200GbE, ~50 GB/s line rate)
MetricsInference load/service, training I/O, token efficiency — 7 in total
INFERENCE

Inference: load and service speedup

Bring-up and switching go from minutes to seconds.

ModelZK-DPU loadNFS loadLoad speedupLatency cutService speedup
DeepSeek-32B6.62 s563.85 s85.17×98.83%6.17×
DeepSeek-70B35.38 s1284.66 s36.31×97.25%9.33×
TRAINING

Training: weights and checkpoint I/O

The bigger the model and the more frequent the checkpoints, the more idle time you save.

TestZK-DPUNFS baselineSpeedupLatency cut
模型加载12.72 s140.23 s11.02×90.93%
模型保存31.16 s165.87 s5.32×81.21%
Checkpoint 加载10.55 s131.37 s12.45×91.97%
Checkpoint 保存81.94 s451.14 s5.51×81.84%
THROUGHPUT

Token throughput (= effective GPU utilization)

The more frequent the switching, the wider the gap.

Switch frequencyZK-DPU util.NFS util.Relative gain
10/day99.8%80.4%+24.1%
20/day99.5%60.8%+63.6%
40/day99.1%21.7%+356.9%

Conclusion

In Beijing Information Science and Technology University’s independent test, ZK-DPU WS5000 reached ~85× peak model-load speedup, 5–12× training I/O speedup and up to +357% token efficiency; median latency reduction across 7 metrics was 90.9% — reproducible and verifiable.S38

Benchmark it on your own workload

2 live demo units are ready for immediate PoC. Let the data do the talking.