Independent validation

Beijing Information Science and Technology University · Huawei Ascend Atlas 910B · leading on all 7 metrics.

SETUP

A reproducible test setup

Objective and checkable: an independent third party, a stated platform, a stated baseline.

Item	Detail
Tester	Beijing Information Science and Technology University (independent third party)
Platform	Huawei Ascend Atlas 910B
Baseline	NFS network storage (NFS over TCP, 10GbE, ~1.25 GB/s)
ZK-DPU link	NVMe-oF over RDMA / RoCE (2×200GbE, ~50 GB/s line rate)
Metrics	Inference load/service, training I/O, token efficiency — 7 in total

INFERENCE

Inference: load and service speedup

Bring-up and switching go from minutes to seconds.

Model	ZK-DPU load	NFS load	Load speedup	Latency cut	Service speedup
DeepSeek-32B	6.62 s	563.85 s	85.17×	98.83%	6.17×
DeepSeek-70B	35.38 s	1284.66 s	36.31×	97.25%	9.33×

TRAINING

Training: weights and checkpoint I/O

The bigger the model and the more frequent the checkpoints, the more idle time you save.

Test	ZK-DPU	NFS baseline	Speedup	Latency cut
模型加载	12.72 s	140.23 s	11.02×	90.93%
模型保存	31.16 s	165.87 s	5.32×	81.21%
Checkpoint 加载	10.55 s	131.37 s	12.45×	91.97%
Checkpoint 保存	81.94 s	451.14 s	5.51×	81.84%

THROUGHPUT

Token throughput (= effective GPU utilization)

The more frequent the switching, the wider the gap.

Switch frequency	ZK-DPU util.	NFS util.	Relative gain
10/day	99.8%	80.4%	+24.1%
20/day	99.5%	60.8%	+63.6%
40/day	99.1%	21.7%	+356.9%

Conclusion

In Beijing Information Science and Technology University’s independent test, ZK-DPU WS5000 reached ~85× peak model-load speedup, 5–12× training I/O speedup and up to +357% token efficiency; median latency reduction across 7 metrics was 90.9% — reproducible and verifiable.^S38

Benchmark it on your own workload

2 live demo units are ready for immediate PoC. Let the data do the talking.

Request a PoC → Contact us