CANN/mat-chem-sim-pred IPDT批量滚动评分基准测试

📅 2026/7/4 8:06:53 👁️ 阅读次数 📝 编程学习
CANN/mat-chem-sim-pred IPDT批量滚动评分基准测试

PidIpdtBatchRolloutScore Benchmark Report

【免费下载链接】mat-chem-sim-pred面向工业领域,聚焦计算仿真、预测两大核心场景,构建面向流程工业"机理+数据"双轮驱动的领域计算层,推动AI for Science在材料化学领域的深度应用。项目地址: https://gitcode.com/cann/mat-chem-sim-pred

This document records the measured CPU/NPU behavior ofPidIpdtBatchRolloutScore.

Environment

  • NPU host:node202
  • Device:Ascend910B3, device id0
  • CANN:/usr/local/Ascend/ascend-toolkit/latest
  • CPU baseline: benchmark program multi-thread mode
  • Build:-DCMAKE_BUILD_TYPE=Release -DSOC_VERSION=Ascend910B3 -DRUN_MODE=npu

Method

Thebenchmark_pid_ipdt_batch_rollout_score_aclnnprogram builds an in-process multi-thread CPU reference (ComputeRange, the same integrator recurrencey[k+1] = y[k] + b*u[k-delay]), runs the NPU operator on the same inputs and reportsmax_abs_err,max_quality_rel_errandbest_idx_diff_count. The pass conditions arenpu_zero_score_count == 0, per-candidate scores matching the CPU reference to float32 precision, and anybest_idxdifferences being near-ties (the chosen candidate's metric rel-err stays small), matching the behavior of the verified FOPDT operator.

Correctness

The IPDT kernel differs from the verified FOPDT kernel only in the state recurrence (thea*ydecay term is dropped). The candidate-axis SIMD width does not change the numerics (each tile is independent), so the accuracy profile matches FOPDT: NPU output equals the CPU reference within float32 rounding.

Measured onnode202 / Ascend910B3, B=128, sim_steps=1024, candidate_tile=C,npu_zero_score_count=0:

candidatesmax_abs_errmax_quality_rel_errbest_idx_diff_count
10242.4e-41.5e-60
40961.01.69e-31
163841.5e-33.3e-51

Themax_abs_err=1at 4096 is the discrete settling-time metric crossing the settle band one sample later on NPU than on CPU for a single near-tie loop (dt=1-> abs diff 1); the corresponding metric rel-err stays< 2e-3. The reference FOPDT operator shows the same behavior at this candidate count (max_abs_err=1, max_quality_rel_err=4.5e-3, best_idx_diff_count=1), so IPDT is within the accepted baseline.

Measured timing

node202 / Ascend910B3, B=128, sim_steps=1024, candidate_tile=C, CPU = 64-thread parallel reference.

candidatesCPU parallel msNPU kernel msNPU kernel vs CPU
102432.57.454.36x
4096122.124.74.95x
16384426.693.84.55x

Against a 192-thread CPU reference the speedup is 3.8-4.0x (the wider CPU pool narrows the gap).

Notes

The kernel reuses the FOPDT wide-lane (kLane=768) and fused inner-loop optimizations unchanged; the only algorithmic difference is the integrator recurrence, which removes one vector multiply per timestep.

【免费下载链接】mat-chem-sim-pred面向工业领域,聚焦计算仿真、预测两大核心场景,构建面向流程工业"机理+数据"双轮驱动的领域计算层,推动AI for Science在材料化学领域的深度应用。项目地址: https://gitcode.com/cann/mat-chem-sim-pred

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考