目录文档-技术白皮书53-模型卡 Template v1.0

第8章 评测基准与对比评分(Bench/Score)


I. 目标与范围(Purpose & Scope)


II. 前置条件与输入(Prerequisites & Inputs)


III. 基准任务与可比性(Bench Tasks & Comparability)


IV. 泄漏防护与一致性(Leakage Prevention & Consistency)


V. 指标与区间(Metrics & Intervals)

  1. 主指标(示例):AUC、ACC、MAE、RMSE、r_phi、ε_flux、Q_res、Latency_P95/Throughput(若含性能约束)。
  2. 区间规则:
    • k 覆盖:U = k·u_c;
    • alpha:t_{ν,1−α/2} 或正态近似;
    • quantile:如 [0.025, 0.975];全卷任选其一并保持一致。

VI. 对比评分映射(Scoring Mapping)


VII. 门阈映射与判定(Gates & Decisions)

  1. 与《误差预算卡》阈值对齐:
    • |ΔT_arr| + U(T_arr) ≤ τ_T;
    • LB(r_phi) ≥ r_phi_min;
    • P95(ε_flux) ≤ ε_flux_guard;
    • p_dim = 1.0、Σ PD。
  2. 发布判定:核心门通过且 Q ≥ Q_base + δQ_min → Pass;否则 Fail / [Restricted](仅发布定性图表与诊断)。

VIII. 路径量统一口径(Normative Path Forms)

正文显式 gamma(ell) 与 d ell;数据侧记录 delta_form;所有表达括号化。


IX. 机读配置与清单(Machine-Readable)
A. bench_plan.yaml

version: "1.0.0"

tasks:

- id: "bench-arrival"

split: "test"

metrics: ["DeltaT_arr_s","Q_res","p_dim"]

coverage: { mode: "k", k: 2 }

- id: "bench-phase"

split: "test"

metrics: ["r_phi","epsilon_flux"]

coverage: { mode: "quantile", p: [0.025, 0.975] }

baseline: { id: "base-001", version: "1.2.3" }

weights: { DeltaT_arr_s: 0.35, r_phi: 0.25, epsilon_flux: 0.15, p_dim: 0.15, Q_res: 0.10 }

B. scorecard.json(示例)

{

"version": "1.0.0",

"baseline": { "id": "base-001", "Q": 0.62 },

"method": { "id": "mdl-core", "Q": 0.78 },

"weights": { "DeltaT_arr_s": 0.35, "r_phi": 0.25, "epsilon_flux": 0.15, "p_dim": 0.15, "Q_res": 0.10 },

"metrics": {

"DeltaT_arr_s": { "mean": -2.3e-9, "Uk2": 1.5e-9 },

"r_phi": { "value": 0.72, "lb95": 0.61, "ub95": 0.80 },

"epsilon_flux": { "median": 0.004, "p95": 0.011 },

"p_dim": 1.0,

"Q_res": 0.13

},

"decision": "pass",

"see": ["EFT.WP.Core.Equations v1.1:S20-1","Error Budget Card v1.0:Ch.8"]

}


C. eval_report.md(提纲)

# Evaluation Report

- Tasks, splits, seeds

- Metrics with intervals & convergence

- Score mapping, weights, final Q

- Gate comparison & decision


X. 反例与修正(Anti-Patterns & Fixes)


XI. 交叉引用(Cross-References)


XII. 执行勾选清单(Checklist)


版权与许可(CC BY 4.0)

版权声明:除另有说明外,《能量丝理论》(含文本、图表、插图、符号与公式)的著作权由作者(“屠广林”先生)享有。
许可方式:本作品采用 Creative Commons 署名 4.0 国际许可协议(CC BY 4.0)进行许可;在注明作者与来源的前提下,允许为商业或非商业目的进行复制、转载、节选、改编与再分发。
署名格式(建议):作者:“屠广林”;作品:《能量丝理论》;来源:energyfilament.org;许可证:CC BY 4.0。

首次发布: 2025-11-11|当前版本:v5.1
协议链接:https://creativecommons.org/licenses/by/4.0/