目录文档-技术白皮书46-EFT.WP.Data.Benchmarks v1.0

第6章 指标体系与单位


I. 章节目的与范围

单位(units)**的规范:分类/回归/排序/检索/检测/生成/多模态/ASR 等任务的指标定义、聚合与窗口、阈值与门槛、校准与不确定度关联、性能与资源计量;确保与数据卡/模型卡/流水线、计量章与引用锚点一致。固化本卷**指标体系(metrics system)

II. 术语与依赖

  1. 术语:higher_is_better、agg(macro|micro|weighted|quant|max|min|mean|sum)、window、thresholds、target_ci、calibration(ECE|Brier)、perf(QPS|T_inf|ρ|net_mbps|size_bytes|power_w)。
  2. 依赖:计量与量纲校核(《Core.Metrology v1.0:check_dim》);评测协议与聚合(《ModelCards v1.0》第11章);监控计量(《Pipeline v1.0》第12章)。
  3. 数学与符号:内联符号一律用反引号;含除号/积分/复合算符必须加括号;路径量 T_arr 采用
    • T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ) 或
    • T_arr = ( ∫ ( n_eff / c_ref ) d ell ),并声明 gamma(ell) 与 d ell;公式/符号/定义禁用中文

III. 指标字段与结构(规范性)

metrics:

- name: "<metric_name>"

family: "classification|regression|ranking|retrieval|detection|nlp|asr|generation|multimodal|calibration|perf"

unit: "—|ms|1/s|dB|W|bytes|%" # SI 或无量纲(—)

higher_is_better: true|false

agg: "macro|micro|weighted|mean|quant|max|min|sum"

window: "N/A|1m|5m" # 仅对流式/在线场景

thresholds:

warn: "<expr>" # 例:p99<=200

block: "<expr>" # 例:ECE<=0.05

weighting:

scheme: "uniform|sample_share|expert"

w_i: null # 显式给定时填写

target_ci:

method: "bootstrap|t|bayes"

level: 0.95

see:

- "EFT.WP.Core.Metrology v1.0:check_dim"


IV. 常见指标定义与口径


V. 任务家族到指标映射(规范性)

families:

classification: ["Acc","F1_macro","F1_micro","ROC_AUC","PR_AUC","ECE","Brier"]

regression: ["RMSE","MAE","MAPE","R2"]

ranking: ["NDCG@k","MRR","precision@k","recall@k"]

retrieval: ["mAP","mAR","MRR","recall@k","latency_ms.p99","QPS"]

detection: ["mAP@0.50:0.95","AR@k"]

nlp: ["BLEU","ROUGE-L","chrF","BERTScore"]

asr: ["WER","CER","latency_ms.p95"]

generation: ["BLEU","ROUGE-L","NLL","ECE"]

perf: ["QPS","latency_ms.p50","latency_ms.p95","latency_ms.p99","ρ","net_mbps","size_bytes","power_w"]


VI. 聚合、加权与归一化


VII. 表达与阈值


VIII. 计量与单位(SI)

  1. 强制:metrology:{units:"SI", check_dim:true};指标单位以 SI 或无量纲 — 表示;复合量合成前先做单位归一
  2. 路径量:若指标依赖 T_arr,登记:delta_form、path="gamma(ell)"、measure="d ell";采用
    • T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ) 或
    • T_arr = ( ∫ ( n_eff / c_ref ) d ell ),并通过 check_dim 校核。

IX. 机器可读片段(可直接嵌入)

metrics:

- name: "F1_macro"

family: "classification"

unit: "—"

higher_is_better: true

agg: "macro"

window: "N/A"

thresholds: {warn: "F1_macro>=0.75", block: "F1_macro>=0.80"}

target_ci: {method:"bootstrap", level:0.95}

- name: "ECE"

family: "calibration"

unit: "—"

higher_is_better: false

agg: "mean"

window: "N/A"

thresholds: {block: "ECE<=0.05"}

- name: "latency_ms.p99"

family: "perf"

unit: "ms"

higher_is_better: false

agg: "quant"

window: "1m"

thresholds: {warn: "latency_ms.p99<=200", block: "latency_ms.p99<=150"}

- name: "QPS"

family: "perf"

unit: "1/s"

higher_is_better: true

agg: "sum"

window: "1m"


X. Lint 规则(节选,规范性)

lint_rules:

- id: METRIC.NAME_FORMAT

when: "$.metrics[*].name"

assert: "matches('^[A-Za-z0-9_.@]+$')"

level: error

- id: METRIC.FAMILY_ALLOWED

when: "$.metrics[*].family"

assert: "value in ['classification','regression','ranking','retrieval','detection','nlp','asr','generation','multimodal','calibration','perf']"

level: error

- id: METRIC.UNIT_SI_OR_DIMLESS

when: "$.metrics[*].unit"

assert: "all_units_in_SI(value) or value in ['—','%']"

level: error

- id: METRIC.AGG_ALLOWED

when: "$.metrics[*].agg"

assert: "value in ['macro','micro','weighted','mean','quant','max','min','sum']"

level: error

- id: METRIC.WINDOW_FORMAT

when: "$.metrics[*].window"

assert: "value in ['N/A','1m','5m','15m']"

level: error

- id: METROLOGY.SI_AND_CHECKDIM

when: "$.metrology"

assert: "units == 'SI' and check_dim == true"

level: error


XI. 交叉引用锚点


XII. 本章合规自检


版权与许可(CC BY 4.0)

版权声明:除另有说明外,《能量丝理论》(含文本、图表、插图、符号与公式)的著作权由作者(“屠广林”先生)享有。
许可方式:本作品采用 Creative Commons 署名 4.0 国际许可协议(CC BY 4.0)进行许可;在注明作者与来源的前提下,允许为商业或非商业目的进行复制、转载、节选、改编与再分发。
署名格式(建议):作者:“屠广林”;作品:《能量丝理论》;来源:energyfilament.org;许可证:CC BY 4.0。

首次发布: 2025-11-11|当前版本:v5.1
协议链接:https://creativecommons.org/licenses/by/4.0/