目录文档-技术白皮书46-EFT.WP.Data.Benchmarks v1.0

第11章 基线与上限


I. 章节目的与范围

上限(upper bound / oracle)**在基准中的定义、构建与报告规范:弱/强基线、随机与启发式基线、Oracle/上限估计、复现实验脚本与环境锁定、评分与显著性联动、治理与公示要求;确保与任务定义、指标体系、评测协议、计量与引用锚点一致。固化**基线(baseline)

II. 术语与依赖

  1. 术语:weak_baseline、strong_baseline、random_baseline、oracle/upper_bound、expected_scores、attestation(出具声明)、repro_script、env.lock、anchors(归一化锚点)。
  2. 依赖:指标与单位(本卷第6章)、评测协议(《ModelCards v1.0》第11章)、运行环境(本卷第10章)、评分与排名(第8章)、单位与量纲核验(《Core.Metrology v1.0:check_dim》)。
  3. 数学与符号:内联符号一律用反引号;凡含除号/积分/复合算符必须加括号;若涉路径量 T_arr,采用
    • T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ) 或
    • T_arr = ( ∫ ( n_eff / c_ref ) d ell ),并声明 gamma(ell) 与 d ell;公式/符号/定义禁用中文

III. 字段与结构(规范性)

baselines:

- id: "<baseline.id>"

title: "<Human-readable name>"

class: "weak|strong|random|oracle"

evaluatee: "model|system|pipeline"

impl: "I15-<id>|container@digest"

params:

# 模型/系统关键参数(显式)

lr: 3.0e-4

batch_size: 256

seed: 1701

data:

dataset_ref: "datasets/<name>@vX.Y"

splits:

train: {frozen:true, index:"splits/train.index", sha256:"<hex>"}

val: {frozen:true, index:"splits/val.index", sha256:"<hex>"}

test: {frozen:true, index:"splits/test.index", sha256:"<hex>"}

leakage_guard: ["per-object","per-timewindow"]

protocol_ref: "protocols/<task>@vX.Y" # 与第7章协议一致

env:

containers: ["ghcr.io/eift/runner@sha256:<hex>"]

deps_lock: "env.lock"

expected_scores:

metrics: {F1_macro: 0.75, ECE: 0.06, "latency_ms.p99": 180}

target_ci: {method:"bootstrap", level:0.95}

artifacts:

repro_script: "scripts/repro_<baseline>.sh"

logs: ["logs/<run>.jsonl"]

model_files: ["weights/<file>.bin?"]

attestation:

author: "<name or org>"

date: "<YYYY-MM-DD>"

statement: "follows frozen splits; no external data/tools unless declared"

see:

- "EFT.WP.Core.Metrology v1.0:check_dim"

- "EFT.WP.Data.ModelCards v1.0:Ch.11"

- "EFT.WP.Data.Benchmarks v1.0:Ch.6"


IV. 基线与上限的定义与口径


V. 复现实验与环境锁定


VI. 与评分/归一化/排名的联动


VII. 随机/Oracle 的约束


VIII. 计量与单位(SI)


IX. 机器可读片段(可直接嵌入)

baselines:

- id: "baseline.logreg"

title: "Logistic Regression (BoW)"

class: "weak"

evaluatee: "model"

impl: "I15-1.logreg"

params: {lr:3.0e-4, batch_size:256, seed:1701}

data:

dataset_ref: "datasets/core_cls@v1.0"

splits:

train: {frozen:true, index:"splits/train.index", sha256:"..."}

val: {frozen:true, index:"splits/val.index", sha256:"..."}

test: {frozen:true, index:"splits/test.index", sha256:"..."}

leakage_guard: ["per-object"]

protocol_ref: "protocols/cls_offline@v1.0"

env: {containers:["ghcr.io/eift/runner@sha256:abcdef..."], deps_lock:"env.lock"}

expected_scores:

metrics: {Acc:0.84, F1_macro:0.75, ECE:0.06}

target_ci: {method:"bootstrap", level:0.95}

artifacts:

repro_script: "scripts/repro_logreg.sh"

logs: ["logs/logreg_run1.jsonl"]

attestation:

author: "EIFT Core"

date: "2025-09-21"

statement: "frozen splits; no external tools/data"


X. Lint 规则(节选,规范性)

lint_rules:

- id: BASE.ID_FORMAT

when: "$.baselines[*].id"

assert: "matches('^[a-z0-9_.\\-]+$')"

level: error

- id: BASE.CLASS_ALLOWED

when: "$.baselines[*].class"

assert: "value in ['weak','strong','random','oracle']"

level: error

- id: BASE.SPLITS_FROZEN

when: "$.baselines[*].data.splits"

assert: "splits.train.frozen and splits.val.frozen and splits.test.frozen"

level: error

- id: BASE.PROTOCOL_REF

when: "$.baselines[*].protocol_ref"

assert: "value != null"

level: error

- id: BASE.ENV_LOCKED

when: "$.baselines[*].env"

assert: "len($.baselines[*].env.containers) >= 1 and has_key($.baselines[*].env.deps_lock)"

level: error

- id: BASE.EXPECTED_SCORES_CI

when: "$.baselines[*].expected_scores.target_ci"

assert: "has_keys(method, level)"

level: error

- id: METROLOGY.SI_AND_CHECKDIM

when: "$.metrology"

assert: "units == 'SI' and check_dim == true"

level: error


XI. 交叉引用锚点


XII. 本章合规自检


版权与许可(CC BY 4.0)

版权声明:除另有说明外,《能量丝理论》(含文本、图表、插图、符号与公式)的著作权由作者(“屠广林”先生)享有。
许可方式:本作品采用 Creative Commons 署名 4.0 国际许可协议(CC BY 4.0)进行许可;在注明作者与来源的前提下,允许为商业或非商业目的进行复制、转载、节选、改编与再分发。
署名格式(建议):作者:“屠广林”;作品:《能量丝理论》;来源:energyfilament.org;许可证:CC BY 4.0。

首次发布: 2025-11-11|当前版本:v5.1
协议链接:https://creativecommons.org/licenses/by/4.0/