目录文档-技术白皮书46-EFT.WP.Data.Benchmarks v1.0

第17章 提交、复现与排行榜治理


I. 章节目的与范围

排行榜治理(leaderboard governance)**的规范:提交物结构与校验流程、环境锁定与重放复现、显著性与门槛联动、稳定线与版本治理、申诉与撤回机制、审计与公开披露;确保与任务定义、评测协议、评分与显著性、隐私合规、计量与引用锚点一致。、**复现(reproducibility)提交(submission)固化

II. 术语与依赖

  1. 术语:submission.payload、attestation(声明)、run_id、env.lock、container@digest、stability_line、gating、tombstone(撤稿墓碑)、appeal、cooldown(冷却期)。
  2. 依赖:评测协议(《ModelCards v1.0》第11章)、评分/归一化/门槛(本卷第8章)、显著性与不确定度(本卷第9章)、运行环境与计量负载(本卷第10章)、隐私与合规(本卷第14章)、单位与量纲(《Core.Metrology v1.0:check_dim》)。
  3. 数学与符号:内联符号一律用反引号;含除号/积分/复合算符必须加括号;如涉路径量 T_arr,采用
    • T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ) 或
    • T_arr = ( ∫ ( n_eff / c_ref ) d ell ),并声明 gamma(ell) 与 d ell;公式/符号/定义禁用中文

III. 字段与结构(规范性)

submission:

submitter:

org: "<org-name>"

contact: "<email>"

payload:

artifacts:

- {path:"reports/summary.json", sha256:"<hex>"}

- {path:"reports/scores.json", sha256:"<hex>"}

- {path:"logs/run.jsonl", sha256:"<hex>"}

- {path:"env.lock", sha256:"<hex>"}

- {path:"protocol.yaml", sha256:"<hex>"}

- {path:"metrics.yaml", sha256:"<hex>"}

checksum: "sha256"

provenance:

run_id: "<RUN-UUID>"

suite_version: "vX.Y"

task_id: "<suite.task>"

splits_ref: {train:"splits/train.index", val:"splits/val.index", test:"splits/test.index"}

env:

containers: ["ghcr.io/eift/runner@sha256:<hex>"]

deps_lock: "env.lock"

scores:

score_raw: {F1_macro:0.81, ECE:0.045}

score_norm: {suite_z: 1.23}

ci: {F1_macro:{lo:0.80, hi:0.82}, ECE:{lo:0.042, hi:0.048}}

significance:

vs_baseline: {baseline_id:"baseline.rf", method:"bootstrap", B:10000, alpha:0.05, p:0.012, correction:"Holm-Bonferroni"}

attestation:

author: "<name>"

date: "<YYYY-MM-DD>"

statement: "frozen splits used; no external data/tools unless declared; units in SI; check_dim=true"

see:

- "EFT.WP.Core.Metrology v1.0:check_dim"

- "EFT.WP.Data.Benchmarks v1.0:Ch.8"

- "EFT.WP.Data.Benchmarks v1.0:Ch.9"


IV. 提交流程与校验


V. 复现与重放


VI. 排行榜治理


VII. 撤回、更正与申诉


VIII. 审计与公开披露


IX. 计量与单位(SI)


X. 机器可读片段(可直接嵌入)

submission:

submitter: {org:"eift", contact:"bench@eift.org"}

payload:

artifacts:

- {path:"reports/scores.json", sha256:"..."}

- {path:"reports/summary.json", sha256:"..."}

- {path:"env.lock", sha256:"..."}

- {path:"protocol.yaml", sha256:"..."}

provenance:

run_id: "RUN-2025-09-21-001"

suite_version: "v1.0"

task_id: "cls.binary"

splits_ref: {train:"splits/train.index", val:"splits/val.index", test:"splits/test.index"}

env: {containers:["ghcr.io/eift/runner@sha256:abcdef..."], deps_lock:"env.lock"}

scores:

score_raw: {F1_macro:0.81, ECE:0.045}

score_norm: {suite_z:1.23}

ci: {F1_macro:{lo:0.80, hi:0.82}}

significance: {vs_baseline:{baseline_id:"baseline.rf", method:"bootstrap", B:10000, alpha:0.05, p:0.012}}

attestation: {author:"teamA", date:"2025-09-21", statement:"frozen splits; SI units; check_dim=true"}

governance:

stability_line: "v1.*"

cooldown: "P1D"

gating: {require_ci:true, min_runs:3}

metrology: {units:"SI", check_dim:true}


XI. Lint 规则(节选,规范性)

lint_rules:

- id: SUBM.ARTIFACTS_REQUIRED

when: "$.submission.payload.artifacts"

assert: "contains_files(['reports/scores.json','env.lock','protocol.yaml','metrics.yaml'])"

level: error

- id: SUBM.HASH_REQUIRED

when: "$.submission.payload.artifacts[*].sha256"

assert: "len(value) > 0"

level: error

- id: SUBM.SPLITS_MATCH_FROZEN

when: "$.submission.provenance.splits_ref"

assert: "files_exist(value) and all_frozen(value)"

level: error

- id: SUBM.CI_PRESENT

when: "$.submission.scores.ci"

assert: "has_any_ci(value)"

level: warn

- id: GOV.COOLDOWN_FORMAT

when: "$.governance.cooldown"

assert: "matches('^P\\d+[D]$') or duration_valid(value)"

level: error

- id: METROLOGY.SI_AND_CHECKDIM

when: "$.metrology"

assert: "units == 'SI' and check_dim == true"

level: error


XII. 交叉引用锚点


XIII. 本章合规自检


版权与许可(CC BY 4.0)

版权声明:除另有说明外,《能量丝理论》(含文本、图表、插图、符号与公式)的著作权由作者(“屠广林”先生)享有。
许可方式:本作品采用 Creative Commons 署名 4.0 国际许可协议(CC BY 4.0)进行许可;在注明作者与来源的前提下,允许为商业或非商业目的进行复制、转载、节选、改编与再分发。
署名格式(建议):作者:“屠广林”;作品:《能量丝理论》;来源:energyfilament.org;许可证:CC BY 4.0。

首次发布: 2025-11-11|当前版本:v5.1
协议链接:https://creativecommons.org/licenses/by/4.0/