目录 / 文档-技术白皮书 / 46-EFT.WP.Data.Benchmarks v1.0
I. 章节目的与范围
:接口原型、请求/响应信封、错误码、鉴权与幂等、速率限制与版本协商;覆盖基准加载、任务执行、评分归一化、显著性与不确定度计算、排行榜发布与撤回;并与数据契约、计量口径、跨卷引用锚点及导出清单对齐。评测 API与规范性实现绑定提供II. 服务面(规范性)
services:
benchmarks.v1:
# 套件与任务
- POST /api/v1/benchmarks/load_suite # 加载/校验套件(阻断)
- POST /api/v1/benchmarks/list_tasks # 列举任务/子任务
- POST /api/v1/benchmarks/get_task # 获取任务规范
# 评测执行
- POST /api/v1/benchmarks/evaluate # 执行评测(离线/在线/流式/交互)
- POST /api/v1/benchmarks/score # 评分与聚合(含归一化)
- POST /api/v1/benchmarks/significance # 显著性与多重比较校正
- POST /api/v1/benchmarks/uncertainty # 不确定度合成与覆盖区间
- POST /api/v1/benchmarks/robustness # 鲁棒/偏移/对抗评测
- POST /api/v1/benchmarks/fairness_ethics # 公平性/伦理/安全应激评测
# 运行与环境
- POST /api/v1/benchmarks/runtime/metrics # 运行期指标(QPS/p99/ρ/能耗)
- POST /api/v1/benchmarks/runtime/lineage # 生成/查询血缘图
- POST /api/v1/benchmarks/runtime/replay # inputs_lock 回放复现
# 提交与发布
- POST /api/v1/benchmarks/submit # 提交评测结果
- POST /api/v1/benchmarks/publish # 发布至排行榜
- POST /api/v1/benchmarks/revoke # 撤回或更正
- POST /api/v1/benchmarks/hash_artifact # 工件哈希
- POST /api/v1/benchmarks/sign_artifact # 工件签名/验签
III. 通用请求/响应与鉴权
request_envelope:
headers:
Authorization: "Bearer <oidc-token> | HMAC <key>:<sig>"
x-eift-idempotency: "<uuid>" # 幂等键(≥24h)
content-type: "application/json"
body:
suite?: { ... } # benchmark.yaml/json
task_id?: "<suite.task>"
spec?: { ... } # 评测配置(protocol/env/metrics)
payload?: {artifacts:[{path, bytes_b64?, sha256?}]}
options?: {dry_run?: true, strict?: true}
filters?: {run_id?: "<id>", since?: "<ISO8601>", until?: "<ISO8601>"}
response_envelope:
status: "ok" | "warn" | "error"
errors: [{code, message, path?, see?}]
warnings:[{code, message, path?, see?}]
metrics: { ... } # 评测/计量/成本统计
data?: { ... } # 结构化结果(scores/ci/graphs)
version: "benchmarks.v1"
security:
auth: "OIDC bearer | HMAC"
tls: "TLS1.2+"
scope: ["load","evaluate","metrics","lineage","submit","publish","admin"]
rate_limits:
per_key_per_minute: 120
burst: 60
IV. 规范性 OpenAPI 摘录
openapi: 3.0.3
info: {title: "EFT Benchmarks API", version: "v1"}
paths:
/api/v1/benchmarks/load_suite:
post:
summary: Validate and load a benchmark suite
requestBody: {required:true, content: {"application/json": {schema: {$ref: "#/components/schemas/SuiteEnvelope"}}}}
responses:
"200": {description: "Result", content: {"application/json": {schema: {$ref: "#/components/schemas/Result"}}}}
/api/v1/benchmarks/evaluate:
post:
summary: Execute evaluation (offline/online/stream/interactive)
requestBody: {required:true, content: {"application/json": {schema: {$ref: "#/components/schemas/EvalRequest"}}}}
responses:
"200": {description: "Run accepted", content: {"application/json": {schema: {$ref: "#/components/schemas/EvalResult"}}}}
components:
schemas:
SuiteEnvelope: {type: object, properties: {suite: {}, options:{type:object}}}
EvalRequest:
type: object
properties:
task_id: {type: string}
spec: {type: object} # protocol/env/metrics
options: {type: object, properties:{mode:{type:string, enum:["sync","async"]}}}
EvalResult:
type: object
properties:
run_id: {type: string}
state: {type: string, enum: ["queued","running","succeeded","failed"]}
scores: {type: object}
ci: {type: object}
artifacts: {type: array, items:{type: object}}
Result:
type: object
properties:
status: {type: string, enum: [ok, warn, error]}
errors: {type: array, items: {$ref: "#/components/schemas/Issue"}}
warnings:{type: array, items: {$ref: "#/components/schemas/Issue"}}
metrics: {type: object}
data: {type: object}
Issue:
type: object
properties:
code: {type: string}
message: {type: string}
path: {type: string}
see: {type: array, items: {type: string}}
V. 端点语义(要点)
- /benchmarks/load_suite(阻断)——结构/类型/正则、跨卷锚点、metrology.units="SI"&check_dim=true、冻结切分与泄漏护栏、评分/显著性/合规最小检查。
- /benchmarks/evaluate——按第7章协议执行(离线/在线/流式/交互),返回 run_id、中间与最终工件;在线支持 shadow/canary 与护栏。
- /benchmarks/score——依据第8章聚合、归一化与分档;方向统一后组合;输出 score_raw/score_norm 与 tie_break 明细。
- /benchmarks/significance——显著性、功效与多重比较校正,输出 Δ/CI_95/p;与门槛联动。
- /benchmarks/uncertainty——按第9章 GUM|linear|montecarlo|bayes 合成不确定度并给出覆盖区间;量纲统一。
- /benchmarks/robustness、/fairness_ethics——执行第12/13章条目,对照阈值出具阻断/预警结论。
- */benchmarks/runtime/ **——查询运行期性能/能耗与血缘、回放一致性报告。
- /benchmarks/submit|publish|revoke——提交、发布/撤回排行榜项;遵循稳定线与公告策略;撤回生成墓碑并同步镜像/索引。
- /benchmarks/hash_artifact|sign_artifact——sha256 哈希与签名/验签;与 export_manifest.artifacts[] 对表。
VI. 错误码(规范性)
errors:
- {code:"ESCHEMA001", message:"suite schema violation", path:"$.suite"}
- {code:"EREF001", message:"invalid reference format", path:"$.export_manifest.references[*]"}
- {code:"EDIM001", message:"units must be SI and check_dim", path:"$.metrology"}
- {code:"ESPLIT001", message:"splits must be frozen and frozen indices enabled", path:"$.tasks[*].splits"}
- {code:"ELEAK000", message:"cross-split leakage detected", path:"$.tasks[*].leakage_guard"}
- {code:"EPROTO001", message:"protocol mode invalid", path:"$.tasks[*].protocol.mode"}
- {code:"EMETRIC001", message:"metric missing family/unit/higher_is_better", path:"$.tasks[*].metrics[*]"}
- {code:"ESIG001", message:"significance params incomplete", path:"$.tasks[*].significance"}
- {code:"EPUB001", message:"publish gate not met", path:"$.scoring.stability"}
VII. 幂等性、版本协商与兼容性
idempotency:
header: "x-eift-idempotency"
window_hours: 24
versioning:
api: "benchmarks.v1" # 破坏性变更 → 提升 MAJOR
minor: "向后兼容新增"
compatibility:
request_backward: "minor+patch"
response_fields: "新增仅追加,不移除"
VIII. 安全、审计与合规
- 鉴权:OIDC/HMAC;传输:TLS1.2+;最小权限:按 scope 授权。
- 审计:记录 request_id、idempotency_key、调用方、时间戳、摘要;日志纳入合规模块并在导出清单登记。
- 合规:区域限制、数据主体权利与提交流程对接第14章;发布/撤回遵循稳定线与公告策略。
IX. 机器可读实现片段(Ixx-? 原型)
def load_suite(suite: dict) -> dict: ...
def list_tasks(suite_id: str) -> dict: ...
def get_task(suite_id: str, task_id: str) -> dict: ...
def evaluate(task_id: str, spec: dict, mode: str = "async") -> dict: ...
def score(results: dict, aggregation: dict, normalization: dict) -> dict: ...
def significance(a: dict, b: dict, method: str = "bootstrap", B: int = 10000) -> dict: ...
def uncertainty(model: str, components: list[dict], policy: dict) -> dict: ...
def robustness(spec: dict) -> dict: ...
def fairness_ethics(spec: dict) -> dict: ...
def runtime_metrics(run_id: str, since: str|None=None, until: str|None=None) -> dict: ...
def lineage(spec: dict|None=None, run_id: str|None=None) -> dict: ...
def replay(run_id: str, policy: str="strict") -> dict: ...
def hash_artifact(path: str|bytes) -> dict: ...
def sign_artifact(path: str|bytes, key_id: str) -> dict: ...
def submit(payload: dict) -> dict: ...
def publish(entry: dict) -> dict: ...
def revoke(tag: str, reason: str) -> dict: ...
X. 示例调用(可直接使用)
# 加载与校验套件
curl -s -X POST https://api.eift.org/api/v1/benchmarks/load_suite \
-H "Authorization: Bearer <token>" \
-H "x-eift-idempotency: 7b7a0b1e-0a21-4f3f-9d0b-3b1e9b1f3c22" \
-H "Content-Type: application/json" \
-d @benchmark.json
# 执行评测(异步)
curl -s -X POST https://api.eift.org/api/v1/benchmarks/evaluate \
-H "Authorization: Bearer <token>" -H "Content-Type: application/json" \
-d '{"task_id":"cls.binary","spec":{...},"options":{"mode":"async"}}'
# 评分与显著性
curl -s -X POST https://api.eift.org/api/v1/benchmarks/score -d @scores.json
curl -s -X POST https://api.eift.org/api/v1/benchmarks/significance -d @pair.json
XI. 与导出清单的耦合(规范性)
export_manifest:
artifacts:
- {path:"api/openapi.yaml", sha256:"..."}
- {path:"api/clients/python.tar.gz", sha256:"..."}
- {path:"runs/RUN-123/scores.json", sha256:"..."}
- {path:"runs/RUN-123/ci.json", sha256:"..."}
- {path:"runs/RUN-123/leaderboard.csv",sha256:"..."}
references:
- "EFT.WP.Core.DataSpec v1.0:EXPORT"
- "EFT.WP.Core.Metrology v1.0:check_dim"
- "EFT.WP.Data.Benchmarks v1.0:Ch.6"
- "EFT.WP.Data.Benchmarks v1.0:Ch.8"
- "EFT.WP.Data.Benchmarks v1.0:Ch.9"
XII. 本章合规自检
- 阻断接口(load_suite/evaluate/score/significance/uncertainty)已实现并启用鉴权、幂等与速率限制。
- 引用锚点采用“卷名 vX.Y:锚点”,并在 export_manifest.references[] 中体现;无短码与无版本引用。
- 计量校核生效(units="SI", check_dim=true);冻结切分/泄漏护栏/评分归一化/显著性最小检查通过。
- 发布/撤回遵循稳定线与治理策略;OpenAPI/SDK 与评分/CI/榜单工件列入导出清单并可校验。
版权与许可(CC BY 4.0)
版权声明:除另有说明外,《能量丝理论》(含文本、图表、插图、符号与公式)的著作权由作者(“屠广林”先生)享有。
许可方式:本作品采用 Creative Commons 署名 4.0 国际许可协议(CC BY 4.0)进行许可;在注明作者与来源的前提下,允许为商业或非商业目的进行复制、转载、节选、改编与再分发。
署名格式(建议):作者:“屠广林”;作品:《能量丝理论》;来源:energyfilament.org;许可证:CC BY 4.0。
首次发布: 2025-11-11|当前版本:v5.1
协议链接:https://creativecommons.org/licenses/by/4.0/