46-EFT.WP.Data.Benchmarks v1.0 | 第16章实现绑定与评测 API

目录／文档-技术白皮书（V5.05）／ 46-EFT.WP.Data.Benchmarks v1.0

第16章实现绑定与评测 API

I. 章节目的与范围

：接口原型、请求/响应信封、错误码、鉴权与幂等、速率限制与版本协商；覆盖基准加载、任务执行、评分归一化、显著性与不确定度计算、排行榜发布与撤回；并与数据契约、计量口径、跨卷引用锚点及导出清单对齐。评测 API与规范性实现绑定提供

II. 服务面（规范性）

services:

benchmarks.v1:

# 套件与任务

- POST /api/v1/benchmarks/load_suite # 加载/校验套件（阻断）

- POST /api/v1/benchmarks/list_tasks # 列举任务/子任务

- POST /api/v1/benchmarks/get_task # 获取任务规范

# 评测执行

- POST /api/v1/benchmarks/evaluate # 执行评测（离线/在线/流式/交互）

- POST /api/v1/benchmarks/score # 评分与聚合（含归一化）

- POST /api/v1/benchmarks/significance # 显著性与多重比较校正

- POST /api/v1/benchmarks/uncertainty # 不确定度合成与覆盖区间

- POST /api/v1/benchmarks/robustness # 鲁棒/偏移/对抗评测

- POST /api/v1/benchmarks/fairness_ethics # 公平性/伦理/安全应激评测

# 运行与环境

- POST /api/v1/benchmarks/runtime/metrics # 运行期指标（QPS/p99/ρ/能耗）

- POST /api/v1/benchmarks/runtime/lineage # 生成/查询血缘图

- POST /api/v1/benchmarks/runtime/replay # inputs_lock 回放复现

# 提交与发布

- POST /api/v1/benchmarks/submit # 提交评测结果

- POST /api/v1/benchmarks/publish # 发布至排行榜

- POST /api/v1/benchmarks/revoke # 撤回或更正

- POST /api/v1/benchmarks/hash_artifact # 工件哈希

- POST /api/v1/benchmarks/sign_artifact # 工件签名/验签

III. 通用请求/响应与鉴权

request_envelope:

headers:

Authorization: "Bearer <oidc-token> | HMAC <key>:<sig>"

x-eift-idempotency: "<uuid>" # 幂等键（≥24h）

content-type: "application/json"

body:

suite?: { ... } # benchmark.yaml/json

task_id?: "<suite.task>"

spec?: { ... } # 评测配置（protocol/env/metrics）

payload?: {artifacts:[{path, bytes_b64?, sha256?}]}

options?: {dry_run?: true, strict?: true}

filters?: {run_id?: "<id>", since?: "<ISO8601>", until?: "<ISO8601>"}

response_envelope:

status: "ok" | "warn" | "error"

errors: [{code, message, path?, see?}]

warnings:[{code, message, path?, see?}]

metrics: { ... } # 评测/计量/成本统计

data?: { ... } # 结构化结果（scores/ci/graphs）

version: "benchmarks.v1"

security:

auth: "OIDC bearer | HMAC"

tls: "TLS1.2+"

scope: ["load","evaluate","metrics","lineage","submit","publish","admin"]

rate_limits:

per_key_per_minute: 120

burst: 60

IV. 规范性 OpenAPI 摘录

openapi: 3.0.3

info: {title: "EFT Benchmarks API", version: "v1"}

paths:

/api/v1/benchmarks/load_suite:

post:

summary: Validate and load a benchmark suite

requestBody: {required:true, content: {"application/json": {schema: {$ref: "#/components/schemas/SuiteEnvelope"}}}}

responses:

"200": {description: "Result", content: {"application/json": {schema: {$ref: "#/components/schemas/Result"}}}}

/api/v1/benchmarks/evaluate:

post:

summary: Execute evaluation (offline/online/stream/interactive)

requestBody: {required:true, content: {"application/json": {schema: {$ref: "#/components/schemas/EvalRequest"}}}}

responses:

"200": {description: "Run accepted", content: {"application/json": {schema: {$ref: "#/components/schemas/EvalResult"}}}}

components:

schemas:

SuiteEnvelope: {type: object, properties: {suite: {}, options:{type:object}}}

EvalRequest:

type: object

properties:

task_id: {type: string}

spec: {type: object} # protocol/env/metrics

options: {type: object, properties:{mode:{type:string, enum:["sync","async"]}}}

EvalResult:

type: object

properties:

run_id: {type: string}

state: {type: string, enum: ["queued","running","succeeded","failed"]}

scores: {type: object}

ci: {type: object}

artifacts: {type: array, items:{type: object}}

Result:

type: object

properties:

status: {type: string, enum: [ok, warn, error]}

errors: {type: array, items: {$ref: "#/components/schemas/Issue"}}

warnings:{type: array, items: {$ref: "#/components/schemas/Issue"}}

metrics: {type: object}

data: {type: object}

Issue:

type: object

properties:

code: {type: string}

message: {type: string}

path: {type: string}

see: {type: array, items: {type: string}}

V. 端点语义（要点）

/benchmarks/load_suite（阻断）——结构/类型/正则、跨卷锚点、metrology.units="SI"&check_dim=true、冻结切分与泄漏护栏、评分/显著性/合规最小检查。
/benchmarks/evaluate——按第7章协议执行（离线/在线/流式/交互），返回 run_id、中间与最终工件；在线支持 shadow/canary 与护栏。
/benchmarks/score——依据第8章聚合、归一化与分档；方向统一后组合；输出 score_raw/score_norm 与 tie_break 明细。
/benchmarks/significance——显著性、功效与多重比较校正，输出 Δ/CI_95/p；与门槛联动。
/benchmarks/uncertainty——按第9章 GUM|linear|montecarlo|bayes 合成不确定度并给出覆盖区间；量纲统一。
/benchmarks/robustness、/fairness_ethics——执行第12/13章条目，对照阈值出具阻断/预警结论。
*/benchmarks/runtime/ **——查询运行期性能/能耗与血缘、回放一致性报告。
/benchmarks/submit|publish|revoke——提交、发布/撤回排行榜项；遵循稳定线与公告策略；撤回生成墓碑并同步镜像/索引。
/benchmarks/hash_artifact|sign_artifact——sha256 哈希与签名/验签；与 export_manifest.artifacts[] 对表。

VI. 错误码（规范性）

errors:

- {code:"ESCHEMA001", message:"suite schema violation", path:"$.suite"}

- {code:"EREF001", message:"invalid reference format", path:"$.export_manifest.references[*]"}

- {code:"EDIM001", message:"units must be SI and check_dim", path:"$.metrology"}

- {code:"ESPLIT001", message:"splits must be frozen and frozen indices enabled", path:"$.tasks[*].splits"}

- {code:"ELEAK000", message:"cross-split leakage detected", path:"$.tasks[*].leakage_guard"}

- {code:"EPROTO001", message:"protocol mode invalid", path:"$.tasks[*].protocol.mode"}

- {code:"EMETRIC001", message:"metric missing family/unit/higher_is_better", path:"$.tasks[*].metrics[*]"}

- {code:"ESIG001", message:"significance params incomplete", path:"$.tasks[*].significance"}

- {code:"EPUB001", message:"publish gate not met", path:"$.scoring.stability"}

VII. 幂等性、版本协商与兼容性

idempotency:

header: "x-eift-idempotency"

window_hours: 24

versioning:

api: "benchmarks.v1" # 破坏性变更 → 提升 MAJOR

minor: "向后兼容新增"

compatibility:

request_backward: "minor+patch"

response_fields: "新增仅追加，不移除"

VIII. 安全、审计与合规

鉴权：OIDC/HMAC；传输：TLS1.2+；最小权限：按 scope 授权。
审计：记录 request_id、idempotency_key、调用方、时间戳、摘要；日志纳入合规模块并在导出清单登记。
合规：区域限制、数据主体权利与提交流程对接第14章；发布/撤回遵循稳定线与公告策略。

IX. 机器可读实现片段（Ixx-? 原型）

def load_suite(suite: dict) -> dict: ...

def list_tasks(suite_id: str) -> dict: ...

def get_task(suite_id: str, task_id: str) -> dict: ...

def evaluate(task_id: str, spec: dict, mode: str = "async") -> dict: ...

def score(results: dict, aggregation: dict, normalization: dict) -> dict: ...

def significance(a: dict, b: dict, method: str = "bootstrap", B: int = 10000) -> dict: ...

def uncertainty(model: str, components: list[dict], policy: dict) -> dict: ...

def robustness(spec: dict) -> dict: ...

def fairness_ethics(spec: dict) -> dict: ...

def runtime_metrics(run_id: str, since: str|None=None, until: str|None=None) -> dict: ...

def lineage(spec: dict|None=None, run_id: str|None=None) -> dict: ...

def replay(run_id: str, policy: str="strict") -> dict: ...

def hash_artifact(path: str|bytes) -> dict: ...

def sign_artifact(path: str|bytes, key_id: str) -> dict: ...

def submit(payload: dict) -> dict: ...

def publish(entry: dict) -> dict: ...

def revoke(tag: str, reason: str) -> dict: ...

X. 示例调用（可直接使用）

# 加载与校验套件

curl -s -X POST https://api.eift.org/api/v1/benchmarks/load_suite \

-H "Authorization: Bearer <token>" \

-H "x-eift-idempotency: 7b7a0b1e-0a21-4f3f-9d0b-3b1e9b1f3c22" \

-H "Content-Type: application/json" \

-d @benchmark.json

# 执行评测（异步）

curl -s -X POST https://api.eift.org/api/v1/benchmarks/evaluate \

-H "Authorization: Bearer <token>" -H "Content-Type: application/json" \

-d '{"task_id":"cls.binary","spec":{...},"options":{"mode":"async"}}'

# 评分与显著性

curl -s -X POST https://api.eift.org/api/v1/benchmarks/score -d @scores.json

curl -s -X POST https://api.eift.org/api/v1/benchmarks/significance -d @pair.json

XI. 与导出清单的耦合（规范性）

export_manifest:

artifacts:

- {path:"api/openapi.yaml", sha256:"..."}

- {path:"api/clients/python.tar.gz", sha256:"..."}

- {path:"runs/RUN-123/scores.json", sha256:"..."}

- {path:"runs/RUN-123/ci.json", sha256:"..."}

- {path:"runs/RUN-123/leaderboard.csv",sha256:"..."}

references:

- "EFT.WP.Core.DataSpec v1.0:EXPORT"

- "EFT.WP.Core.Metrology v1.0:check_dim"

- "EFT.WP.Data.Benchmarks v1.0:Ch.6"

- "EFT.WP.Data.Benchmarks v1.0:Ch.8"

- "EFT.WP.Data.Benchmarks v1.0:Ch.9"

XII. 本章合规自检

阻断接口（load_suite/evaluate/score/significance/uncertainty）已实现并启用鉴权、幂等与速率限制。
引用锚点采用“卷名 vX.Y:锚点”，并在 export_manifest.references[] 中体现；无短码与无版本引用。
计量校核生效（units="SI", check_dim=true）；冻结切分/泄漏护栏/评分归一化/显著性最小检查通过。
发布/撤回遵循稳定线与治理策略；OpenAPI/SDK 与评分/CI/榜单工件列入导出清单并可校验。

版权与许可：除另有说明外，《能量丝理论》（含文本、图表、插图、符号与公式）的著作权由作者（屠广林）享有。
许可方式（CC BY 4.0）：在注明作者与来源的前提下，允许复制、转载、节选、改编与再分发。
署名格式（建议）：作者：屠广林｜作品：《能量丝理论》｜来源：energyfilament.org｜许可证：CC BY 4.0
验证召集： 作者独立自费、无雇主无资助；下一阶段将优先在最愿意公开讨论、公开复现、公开挑错的环境中推进落地，不限国家。欢迎各国媒体与同行抓住窗口组织验证，并与我们联系。
版本信息： 首次发布：2025-11-11 ｜当前版本：v6.0+5.05