目录文档-技术白皮书46-EFT.WP.Data.Benchmarks v1.0

第16章 实现绑定与评测 API


I. 章节目的与范围

:接口原型、请求/响应信封、错误码、鉴权与幂等、速率限制与版本协商;覆盖基准加载、任务执行、评分归一化、显著性与不确定度计算、排行榜发布与撤回;并与数据契约、计量口径、跨卷引用锚点及导出清单对齐。评测 API规范性实现绑定提供

II. 服务面(规范性)

services:

benchmarks.v1:

# 套件与任务

- POST /api/v1/benchmarks/load_suite # 加载/校验套件(阻断)

- POST /api/v1/benchmarks/list_tasks # 列举任务/子任务

- POST /api/v1/benchmarks/get_task # 获取任务规范

# 评测执行

- POST /api/v1/benchmarks/evaluate # 执行评测(离线/在线/流式/交互)

- POST /api/v1/benchmarks/score # 评分与聚合(含归一化)

- POST /api/v1/benchmarks/significance # 显著性与多重比较校正

- POST /api/v1/benchmarks/uncertainty # 不确定度合成与覆盖区间

- POST /api/v1/benchmarks/robustness # 鲁棒/偏移/对抗评测

- POST /api/v1/benchmarks/fairness_ethics # 公平性/伦理/安全应激评测

# 运行与环境

- POST /api/v1/benchmarks/runtime/metrics # 运行期指标(QPS/p99/ρ/能耗)

- POST /api/v1/benchmarks/runtime/lineage # 生成/查询血缘图

- POST /api/v1/benchmarks/runtime/replay # inputs_lock 回放复现

# 提交与发布

- POST /api/v1/benchmarks/submit # 提交评测结果

- POST /api/v1/benchmarks/publish # 发布至排行榜

- POST /api/v1/benchmarks/revoke # 撤回或更正

- POST /api/v1/benchmarks/hash_artifact # 工件哈希

- POST /api/v1/benchmarks/sign_artifact # 工件签名/验签


III. 通用请求/响应与鉴权

request_envelope:

headers:

Authorization: "Bearer <oidc-token> | HMAC <key>:<sig>"

x-eift-idempotency: "<uuid>" # 幂等键(≥24h)

content-type: "application/json"

body:

suite?: { ... } # benchmark.yaml/json

task_id?: "<suite.task>"

spec?: { ... } # 评测配置(protocol/env/metrics)

payload?: {artifacts:[{path, bytes_b64?, sha256?}]}

options?: {dry_run?: true, strict?: true}

filters?: {run_id?: "<id>", since?: "<ISO8601>", until?: "<ISO8601>"}

response_envelope:

status: "ok" | "warn" | "error"

errors: [{code, message, path?, see?}]

warnings:[{code, message, path?, see?}]

metrics: { ... } # 评测/计量/成本统计

data?: { ... } # 结构化结果(scores/ci/graphs)

version: "benchmarks.v1"

security:

auth: "OIDC bearer | HMAC"

tls: "TLS1.2+"

scope: ["load","evaluate","metrics","lineage","submit","publish","admin"]

rate_limits:

per_key_per_minute: 120

burst: 60


IV. 规范性 OpenAPI 摘录

openapi: 3.0.3

info: {title: "EFT Benchmarks API", version: "v1"}

paths:

/api/v1/benchmarks/load_suite:

post:

summary: Validate and load a benchmark suite

requestBody: {required:true, content: {"application/json": {schema: {$ref: "#/components/schemas/SuiteEnvelope"}}}}

responses:

"200": {description: "Result", content: {"application/json": {schema: {$ref: "#/components/schemas/Result"}}}}

/api/v1/benchmarks/evaluate:

post:

summary: Execute evaluation (offline/online/stream/interactive)

requestBody: {required:true, content: {"application/json": {schema: {$ref: "#/components/schemas/EvalRequest"}}}}

responses:

"200": {description: "Run accepted", content: {"application/json": {schema: {$ref: "#/components/schemas/EvalResult"}}}}

components:

schemas:

SuiteEnvelope: {type: object, properties: {suite: {}, options:{type:object}}}

EvalRequest:

type: object

properties:

task_id: {type: string}

spec: {type: object} # protocol/env/metrics

options: {type: object, properties:{mode:{type:string, enum:["sync","async"]}}}

EvalResult:

type: object

properties:

run_id: {type: string}

state: {type: string, enum: ["queued","running","succeeded","failed"]}

scores: {type: object}

ci: {type: object}

artifacts: {type: array, items:{type: object}}

Result:

type: object

properties:

status: {type: string, enum: [ok, warn, error]}

errors: {type: array, items: {$ref: "#/components/schemas/Issue"}}

warnings:{type: array, items: {$ref: "#/components/schemas/Issue"}}

metrics: {type: object}

data: {type: object}

Issue:

type: object

properties:

code: {type: string}

message: {type: string}

path: {type: string}

see: {type: array, items: {type: string}}


V. 端点语义(要点)


VI. 错误码(规范性)

errors:

- {code:"ESCHEMA001", message:"suite schema violation", path:"$.suite"}

- {code:"EREF001", message:"invalid reference format", path:"$.export_manifest.references[*]"}

- {code:"EDIM001", message:"units must be SI and check_dim", path:"$.metrology"}

- {code:"ESPLIT001", message:"splits must be frozen and frozen indices enabled", path:"$.tasks[*].splits"}

- {code:"ELEAK000", message:"cross-split leakage detected", path:"$.tasks[*].leakage_guard"}

- {code:"EPROTO001", message:"protocol mode invalid", path:"$.tasks[*].protocol.mode"}

- {code:"EMETRIC001", message:"metric missing family/unit/higher_is_better", path:"$.tasks[*].metrics[*]"}

- {code:"ESIG001", message:"significance params incomplete", path:"$.tasks[*].significance"}

- {code:"EPUB001", message:"publish gate not met", path:"$.scoring.stability"}


VII. 幂等性、版本协商与兼容性

idempotency:

header: "x-eift-idempotency"

window_hours: 24

versioning:

api: "benchmarks.v1" # 破坏性变更 → 提升 MAJOR

minor: "向后兼容新增"

compatibility:

request_backward: "minor+patch"

response_fields: "新增仅追加,不移除"


VIII. 安全、审计与合规


IX. 机器可读实现片段(Ixx-? 原型)

def load_suite(suite: dict) -> dict: ...

def list_tasks(suite_id: str) -> dict: ...

def get_task(suite_id: str, task_id: str) -> dict: ...

def evaluate(task_id: str, spec: dict, mode: str = "async") -> dict: ...

def score(results: dict, aggregation: dict, normalization: dict) -> dict: ...

def significance(a: dict, b: dict, method: str = "bootstrap", B: int = 10000) -> dict: ...

def uncertainty(model: str, components: list[dict], policy: dict) -> dict: ...

def robustness(spec: dict) -> dict: ...

def fairness_ethics(spec: dict) -> dict: ...

def runtime_metrics(run_id: str, since: str|None=None, until: str|None=None) -> dict: ...

def lineage(spec: dict|None=None, run_id: str|None=None) -> dict: ...

def replay(run_id: str, policy: str="strict") -> dict: ...

def hash_artifact(path: str|bytes) -> dict: ...

def sign_artifact(path: str|bytes, key_id: str) -> dict: ...

def submit(payload: dict) -> dict: ...

def publish(entry: dict) -> dict: ...

def revoke(tag: str, reason: str) -> dict: ...


X. 示例调用(可直接使用)

# 加载与校验套件

curl -s -X POST https://api.eift.org/api/v1/benchmarks/load_suite \

-H "Authorization: Bearer <token>" \

-H "x-eift-idempotency: 7b7a0b1e-0a21-4f3f-9d0b-3b1e9b1f3c22" \

-H "Content-Type: application/json" \

-d @benchmark.json

# 执行评测(异步)

curl -s -X POST https://api.eift.org/api/v1/benchmarks/evaluate \

-H "Authorization: Bearer <token>" -H "Content-Type: application/json" \

-d '{"task_id":"cls.binary","spec":{...},"options":{"mode":"async"}}'

# 评分与显著性

curl -s -X POST https://api.eift.org/api/v1/benchmarks/score -d @scores.json

curl -s -X POST https://api.eift.org/api/v1/benchmarks/significance -d @pair.json


XI. 与导出清单的耦合(规范性)

export_manifest:

artifacts:

- {path:"api/openapi.yaml", sha256:"..."}

- {path:"api/clients/python.tar.gz", sha256:"..."}

- {path:"runs/RUN-123/scores.json", sha256:"..."}

- {path:"runs/RUN-123/ci.json", sha256:"..."}

- {path:"runs/RUN-123/leaderboard.csv",sha256:"..."}

references:

- "EFT.WP.Core.DataSpec v1.0:EXPORT"

- "EFT.WP.Core.Metrology v1.0:check_dim"

- "EFT.WP.Data.Benchmarks v1.0:Ch.6"

- "EFT.WP.Data.Benchmarks v1.0:Ch.8"

- "EFT.WP.Data.Benchmarks v1.0:Ch.9"


XII. 本章合规自检


版权与许可(CC BY 4.0)

版权声明:除另有说明外,《能量丝理论》(含文本、图表、插图、符号与公式)的著作权由作者(“屠广林”先生)享有。
许可方式:本作品采用 Creative Commons 署名 4.0 国际许可协议(CC BY 4.0)进行许可;在注明作者与来源的前提下,允许为商业或非商业目的进行复制、转载、节选、改编与再分发。
署名格式(建议):作者:“屠广林”;作品:《能量丝理论》;来源:energyfilament.org;许可证:CC BY 4.0。

首次发布: 2025-11-11|当前版本:v5.1
协议链接:https://creativecommons.org/licenses/by/4.0/