目录文档-技术白皮书45-EFT.WP.Data.Pipeline v1.0

第11章 版本化、溯源与血缘


I. 章节目的与范围

血缘(lineage)**的规范:对象与工件的版本锁定、哈希与可追溯、血缘图与回放、变更通告与兼容策略、审计轨与导出清单;确保与数据契约、数据卡/模型卡、计量章与引用锚点一致。、**溯源(provenance)版本化(versioning)固化流水线

II. 术语与依赖


III. 字段与结构(规范性)

versioning:

scheme: "semver" # vMAJOR.MINOR.PATCH

stability_line: "v1.*"

compat_mode: "forward|backward|both|break"

notice:

type: "release|correction|withdrawal"

summary: "<text>"

date: "<YYYY-MM-DD>"

provenance:

sources: ["<uri-or-ref>", "..."] # 数据来源/上游引用(仅引用)

transforms: ["<stage-name>@vX.Y", "..."]

environment:

containers: ["<image@digest>", "..."]

deps_lock: "locks/deps.lock.yaml"

seeds: {global: 1701}

lineage:

graph:

nodes:

- {id:"src.s3.pull", kind:"stage", version:"v1.0"}

- {id:"schema.check", kind:"stage", version:"v1.2"}

- {id:"feat.map", kind:"stage", version:"v1.1"}

- {id:"train_pkg", kind:"artifact", digest:"sha256:..."}

edges:

- {from:"src.s3.pull", to:"schema.check"}

- {from:"schema.check", to:"feat.map"}

- {from:"feat.map", to:"train_pkg"}

replay:

enabled: true

inputs_lock: "locks/inputs.manifest.json" # 源清单+偏移/水位

policy: "strict|lenient"

artifacts:

- {path:"pipeline.yaml", sha256:"<hex>"}

- {path:"locks/inputs.manifest.json", sha256:"<hex>"}

- {path:"locks/deps.lock.yaml", sha256:"<hex>"}

- {path:"outputs/train_pkg.tgz", sha256:"<hex>"}


IV. 版本化策略与稳定线


V. 溯源信息与可复现


VI. 血缘图与回放


VII. 工件哈希与完整性


VIII. 计量与单位(SI)

  1. 性能与资源:QPS(1/s)、T_inf(ms {p50,p95,p99})、ρ(—)、net_mbps、size_bytes;
  2. 强制:metrology:{units:"SI", check_dim:true};合成/换算前先做单位归一
  3. 路径量:如血缘涉及到达时/改正链路,登记 delta_form、path="gamma(ell)"、measure="d ell";采用:
    • T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ) 或
    • T_arr = ( ∫ ( n_eff / c_ref ) d ell ),并通过 check_dim。

IX. 机器可读片段(可直接嵌入)

versioning:

scheme: "semver"

stability_line: "v1.*"

compat_mode: "both"

notice: {type:"release", summary:"initial stable", date:"2025-09-21"}

provenance:

sources: ["s3://eift-data/raw/2025/09/", "contracts/raw_rows@v1.2"]

transforms: ["schema.check@v1.2", "feat.map@v1.1"]

environment:

containers: ["ghcr.io/eift/pipeline@sha256:abcdef..."]

deps_lock: "locks/deps.lock.yaml"

seeds: {global:1701}

lineage:

graph:

nodes:

- {id:"src.s3.pull", kind:"stage", version:"v1.0"}

- {id:"schema.check", kind:"stage", version:"v1.2"}

- {id:"feat.map", kind:"stage", version:"v1.1"}

- {id:"train_pkg", kind:"artifact", digest:"sha256:1234..."}

edges:

- {from:"src.s3.pull", to:"schema.check"}

- {from:"schema.check", to:"feat.map"}

- {from:"feat.map", to:"train_pkg"}

replay: {enabled:true, inputs_lock:"locks/inputs.manifest.json", policy:"strict"}

artifacts:

- {path:"pipeline.yaml", sha256:"..."}

- {path:"locks/inputs.manifest.json", sha256:"..."}

- {path:"locks/deps.lock.yaml", sha256:"..."}

- {path:"outputs/train_pkg.tgz", sha256:"..."}


X. Lint 规则(节选,规范性)

lint_rules:

- id: VER.SEMVER

when: "$.versioning.scheme"

assert: "value == 'semver' and matches($.pipeline.version, '^v\\d+\\.\\d+(\\.\\d+)?$')"

level: error

- id: VER.COMPAT_ALLOWED

when: "$.versioning.compat_mode"

assert: "value in ['forward','backward','both','break']"

level: error

- id: LIN.GRAPH_CONNECTED

when: "$.lineage.graph"

assert: "graph_is_connected(value) and no_dangling_nodes(value)"

level: error

- id: LIN.REPLAY_INPUTS_LOCK

when: "$.lineage.replay.enabled"

assert: "value == false or has_key($.lineage.replay.inputs_lock)"

level: error

- id: ART.SHA256_REQUIRED

when: "$.artifacts[*]"

assert: "has_key('sha256') and len(value.sha256) > 0"

level: error

- id: METROLOGY.SI_AND_CHECKDIM

when: "$.metrology"

assert: "units == 'SI' and check_dim == true"

level: error


XI. 导出清单与审计轨

export_manifest:

version: "v1.0"

artifacts:

- {path:"pipeline.yaml", sha256:"..."}

- {path:"locks/inputs.manifest.json", sha256:"..."}

- {path:"locks/deps.lock.yaml", sha256:"..."}

- {path:"lineage/graph.json", sha256:"..."}

- {path:"reports/replay.result.json", sha256:"..."}

references:

- "EFT.WP.Core.DataSpec v1.0:EXPORT"

- "EFT.WP.Core.Metrology v1.0:check_dim"

- "EFT.WP.Data.DatasetCards v1.0:Ch.11"

- "EFT.WP.Data.ModelCards v1.0:Ch.11"


XII. 本章合规自检


版权与许可(CC BY 4.0)

版权声明:除另有说明外,《能量丝理论》(含文本、图表、插图、符号与公式)的著作权由作者(“屠广林”先生)享有。
许可方式:本作品采用 Creative Commons 署名 4.0 国际许可协议(CC BY 4.0)进行许可;在注明作者与来源的前提下,允许为商业或非商业目的进行复制、转载、节选、改编与再分发。
署名格式(建议):作者:“屠广林”;作品:《能量丝理论》;来源:energyfilament.org;许可证:CC BY 4.0。

首次发布: 2025-11-11|当前版本:v5.1
协议链接:https://creativecommons.org/licenses/by/4.0/