目录文档-技术白皮书53-模型卡 Template v1.0

第5章 训练数据与血缘


I. 目标与范围(Purpose & Scope)


II. 输入与依赖(Inputs & Dependencies)


III. 训练数据来源与许可(Sources & Licenses)


IV. 结构与切分对齐(Schema & Splits Alignment)


V. 采样与清洗(Sampling & Cleaning)


VI. 血缘与可追溯(Lineage & Traceability)


VII. 路径量统一口径(Normative Path Forms)

正文显式 gamma(ell) 与 d ell;数据侧记录 delta_form;训练/评测的路径/相位口径必须与数据集卡一致。


VIII. 质量门映射(Gate Mapping)


IX. 机读制品(Machine-Readable Artifacts)
A. data_refs.yaml

version: "1.0.0"

datasets:

- id: "ds-core"

see:

- "Dataset Card v1.0:Ch.3"

- "Dataset Card v1.0:Ch.4"

- "Dataset Card v1.0:Ch.6"

manifest: "DS_EXPORT/manifests/report_manifest.yaml"

splits: "DS_EXPORT/splits/split_manifest.json"

license: "CC-BY-4.0"

checksum: "sha256:..."

sampling:

seed: 20250924

strategy: { stratified: ["device","region","quality.flags"] }

preprocess_spec: "configs/preprocess_spec.yaml"

B. preprocess_spec.yaml

version: "1.0.0"

missing: { numeric: "null", route_to: "quality.flags" }

normalize: { mean: "μ_train", std: "σ_train" }

path_align: { require: true, delta_form: "general", enforce_delta_ell: true }

filters:

- name: "window_guard"

rule: "drop if ts ∉ [ts_start, ts_end]"

audits: { write_to: "reports/audit.jsonl" }


C. lineage_graph.json(节选)

JSON json
{
  "nodes": [
    { "id": "RAW-telemetry", "version": "1.0.0", "checksum": "sha256:..." },
    { "id": "CAL-telemetry", "version": "1.0.1", "checksum": "sha256:..." },
    { "id": "DER-train", "version": "1.0.0", "checksum": "sha256:..." }
  ],
  "edges": [
    { "from": "RAW-telemetry", "to": "CAL-telemetry", "type": "calibrate" },
    { "from": "CAL-telemetry", "to": "DER-train", "type": "derive" }
  ]
}

X. 反例与修正(Anti-Patterns & Fixes)


XI. 交叉引用(Cross-References)


XII. 执行勾选清单(Checklist)


版权与许可(CC BY 4.0)

版权声明:除另有说明外,《能量丝理论》(含文本、图表、插图、符号与公式)的著作权由作者(“屠广林”先生)享有。
许可方式:本作品采用 Creative Commons 署名 4.0 国际许可协议(CC BY 4.0)进行许可;在注明作者与来源的前提下,允许为商业或非商业目的进行复制、转载、节选、改编与再分发。
署名格式(建议):作者:“屠广林”;作品:《能量丝理论》;来源:energyfilament.org;许可证:CC BY 4.0。

首次发布: 2025-11-11|当前版本:v5.1
协议链接:https://creativecommons.org/licenses/by/4.0/