目录文档-技术白皮书52-数据集卡 Template v1.0

第12章 附录(表单/清单/模板)


I. 目的与范围(Purpose & Scope)


II. 推荐目录结构(DS_EXPORT/ Layout)

DS_EXPORT/

figs/

tables/

reports/

manifests/

schemas/

contracts/

splits/

policies/

benchmarks/

SIGNATURE.asc


III. 结构与契约(Structure & Contract)

A. schemas/dataset/schema.json(最小结构)

{

"$schema":"https://json-schema.org/draft/2020-12/schema",

"title":"Dataset v1.0.0 (structure)",

"type":"object",

"required":["record_id","acq","path","medium","ref","see","version"],

"properties":{

"record_id":{"type":"string"},

"acq":{"type":"object","required":["ts_start","ts_end"],

"properties":{"ts_start":{"type":"string","format":"date-time"},"ts_end":{"type":"string","format":"date-time"}}},

"path":{"type":"object","required":["gamma_ell","d_ell"],

"properties":{"gamma_ell":{"type":"array","items":{"type":"number"},"minItems":2},

"d_ell":{"type":"array","items":{"type":"number"},"minItems":2}}},

"medium":{"type":"object","required":["n_eff_profile"],

"properties":{"n_eff_profile":{"type":"array","items":{"type":"number"},"minItems":2}}},

"ref":{"type":"object","properties":{"c_ref":{"type":"number"},"lambda_ref":{"type":"number"}}},

"see":{"type":"array","items":{"type":"string"},"minItems":1},

"version":{"type":"string"}

}

}

B. contracts/contract.yaml(路径块与覆盖)

version: "1.0.0"

units: { T_arr: "s", Phi: "rad", c_ref: "m/s", lambda_ref: "m" }

path:

required: true

gamma: "gamma(ell)"

measure: "d ell"

delta_form: "general" # or "factored"

missing:

numeric: "null"

reason_to: "quality.flags"

coverage:

mode: "k" # k | alpha | quantile

k: 2


IV. 切分、版本与新鲜度(Splits/Versioning/Freshness)

A. splits/split.yaml

version: "1.0.0"

seed: 20250924

strategy:

group_by: ["entity_id"]

time_ordered: true

splits: { train: 0.70, val: 0.15, test: 0.15 }

constraints:

leakage: { time: { enforce: true }, entity: { enforce: true } }

path:

require_alignment: true

delta_form: "general"

coverage: { mode: "k", k: 2 }

B. splits/split_manifest.json(节选)

{

"dataset_version":"1.2.0",

"splits":{

"train":{"count":120345,"checksum":"sha256:..."},

"val":{"count":25780,"checksum":"sha256:..."},

"test":{"count":25812,"checksum":"sha256:..."}

},

"slices":{"low_snr":{"count":8142,"rule":"snr<5"}},

"freshness":{

"valid_from":"2025-09-01T00:00:00Z",

"valid_to":"2026-03-01T00:00:00Z",

"policy":{"tau_calib_s_max":86400,"clock_state":"locked"}

}

}


C. manifests/version_matrix.yaml(兼容矩阵)

dataset: "ds-core"

current: "1.2.0"

compatibility:

"1.2.x": { api: ">=1.2,<2.0", schema: ">=1.2,<2.0" }

"1.1.x": { api: ">=1.1,<1.3", schema: ">=1.1,<1.3" }

migration:

from: "1.1.x"

to: "1.2.x"

steps:

- "add slice 'low_snr'"

- "add field quality.score_Q"

rollback: { tag: "v1.1.3-lock" }


V. 质量门与校验(QC Gates & Validate)

A. manifests/gate_rules.yaml

version: "1.0.0"

gates:

G1: { schema_required: true }

G2: { anchor_coverage_min: 0.90, forbid_external_links: true }

G3: { path_required: true, min_samples: 2, delta_form: ["general","factored"], delta_ell_guard: "c_ref/fs/max(n_eff)" }

G4: { require_dim_check: true, p_dim: 1.0 }

G5: { tau_calib_s_max: 86400, clock_state: "locked" }

G6: { coverage_allowed: ["k","alpha","quantile"] }

G7: { cov_pd: true, kernel_allowed: ["exp","matern","ar1","const"] }

G8: { unique_record_id: true, unique_checksum: true, lineage_acyclic: true }

stops:

S1: "dim_check_fail or p_dim<1"

S2: "freshness_expired or clock_state!=locked"

S3: "path_block_missing or delta_ell_violate"

S4: "covariance_not_pd or cov_model_mismatch"

S5: "anchor_coverage_below_min or external_link_found"

labels: { restricted: "[Restricted]" }

B. reports/validate_report.json(示例)

{

"dataset_id":"ds-core",

"timestamp":"2025-09-24T16:00:00Z",

"global":{"G1":true,"G2":0.94,"G3":true,"G4":true,"G5":true,"G6":true,"G7":true,"G8":true},

"stops_triggered":[],

"links":{"check_dim_report":"reports/check_dim_report.json","audit":"reports/audit.jsonl"}

}


VI. 不确定度与协方差(UQ & Covariance)

A. policies/dataset_uq.yaml

version: "1.0.0"

targets: ["T_arr","Phi","epsilon_flux","Q_res","p_dim"]

methods:

T_arr: { type: "delta", jacobian: "auto", cov_group: "medium" }

Phi: { type: "mc", draws: 10000, coverage: { quantile: [0.025,0.975] } }

covariance:

medium: { kernel: "exp", params: { sigma2: 9.0e-6, L_c_m: 25.0 } }

coverage: { mode: "k", k: 2 }

split_scope: "per_split"

freshness: { policy: { tau_calib_s_max: 86400, clock_state: "locked" } }

outputs: { attach: ["uq_summary.json","cov_blocks.json"] }

B. reports/uq_summary.json(示例)

{

"split":"test",

"T_arr":{"point":1.23e-8,"U_k2":1.5e-9},

"Phi":{"median":0.035,"q025":0.028,"q975":0.043},

"epsilon_flux":{"p95":0.011},

"Q_res":0.13

}


VII. 偏倚、伦理与隐私(Bias/Ethics/Privacy)

A. policies/privacy_policy.yaml

version: "1.0.0"

deid: { techniques: ["hash","mask","generalize"], k_anonymity: 10, l_diversity: 2, t_closeness: 0.2 }

access_control:

roles: { reader: ["get"], publisher: ["get","export"], admin: ["get","export","write"] }

retention: { policy_days: 365 }

B. reports/bias_report.md(纲要)

# Bias Report

- Stratified coverage + CIs

- Measurement bias: δt_abs/Δτ_ch/σ_y(τ)/n_eff residuals

- Labeling consistency: κ/MAE/DTW

- High-risk slices & mitigation


C. docs/ethics.md(纲要)

# Ethics Statement

- Consent & purpose limitation

- Minimization & de-identification

- Governance roles & escalation

- Third-party license terms


VIII. 基准与评分(Bench/Score)

A. benchmarks/bench_plan.yaml

version: "1.0.0"

tasks:

- id: "bench-arrival"

split: "test"

metrics: ["DeltaT_arr_s","Q_res","p_dim"]

coverage: { mode: "k", k: 2 }

- id: "bench-phase"

split: "test"

metrics: ["r_phi","epsilon_flux"]

coverage: { mode: "quantile", p: [0.025,0.975] }

baseline: { id: "base-001", version: "1.2.3" }

weights: { DeltaT_arr_s: 0.35, r_phi: 0.25, epsilon_flux: 0.15, p_dim: 0.15, Q_res: 0.10 }

B. tables/scorecard.csv(表头)

split,DeltaT_arr_s_mean,DeltaT_arr_s_Uk2,r_phi_lb95,r_phi_ub95,epsilon_flux_p95,p_dim,Q_res,Q_score


C. benchmarks/scorecard.json(示例)

JSON json
{
  "version": "1.0.0",
  "baseline": { "id": "base-001", "Q": 0.62 },
  "method": { "id": "ds-core", "Q": 0.78 },
  "weights": { "DeltaT_arr_s": 0.35, "r_phi": 0.25, "epsilon_flux": 0.15, "p_dim": 0.15, "Q_res": 0.1 },
  "decision": "pass",
  "see": [ "EFT.WP.Core.Equations v1.1:S20-1", "Data.Benchmarks v1.0:PROTO" ]
}

IX. 来源与血缘(Provenance & Lineage)

A. manifests/provenance.yaml

version: "1.0.0"

source: { id: "SRC-obs-labA-2025Q3", type: "instrument", license: "CC-BY-4.0" }

instrument: { make: "Acme", model: "DPO-7k", serial: "SN123456", firmware: "v2.1.3" }

calibration:

calib_run_id: "CAL2025-09-24-01"

clock_state: "locked"

sigma_y_1s: 1.1e-11

delta_t_abs_ns: 18

delta_tau_ch_ns: 2

B. manifests/lineage_graph.json

{

"nodes":[

{"id":"RAW-telemetry","version":"1.0.0","checksum":"sha256:..."},

{"id":"CAL-telemetry","version":"1.0.1","checksum":"sha256:..."},

{"id":"DER-features","version":"1.0.0","checksum":"sha256:..."}

],

"edges":[

{"from":"RAW-telemetry","to":"CAL-telemetry","type":"calibrate"},

{"from":"CAL-telemetry","to":"DER-features","type":"derive"}

],

"meta":{"generated_at":"2025-09-24T16:00:00Z"}

}


C. reports/audit.jsonl(示例行)

JSON json
{
  "ts": "2025-09-24T16:05:00Z",
  "event": "acquire",
  "source_id": "SRC-obs-labA-2025Q3",
  "window": "2025-09-24T16:00:00Z/16:01:00Z",
  "input_hashes": [ "sha256:..." ],
  "user": "collector",
  "signature": "PGP:...",
  "checksum": "sha256:..."
}

X. 清单与发布(Manifests & Release)

A. manifests/report_manifest.yaml

version: "1.0.0"

bundle:

figs:

- "figs/scale_dist.pdf"

- "figs/path_profile.pdf"

- "figs/scorecard_bar.pdf"

tables:

- "tables/kpi_summary.csv"

- "tables/scorecard.csv"

reports:

- "reports/check_dim_report.json"

- "reports/validate_report.json"

- "reports/audit.jsonl"

metadata:

dataset_id: "ds-core"

method_version: "2.0.0"

created_at: "2025-09-24T16:00:00Z"

checksums:

schema: "sha256:..."

contract: "sha256:..."

splits: "sha256:..."

sign: "SIGNATURE.asc"

see:

- "EFT.WP.Core.Metrology v1.0:check_dim"

- "EFT.WP.Core.Equations v1.1:S20-1"


XI. 结果表与 KPI(Tables & KPIs)

A. tables/kpi_summary.csv(表头)

split,Latency_P95_s,Throughput_rps,p_dim,epsilon_flux_p95,Q_res,allan_1s,delta_t_abs_ns,delta_tau_ch_ns


XII. 路径量统一口径(Normative Path Forms)

文本显式路径与测度;数据侧记录 delta_form;路径数组满足 len(gamma_ell)=len(d_ell)=len(n_eff)≥2。


XIII. 执行勾选清单(Checklist)


版权与许可(CC BY 4.0)

版权声明:除另有说明外,《能量丝理论》(含文本、图表、插图、符号与公式)的著作权由作者(“屠广林”先生)享有。
许可方式:本作品采用 Creative Commons 署名 4.0 国际许可协议(CC BY 4.0)进行许可;在注明作者与来源的前提下,允许为商业或非商业目的进行复制、转载、节选、改编与再分发。
署名格式(建议):作者:“屠广林”;作品:《能量丝理论》;来源:energyfilament.org;许可证:CC BY 4.0。

首次发布: 2025-11-11|当前版本:v5.1
协议链接:https://creativecommons.org/licenses/by/4.0/