目录 / 文档-技术白皮书 / 52-数据集卡 Template v1.0
I. 目的与范围(Purpose & Scope)
- 汇集本卷执行所需的表单(Forms)、清单(Manifests)与模板(Templates),覆盖结构/契约、切分/版本与新鲜度、质量门与校验、不确定度与协方差、偏倚/伦理/隐私、基准与评分及发布制品。
- 涉及路径量(到达时/相位)之任何模板,正文显式 gamma(ell) 与测度 d ell,数据侧记录 delta_form ∈ {general, factored};统一采用括号化口径,发布要求 p_dim = 1.0 并随附 check_dim_report.json。
II. 推荐目录结构(DS_EXPORT/ Layout)
DS_EXPORT/
figs/
tables/
reports/
manifests/
schemas/
contracts/
splits/
policies/
benchmarks/
SIGNATURE.asc
III. 结构与契约(Structure & Contract)
A. schemas/dataset/schema.json(最小结构)
{
"$schema":"https://json-schema.org/draft/2020-12/schema",
"title":"Dataset v1.0.0 (structure)",
"type":"object",
"required":["record_id","acq","path","medium","ref","see","version"],
"properties":{
"record_id":{"type":"string"},
"acq":{"type":"object","required":["ts_start","ts_end"],
"properties":{"ts_start":{"type":"string","format":"date-time"},"ts_end":{"type":"string","format":"date-time"}}},
"path":{"type":"object","required":["gamma_ell","d_ell"],
"properties":{"gamma_ell":{"type":"array","items":{"type":"number"},"minItems":2},
"d_ell":{"type":"array","items":{"type":"number"},"minItems":2}}},
"medium":{"type":"object","required":["n_eff_profile"],
"properties":{"n_eff_profile":{"type":"array","items":{"type":"number"},"minItems":2}}},
"ref":{"type":"object","properties":{"c_ref":{"type":"number"},"lambda_ref":{"type":"number"}}},
"see":{"type":"array","items":{"type":"string"},"minItems":1},
"version":{"type":"string"}
}
}
B. contracts/contract.yaml(路径块与覆盖)
version: "1.0.0"
units: { T_arr: "s", Phi: "rad", c_ref: "m/s", lambda_ref: "m" }
path:
required: true
gamma: "gamma(ell)"
measure: "d ell"
delta_form: "general" # or "factored"
missing:
numeric: "null"
reason_to: "quality.flags"
coverage:
mode: "k" # k | alpha | quantile
k: 2
IV. 切分、版本与新鲜度(Splits/Versioning/Freshness)
A. splits/split.yaml
version: "1.0.0"
seed: 20250924
strategy:
group_by: ["entity_id"]
time_ordered: true
splits: { train: 0.70, val: 0.15, test: 0.15 }
constraints:
leakage: { time: { enforce: true }, entity: { enforce: true } }
path:
require_alignment: true
delta_form: "general"
coverage: { mode: "k", k: 2 }
B. splits/split_manifest.json(节选)
{
"dataset_version":"1.2.0",
"splits":{
"train":{"count":120345,"checksum":"sha256:..."},
"val":{"count":25780,"checksum":"sha256:..."},
"test":{"count":25812,"checksum":"sha256:..."}
},
"slices":{"low_snr":{"count":8142,"rule":"snr<5"}},
"freshness":{
"valid_from":"2025-09-01T00:00:00Z",
"valid_to":"2026-03-01T00:00:00Z",
"policy":{"tau_calib_s_max":86400,"clock_state":"locked"}
}
}
C. manifests/version_matrix.yaml(兼容矩阵)
dataset: "ds-core"
current: "1.2.0"
compatibility:
"1.2.x": { api: ">=1.2,<2.0", schema: ">=1.2,<2.0" }
"1.1.x": { api: ">=1.1,<1.3", schema: ">=1.1,<1.3" }
migration:
from: "1.1.x"
to: "1.2.x"
steps:
- "add slice 'low_snr'"
- "add field quality.score_Q"
rollback: { tag: "v1.1.3-lock" }
V. 质量门与校验(QC Gates & Validate)
A. manifests/gate_rules.yaml
version: "1.0.0"
gates:
G1: { schema_required: true }
G2: { anchor_coverage_min: 0.90, forbid_external_links: true }
G3: { path_required: true, min_samples: 2, delta_form: ["general","factored"], delta_ell_guard: "c_ref/fs/max(n_eff)" }
G4: { require_dim_check: true, p_dim: 1.0 }
G5: { tau_calib_s_max: 86400, clock_state: "locked" }
G6: { coverage_allowed: ["k","alpha","quantile"] }
G7: { cov_pd: true, kernel_allowed: ["exp","matern","ar1","const"] }
G8: { unique_record_id: true, unique_checksum: true, lineage_acyclic: true }
stops:
S1: "dim_check_fail or p_dim<1"
S2: "freshness_expired or clock_state!=locked"
S3: "path_block_missing or delta_ell_violate"
S4: "covariance_not_pd or cov_model_mismatch"
S5: "anchor_coverage_below_min or external_link_found"
labels: { restricted: "[Restricted]" }
B. reports/validate_report.json(示例)
{
"dataset_id":"ds-core",
"timestamp":"2025-09-24T16:00:00Z",
"global":{"G1":true,"G2":0.94,"G3":true,"G4":true,"G5":true,"G6":true,"G7":true,"G8":true},
"stops_triggered":[],
"links":{"check_dim_report":"reports/check_dim_report.json","audit":"reports/audit.jsonl"}
}
VI. 不确定度与协方差(UQ & Covariance)
A. policies/dataset_uq.yaml
version: "1.0.0"
targets: ["T_arr","Phi","epsilon_flux","Q_res","p_dim"]
methods:
T_arr: { type: "delta", jacobian: "auto", cov_group: "medium" }
Phi: { type: "mc", draws: 10000, coverage: { quantile: [0.025,0.975] } }
covariance:
medium: { kernel: "exp", params: { sigma2: 9.0e-6, L_c_m: 25.0 } }
coverage: { mode: "k", k: 2 }
split_scope: "per_split"
freshness: { policy: { tau_calib_s_max: 86400, clock_state: "locked" } }
outputs: { attach: ["uq_summary.json","cov_blocks.json"] }
B. reports/uq_summary.json(示例)
{
"split":"test",
"T_arr":{"point":1.23e-8,"U_k2":1.5e-9},
"Phi":{"median":0.035,"q025":0.028,"q975":0.043},
"epsilon_flux":{"p95":0.011},
"Q_res":0.13
}
VII. 偏倚、伦理与隐私(Bias/Ethics/Privacy)
A. policies/privacy_policy.yaml
version: "1.0.0"
deid: { techniques: ["hash","mask","generalize"], k_anonymity: 10, l_diversity: 2, t_closeness: 0.2 }
access_control:
roles: { reader: ["get"], publisher: ["get","export"], admin: ["get","export","write"] }
retention: { policy_days: 365 }
B. reports/bias_report.md(纲要)
# Bias Report
- Stratified coverage + CIs
- Measurement bias: δt_abs/Δτ_ch/σ_y(τ)/n_eff residuals
- Labeling consistency: κ/MAE/DTW
- High-risk slices & mitigation
C. docs/ethics.md(纲要)
# Ethics Statement
- Consent & purpose limitation
- Minimization & de-identification
- Governance roles & escalation
- Third-party license terms
VIII. 基准与评分(Bench/Score)
A. benchmarks/bench_plan.yaml
version: "1.0.0"
tasks:
- id: "bench-arrival"
split: "test"
metrics: ["DeltaT_arr_s","Q_res","p_dim"]
coverage: { mode: "k", k: 2 }
- id: "bench-phase"
split: "test"
metrics: ["r_phi","epsilon_flux"]
coverage: { mode: "quantile", p: [0.025,0.975] }
baseline: { id: "base-001", version: "1.2.3" }
weights: { DeltaT_arr_s: 0.35, r_phi: 0.25, epsilon_flux: 0.15, p_dim: 0.15, Q_res: 0.10 }
B. tables/scorecard.csv(表头)
split,DeltaT_arr_s_mean,DeltaT_arr_s_Uk2,r_phi_lb95,r_phi_ub95,epsilon_flux_p95,p_dim,Q_res,Q_score
C. benchmarks/scorecard.json(示例)
IX. 来源与血缘(Provenance & Lineage)
A. manifests/provenance.yaml
version: "1.0.0"
source: { id: "SRC-obs-labA-2025Q3", type: "instrument", license: "CC-BY-4.0" }
instrument: { make: "Acme", model: "DPO-7k", serial: "SN123456", firmware: "v2.1.3" }
calibration:
calib_run_id: "CAL2025-09-24-01"
clock_state: "locked"
sigma_y_1s: 1.1e-11
delta_t_abs_ns: 18
delta_tau_ch_ns: 2
B. manifests/lineage_graph.json
{
"nodes":[
{"id":"RAW-telemetry","version":"1.0.0","checksum":"sha256:..."},
{"id":"CAL-telemetry","version":"1.0.1","checksum":"sha256:..."},
{"id":"DER-features","version":"1.0.0","checksum":"sha256:..."}
],
"edges":[
{"from":"RAW-telemetry","to":"CAL-telemetry","type":"calibrate"},
{"from":"CAL-telemetry","to":"DER-features","type":"derive"}
],
"meta":{"generated_at":"2025-09-24T16:00:00Z"}
}
C. reports/audit.jsonl(示例行)
X. 清单与发布(Manifests & Release)
A. manifests/report_manifest.yaml
version: "1.0.0"
bundle:
figs:
- "figs/scale_dist.pdf"
- "figs/path_profile.pdf"
- "figs/scorecard_bar.pdf"
tables:
- "tables/kpi_summary.csv"
- "tables/scorecard.csv"
reports:
- "reports/check_dim_report.json"
- "reports/validate_report.json"
- "reports/audit.jsonl"
metadata:
dataset_id: "ds-core"
method_version: "2.0.0"
created_at: "2025-09-24T16:00:00Z"
checksums:
schema: "sha256:..."
contract: "sha256:..."
splits: "sha256:..."
sign: "SIGNATURE.asc"
see:
- "EFT.WP.Core.Metrology v1.0:check_dim"
- "EFT.WP.Core.Equations v1.1:S20-1"
XI. 结果表与 KPI(Tables & KPIs)
A. tables/kpi_summary.csv(表头)
split,Latency_P95_s,Throughput_rps,p_dim,epsilon_flux_p95,Q_res,allan_1s,delta_t_abs_ns,delta_tau_ch_ns
XII. 路径量统一口径(Normative Path Forms)
- 到达时(两种等价):
T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
T_arr = ( ∫ ( n_eff / c_ref ) d ell ) - 相位累计:
Phi = ( 2π / λ_ref ) * ( ∫ n_eff d ell )
文本显式路径与测度;数据侧记录 delta_form;路径数组满足 len(gamma_ell)=len(d_ell)=len(n_eff)≥2。
XIII. 执行勾选清单(Checklist)
- 所有模板已按本卷要求落库,see[]/references[]/version 合规,锚点直指率 ≥ 90%。
- 路径相关模板显式 gamma/measure/delta_form;到达时/相位采用括号化统一口径;p_dim = 1.0。
- report_manifest.yaml、validate_report.json、check_dim_report.json、audit.jsonl 与签名齐备。
- split.yaml/split_manifest.json、dataset_uq.yaml/uq_summary.json、bench_plan.yaml/scorecard.json 与发布口径一致。
- 发布目录 DS_EXPORT/ 分类清晰,所有制品具 checksum;不合规项已标注 [Restricted] 并仅定性呈现。
版权与许可(CC BY 4.0)
版权声明:除另有说明外,《能量丝理论》(含文本、图表、插图、符号与公式)的著作权由作者(“屠广林”先生)享有。
许可方式:本作品采用 Creative Commons 署名 4.0 国际许可协议(CC BY 4.0)进行许可;在注明作者与来源的前提下,允许为商业或非商业目的进行复制、转载、节选、改编与再分发。
署名格式(建议):作者:“屠广林”;作品:《能量丝理论》;来源:energyfilament.org;许可证:CC BY 4.0。
首次发布: 2025-11-11|当前版本:v5.1
协议链接:https://creativecommons.org/licenses/by/4.0/