目录 / 文档-技术白皮书 / 06-EFT.WP.Core.DataSpec v1.0
I. 目标与范围
- 给出 assert_contract(ds; tests) 的标准用例与可拷贝模板,覆盖唯一性、非空、范围、正则、枚举、交叉字段、单调性、量纲检查、两口径一致、完整性与漂移等。
- 统一沿用本卷符号与跨卷锚点:gamma(ell)、d ell、T_arr、n_eff(x,t)、c_ref、check_dim(expr)、Trace、hash_sha256、signature 等。
II. TestSpec 结构(与 ConstraintSpec 对齐)
- 字段集合(最小必需):
- kind : str ∈ {"unique","not_null","range","regex","enum_set","cross_field","referential","monotonic","dim_check","arrivaltime_dualform","custom"}
- expr : str(约束表达)
- params : dict(可选)
- severity : str ∈ {"ERROR","WARN","INFO"}
- message : str
- see : list[str](可选,跨卷引用 S/P/M/I)
- 返回约定(摘要):
- summary = { passed:int, failed:int, severity_max:str, metrics:dict }
- violations = [ { kind, expr, sample:dict, msg } ]
III. 通用契约模板库(可复用片段)
# U1 主键唯一
- kind: "unique"
expr: "unique(<k1>,<k2>,...)"
severity: "ERROR"
message: "primary key must be unique"
# N1 非空
- kind: "not_null"
expr: "not_null(<field>)"
severity: "ERROR"
message: "<field> must be not null"
# R1 数值范围
- kind: "range"
expr: "<field> in [min,max]"
params: { min: <min>, max: <max> }
severity: "ERROR"
message: "<field> out of range"
# G1 正则匹配
- kind: "regex"
expr: "match(<field>, <pattern>)"
params: { pattern: "^EPSG:[0-9]+$" }
severity: "ERROR"
message: "<field> regex mismatch"
# E1 枚举集合
- kind: "enum_set"
expr: "<field> in {<v1>,<v2>,...}"
severity: "ERROR"
message: "<field> not in enum set"
# C1 交叉字段关系
- kind: "cross_field"
expr: "<f_right> >= <f_left>"
severity: "ERROR"
message: "cross-field inequality violated"
# M1 单调性(分组内)
- kind: "monotonic"
expr: "groupby(<key>).monotonic(<field>, mode='nondecreasing')"
severity: "ERROR"
message: "<field> must be nondecreasing within group <key>"
# D1 量纲检查
- kind: "dim_check"
expr: "check_dim(<field>)=='<DIM>'"
severity: "ERROR"
message: "dimension of <field> must be <DIM>"
# Q1 完整性阈值(自定义门)
- kind: "custom"
expr: "quality_metrics(ds).completeness >= p_min"
params: { p_min: 0.98 }
severity: "ERROR"
message: "dataset completeness below threshold"
# H1 指纹与签名存在
- kind: "regex"
expr: "match(hash_sha256, '^[a-f0-9]{64}$')"
severity: "ERROR"
message: "invalid sha256 checksum format"
# F1 外键/引用检查
- kind: "referential"
expr: "exists_in(parameters, c_ref_ref)"
severity: "ERROR"
message: "c_ref_ref not resolvable in Core.Parameters"
# K1 漂移闸门(自定义)
- kind: "custom"
expr: "monitor_drift(ds_ref, ds_new, ['<field1>','<field2>'], method='KL').score <= d_max"
params: { d_max: 0.02 }
severity: "WARN"
message: "drift exceeds recommended band"
IV. 到达时两口径专用模板
- 物理定义(文本)
- T_arr_const = ( 1 / c_ref_value ) * ( ∫_gamma n_eff d ell )
- T_arr_integrand = ( ∫_gamma ( n_eff / c_ref_value ) d ell )
- delta_form = | T_arr_const - T_arr_integrand |
- 契约模板
# A1 两口径一致
- kind: "arrivaltime_dualform"
expr: "delta_form <= tol_Tarr"
params: { tol_Tarr: "1e-9 s" }
severity: "WARN"
message: "dual-form gap exceeds tolerance"
# A2 量纲守恒
- kind: "dim_check"
expr: "check_dim(T_arr_const)=='T'"
severity: "ERROR"
message: "dim(T_arr_const) must be T"
- kind: "dim_check"
expr: "check_dim(T_arr_integrand)=='T'"
severity: "ERROR"
message: "dim(T_arr_integrand) must be T"
V. DS.TARR.PathIntegral v1 契约样例(完整清单)
# 主键与基础
- { kind: "unique", expr: "unique(pid, seg_id)", severity: "ERROR", message: "pk (pid,seg_id) must be unique" }
- { kind: "not_null", expr: "not_null(pid)", severity: "ERROR", message: "pid required" }
- { kind: "not_null", expr: "not_null(seg_id)", severity: "ERROR", message: "seg_id required" }
- { kind: "not_null", expr: "not_null(ts)", severity: "ERROR", message: "ts required" }
- { kind: "regex", expr: "match(CRS, '^EPSG:[0-9]+$')", severity: "ERROR", message: "CRS must be EPSG code" }
# 时序与路径
- { kind: "monotonic", expr: "groupby(pid).monotonic(ts, mode='nondecreasing')", severity: "ERROR", message: "ts must be nondecreasing per pid" }
- { kind: "cross_field", expr: "ell_end >= ell_start", severity: "ERROR", message: "ell_end must be >= ell_start" }
# 物理范围
- { kind: "range", expr: "c_ref_value in (0, +inf)", params: { min: 0.0, max: null }, severity: "ERROR", message: "c_ref_value must be > 0" }
- { kind: "range", expr: "n_eff_mean in [1.0, 3.5]", params: { min: 1.0, max: 3.5 }, severity: "WARN", message: "n_eff_mean out of typical bounds" }
# 两口径与量纲
- { kind: "dim_check", expr: "check_dim(T_arr_const)=='T'", severity: "ERROR", message: "dim(T_arr_const)=T" }
- { kind: "dim_check", expr: "check_dim(T_arr_integrand)=='T'", severity: "ERROR", message: "dim(T_arr_integrand)=T" }
- { kind: "arrivaltime_dualform", expr: "delta_form <= tol_Tarr", params: { tol_Tarr: "1e-9 s" }, severity: "WARN", message: "dual-form mismatch" }
# 追溯与引用
- { kind: "regex", expr: "match(hash_sha256, '^[a-f0-9]{64}$')", severity: "ERROR", message: "invalid sha256" }
- { kind: "referential", expr: "exists_in(parameters, c_ref_ref)", severity: "ERROR", message: "c_ref_ref unresolved" }
# 质量闸门
- { kind: "custom", expr: "quality_metrics(ds).completeness >= p_min", params: { p_min: 0.98 }, severity: "ERROR", message: "completeness below 0.98" }
- { kind: "custom", expr: "quality_metrics(ds).validity >= v_min", params: { v_min: 0.99 }, severity: "ERROR", message: "validity below 0.99" }
- { kind: "custom", expr: "monitor_drift(ds_ref, ds_new, ['n_eff_mean'], method='KL').score <= d_max", params: { d_max: 0.02 }, severity: "WARN", message: "n_eff_mean drift high" }
VI. 跨卷绑定断言(与《Core.Parameters》《Core.Equations》)
- 参数绑定:bind_to_parameters(ds, ['c_ref_ref']) == True。
- 方程绑定:bind_to_equations(ds, ['S610-1','S610-2']) == True。
- 两口径约束见 A1;当 A1 持续 WARN 且 delta_form 呈系统性偏置时,需回溯 n_eff(x,t) 建模与 c_ref 解析。
VII. 时间与窗口规则模板
# W1 时间落在清单声明窗口
- kind: "range"
expr: "ts in [manifest.t0, manifest.t1]"
params: { }
severity: "ERROR"
message: "ts out of declared window"
# W2 重采样一致性声明
- kind: "custom"
expr: "resample_policy in {'mean','sum','median','first','last'} and Delta_t > 0"
severity: "ERROR"
message: "invalid resample policy or Delta_t"
VIII. 隐私与治理检查模板
# P1 留存期
- kind: "custom"
expr: "governance.retention_days >= 365"
severity: "INFO"
message: "retention shorter than recommended"
# P2 分级完整
- kind: "custom"
expr: "forall(f in fields) -> pii_level(f) in {'none','low','moderate','high'}"
severity: "ERROR"
message: "pii_level missing or invalid"
IX. 示例执行结果结构(摘要)
summary:
passed: 23
failed: 2
severity_max: "ERROR"
metrics: { completeness: 0.985, validity: 0.996, drift_KL_n_eff_mean: 0.011 }
violations:
- kind: "range"
expr: "n_eff_mean in [1.0, 3.5]"
sample: { pid: "p-007", seg_id: 12, n_eff_mean: 3.62 }
msg: "n_eff_mean out of typical bounds"
- kind: "arrivaltime_dualform"
expr: "delta_form <= tol_Tarr"
sample: { pid: "p-019", delta_form: "1.35e-9 s", tol_Tarr: "1e-9 s" }
msg: "dual-form mismatch"
X. 失败处置策略映射(质量闸门)
- ERROR:阻断发布,执行 freeze_release 取消,输出修复工单;必要时回滚上版。
- WARN:允许发布但降级 q_score,触发观测;连续 k 次同类 WARN 视为 ERROR。
- INFO:记录于 Trace,不影响发布。
XI. 最小可用清单(占位符,直接复制后按需替换)
# Contract Test Pack (skeleton)
- { kind: "unique", expr: "unique(<k...>)", severity: "ERROR", message: "pk unique" }
- { kind: "not_null", expr: "not_null(<f>)", severity: "ERROR", message: "<f> required" }
- { kind: "range", expr: "<f> in [min,max]", params: { min: <m1>, max: <m2> }, severity: "ERROR", message: "range" }
- { kind: "regex", expr: "match(<f>, '<re>')", severity: "ERROR", message: "regex" }
- { kind: "enum_set", expr: "<f> in {<v...>}", severity: "ERROR", message: "enum" }
- { kind: "cross_field", expr: "<f2> >= <f1>", severity: "ERROR", message: "cross" }
- { kind: "monotonic", expr: "groupby(<k>).monotonic(<f>, mode='nondecreasing')", severity: "ERROR", message: "mono" }
- { kind: "dim_check", expr: "check_dim(<f>)=='<DIM>'", severity: "ERROR", message: "dim" }
- { kind: "custom", expr: "quality_metrics(ds).completeness >= <p_min>", params: { p_min: 0.98 }, severity: "ERROR", message: "completeness" }
- { kind: "custom", expr: "monitor_drift(ds_ref, ds_new, ['<f>'], method='KL').score <= <d_max>", params: { d_max: 0.02 }, severity: "WARN", message: "drift" }
XII. 与本卷其它部分的对应关系
- 与第4章一致:kind/expr/params 字段即 assert_contract 的输入形制,执行前后应产出质量闸门指标与失败策略结果。
- 与附录A一致:arrivaltime_dualform、dim_check、referential 等模板直接应用于 DS.TARR.PathIntegral,保证 T_arr_const 与 T_arr_integrand 两口径在 tol_Tarr 内一致,并保持 dim(T_arr_*) = T。
- 跨卷引用写法固定,如“见 配套白皮书《能量丝》 第6章 S610…”。
版权与许可(CC BY 4.0)
版权声明:除另有说明外,《能量丝理论》(含文本、图表、插图、符号与公式)的著作权由作者(“屠广林”先生)享有。
许可方式:本作品采用 Creative Commons 署名 4.0 国际许可协议(CC BY 4.0)进行许可;在注明作者与来源的前提下,允许为商业或非商业目的进行复制、转载、节选、改编与再分发。
署名格式(建议):作者:“屠广林”;作品:《能量丝理论》;来源:energyfilament.org;许可证:CC BY 4.0。
首次发布: 2025-11-11|当前版本:v5.1
协议链接:https://creativecommons.org/licenses/by/4.0/