目录 / 文档-技术白皮书 / 18-EFT.WP.Methods.CrossStats v1.0
一句话目标:提供跨统计产出的标准发布清单 manifest.stats 之最小键集、字段语义、校验与多场景样例,确保估计、检验、漂移、实验与审计在不同系统间可复现、可追溯、可对齐。
I. 范围与对象
- 本附录定义统计发布物的落盘结构,适用于离线评估、在线实验、漂移监测、因果估计与校准迁移等场景。
- 输入:本卷各章产生的度量、区间与诊断;跨卷必携元数据(见《Methods.Cleaning v1.0》《Methods.Imaging v1.0》)。
- 输出:单一 JSON 文档 manifest.stats,含 TraceID、版本、窗口 Delta_t、时基映射、契约评估与签名。
II. 最小键集(必须存在)
- schema_version,book_ref,release_tag
- TraceID,repro_hash,signature
- timebase.tau_mono_range,timebase.ts_range,timebase.offset/skew/J
- arrival.two_forms.delta_form,arrival.two_forms.tol_Tarr
- window.Delta_t,dataset.N,weights.W_norm
- metrics.core[*](名称、估计、区间或后验分位、单位)
- contracts[*](id, status, severity, evidence)
- actions[*](策略卡决策与处置)
- provenance(数据与代码来源、环境摘要)
III. 字段与类型说明
- schema_version : string(如 "1.0.0")
- book_ref : string(固定 "EFT.WP.Methods.CrossStats v1.0")
- TraceID : string(跨系统追溯 ID)
- repro_hash : string(hash_sha256(code+params+data_fingerprint))
- signature : string(发布方签名)
- timebase : object
- tau_mono_range : [int,int](内部单调时基区间)
- ts_range : [string,string](ISO8601 对外发布区间)
- offset/skew/J : {offset: double, skew: double, J: double}
- arrival.two_forms : {delta_form: double, tol_Tarr: double}
- window.Delta_t : string(统计窗口,如 "PT24H")
- dataset : {N: int, N_eff?: double, sampling?: string}
- weights : {W_norm: double, cap_w?: double, p_trim?: double}
- metrics.core[*] : {name: string, est: double, se?: double, ci?: [double,double], posterior?: {q05: double, q50: double, q95: double}, unit?: string, dim?: string, notes?: string}
- metrics.drift? : {W1?: double, KL?: double, psi?: double}
- metrics.ab? : {lift: double, se: double, ci: [double,double], mde?: double, alpha_spent?: double}
- metrics.causal? : {ATE: double, U?: double, SMD_max?: double, overlap_min?: double}
- contracts[*] : {id: string, status: string, severity: string, evidence: object}
- actions[*] : {policy_id: string, decision: string, reason: string, at: string}
- provenance : {data_uri: string, code_uri: string, env: {python?: string, pkg?: object}}
IV. 模板:最小可发布清单
{
"schema_version": "1.0.0",
"book_ref": "EFT.WP.Methods.CrossStats v1.0",
"release_tag": "stats-prod-2025-08-31T12:00Z",
"TraceID": "trc_01HXYZ...",
"repro_hash": "sha256:REPRO_HASH",
"signature": "SIG_BASE64",
"timebase": {
"tau_mono_range": [1725062400, 1725148800],
"ts_range": ["2025-08-31T00:00:00Z", "2025-09-01T00:00:00Z"],
"offset/skew/J": {"offset": 0.0012, "skew": 2.3e-6, "J": 0.0031}
},
"arrival": {
"two_forms": {
"delta_form": 2.1e-6,
"tol_Tarr": 5.0e-6
}
},
"window": {"Delta_t": "PT24H"},
"dataset": {"N": 125034},
"weights": {"W_norm": 0.9996},
"metrics": {
"core": [
{"name": "conversion_rate", "est": 0.0842, "ci": [0.0831, 0.0853], "unit": "1", "dim": "[]"},
{"name": "avg_order_value", "est": 56.73, "se": 0.42, "unit": "USD", "dim": "[M]"}
]
},
"contracts": [
{"id": "C30-000", "status": "pass", "severity": "info", "evidence": {"checks": 128}},
{"id": "C30-001", "status": "pass", "severity": "info"},
{"id": "C30-004", "status": "pass", "severity": "info", "evidence": {"W_norm": 0.9996}},
{"id": "C30-342", "status": "pass", "severity": "info", "evidence": {"coverage_rate": 0.949}}
],
"actions": [
{"policy_id": "SC-SLO-01", "decision": "ship", "reason": "all guardrails pass", "at": "2025-08-31T12:01:05Z"}
],
"provenance": {
"data_uri": "s3://bucket/ds/2025-08-31/",
"code_uri": "git+https://repo/commit/abcdef",
"env": {"python": "3.11.5", "pkg": {"numpy": "2.0.1", "scipy": "1.14.0"}}
}
}
V. 样例A:A/B 在线实验(序贯 alpha,护栏 SLO)
{
"schema_version": "1.0.0",
"book_ref": "EFT.WP.Methods.CrossStats v1.0",
"release_tag": "ab-exp-42-int-07",
"TraceID": "trc_AB42_07",
"repro_hash": "sha256:HASH_AB42_07",
"signature": "SIG_BASE64",
"timebase": {
"tau_mono_range": [1725148800, 1725235200],
"ts_range": ["2025-09-01T00:00:00Z", "2025-09-02T00:00:00Z"],
"offset/skew/J": {"offset": 0.0007, "skew": 2.0e-6, "J": 0.0025}
},
"arrival": { "two_forms": {"delta_form": 1.8e-6, "tol_Tarr": 5.0e-6} },
"window": {"Delta_t": "PT24H"},
"dataset": {"N": 98023, "sampling": "online_randomized"},
"weights": {"W_norm": 1.0002},
"metrics": {
"core": [
{"name": "lift_cr_B_vs_A", "est": 0.0124, "se": 0.0039, "ci": [0.0048, 0.0200], "unit": "1", "dim": "[]"},
{"name": "guardrail_latency_ms_p99", "est": 245.0, "unit": "ms", "dim": "[T]"}
],
"ab": {
"lift": 0.0124,
"se": 0.0039,
"ci": [0.0048, 0.0200],
"mde": 0.01,
"alpha_spent": 0.043
}
},
"contracts": [
{"id": "C30-382", "status": "pass", "severity": "info", "evidence": {"latency_ms_p99": 245}},
{"id": "C30-383", "status": "pass", "severity": "info", "evidence": {"alpha_spent": 0.043, "alpha_budget": 0.05}},
{"id": "C30-381", "status": "pass", "severity": "info", "evidence": {"p_t": 0.501, "p_c": 0.499, "eps_exp": 0.01}}
],
"actions": [
{"policy_id": "SC-AB-01", "decision": "ship", "reason": "sequential boundary crossed; guardrails pass", "at": "2025-09-02T00:00:30Z"}
],
"provenance": {
"data_uri": "kafka://topic/exp42/day=2025-09-01",
"code_uri": "git+https://repo/commit/1122aabb",
"env": {"python": "3.11.5", "pkg": {"pandas": "2.2.2", "statsmodels": "0.14.2"}}
}
}
VI. 样例B:分布漂移监测(对齐与重校准触发)
{
"schema_version": "1.0.0",
"book_ref": "EFT.WP.Methods.CrossStats v1.0",
"release_tag": "drift-week-2025W36",
"TraceID": "trc_DRIFT_W36",
"repro_hash": "sha256:HASH_DRIFT_W36",
"signature": "SIG_BASE64",
"timebase": {
"tau_mono_range": [1725148800, 1725753600],
"ts_range": ["2025-09-01T00:00:00Z", "2025-09-08T00:00:00Z"],
"offset/skew/J": {"offset": 0.0011, "skew": 2.6e-6, "J": 0.0030}
},
"arrival": { "two_forms": {"delta_form": 2.5e-6, "tol_Tarr": 5.0e-6} },
"window": {"Delta_t": "P7D"},
"dataset": {"N": 705_211},
"weights": {"W_norm": 1.0000},
"metrics": {
"core": [
{"name": "score_calibration_ece", "est": 0.024, "unit": "1", "dim": "[]"}
],
"drift": {"W1": 0.095, "KL": 0.021, "psi": 0.14}
},
"contracts": [
{"id": "C30-370", "status": "fail", "severity": "warn", "evidence": {"W1": 0.095, "W1_max": 0.08}},
{"id": "C30-373", "status": "fail", "severity": "error", "evidence": {"r_win": 3}}
],
"actions": [
{"policy_id": "SC-DRIFT-01", "decision": "align_then_recalibrate", "reason": "persistent W1 breach; psi elevated", "at": "2025-09-08T00:01:00Z"},
{"policy_id": "SC-CAL-01", "decision": "canary_10pct", "reason": "ECE_after improves >= delta_min", "at": "2025-09-09T12:00:00Z"}
],
"provenance": {
"data_uri": "s3://bucket/weekly_snap/2025W36",
"code_uri": "git+https://repo/commit/55cc66dd",
"env": {"python": "3.11.5", "pkg": {"scikit-learn": "1.5.1"}}
}
}
VII. 样例C:因果估计(双稳健与重叠校核)
{
"schema_version": "1.0.0",
"book_ref": "EFT.WP.Methods.CrossStats v1.0",
"release_tag": "causal-ATE-geoQ3",
"TraceID": "trc_CAUSAL_GEO_Q3",
"repro_hash": "sha256:HASH_CAUSAL_GEO_Q3",
"signature": "SIG_BASE64",
"timebase": {
"tau_mono_range": [1719782400, 1727568000],
"ts_range": ["2024-07-01T00:00:00Z", "2024-09-30T23:59:59Z"],
"offset/skew/J": {"offset": 0.0009, "skew": 1.8e-6, "J": 0.0021}
},
"arrival": { "two_forms": {"delta_form": 1.2e-6, "tol_Tarr": 5.0e-6} },
"window": {"Delta_t": "P92D"},
"dataset": {"N": 40211, "N_eff": 27894.5, "sampling": "observational"},
"weights": {"W_norm": 1.0007, "cap_w": 20.0, "p_trim": 0.7},
"metrics": {
"core": [
{"name": "ATE", "est": 1.84, "ci": [0.95, 2.73], "unit": "USD", "dim": "[M]"}
],
"causal": {"ATE": 1.84, "U": 0.62, "SMD_max": 0.06, "overlap_min": 0.04}
},
"contracts": [
{"id": "C30-400", "status": "pass", "severity": "info", "evidence": {"overlap_min": 0.04, "eps_ol": 0.02}},
{"id": "C30-401", "status": "pass", "severity": "info", "evidence": {"SMD_max": 0.06, "smd_max": 0.10}},
{"id": "C30-402", "status": "pass", "severity": "info", "evidence": {"ATE_IPW": 1.79, "ATE_OR": 1.88}},
{"id": "C30-350", "status": "pass", "severity": "info", "evidence": {"B": 2000}}
],
"actions": [
{"policy_id": "SC-COVER-01", "decision": "publish_readonly", "reason": "coverage met; trimmed weights applied", "at": "2024-10-01T10:00:00Z"}
],
"provenance": {
"data_uri": "warehouse://table/geo_q3",
"code_uri": "git+https://repo/commit/aa77bb88",
"env": {"python": "3.10.14", "pkg": {"econml": "0.15.0", "pymc": "5.13.1"}}
}
}
VIII. 校验与断言(自动化)
- 结构校验
required_keys ⊆ manifest.stats.keys();空数组字段不得省略必填项。 - 量纲校验
对 metrics.core[*] 执行 unit(x) 与 dim(x) 一致性检查:check_dim( est - ref_unit_transform(est) ) = true。 - 时基与到达时
non_decreasing(tau_mono_range);delta_form ≤ tol_Tarr。 - 权重与样本
|W_norm - 1| ≤ tol_w;若提供 N_eff,验证 N_eff = ( ( ∑ w )^2 ) / ( ∑ w^2 )。 - 契约与策略
contracts[*].id 属于 C30-* 命名空间;status ∈ {pass, fail};severity ∈ {info, warn, error, fatal};actions[*].policy_id 属于策略卡命名空间(如 SC-DRIFT-01)。
IX. 与契约库映射
- contracts[*] 对应附录B 的 C30-* 条目;evidence 携带触发度量,如 coverage_rate, alpha_spent, W1/KL/psi, SMD_max。
- actions[*] 对应附录B 策略卡 SC-*,记录决策、原因与时间戳。
- 失败条目需与《Methods.Cleaning v1.0》第10章的发布闸门对齐:assert_contract(ds, tests) -> report。
X. 追溯与签名规范
- repro_hash = hash_sha256( code_uri ∥ params ∥ data_fingerprint )
- signature = sign( private_key, repro_hash );校验 verify(public_key, signature, repro_hash) = true
- 必须记录 provenance.env,使得容器与包版本可重建。
XI. 版本与兼容性
- schema_version 采用 MAJOR.MINOR.PATCH。
MAJOR 变更可能破坏读取;MINOR 增加可选字段;PATCH 修复说明性注释。 - 向后兼容策略
读取器应在 MINOR 增量下以默认值回填缺失的新增字段(如 metrics.* 的可选子键)。
小结
- 本模板将本卷核心度量、到达时两口径、时基语义与契约-策略闭环统一落盘为 manifest.stats。
- 通过最小键集与样例(A/B、漂移、因果),实现跨系统的统计结果复用、合规审计与可回退发布。
版权与许可(CC BY 4.0)
版权声明:除另有说明外,《能量丝理论》(含文本、图表、插图、符号与公式)的著作权由作者(“屠广林”先生)享有。
许可方式:本作品采用 Creative Commons 署名 4.0 国际许可协议(CC BY 4.0)进行许可;在注明作者与来源的前提下,允许为商业或非商业目的进行复制、转载、节选、改编与再分发。
署名格式(建议):作者:“屠广林”;作品:《能量丝理论》;来源:energyfilament.org;许可证:CC BY 4.0。
首次发布: 2025-11-11|当前版本:v5.1
协议链接:https://creativecommons.org/licenses/by/4.0/