目录文档-技术白皮书44-EFT.WP.Data.ModelCards v1.0

第10章 目标函数、优化与超参


I. 章节目的与范围

、搜索空间与取值、随机性与停止准则、正则化与约束、学习率与调度器、混合精度与梯度裁剪、早停与回滚策略;确保与《任务与 I/O》《训练数据与采样绑定》《评测协议与指标》《预处理与特征工程》及计量章一致。规范性定义固化模型卡中 optimization 与 hyperparams 的

II. 字段与结构(规范性)

optimization:

objective: # 目标函数与权重

name: "<cross_entropy|mse|mae|nll|ctc|triplet|contrastive|custom>"

reduction: "<mean|sum|none>"

weights?: {class:"<inverse_freq|log_inv|custom>", pos_neg: 1.0}

formula?: "L(θ) = ( E_{(x,y)∼D} [ ℓ(f_θ(x), y ) ] )" # 纯文本

regularization: # 正则化与约束

weight_decay: 0.05

l1: 0.0

label_smoothing: 0.0

grad_clip: {type:"<norm|value>", value: 1.0}

constraints?: ["orthogonal_init","spectral_norm"]

optimizer: # 优化器

name: "<sgd|adam|adamw|lamb|adagrad|lion|custom>"

lr: 3.0e-4

betas?: [0.9, 0.999]

momentum?: 0.9

eps?: 1.0e-8

weight_decay?: 0.05

amsgrad?: false

scheduler: # 学习率/温度/权重调度

name: "<cosine|step|multistep|linear|poly|plateau|onecycle|custom>"

warmup:

steps: 500

mode: "<linear|cosine|none>"

params?: {step_size: 30, gamma: 0.1}

early_stopping: # 早停与回滚

monitor: "val/f1_macro"

mode: "max"

patience: 12

min_delta: 0.0

rollback: true

precision: # 精度与缩放

amp: {train:"<fp16|bf16|fp32>", infer:"<fp16|bf16|fp32>", loss_scale:"<dynamic|static|none>"}

seeds: # 随机性与可复现

global: 1701

per_phase?: {train:[1701,1702,1703], eval:[1701]}

stopping_criteria: # 停止准则(除早停外)

max_epochs: 200

max_steps?: null

wallclock_hours?: null

budget: # 资源/搜索预算

gpu_hours: 120

trials: 32

notes?: "<non-normative>"

hyperparams:

batch_size: 256

accum_steps: 1

epochs: 200

grad_accum?: true

dropout: 0.1

label_smoothing?: 0.0

temperature?: null

mixup_cutmix?: {mixup_alpha:0.0, cutmix_alpha:0.0}

search_space?: # 超参搜索空间(可选)

lr: {type:"loguniform", low:1.0e-5, high:1.0e-3}

weight_decay: {type:"loguniform", low:1.0e-5, high:1.0e-1}

batch_size: {type:"choice", values:[128,256,512]}

search_algo?: "<grid|random|bayes|evolution|pbt>"

search_seed?: 1701


III. 目标函数与加权口径


IV. 优化器、学习率与调度


V. 正则化与梯度约束


VI. 随机性、停止与预算


VII. 计量与单位(涉物理/时间/频率/性能)

  1. 学习率、时延、吞吐、能耗等字段需声明单位与计量口径,并通过 check_dim;
  2. 若目标或约束涉及路径依赖量(如 T_arr),需登记 delta_form、以及两种等价表达之一进行一致性校验:
    • T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
    • T_arr = ( ∫ ( n_eff / c_ref ) d ell )。

VIII. 机器可读片段(可直接嵌入)

optimization:

objective: {name:"cross_entropy", reduction:"mean", weights:{class:"inverse_freq"}}

regularization: {weight_decay:0.05, label_smoothing:0.0, grad_clip:{type:"norm", value:1.0}}

optimizer: {name:"adamw", lr:3.0e-4, betas:[0.9,0.999], eps:1.0e-8, weight_decay:0.05}

scheduler:

name:"cosine"

warmup: {steps:500, mode:"linear"}

early_stopping: {monitor:"val/f1_macro", mode:"max", patience:12, rollback:true}

precision: {amp:{train:"bf16", infer:"bf16", loss_scale:"dynamic"}}

seeds: {global:1701}

stopping_criteria: {max_epochs:200}

budget: {gpu_hours:120, trials:32}

hyperparams:

batch_size: 256

accum_steps: 1

epochs: 200

dropout: 0.1

search_space:

lr: {type:"loguniform", low:1.0e-5, high:1.0e-3}

weight_decay: {type:"loguniform", low:1.0e-5, high:1.0e-1}

batch_size: {type:"choice", values:[128,256,512]}

search_algo: "bayes"

search_seed: 1701


IX. 与评测协议、架构与特征的一致性


X. 导出清单与审计轨

export_manifest:

artifacts:

- {path:"opt/hparams.yaml", sha256:"..."}

- {path:"opt/search_space.yaml", sha256:"..."}

- {path:"opt/search_trials.csv", sha256:"..."}

references:

- "EFT.WP.Core.DataSpec v1.0:EXPORT"

- "EFT.WP.Core.Metrology v1.0:check_dim"

可校验并与模型卡字段一致。必须搜索空间、试验记录与最终超参

XI. 本章合规自检


版权与许可(CC BY 4.0)

版权声明:除另有说明外,《能量丝理论》(含文本、图表、插图、符号与公式)的著作权由作者(“屠广林”先生)享有。
许可方式:本作品采用 Creative Commons 署名 4.0 国际许可协议(CC BY 4.0)进行许可;在注明作者与来源的前提下,允许为商业或非商业目的进行复制、转载、节选、改编与再分发。
署名格式(建议):作者:“屠广林”;作品:《能量丝理论》;来源:energyfilament.org;许可证:CC BY 4.0。

首次发布: 2025-11-11|当前版本:v5.1
协议链接:https://creativecommons.org/licenses/by/4.0/