Optimizers Optimization
Gradien includes a wide variety of optimizers for different use cases.
Adaptive Methods
Adam
Adaptive Moment Estimation. Good default. Supports L2 weight decay (applied to gradients).
lua
(params: {Tensor}, lr: number, b1: num?, b2: num?, eps: num?, weight_decay: num?) -> Optimizerlua
local opt = Gradien.Optim.Adam(params, 0.001, 0.9, 0.999, 1e-8, 0.01) -- with weight decayNote: Adam with weight_decay applies L2 regularization by modifying gradients. For decoupled weight decay, use AdamW instead.
AdamW Modern
Adam with decoupled weight decay. Often generalizes better than Adam.
lua
(
params: {Tensor},
lr: number,
wd: number, -- Weight Decay
b1: number?,
b2: number?,
eps: number?
) -> OptimizerLion New
Evolved Sign Momentum. Memory efficient and often faster than Adam.
lua
(
params: {Tensor},
lr: number,
beta1: number?, -- default 0.9
beta2: number?, -- default 0.99
weightDecay: number?
) -> OptimizerAdafactor
Memory-efficient adaptive optimization. Can operate in 2D factored mode to save memory on large matrices.
lua
(
params: {Tensor},
lr: number,
beta2: number?,
eps: number?,
clipThreshold: number?,
weightDecay: number?
) -> OptimizerRMSProp
Maintains a moving average of the squared gradient.
lua
(params: {Tensor}, lr: number, decay: number?, eps: number?) -> OptimizerAdagrad
Adapts learning rates based on the frequency of parameters updates.
lua
(params: {Tensor}, lr: number, eps: number?) -> OptimizerStochastic Methods
SGD
Stochastic Gradient Descent with Momentum and Nesterov acceleration.
lua
(params: {Tensor}, lr: number, momentum: number?, nesterov: boolean?) -> Optimizer