RL Agents Module
Gradien provides a unified API for Reinforcement Learning agents.
Agent Types
DQL & DoubleDQN Off-Policy
Deep Q-Learning algorithms. DoubleDQN reduces overestimation bias.
lua
{
actionDim: number,
batchSize: number,
gamma: number,
epsilonStart: number?,
epsilonEnd: number?,
epsilonDecay: number?,
modelFactory: () -> Module,
optimizerFactory: (params) -> Optimizer,
replay: ReplayBuffer?,
targetSyncInterval: number?,
tau: number? -- Soft update factor
}PPO On-Policy
Proximal Policy Optimization. Stable and efficient.
lua
{
policy: Module,
value: Module,
gamma: number,
lam: number,
clip: number,
epochs: number,
minBatch: number?,
maxBuffer: number?,
optimizerFactory: (params) -> Optimizer
}A2C On-Policy
Advantage Actor-Critic.
lua
{
policy: Module,
value: Module,
gamma: number,
minBatch: number?,
optimizerFactory: (params) -> Optimizer
}Common Interface
:act
lua
(state: Tensor, stepIndex: number?) -> number:observe
lua
(transition: {state: Tensor, action: number, reward: number, nextState: Tensor, done: boolean}) -> ():trainStep Parallel
lua
() -> { loss: number, avgReturn: number? }?:getPolicy
lua
() -> Module:loadParameters (DQN only)
lua
(snapshot: any, strict: boolean?) -> ()