Skip to content

RL Agents Module

Gradien provides a unified API for Reinforcement Learning agents.

Agent Types

DQL & DoubleDQN Off-Policy

Deep Q-Learning algorithms. DoubleDQN reduces overestimation bias.

lua
{
    actionDim: number,
    batchSize: number,
    gamma: number,
    epsilonStart: number?,
    epsilonEnd: number?,
    epsilonDecay: number?,
    modelFactory: () -> Module,
    optimizerFactory: (params) -> Optimizer,
    replay: ReplayBuffer?,
    targetSyncInterval: number?,
    tau: number? -- Soft update factor
}

PPO On-Policy

Proximal Policy Optimization. Stable and efficient.

lua
{
    policy: Module,
    value: Module,
    gamma: number,
    lam: number,
    clip: number,
    epochs: number,
    minBatch: number?,
    maxBuffer: number?,
    optimizerFactory: (params) -> Optimizer
}

A2C On-Policy

Advantage Actor-Critic.

lua
{
    policy: Module,
    value: Module,
    gamma: number,
    minBatch: number?,
    optimizerFactory: (params) -> Optimizer
}

Common Interface

:act

lua
(state: Tensor, stepIndex: number?) -> number

:observe

lua
(transition: {state: Tensor, action: number, reward: number, nextState: Tensor, done: boolean}) -> ()

:trainStep Parallel

lua
() -> { loss: number, avgReturn: number? }?

:getPolicy

lua
() -> Module

:loadParameters (DQN only)

lua
(snapshot: any, strict: boolean?) -> ()