Skip to content

Activations Parallel

Activation functions introduce non-linearity to the network. Located in Gradien.NN.Activations.

Standard

FunctionDefinition
ReLU(x)max(0, x)
Sigmoid(x)1 / (1 + exp(-x))
Tanh(x)(exp(x) - exp(-x)) / (exp(x) + exp(-x))

Probability

Softmax

Converts a vector of values to a probability distribution. The elements of the output vector are in range (0, 1) and sum to 1.

lua
(logits: Tensor) -> Tensor
lua
local probs = Gradien.NN.Softmax.forward(logits)

Advanced

GELU

Gaussian Error Linear Unit. Often used in Transformers.

lua
(x: Tensor) -> Tensor

LeakyReLU

ReLU with a small slope for negative values to prevent dead neurons.

lua
(x: Tensor, alpha: number?) -> Tensor -- alpha defaults to 0.01

ELU

Exponential Linear Unit.

lua
(x: Tensor, alpha: number?) -> Tensor -- alpha defaults to 1.0

SwiGLU

Swish-Gated Linear Unit. Requires two inputs.

lua
(a: Tensor, b: Tensor) -> Tensor

SwiGLUSplit

Splits the input tensor into two halves and applies SwiGLU.

lua
(x: Tensor, hidden: number?) -> Tensor

SiLU (Swish)

x * sigmoid(x)

lua
(x: Tensor) -> Tensor

Mish

x * tanh(ln(1 + exp(x)))

lua
(x: Tensor) -> Tensor

SeLU

Scaled Exponential Linear Unit.

lua
(x: Tensor, alpha: number?, lambda: number?) -> Tensor