Activations Parallel
Activation functions introduce non-linearity to the network. Located in Gradien.NN.Activations.
Standard
| Function | Definition |
|---|---|
ReLU(x) | max(0, x) |
Sigmoid(x) | 1 / (1 + exp(-x)) |
Tanh(x) | (exp(x) - exp(-x)) / (exp(x) + exp(-x)) |
Probability
Softmax
Converts a vector of values to a probability distribution. The elements of the output vector are in range (0, 1) and sum to 1.
lua
(logits: Tensor) -> Tensorlua
local probs = Gradien.NN.Softmax.forward(logits)Advanced
GELU
Gaussian Error Linear Unit. Often used in Transformers.
lua
(x: Tensor) -> TensorLeakyReLU
ReLU with a small slope for negative values to prevent dead neurons.
lua
(x: Tensor, alpha: number?) -> Tensor -- alpha defaults to 0.01ELU
Exponential Linear Unit.
lua
(x: Tensor, alpha: number?) -> Tensor -- alpha defaults to 1.0SwiGLU
Swish-Gated Linear Unit. Requires two inputs.
lua
(a: Tensor, b: Tensor) -> TensorSwiGLUSplit
Splits the input tensor into two halves and applies SwiGLU.
lua
(x: Tensor, hidden: number?) -> TensorSiLU (Swish)
x * sigmoid(x)
lua
(x: Tensor) -> TensorMish
x * tanh(ln(1 + exp(x)))
lua
(x: Tensor) -> TensorSeLU
Scaled Exponential Linear Unit.
lua
(x: Tensor, alpha: number?, lambda: number?) -> Tensor