Skip to content

Data Pipeline Utils

Tools for loading, scaling, and encoding data. Located in Gradien.Preprocess.

Data Loading

DataLoader

Creates an iterator that yields batches of data from a dataset.

lua
(
    dataset: Dataset,
    batchSize: number,
    shuffle: boolean,
    generator: { randint: (a,b)->number }?,
    drop_last: boolean?
) -> Iterator

Scaling & Normalization

StandardScaler Parallel

Standardizes features by removing the mean and scaling to unit variance.

Formula: z = (x - mean) / std

lua
() -> StandardScaler
lua
scaler:fit(X: Tensor)
local transformed = scaler:transform(X: Tensor)

MinMaxScaler Parallel

Scales features to a specific range (default 0 to 1).

Formula: x_scaled = (x - min) / (max - min)

lua
(min: number?, max: number?) -> MinMaxScaler
lua
scaler:fit(X: Tensor)
local transformed = scaler:transform(X: Tensor)

RunningNorm Stream

Maintains a running mean and variance for scalar streams. Useful in Reinforcement Learning where the full dataset isn't available upfront.

lua
(eps: number?) -> RunningNorm
lua
norm:update(x: number)      -- Updates stats with new value
norm:normalize(x: number)   -- Returns (x - mean) / std
norm:var()                  -- Current variance
norm:std()                  -- Current standard deviation

Encoders & Transformers

OneHot Parallel

Creates a function that converts a list of class indices into a One-Hot encoded batch.

lua
(numClasses: number) -> (indices: {number}) -> Tensor
lua
local encoder = Gradien.Preprocess.OneHot(10)
-- Batch of 3 samples with classes 1, 5, and 9
local batch = encoder({1, 5, 9}) 
-- Result: Tensor of shape {10, 3}

PCA (Principal Component Analysis) Parallel

Performs dimensionality reduction by projecting data onto its principal components.

lua
(X: Tensor, K: number, iters: number?) -> PCA_Model
lua
-- Projects new data X onto the K principal components found during init
local reduced = pca:transform(X_new)

SinusoidalPE Parallel

Adds Sinusoidal Positional Embeddings to a sequence tensor. Crucial for Transformer models to understand order.

lua
(x: Tensor, sequenceLength: number) -> Tensor
lua
-- x: {EmbeddingDim, Batch * SeqLen}
local output = Gradien.Preprocess.SinusoidalPE(x, 128)