Data Pipeline Utils
Tools for loading, scaling, and encoding data. Located in Gradien.Preprocess.
Data Loading
DataLoader
Creates an iterator that yields batches of data from a dataset.
(
dataset: Dataset,
batchSize: number,
shuffle: boolean,
generator: { randint: (a,b)->number }?,
drop_last: boolean?
) -> IteratorScaling & Normalization
StandardScaler Parallel
Standardizes features by removing the mean and scaling to unit variance.
Formula: z = (x - mean) / std
() -> StandardScalerscaler:fit(X: Tensor)
local transformed = scaler:transform(X: Tensor)MinMaxScaler Parallel
Scales features to a specific range (default 0 to 1).
Formula: x_scaled = (x - min) / (max - min)
(min: number?, max: number?) -> MinMaxScalerscaler:fit(X: Tensor)
local transformed = scaler:transform(X: Tensor)RunningNorm Stream
Maintains a running mean and variance for scalar streams. Useful in Reinforcement Learning where the full dataset isn't available upfront.
(eps: number?) -> RunningNormnorm:update(x: number) -- Updates stats with new value
norm:normalize(x: number) -- Returns (x - mean) / std
norm:var() -- Current variance
norm:std() -- Current standard deviationEncoders & Transformers
OneHot Parallel
Creates a function that converts a list of class indices into a One-Hot encoded batch.
(numClasses: number) -> (indices: {number}) -> Tensorlocal encoder = Gradien.Preprocess.OneHot(10)
-- Batch of 3 samples with classes 1, 5, and 9
local batch = encoder({1, 5, 9})
-- Result: Tensor of shape {10, 3}PCA (Principal Component Analysis) Parallel
Performs dimensionality reduction by projecting data onto its principal components.
(X: Tensor, K: number, iters: number?) -> PCA_Model-- Projects new data X onto the K principal components found during init
local reduced = pca:transform(X_new)SinusoidalPE Parallel
Adds Sinusoidal Positional Embeddings to a sequence tensor. Crucial for Transformer models to understand order.
(x: Tensor, sequenceLength: number) -> Tensor-- x: {EmbeddingDim, Batch * SeqLen}
local output = Gradien.Preprocess.SinusoidalPE(x, 128)