Skip to content

Glossary

Terms and concepts used throughout RustyStats documentation.


A

AIC (Akaike Information Criterion)

A measure of model quality balancing fit and complexity: AIC = -2×loglik + 2p. Lower is better.

Alpha (α)

Regularization strength parameter. Higher α means more penalty.


B

BIC (Bayesian Information Criterion)

Similar to AIC but with stronger penalty for complexity: BIC = -2×loglik + p×log(n).

B-spline

Piecewise polynomial basis functions defined by knots. See Splines.

Binomial Family

Distribution for binary (0/1) or proportion data. Variance: V(μ) = μ(1-μ).


C

The "natural" link function for a family, derived from the exponential family form.

Coordinate Descent

Optimization algorithm that updates one variable at a time. Used for regularized GLMs.

Cross-Validation (CV)

Technique for model selection by training on subsets and testing on holdout data.


D

Design Matrix

The matrix X containing predictor values. Each row is an observation, each column a feature.

Deviance

A measure of model fit: D = 2×[ℓ(saturated) - ℓ(fitted)]. Lower is better.

Dispersion (φ)

Scale parameter relating variance to the variance function: Var(Y) = φ×V(μ).

Dummy Coding

Representing a categorical variable with k levels using k-1 binary columns.


E

Elastic Net

Regularization combining L1 and L2 penalties: α×[ρ×‖β‖₁ + (1-ρ)/2×‖β‖₂²].

Exponential Family

Class of probability distributions with specific form, including Gaussian, Poisson, Binomial.


F

Family

The distribution of the response variable in a GLM. See Families.

Fisher Scoring

Variant of Newton-Raphson using expected Hessian. Equivalent to IRLS for GLMs.

Fitted Values

Predicted means μ̂ = g⁻¹(Xβ̂).


G

Gamma Family

Distribution for positive continuous data. Variance: V(μ) = μ².

Gaussian Family

Normal distribution. Variance: V(μ) = 1 (constant).

Gini Coefficient

Measure of model discrimination: Gini = 2×AUC - 1. Range [0, 1].

GLM (Generalized Linear Model)

Extension of linear regression allowing different response distributions and link functions.


H

HC0, HC1, HC2, HC3

Heteroscedasticity-consistent standard error estimators. See Robust SE.

Hosmer-Lemeshow Test

Statistical test for model calibration.


I

Link function g(μ) = μ. Used with Gaussian family.

Intercept

The constant term β₀ in the linear predictor.

IRLS (Iteratively Reweighted Least Squares)

Algorithm for fitting GLMs. See IRLS.


L

L1 Ratio

Parameter controlling Elastic Net mix: 1.0 = Lasso, 0.0 = Ridge.

Lasso

L1-penalized regression that performs variable selection: min D(β) + α‖β‖₁.

Linear Predictor (η)

The linear combination η = Xβ + offset.

Function g(μ) connecting mean to linear predictor. See Links.

Link function g(μ) = log(μ). Ensures positive predictions.

Link function g(μ) = log(μ/(1-μ)). Maps (0,1) to ℝ.

Log-Likelihood

The log of the probability of observing the data given the model.


M

Maximum Likelihood Estimation (MLE)

Finding parameters that maximize the likelihood of the data.


N

Natural Spline

B-spline with additional constraints for linear extrapolation at boundaries.

Negative Binomial

Distribution for overdispersed counts. Variance: V(μ) = μ + μ²/θ.

Newton-Raphson

Iterative optimization algorithm using gradient and Hessian.

Null Deviance

Deviance of the intercept-only model.


O

Odds Ratio

exp(β) in logistic regression. Multiplicative effect on odds.

Offset

Known component added to linear predictor but not estimated. Example: log(exposure).

Overdispersion

When observed variance exceeds model-predicted variance.


P

P-value

Probability of observing test statistic as extreme as computed, under null hypothesis.

Pearson Residual

Standardized residual: (y - μ) / √V(μ).

Poisson Family

Distribution for count data. Variance: V(μ) = μ.

Prior Weight

Known weight for each observation, used in likelihood.

Probit

Link function using inverse normal CDF: g(μ) = Φ⁻¹(μ).

PyO3

Rust library for Python bindings.


Q

QuasiPoisson / QuasiBinomial

Quasi-likelihood families that estimate dispersion from data.


R

Rayon

Rust library for data parallelism.

Reference Level

The omitted category in dummy coding (absorbed into intercept).

Regularization

Adding penalty to prevent overfitting. See Regularization.

Relativity

exp(β) for log-link models. Multiplicative effect on mean.

Residual

Difference between observed and predicted: various types exist.

Ridge

L2-penalized regression: min D(β) + α‖β‖₂².

Robust Standard Errors

Standard errors that account for model misspecification.


S

Sandwich Estimator

Robust covariance estimator: (X'WX)⁻¹ × Meat × (X'WX)⁻¹.

Score Function

Derivative of log-likelihood with respect to parameters.

Soft Thresholding

Operation used in Lasso: S(z,γ) = sign(z)×max(|z|-γ, 0).

Spline

Piecewise polynomial function. See Splines.

Standard Error (SE)

Estimated standard deviation of a parameter estimate.


T

Target Encoding

Replacing categories with target-based statistics. See Target Encoding.

Theta (θ)

Negative Binomial dispersion parameter. Larger θ = less overdispersion.

Tweedie

Family with variance V(μ) = μᵖ. Useful for insurance pure premiums.


U

Unit Deviance

Per-observation contribution to total deviance.


V

Variance Function

V(μ) defining how variance relates to mean: Var(Y) = φ×V(μ).

Variance Power (p)

Parameter in Tweedie family: V(μ) = μᵖ.


W

Wald Test

Hypothesis test using (β̂/SE)² as test statistic.

Weights

Prior observation weights or IRLS working weights.

Working Response

Linearized response in IRLS: z = η + (y - μ)×g'(μ).

Working Weights

IRLS weights: W = 1/[V(μ)×g'(μ)²].


Z

z-statistic

Test statistic: z = β̂ / SE(β̂). Used for hypothesis testing.