Skip to content

RustyStats Documentation

High-performance Generalized Linear Models with a Rust backend and Python API

RustyStats is a statistical modeling library designed for actuarial and data science applications. It combines the performance of Rust with the ease of use of Python, providing a statsmodels-compatible API with significant performance improvements.

Why RustyStats?

Feature RustyStats Statsmodels
Parallel IRLS Solver ✅ Multi-threaded via Rayon ❌ Single-threaded
Native Polars Support ✅ Formula API with Polars ❌ Pandas only
Built-in Lasso/Elastic Net ✅ All GLM families ⚠️ Limited
Performance 5-23x faster Baseline

Quick Example

import rustystats as rs
import polars as pl

data = pl.read_parquet("insurance.parquet")

# Dict API (recommended for production)
result = rs.glm_dict(
    response="ClaimCount",
    terms={
        "VehPower": {"type": "linear"},
        "VehAge": {"type": "bs"},  # Penalized smooth
        "Area": {"type": "categorical"},
        "Region": {"type": "categorical"},
        "Brand": {"type": "target_encoding"},
    },
    data=data,
    family="poisson",
    offset="Exposure",
).fit()

print(result.summary())
print(result.relativities())  # exp(coef) for pricing

Documentation Structure

This documentation is organized for maintainers who may be new to Rust and/or GLMs:

API Reference

For Understanding the Math

For Understanding the Code

For Maintaining the Code

Key Features

Distribution Families

  • Gaussian - Continuous data (linear regression)
  • Poisson - Count data (claim frequency)
  • Binomial - Binary outcomes (logistic regression)
  • Gamma - Positive continuous (claim severity)
  • Tweedie - Mixed zeros and positives (pure premium)
  • QuasiPoisson/QuasiBinomial - Overdispersed data
  • Negative Binomial - Alternative for overdispersed counts

Advanced Features

  • Regularization - Ridge, Lasso, Elastic Net with automatic CV-based alpha selection
  • Splines - B-splines and natural splines for non-linear effects
  • Target Encoding - Target and frequency encoding for high-cardinality categoricals
  • Coefficient Constraints - Monotonicity constraints on coefficients
  • Robust Standard Errors - HC0, HC1, HC2, HC3 sandwich estimators
  • Model Diagnostics - Calibration, discrimination, base model comparison

Installation

# Development installation
cd rustystats
uv run maturin develop

# Run tests
uv run pytest tests/python/ -v

Project Structure

rustystats/
├── crates/
│   ├── rustystats-core/        # Pure Rust GLM library (no Python deps)
│   │   ├── Cargo.toml
│   │   └── src/
│   │       ├── lib.rs          # Crate entry, re-exports
│   │       ├── error.rs        # Error types
│   │       ├── families/       # Distribution families (Gaussian, Poisson, etc.)
│   │       ├── links/          # Link functions (Identity, Log, Logit)
│   │       ├── solvers/        # IRLS, coordinate descent
│   │       ├── inference/      # Standard errors, p-values, robust SEs
│   │       ├── diagnostics/    # Residuals, calibration, discrimination
│   │       ├── splines/        # B-splines, natural splines
│   │       ├── formula/        # Formula parsing
│   │       ├── design_matrix/  # Design matrix construction
│   │       ├── regularization/ # Lasso, Ridge, Elastic Net
│   │       ├── target_encoding/# Ordered target encoding
│   │       └── interactions/   # Interaction term handling
│   │
│   └── rustystats/             # Python bindings (PyO3)
│       ├── Cargo.toml
│       └── src/lib.rs          # PyO3 wrappers, NumPy conversion
├── python/rustystats/          # Python package
│   ├── __init__.py             # Public API exports
│   ├── glm.py                  # Summary formatting functions
│   ├── formula.py              # Formula API, glm()
│   ├── families.py             # Python family wrappers
│   ├── links.py                # Python link wrappers
│   ├── splines.py              # bs(), ns() functions
│   ├── diagnostics.py          # ModelDiagnostics, explore_data()
│   ├── interactions.py         # Interaction utilities
│   └── target_encoding.py      # TargetEncoder class
├── tests/
│   └── python/                 # Python integration tests
│       ├── test_glm.py
│       ├── test_families.py
│       └── ...
├── docs/                       # MkDocs documentation (you are here)
├── examples/                   # Jupyter notebook examples
├── Cargo.toml                  # Workspace configuration
├── pyproject.toml              # Python build config (maturin)
└── mkdocs.yml                  # Documentation config