RustyStats Documentation¶
High-performance Generalized Linear Models with a Rust backend and Python API
RustyStats is a statistical modeling library designed for actuarial and data science applications. It combines the performance of Rust with the ease of use of Python, providing a statsmodels-compatible API with significant performance improvements.
Why RustyStats?¶
| Feature | RustyStats | Statsmodels |
|---|---|---|
| Parallel IRLS Solver | ✅ Multi-threaded via Rayon | ❌ Single-threaded |
| Native Polars Support | ✅ Formula API with Polars | ❌ Pandas only |
| Built-in Lasso/Elastic Net | ✅ All GLM families | ⚠️ Limited |
| Performance | 5-23x faster | Baseline |
Quick Example¶
import rustystats as rs
import polars as pl
data = pl.read_parquet("insurance.parquet")
# Dict API (recommended for production)
result = rs.glm_dict(
response="ClaimCount",
terms={
"VehPower": {"type": "linear"},
"VehAge": {"type": "bs"}, # Penalized smooth
"Area": {"type": "categorical"},
"Region": {"type": "categorical"},
"Brand": {"type": "target_encoding"},
},
data=data,
family="poisson",
offset="Exposure",
).fit()
print(result.summary())
print(result.relativities()) # exp(coef) for pricing
Documentation Structure¶
This documentation is organized for maintainers who may be new to Rust and/or GLMs:
API Reference¶
- Dict API - Primary API for programmatic model building
- Results Object - GLMResults and GLMModel methods
- Diagnostics API - Model diagnostics and exploration
- Model Serialization - Save and load fitted models
For Understanding the Math¶
- GLM Theory - Complete mathematical foundation
- Distribution Families - Variance functions and when to use each
- Link Functions - Connecting linear predictors to means
- IRLS Algorithm - How GLMs are actually fitted
For Understanding the Code¶
- Architecture Overview - How components connect
- Rust Core Library - The computational engine
- Python Bindings - PyO3 bridge layer
For Maintaining the Code¶
- Rust Best Practices - Code style and patterns
- Adding New Components - Extending the library
- Testing Strategy - Test organization and practices
Key Features¶
Distribution Families¶
- Gaussian - Continuous data (linear regression)
- Poisson - Count data (claim frequency)
- Binomial - Binary outcomes (logistic regression)
- Gamma - Positive continuous (claim severity)
- Tweedie - Mixed zeros and positives (pure premium)
- QuasiPoisson/QuasiBinomial - Overdispersed data
- Negative Binomial - Alternative for overdispersed counts
Advanced Features¶
- Regularization - Ridge, Lasso, Elastic Net with automatic CV-based alpha selection
- Splines - B-splines and natural splines for non-linear effects
- Target Encoding - Target and frequency encoding for high-cardinality categoricals
- Coefficient Constraints - Monotonicity constraints on coefficients
- Robust Standard Errors - HC0, HC1, HC2, HC3 sandwich estimators
- Model Diagnostics - Calibration, discrimination, base model comparison
Installation¶
# Development installation
cd rustystats
uv run maturin develop
# Run tests
uv run pytest tests/python/ -v
Project Structure¶
rustystats/
├── crates/
│ ├── rustystats-core/ # Pure Rust GLM library (no Python deps)
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs # Crate entry, re-exports
│ │ ├── error.rs # Error types
│ │ ├── families/ # Distribution families (Gaussian, Poisson, etc.)
│ │ ├── links/ # Link functions (Identity, Log, Logit)
│ │ ├── solvers/ # IRLS, coordinate descent
│ │ ├── inference/ # Standard errors, p-values, robust SEs
│ │ ├── diagnostics/ # Residuals, calibration, discrimination
│ │ ├── splines/ # B-splines, natural splines
│ │ ├── formula/ # Formula parsing
│ │ ├── design_matrix/ # Design matrix construction
│ │ ├── regularization/ # Lasso, Ridge, Elastic Net
│ │ ├── target_encoding/# Ordered target encoding
│ │ └── interactions/ # Interaction term handling
│ │
│ └── rustystats/ # Python bindings (PyO3)
│ ├── Cargo.toml
│ └── src/lib.rs # PyO3 wrappers, NumPy conversion
│
├── python/rustystats/ # Python package
│ ├── __init__.py # Public API exports
│ ├── glm.py # Summary formatting functions
│ ├── formula.py # Formula API, glm()
│ ├── families.py # Python family wrappers
│ ├── links.py # Python link wrappers
│ ├── splines.py # bs(), ns() functions
│ ├── diagnostics.py # ModelDiagnostics, explore_data()
│ ├── interactions.py # Interaction utilities
│ └── target_encoding.py # TargetEncoder class
│
├── tests/
│ └── python/ # Python integration tests
│ ├── test_glm.py
│ ├── test_families.py
│ └── ...
│
├── docs/ # MkDocs documentation (you are here)
├── examples/ # Jupyter notebook examples
├── Cargo.toml # Workspace configuration
├── pyproject.toml # Python build config (maturin)
└── mkdocs.yml # Documentation config