Architecture Overview¶
RustyStats is a hybrid Rust/Python library. This chapter explains how the components fit together and the design principles behind the architecture.
High-Level Architecture¶
┌─────────────────────────────────────────────────────────────────┐
│ Python User Code │
│ import rustystats as rs; rs.glm(...).fit() │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Python API Layer │
│ python/rustystats/*.py │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────────┐ │
│ │ glm.py │ │formula.py│ │splines.py│ │diagnostics│ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └─────┬────┘ │
└────────┼───────────┼───────────┼────────────┼───────────────────┘
│ │ │ │
└───────────┴─────┬─────┴────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ PyO3 Bindings Layer │
│ crates/rustystats/src/lib.rs │
│ Converts Python objects ↔ Rust types using NumPy/PyO3 │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Rust Core Library │
│ crates/rustystats-core/src/ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────────┐ │
│ │families │ │ links │ │ solvers │ │inference │ │
│ └─────────┘ └─────────┘ └─────────┘ └──────────┘ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────────┐ │
│ │ splines │ │ formula │ │design_mx│ │diagnostics│ │
│ └─────────┘ └─────────┘ └─────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
Design Principles¶
1. Separation of Concerns¶
The codebase is split into three layers:
| Layer | Location | Responsibility |
|---|---|---|
| Python API | python/rustystats/ |
User-facing interface, DataFrame handling |
| PyO3 Bindings | crates/rustystats/ |
Type conversion, Python ↔ Rust bridge |
| Rust Core | crates/rustystats-core/ |
All mathematical computation |
2. Pure Rust Core¶
rustystats-core has no Python dependencies. It's a pure Rust library that could be used independently. Benefits:
- Testable without Python
- Could support other language bindings (R, Julia)
- Clear API boundary
3. Minimal Python Dependencies¶
The Python layer requires only numpy. Optional dependencies (polars) are imported lazily.
4. Trait-Based Extensibility¶
Core abstractions use Rust traits:
New families/links can be added by implementing these traits.
5. Parallel by Default¶
Computationally intensive operations use Rayon for automatic parallelization:
use rayon::prelude::*;
// Parallel matrix multiplication
let result = (0..n).into_par_iter()
.fold(|| init, |acc, i| compute(acc, i))
.reduce(|| init, |a, b| combine(a, b));
Crate Structure¶
rustystats-core¶
The pure Rust computation library:
crates/rustystats-core/
├── Cargo.toml
└── src/
├── lib.rs # Re-exports, module declarations
├── error.rs # Error types
├── families/ # Distribution families
│ ├── mod.rs # Family trait
│ ├── gaussian.rs
│ ├── poisson.rs
│ └── ...
├── links/ # Link functions
│ ├── mod.rs # Link trait
│ ├── identity.rs
│ ├── log.rs
│ └── logit.rs
├── solvers/ # Fitting algorithms
│ ├── mod.rs
│ ├── irls.rs # Main IRLS solver
│ └── coordinate_descent.rs
├── inference/ # Statistical inference
│ └── mod.rs # SEs, p-values, robust SEs
├── diagnostics/ # Model diagnostics
│ ├── mod.rs
│ ├── residuals.rs
│ ├── calibration.rs
│ ├── negbinomial.rs # NegBin theta estimation, log-likelihood
│ └── ...
├── splines/ # Spline basis functions
│ └── mod.rs
├── formula/ # Formula parsing
│ └── mod.rs
├── design_matrix/ # Design matrix construction
│ └── mod.rs
├── target_encoding/ # Target encoding
│ └── mod.rs
├── regularization/ # Penalty configuration
│ └── mod.rs
└── interactions/ # Interaction terms
└── mod.rs
rustystats (Python bindings)¶
The PyO3 bridge:
This single file:
- Wraps Rust types as Python classes (#[pyclass])
- Exposes functions to Python (#[pyfunction])
- Handles NumPy array conversion
Python Package¶
High-level Python API:
python/rustystats/
├── __init__.py # Public exports
├── glm.py # Summary formatting functions
├── formula.py # Formula API, glm()
├── families.py # Python family wrappers
├── links.py # Python link wrappers
├── splines.py # bs(), ns() functions
├── target_encoding.py # target_encode(), TargetEncoder
├── interactions.py # Interaction term utilities
└── diagnostics.py # ModelDiagnostics, explore_data()
Data Flow¶
Fitting a Model¶
User calls rs.glm("y ~ x1 + C(cat)", data).fit()
│
▼
┌──────────────────────────────────────────┐
│ python/rustystats/formula.py │
│ - Parse formula string │
│ - Extract columns from DataFrame │
│ - Build design matrix │
│ - Handle categoricals, splines, etc. │
│ - Call Rust via _rustystats │
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ crates/rustystats/src/lib.rs │
│ - Convert PyArray → ndarray::Array │
│ - Create Rust Family/Link objects │
│ - Call fit_glm_full() │
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ crates/rustystats-core/src/solvers/ │
│ - Run IRLS iterations │
│ - Compute X'WX, X'Wz (parallel) │
│ - Solve linear system │
│ - Return IRLSResult │
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ Back to Python │
│ - Wrap IRLSResult as PyGLMResults │
│ - Convert arrays back to NumPy │
│ - Return GLMModel to user │
└──────────────────────────────────────────┘
Error Handling¶
Rust Errors¶
Custom error type with context:
#[derive(Debug)]
pub enum RustyStatsError {
InvalidInput(String),
ConvergenceFailure { iterations: usize, tolerance: f64 },
NumericalError(String),
DimensionMismatch { expected: usize, got: usize },
}
Python Errors¶
Rust errors are converted to Python exceptions:
impl From<RustyStatsError> for PyErr {
fn from(err: RustyStatsError) -> PyErr {
PyValueError::new_err(format!("{}", err))
}
}
Memory Management¶
Zero-Copy When Possible¶
NumPy arrays can be viewed without copying:
fn process(arr: PyReadonlyArray1<f64>) -> ... {
let view = arr.as_array(); // No copy, just a view
// ... work with view
}
Copies When Necessary¶
When Rust needs ownership or the array will be modified:
Returning to Python¶
Arrays are converted back to NumPy:
Thread Safety¶
Rust Side¶
All traits require Send + Sync:
This allows parallel iteration with Rayon.
Python GIL¶
PyO3 handles the GIL automatically. Rust code releases the GIL during computation:
py.allow_threads(|| {
// This code runs without the GIL
// Python can run other threads
expensive_computation()
})
Testing Strategy¶
Rust Unit Tests¶
Each module has tests:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_variance_function() {
let family = PoissonFamily;
let mu = array![1.0, 2.0, 3.0];
let var = family.variance(&mu);
assert_eq!(var, mu); // Poisson: V(μ) = μ
}
}
Run with: cargo test -p rustystats-core
Python Integration Tests¶
Located in tests/python/:
def test_poisson_fit():
data = pl.DataFrame({
"y": np.random.poisson(5, 100),
"x": np.random.randn(100),
})
result = rs.glm("y ~ x", data, family="poisson").fit()
assert result.converged
Run with: uv run pytest tests/python/ -v
Comparison Tests¶
Compare against statsmodels:
def test_vs_statsmodels():
data = pl.DataFrame({"y": y, "x": x})
# Fit with RustyStats
rs_result = rs.glm("y ~ x", data, family="gaussian").fit()
# Fit with statsmodels
sm_result = sm.GLM(y, sm.add_constant(x), family=sm.families.Gaussian()).fit()
# Compare coefficients
np.testing.assert_allclose(rs_result.params, sm_result.params, rtol=1e-5)
Build System¶
Maturin¶
The Python package is built using maturin:
# pyproject.toml
[build-system]
requires = ["maturin>=1.4"]
build-backend = "maturin"
[tool.maturin]
manifest-path = "crates/rustystats/Cargo.toml"
python-source = "python"
module-name = "rustystats._rustystats"