RustyStats Code Walkthrough
This guide walks through the actual RustyStats source code, explaining every Rust syntax element. Each section takes a real function from the codebase and annotates it line-by-line.
Goal: After reading this, you should be able to read and modify any function in the codebase.
Part 1: Understanding Rust Syntax Basics
Before diving into the code, here's a quick reference for Rust syntax you'll encounter:
| Syntax |
Meaning |
Example |
fn name() -> T |
Function returning type T |
fn variance() -> Array1<f64> |
& |
Immutable reference (borrow) |
&self, &Array1<f64> |
&mut |
Mutable reference |
&mut self |
pub |
Public (visible outside module) |
pub fn fit_glm() |
let |
Variable binding (immutable) |
let x = 5; |
let mut |
Mutable variable |
let mut count = 0; |
:: |
Path separator |
Array1::zeros(n) |
<T> |
Generic type parameter |
Vec<f64> |
impl |
Implementation block |
impl Family for PoissonFamily |
self |
Current instance |
self.name() |
? |
Propagate error if Result is Err |
result? |
\|x\| expr |
Closure (anonymous function) |
\|x\| x * 2 |
.iter() |
Create iterator |
vec.iter() |
.map() |
Transform each element |
.map(\|x\| x + 1) |
.collect() |
Gather iterator into collection |
.collect::<Vec<_>>() |
Part 2: The Family Trait
File: crates/rustystats-core/src/families/mod.rs
A trait defines shared behavior. Every distribution family implements this trait.
/// The Family trait defines the interface for all distribution families.
pub trait Family: Send + Sync {
Line-by-line breakdown:
| Code |
Explanation |
/// |
Documentation comment (appears in generated docs) |
pub |
This trait is public (can be used outside this module) |
trait Family |
Declares a trait named Family |
: Send + Sync |
Trait bounds: Types implementing Family must also implement Send and Sync (required for thread safety with Rayon) |
{ |
Start of trait definition |
Trait Methods
/// Returns the name of this family.
fn name(&self) -> &str;
| Code |
Explanation |
fn name |
Method named name |
(&self) |
Takes an immutable reference to self (the instance) |
-> &str |
Returns a string slice (borrowed string) |
; |
No body—this is a required method that implementors must define |
/// Compute the variance function V(μ).
fn variance(&self, mu: &Array1<f64>) -> Array1<f64>;
| Code |
Explanation |
mu: &Array1<f64> |
Parameter mu is a reference to a 1D array of f64 (64-bit floats) |
-> Array1<f64> |
Returns an owned array (not a reference) |
/// Compute the total deviance with optional weights.
fn deviance(&self, y: &Array1<f64>, mu: &Array1<f64>, weights: Option<&Array1<f64>>) -> f64 {
let unit_dev = self.unit_deviance(y, mu);
match weights {
Some(w) => (&unit_dev * w).sum(),
None => unit_dev.sum(),
}
}
| Code |
Explanation |
weights: Option<&Array1<f64>> |
Option type: either Some(value) or None. This makes the parameter optional. |
-> f64 |
Returns a single float |
{ ... } |
Has a body—this is a default implementation (can be overridden) |
let unit_dev = ... |
Bind result to immutable variable |
self.unit_deviance(y, mu) |
Call another method on self |
match weights { ... } |
Pattern matching on the Option |
Some(w) => |
If weights is Some, bind the inner value to w |
(&unit_dev * w).sum() |
Element-wise multiply arrays, then sum |
None => |
If weights is None |
unit_dev.sum() |
Just sum without weighting |
Part 3: Implementing a Family (PoissonFamily)
File: crates/rustystats-core/src/families/poisson.rs
The Struct Definition
use ndarray::Array1;
use crate::links::{Link, LogLink};
use super::Family;
#[derive(Debug, Clone, Copy)]
pub struct PoissonFamily;
| Code |
Explanation |
use ndarray::Array1 |
Import Array1 from the ndarray crate |
use crate::links::{Link, LogLink} |
Import from our own crate's links module |
use super::Family |
Import Family from parent module (super = parent) |
#[derive(...)] |
Derive macro: auto-generate trait implementations |
Debug |
Enables {:?} formatting for debugging |
Clone |
Enables .clone() method |
Copy |
Type can be copied implicitly (no move semantics) |
pub struct PoissonFamily; |
Empty struct (no fields)—a "unit struct" |
The Implementation Block
impl Family for PoissonFamily {
fn name(&self) -> &str {
"Poisson"
}
| Code |
Explanation |
impl Family for PoissonFamily |
"Implement the Family trait for PoissonFamily" |
fn name(&self) -> &str |
Must match trait signature exactly |
"Poisson" |
String literal; last expression without ; is the return value |
The Variance Function
fn variance(&self, mu: &Array1<f64>) -> Array1<f64> {
mu.clone()
}
| Code |
Explanation |
mu.clone() |
Create a copy of the array. For Poisson, V(μ) = μ, so we return a copy of mu. |
The Unit Deviance Function
fn unit_deviance(&self, y: &Array1<f64>, mu: &Array1<f64>) -> Array1<f64> {
ndarray::Zip::from(y)
.and(mu)
.map_collect(|&yi, &mui| {
if yi == 0.0 {
2.0 * mui
} else {
2.0 * (yi * (yi / mui).ln() - (yi - mui))
}
})
}
| Code |
Explanation |
ndarray::Zip::from(y) |
Create a parallel zipper starting with array y |
.and(mu) |
Add array mu to the zip |
.map_collect(\|&yi, &mui\| { ... }) |
Apply closure to each pair, collect into new array |
\|&yi, &mui\| |
Closure parameters. & pattern-matches the reference to get the value. |
if yi == 0.0 { ... } else { ... } |
Conditional expression (returns a value) |
(yi / mui).ln() |
Divide, then natural log. Method chaining. |
Returning a Boxed Trait Object
fn default_link(&self) -> Box<dyn Link> {
Box::new(LogLink)
}
| Code |
Explanation |
Box<dyn Link> |
A heap-allocated trait object. dyn Link means "some type implementing Link" |
Box::new(LogLink) |
Allocate LogLink on the heap and return a pointer |
Why Box? The trait method must return a consistent type, but different families return different link types. Box<dyn Link> erases the concrete type, allowing runtime polymorphism.
Array Mapping
fn initialize_mu(&self, y: &Array1<f64>) -> Array1<f64> {
y.mapv(|yi| (yi + 0.1).max(0.1))
}
| Code |
Explanation |
y.mapv(\|yi\| ...) |
Map over values. mapv = map values (as opposed to map which maps references) |
(yi + 0.1).max(0.1) |
Add 0.1, then take max with 0.1. Ensures result ≥ 0.1 |
Iterator with All
fn is_valid_mu(&self, mu: &Array1<f64>) -> bool {
mu.iter().all(|&x| x > 0.0 && x.is_finite())
}
| Code |
Explanation |
mu.iter() |
Create an iterator over the array |
.all(\|&x\| ...) |
Returns true if predicate is true for ALL elements |
x > 0.0 && x.is_finite() |
Check positive AND finite (not NaN or infinity) |
Part 4: The Link Trait and LogLink
File: crates/rustystats-core/src/links/log.rs
use ndarray::Array1;
use super::Link;
#[derive(Debug, Clone, Copy)]
pub struct LogLink;
impl Link for LogLink {
fn name(&self) -> &str {
"log"
}
fn link(&self, mu: &Array1<f64>) -> Array1<f64> {
mu.mapv(|x| x.ln())
}
fn inverse(&self, eta: &Array1<f64>) -> Array1<f64> {
eta.mapv(|x| x.exp())
}
fn derivative(&self, mu: &Array1<f64>) -> Array1<f64> {
mu.mapv(|x| 1.0 / x)
}
}
| Code |
Explanation |
x.ln() |
Natural logarithm (method on f64) |
x.exp() |
Exponential function e^x |
1.0 / x |
Division. 1.0 is a float literal. |
Part 5: Error Handling
File: crates/rustystats-core/src/error.rs
Defining Error Types
use thiserror::Error;
#[derive(Error, Debug)]
pub enum RustyStatsError {
#[error("Dimension mismatch: {0}")]
DimensionMismatch(String),
#[error("Invalid value: {0}")]
InvalidValue(String),
#[error("Convergence failed: {0}")]
ConvergenceFailure(String),
}
| Code |
Explanation |
use thiserror::Error |
Import the Error derive macro from thiserror crate |
#[derive(Error, Debug)] |
Auto-implement std::error::Error and Debug traits |
pub enum RustyStatsError |
Define an enum (sum type) for all error variants |
#[error("...")] |
Attribute macro: defines the error message |
{0} |
Placeholder for first field in the variant |
DimensionMismatch(String) |
Variant with one String field |
The Result Type Alias
pub type Result<T> = std::result::Result<T, RustyStatsError>;
| Code |
Explanation |
pub type Result<T> |
Define a public type alias with generic parameter T |
= std::result::Result<T, RustyStatsError> |
Alias for Result with our error type |
Now instead of writing Result<IRLSResult, RustyStatsError>, we write Result<IRLSResult>.
Part 6: The IRLS Solver
File: crates/rustystats-core/src/solvers/irls.rs
Configuration Struct
#[derive(Debug, Clone)]
pub struct IRLSConfig {
pub max_iterations: usize,
pub tolerance: f64,
pub min_weight: f64,
pub verbose: bool,
}
| Code |
Explanation |
#[derive(Debug, Clone)] |
Auto-implement Debug and Clone |
pub struct IRLSConfig |
Public struct |
pub max_iterations: usize |
Public field of type usize (unsigned integer, pointer-sized) |
pub tolerance: f64 |
64-bit floating point |
pub verbose: bool |
Boolean |
Default Implementation
impl Default for IRLSConfig {
fn default() -> Self {
Self {
max_iterations: 25,
tolerance: 1e-8,
min_weight: 1e-10,
verbose: false,
}
}
}
| Code |
Explanation |
impl Default for IRLSConfig |
Implement the Default trait |
fn default() -> Self |
Returns an instance of Self (IRLSConfig) |
Self { ... } |
Struct literal. Self is an alias for the implementing type. |
1e-8 |
Scientific notation: 1 × 10⁻⁸ |
The Main Fitting Function
pub fn fit_glm(
y: &Array1<f64>,
x: &Array2<f64>,
family: &dyn Family,
link: &dyn Link,
config: &IRLSConfig,
) -> Result<IRLSResult> {
fit_glm_full(y, x, family, link, config, None, None)
}
| Code |
Explanation |
pub fn fit_glm(...) |
Public function |
y: &Array1<f64> |
Borrow a 1D array |
x: &Array2<f64> |
Borrow a 2D array (matrix) |
family: &dyn Family |
Borrow a trait object (any type implementing Family) |
-> Result<IRLSResult> |
Returns Ok(IRLSResult) on success, Err(...) on failure |
None, None |
Pass None for optional parameters |
pub fn fit_glm_full(
y: &Array1<f64>,
x: &Array2<f64>,
family: &dyn Family,
link: &dyn Link,
config: &IRLSConfig,
offset: Option<&Array1<f64>>,
weights: Option<&Array1<f64>>,
) -> Result<IRLSResult> {
let n = y.len();
let p = x.ncols();
if x.nrows() != n {
return Err(RustyStatsError::DimensionMismatch(format!(
"X has {} rows but y has {} elements",
x.nrows(),
n
)));
}
| Code |
Explanation |
offset: Option<&Array1<f64>> |
Optional borrowed array |
let n = y.len() |
Get array length |
let p = x.ncols() |
Get number of columns |
x.nrows() |
Get number of rows |
return Err(...) |
Early return with error |
format!("...", x.nrows(), n) |
String formatting macro (like Python f-strings) |
Handling Optional Parameters
let offset_vec = match offset {
Some(o) => {
if o.len() != n {
return Err(RustyStatsError::DimensionMismatch(...));
}
o.clone()
}
None => Array1::zeros(n),
};
| Code |
Explanation |
match offset { ... } |
Pattern match on the Option |
Some(o) => { ... } |
If Some, bind inner value to o, execute block |
o.clone() |
Clone the borrowed array to get an owned copy |
None => Array1::zeros(n) |
If None, create zero array of length n |
The IRLS Loop
let mut converged = false;
let mut iteration = 0;
while iteration < config.max_iterations {
iteration += 1;
// Compute variance and link derivative
let variance = family.variance(&mu);
let link_deriv = link.derivative(&mu);
| Code |
Explanation |
let mut converged = false |
Mutable boolean, initially false |
while condition { ... } |
While loop |
iteration += 1 |
Increment (shorthand for iteration = iteration + 1) |
family.variance(&mu) |
Call trait method, passing borrow of mu |
Parallel Iteration with Rayon
let results: Vec<(f64, f64, f64)> = (0..n)
.into_par_iter()
.map(|i| {
let v = variance[i];
let d = link_deriv[i];
let iw = (1.0 / (v * d * d)).max(min_weight).min(1e10);
let cw = prior_weights_vec[i] * iw;
let wr = (eta[i] - offset_vec[i]) + (y[i] - mu[i]) * d;
(iw, cw, wr)
})
.collect();
| Code |
Explanation |
Vec<(f64, f64, f64)> |
Vector of tuples, each containing 3 floats |
(0..n) |
Range from 0 to n-1 |
.into_par_iter() |
Convert to parallel iterator (Rayon) |
.map(\|i\| { ... }) |
Transform each index in parallel |
variance[i] |
Index into array |
(1.0 / (v * d * d)) |
Arithmetic expression |
.max(min_weight) |
Take maximum of value and min_weight |
.min(1e10) |
Take minimum (clamp to upper bound) |
(iw, cw, wr) |
Return a tuple |
.collect() |
Gather parallel results into Vec |
Building Arrays from Vectors
let mut irls_weights_vec = Vec::with_capacity(n);
let mut combined_weights_vec = Vec::with_capacity(n);
let mut working_response_vec = Vec::with_capacity(n);
for (iw, cw, wr) in results {
irls_weights_vec.push(iw);
combined_weights_vec.push(cw);
working_response_vec.push(wr);
}
let irls_weights = Array1::from_vec(irls_weights_vec);
| Code |
Explanation |
Vec::with_capacity(n) |
Create Vec with pre-allocated capacity (performance) |
for (iw, cw, wr) in results |
Destructure tuple in for loop |
.push(iw) |
Append to vector |
Array1::from_vec(...) |
Convert Vec to ndarray Array1 |
Matrix Operations
let eta_base = x.dot(&new_coefficients);
eta = &eta_base + &offset_vec;
mu = link.inverse(&eta);
| Code |
Explanation |
x.dot(&new_coefficients) |
Matrix-vector multiplication |
&eta_base + &offset_vec |
Element-wise addition of array references |
link.inverse(&eta) |
Apply inverse link function |
Step-Halving for Stability
When deviance increases (step too aggressive), we halve the step size:
// Step-halving: if deviance increased, try smaller steps
if deviance_new > deviance_old * 1.0001 && iteration > 1 {
let mut step_size = 0.5;
for _half_step in 0..4 {
// Blend: eta = (1-step)*old + step*new
let eta_blend: Array1<f64> = eta_old.iter()
.zip(eta_new.iter())
.map(|(&old, &new)| (1.0 - step_size) * old + step_size * new)
.collect();
let mu_blend = link.inverse(&eta_blend);
let dev_blend = family.deviance(y, &mu_blend, weights);
if dev_blend <= deviance_old * 1.0001 {
break; // Accept this step
}
step_size *= 0.5;
}
}
| Code |
Explanation |
deviance_new > deviance_old * 1.0001 |
Check if deviance increased by >0.01% |
step_size *= 0.5 |
Halve the step size |
(1.0 - step_size) * old + step_size * new |
Linear interpolation between old and new |
Convergence Check
let rel_change = if deviance_old.abs() > 1e-10 {
(deviance_old - deviance).abs() / deviance_old.abs()
} else {
(deviance_old - deviance).abs()
};
if rel_change < config.tolerance {
converged = true;
break;
}
| Code |
Explanation |
if ... { ... } else { ... } |
If-else expression (returns value) |
.abs() |
Absolute value |
break |
Exit the loop |
Returning the Result
Ok(IRLSResult {
coefficients: final_coefficients,
fitted_values: mu,
linear_predictor: eta,
deviance,
iterations: iteration,
converged,
covariance_unscaled: cov_unscaled,
irls_weights: final_weights,
prior_weights: prior_weights_vec,
offset: offset_vec,
y: y.to_owned(),
family_name: family.name().to_string(),
penalty: Penalty::None,
design_matrix: None,
})
}
| Code |
Explanation |
Ok(IRLSResult { ... }) |
Return success with struct |
coefficients: final_coefficients |
Field name: value |
deviance, |
Shorthand when field name equals variable name |
y.to_owned() |
Convert borrowed slice to owned data |
.to_string() |
Convert &str to owned String |
Penalty::None |
Enum variant |
None |
Option::None |
Part 7: Tests
File: crates/rustystats-core/src/families/poisson.rs
#[cfg(test)]
mod tests {
use super::*;
use ndarray::array;
use approx::assert_abs_diff_eq;
#[test]
fn test_poisson_variance() {
let family = PoissonFamily;
let mu = array![0.5, 1.0, 2.0, 10.0];
let var = family.variance(&mu);
assert_abs_diff_eq!(var, mu, epsilon = 1e-10);
}
| Code |
Explanation |
#[cfg(test)] |
Conditional compilation: only compile when testing |
mod tests |
Nested module for tests |
use super::* |
Import everything from parent module |
use ndarray::array |
Import the array! macro |
#[test] |
Mark function as a test |
fn test_poisson_variance() |
Test function (no parameters, no return) |
array![0.5, 1.0, 2.0, 10.0] |
Macro to create array literal |
assert_abs_diff_eq!(var, mu, epsilon = 1e-10) |
Assert arrays are approximately equal |
Part 8: Coordinate Descent Specifics
File: crates/rustystats-core/src/solvers/coordinate_descent.rs
Soft Thresholding
pub fn soft_threshold(z: f64, gamma: f64) -> f64 {
if z > gamma {
z - gamma
} else if z < -gamma {
z + gamma
} else {
0.0
}
}
| Code |
Explanation |
pub fn soft_threshold(z: f64, gamma: f64) -> f64 |
Public function taking two floats, returning float |
if ... else if ... else |
Chained conditionals |
0.0 |
Float literal (no semicolon = return value) |
Parallel Fold-Reduce for Gram Matrix
let xwx: Vec<f64> = (0..n)
.into_par_iter()
.fold(
|| vec![0.0; p * p],
|mut acc, i| {
let w_i = combined_weights[i];
let x_i = x.row(i);
for j in 0..p {
let xij_w = x_i[j] * w_i;
for k in j..p {
acc[j * p + k] += xij_w * x_i[k];
}
}
acc
},
)
.reduce(
|| vec![0.0; p * p],
|mut a, b| {
for i in 0..a.len() {
a[i] += b[i];
}
a
},
);
| Code |
Explanation |
.fold(init, f) |
Parallel fold with initial value factory and accumulator function |
\|\| vec![0.0; p * p] |
Closure returning new zero vector (called per thread) |
vec![0.0; p * p] |
Vec macro: create vector of p*p zeros |
\|mut acc, i\| |
Closure taking mutable accumulator and index |
x.row(i) |
Get row i of matrix as array view |
acc[j * p + k] += ... |
Accumulate into flattened 2D index |
.reduce(init, combine) |
Combine thread-local results |
\|mut a, b\| |
Combine two accumulators |
Part 9: Common Patterns Summary
Pattern: Method Chaining
mu.iter().all(|&x| x > 0.0 && x.is_finite())
Read left-to-right: take mu, create iterator, check if all elements satisfy condition.
Pattern: Error Propagation with ?
let result = some_fallible_operation()?;
If some_fallible_operation() returns Err, immediately return that error. Otherwise, unwrap the Ok value.
Pattern: Builder-style APIs
let config = IRLSConfig {
max_iterations: 50,
..Default::default() // Fill remaining fields with defaults
};
Pattern: Iterating and Collecting
let squares: Vec<f64> = (0..10).map(|x| (x * x) as f64).collect();
Pattern: Parallel Processing with Rayon
use rayon::prelude::*;
let results: Vec<_> = data.par_iter().map(|x| expensive_operation(x)).collect();
Just change .iter() to .par_iter() for parallelism.
Quick Reference: Common Methods
| Method |
On Type |
Returns |
Description |
.len() |
Array/Vec |
usize |
Number of elements |
.nrows() |
Array2 |
usize |
Number of rows |
.ncols() |
Array2 |
usize |
Number of columns |
.clone() |
Any Clone |
Owned copy |
Deep copy |
.iter() |
Collection |
Iterator |
Iterate by reference |
.into_iter() |
Collection |
Iterator |
Iterate by value (consumes) |
.map(f) |
Iterator |
Iterator |
Transform elements |
.filter(f) |
Iterator |
Iterator |
Keep matching elements |
.collect() |
Iterator |
Collection |
Gather into collection |
.sum() |
Iterator |
Number |
Sum all elements |
.all(f) |
Iterator |
bool |
True if all match |
.any(f) |
Iterator |
bool |
True if any match |
.zip(other) |
Iterator |
Iterator |
Pair with another iterator |
.mapv(f) |
ndarray |
Array |
Map values |
.dot(&other) |
Array |
Array/scalar |
Matrix multiplication |
.abs() |
Number |
Number |
Absolute value |
.ln() |
f64 |
f64 |
Natural log |
.exp() |
f64 |
f64 |
Exponential |
.max(other) |
f64 |
f64 |
Maximum |
.min(other) |
f64 |
f64 |
Minimum |
.is_finite() |
f64 |
bool |
Not NaN or infinity |
Next Steps