RustyStats Code Walkthrough¶

This guide walks through the actual RustyStats source code, explaining every Rust syntax element. Each section takes a real function from the codebase and annotates it line-by-line.

Goal: After reading this, you should be able to read and modify any function in the codebase.

Part 1: Understanding Rust Syntax Basics¶

Before diving into the code, here's a quick reference for Rust syntax you'll encounter:

Syntax	Meaning	Example
`fn name() -> T`	Function returning type `T`	`fn variance() -> Array1<f64>`
`&`	Immutable reference (borrow)	`&self`, `&Array1<f64>`
`&mut`	Mutable reference	`&mut self`
`pub`	Public (visible outside module)	`pub fn fit_glm()`
`let`	Variable binding (immutable)	`let x = 5;`
`let mut`	Mutable variable	`let mut count = 0;`
`::`	Path separator	`Array1::zeros(n)`
`<T>`	Generic type parameter	`Vec<f64>`
`impl`	Implementation block	`impl Family for PoissonFamily`
`self`	Current instance	`self.name()`
`?`	Propagate error if Result is Err	`result?`
`\\|x\\| expr`	Closure (anonymous function)	`\\|x\\| x * 2`
`.iter()`	Create iterator	`vec.iter()`
`.map()`	Transform each element	`.map(\\|x\\| x + 1)`
`.collect()`	Gather iterator into collection	`.collect::<Vec<_>>()`

Part 2: The Family Trait¶

File: crates/rustystats-core/src/families/mod.rs

A trait defines shared behavior. Every distribution family implements this trait.

/// The Family trait defines the interface for all distribution families.
pub trait Family: Send + Sync {

Line-by-line breakdown:

Code	Explanation
`///`	Documentation comment (appears in generated docs)
`pub`	This trait is public (can be used outside this module)
`trait Family`	Declares a trait named `Family`
`: Send + Sync`	Trait bounds: Types implementing `Family` must also implement `Send` and `Sync` (required for thread safety with Rayon)
`{`	Start of trait definition

Trait Methods¶

    /// Returns the name of this family.
    fn name(&self) -> &str;

Code	Explanation
`fn name`	Method named `name`
`(&self)`	Takes an immutable reference to self (the instance)
`-> &str`	Returns a string slice (borrowed string)
`;`	No body—this is a required method that implementors must define

    /// Compute the variance function V(μ).
    fn variance(&self, mu: &Array1<f64>) -> Array1<f64>;

Code	Explanation
`mu: &Array1<f64>`	Parameter `mu` is a reference to a 1D array of `f64` (64-bit floats)
`-> Array1<f64>`	Returns an owned array (not a reference)

    /// Compute the total deviance with optional weights.
    fn deviance(&self, y: &Array1<f64>, mu: &Array1<f64>, weights: Option<&Array1<f64>>) -> f64 {
        let unit_dev = self.unit_deviance(y, mu);
        match weights {
            Some(w) => (&unit_dev * w).sum(),
            None => unit_dev.sum(),
        }
    }

Code	Explanation
`weights: Option<&Array1<f64>>`	`Option` type: either `Some(value)` or `None`. This makes the parameter optional.
`-> f64`	Returns a single float
`{ ... }`	Has a body—this is a default implementation (can be overridden)
`let unit_dev = ...`	Bind result to immutable variable
`self.unit_deviance(y, mu)`	Call another method on self
`match weights { ... }`	Pattern matching on the Option
`Some(w) =>`	If weights is Some, bind the inner value to `w`
`(&unit_dev * w).sum()`	Element-wise multiply arrays, then sum
`None =>`	If weights is None
`unit_dev.sum()`	Just sum without weighting

Part 3: Implementing a Family (PoissonFamily)¶

File: crates/rustystats-core/src/families/poisson.rs

The Struct Definition¶

use ndarray::Array1;
use crate::links::{Link, LogLink};
use super::Family;

#[derive(Debug, Clone, Copy)]
pub struct PoissonFamily;

Code	Explanation
`use ndarray::Array1`	Import `Array1` from the `ndarray` crate
`use crate::links::{Link, LogLink}`	Import from our own crate's `links` module
`use super::Family`	Import `Family` from parent module (`super` = parent)
`#[derive(...)]`	Derive macro: auto-generate trait implementations
`Debug`	Enables `{:?}` formatting for debugging
`Clone`	Enables `.clone()` method
`Copy`	Type can be copied implicitly (no move semantics)
`pub struct PoissonFamily;`	Empty struct (no fields)—a "unit struct"

The Implementation Block¶

impl Family for PoissonFamily {
    fn name(&self) -> &str {
        "Poisson"
    }

Code	Explanation
`impl Family for PoissonFamily`	"Implement the Family trait for PoissonFamily"
`fn name(&self) -> &str`	Must match trait signature exactly
`"Poisson"`	String literal; last expression without `;` is the return value

The Variance Function¶

    fn variance(&self, mu: &Array1<f64>) -> Array1<f64> {
        mu.clone()
    }

Code	Explanation
`mu.clone()`	Create a copy of the array. For Poisson, V(μ) = μ, so we return a copy of mu.

The Unit Deviance Function¶

    fn unit_deviance(&self, y: &Array1<f64>, mu: &Array1<f64>) -> Array1<f64> {
        ndarray::Zip::from(y)
            .and(mu)
            .map_collect(|&yi, &mui| {
                if yi == 0.0 {
                    2.0 * mui
                } else {
                    2.0 * (yi * (yi / mui).ln() - (yi - mui))
                }
            })
    }

Code	Explanation
`ndarray::Zip::from(y)`	Create a parallel zipper starting with array `y`
`.and(mu)`	Add array `mu` to the zip
`.map_collect(\\|&yi, &mui\\| { ... })`	Apply closure to each pair, collect into new array
`\\|&yi, &mui\\|`	Closure parameters. `&` pattern-matches the reference to get the value.
`if yi == 0.0 { ... } else { ... }`	Conditional expression (returns a value)
`(yi / mui).ln()`	Divide, then natural log. Method chaining.

Returning a Boxed Trait Object¶

    fn default_link(&self) -> Box<dyn Link> {
        Box::new(LogLink)
    }

Code	Explanation
`Box<dyn Link>`	A heap-allocated trait object. `dyn Link` means "some type implementing Link"
`Box::new(LogLink)`	Allocate `LogLink` on the heap and return a pointer

Why Box? The trait method must return a consistent type, but different families return different link types. Box<dyn Link> erases the concrete type, allowing runtime polymorphism.

Array Mapping¶

    fn initialize_mu(&self, y: &Array1<f64>) -> Array1<f64> {
        y.mapv(|yi| (yi + 0.1).max(0.1))
    }

Code	Explanation
`y.mapv(\\|yi\\| ...)`	Map over values. `mapv` = map values (as opposed to `map` which maps references)
`(yi + 0.1).max(0.1)`	Add 0.1, then take max with 0.1. Ensures result ≥ 0.1

Iterator with All¶

    fn is_valid_mu(&self, mu: &Array1<f64>) -> bool {
        mu.iter().all(|&x| x > 0.0 && x.is_finite())
    }

Code	Explanation
`mu.iter()`	Create an iterator over the array
`.all(\\|&x\\| ...)`	Returns `true` if predicate is true for ALL elements
`x > 0.0 && x.is_finite()`	Check positive AND finite (not NaN or infinity)

Part 4: The Link Trait and LogLink¶

File: crates/rustystats-core/src/links/log.rs

use ndarray::Array1;
use super::Link;

#[derive(Debug, Clone, Copy)]
pub struct LogLink;

impl Link for LogLink {
    fn name(&self) -> &str {
        "log"
    }

    fn link(&self, mu: &Array1<f64>) -> Array1<f64> {
        mu.mapv(|x| x.ln())
    }

    fn inverse(&self, eta: &Array1<f64>) -> Array1<f64> {
        eta.mapv(|x| x.exp())
    }

    fn derivative(&self, mu: &Array1<f64>) -> Array1<f64> {
        mu.mapv(|x| 1.0 / x)
    }
}

Code	Explanation
`x.ln()`	Natural logarithm (method on `f64`)
`x.exp()`	Exponential function e^x
`1.0 / x`	Division. `1.0` is a float literal.

Part 5: Error Handling¶

File: crates/rustystats-core/src/error.rs

Defining Error Types¶

use thiserror::Error;

#[derive(Error, Debug)]
pub enum RustyStatsError {
    #[error("Dimension mismatch: {0}")]
    DimensionMismatch(String),

    #[error("Invalid value: {0}")]
    InvalidValue(String),

    #[error("Convergence failed: {0}")]
    ConvergenceFailure(String),
}

Code	Explanation
`use thiserror::Error`	Import the `Error` derive macro from `thiserror` crate
`#[derive(Error, Debug)]`	Auto-implement `std::error::Error` and `Debug` traits
`pub enum RustyStatsError`	Define an enum (sum type) for all error variants
`#[error("...")]`	Attribute macro: defines the error message
`{0}`	Placeholder for first field in the variant
`DimensionMismatch(String)`	Variant with one `String` field

The Result Type Alias¶

pub type Result<T> = std::result::Result<T, RustyStatsError>;

Code	Explanation
`pub type Result<T>`	Define a public type alias with generic parameter `T`
`= std::result::Result<T, RustyStatsError>`	Alias for `Result` with our error type

Now instead of writing Result<IRLSResult, RustyStatsError>, we write Result<IRLSResult>.

Part 6: The IRLS Solver¶

File: crates/rustystats-core/src/solvers/irls.rs

Configuration Struct¶

#[derive(Debug, Clone)]
pub struct IRLSConfig {
    pub max_iterations: usize,
    pub tolerance: f64,
    pub min_weight: f64,
    pub verbose: bool,
}

Code	Explanation
`#[derive(Debug, Clone)]`	Auto-implement Debug and Clone
`pub struct IRLSConfig`	Public struct
`pub max_iterations: usize`	Public field of type `usize` (unsigned integer, pointer-sized)
`pub tolerance: f64`	64-bit floating point
`pub verbose: bool`	Boolean

Default Implementation¶

impl Default for IRLSConfig {
    fn default() -> Self {
        Self {
            max_iterations: 25,
            tolerance: 1e-8,
            min_weight: 1e-10,
            verbose: false,
        }
    }
}

Code	Explanation
`impl Default for IRLSConfig`	Implement the `Default` trait
`fn default() -> Self`	Returns an instance of Self (IRLSConfig)
`Self { ... }`	Struct literal. `Self` is an alias for the implementing type.
`1e-8`	Scientific notation: 1 × 10⁻⁸

The Main Fitting Function¶

pub fn fit_glm(
    y: &Array1<f64>,
    x: &Array2<f64>,
    family: &dyn Family,
    link: &dyn Link,
    config: &IRLSConfig,
) -> Result<IRLSResult> {
    fit_glm_full(y, x, family, link, config, None, None)
}

Code	Explanation
`pub fn fit_glm(...)`	Public function
`y: &Array1<f64>`	Borrow a 1D array
`x: &Array2<f64>`	Borrow a 2D array (matrix)
`family: &dyn Family`	Borrow a trait object (any type implementing Family)
`-> Result<IRLSResult>`	Returns `Ok(IRLSResult)` on success, `Err(...)` on failure
`None, None`	Pass `None` for optional parameters

Input Validation¶

pub fn fit_glm_full(
    y: &Array1<f64>,
    x: &Array2<f64>,
    family: &dyn Family,
    link: &dyn Link,
    config: &IRLSConfig,
    offset: Option<&Array1<f64>>,
    weights: Option<&Array1<f64>>,
) -> Result<IRLSResult> {
    let n = y.len();
    let p = x.ncols();

    if x.nrows() != n {
        return Err(RustyStatsError::DimensionMismatch(format!(
            "X has {} rows but y has {} elements",
            x.nrows(),
            n
        )));
    }

Code	Explanation
`offset: Option<&Array1<f64>>`	Optional borrowed array
`let n = y.len()`	Get array length
`let p = x.ncols()`	Get number of columns
`x.nrows()`	Get number of rows
`return Err(...)`	Early return with error
`format!("...", x.nrows(), n)`	String formatting macro (like Python f-strings)

Handling Optional Parameters¶

    let offset_vec = match offset {
        Some(o) => {
            if o.len() != n {
                return Err(RustyStatsError::DimensionMismatch(...));
            }
            o.clone()
        }
        None => Array1::zeros(n),
    };

Code	Explanation
`match offset { ... }`	Pattern match on the Option
`Some(o) => { ... }`	If Some, bind inner value to `o`, execute block
`o.clone()`	Clone the borrowed array to get an owned copy
`None => Array1::zeros(n)`	If None, create zero array of length n

The IRLS Loop¶

    let mut converged = false;
    let mut iteration = 0;

    while iteration < config.max_iterations {
        iteration += 1;

        // Compute variance and link derivative
        let variance = family.variance(&mu);
        let link_deriv = link.derivative(&mu);

Code	Explanation
`let mut converged = false`	Mutable boolean, initially false
`while condition { ... }`	While loop
`iteration += 1`	Increment (shorthand for `iteration = iteration + 1`)
`family.variance(&mu)`	Call trait method, passing borrow of mu

Parallel Iteration with Rayon¶

        let results: Vec<(f64, f64, f64)> = (0..n)
            .into_par_iter()
            .map(|i| {
                let v = variance[i];
                let d = link_deriv[i];
                let iw = (1.0 / (v * d * d)).max(min_weight).min(1e10);
                let cw = prior_weights_vec[i] * iw;
                let wr = (eta[i] - offset_vec[i]) + (y[i] - mu[i]) * d;
                (iw, cw, wr)
            })
            .collect();

Code	Explanation
`Vec<(f64, f64, f64)>`	Vector of tuples, each containing 3 floats
`(0..n)`	Range from 0 to n-1
`.into_par_iter()`	Convert to parallel iterator (Rayon)
`.map(\\|i\\| { ... })`	Transform each index in parallel
`variance[i]`	Index into array
`(1.0 / (v * d * d))`	Arithmetic expression
`.max(min_weight)`	Take maximum of value and min_weight
`.min(1e10)`	Take minimum (clamp to upper bound)
`(iw, cw, wr)`	Return a tuple
`.collect()`	Gather parallel results into Vec

Building Arrays from Vectors¶

        let mut irls_weights_vec = Vec::with_capacity(n);
        let mut combined_weights_vec = Vec::with_capacity(n);
        let mut working_response_vec = Vec::with_capacity(n);

        for (iw, cw, wr) in results {
            irls_weights_vec.push(iw);
            combined_weights_vec.push(cw);
            working_response_vec.push(wr);
        }

        let irls_weights = Array1::from_vec(irls_weights_vec);

Code	Explanation
`Vec::with_capacity(n)`	Create Vec with pre-allocated capacity (performance)
`for (iw, cw, wr) in results`	Destructure tuple in for loop
`.push(iw)`	Append to vector
`Array1::from_vec(...)`	Convert Vec to ndarray Array1

Matrix Operations¶

        let eta_base = x.dot(&new_coefficients);
        eta = &eta_base + &offset_vec;
        mu = link.inverse(&eta);

Code	Explanation
`x.dot(&new_coefficients)`	Matrix-vector multiplication
`&eta_base + &offset_vec`	Element-wise addition of array references
`link.inverse(&eta)`	Apply inverse link function

Step-Halving for Stability¶

When deviance increases (step too aggressive), we halve the step size:

        // Step-halving: if deviance increased, try smaller steps
        if deviance_new > deviance_old * 1.0001 && iteration > 1 {
            let mut step_size = 0.5;
            for _half_step in 0..4 {
                // Blend: eta = (1-step)*old + step*new
                let eta_blend: Array1<f64> = eta_old.iter()
                    .zip(eta_new.iter())
                    .map(|(&old, &new)| (1.0 - step_size) * old + step_size * new)
                    .collect();

                let mu_blend = link.inverse(&eta_blend);
                let dev_blend = family.deviance(y, &mu_blend, weights);

                if dev_blend <= deviance_old * 1.0001 {
                    break;  // Accept this step
                }
                step_size *= 0.5;
            }
        }

Code	Explanation
`deviance_new > deviance_old * 1.0001`	Check if deviance increased by >0.01%
`step_size *= 0.5`	Halve the step size
`(1.0 - step_size) * old + step_size * new`	Linear interpolation between old and new

Convergence Check¶

        let rel_change = if deviance_old.abs() > 1e-10 {
            (deviance_old - deviance).abs() / deviance_old.abs()
        } else {
            (deviance_old - deviance).abs()
        };

        if rel_change < config.tolerance {
            converged = true;
            break;
        }

Code	Explanation
`if ... { ... } else { ... }`	If-else expression (returns value)
`.abs()`	Absolute value
`break`	Exit the loop

Returning the Result¶

    Ok(IRLSResult {
        coefficients: final_coefficients,
        fitted_values: mu,
        linear_predictor: eta,
        deviance,
        iterations: iteration,
        converged,
        covariance_unscaled: cov_unscaled,
        irls_weights: final_weights,
        prior_weights: prior_weights_vec,
        offset: offset_vec,
        y: y.to_owned(),
        family_name: family.name().to_string(),
        penalty: Penalty::None,
        design_matrix: None,
    })
}

Code	Explanation
`Ok(IRLSResult { ... })`	Return success with struct
`coefficients: final_coefficients`	Field name: value
`deviance,`	Shorthand when field name equals variable name
`y.to_owned()`	Convert borrowed slice to owned data
`.to_string()`	Convert &str to owned String
`Penalty::None`	Enum variant
`None`	Option::None

Part 7: Tests¶