Skip to content

econometron.estimation.regression

  • ols_estimator Function

Overview

The ols_estimator function implements Ordinary Least Squares (OLS) regression for estimating parameters of a linear model. It supports multivariate regression, allowing multiple dependent variables to be regressed on a set of independent variables.

The function computes:

  • Parameter estimates (beta)
  • Fitted values (fitted)
  • Residuals (resid)
  • Diagnostic statistics (res) including R-squared, standard errors, z-values, p-values, and log-likelihood.

It automatically handles intercept inclusion, works with both NumPy arrays and pandas DataFrames, and is suitable for econometric and statistical applications.

Linear Regression Model

The OLS estimator fits a model:

Where:

  • : Dependent variable matrix (T × K, T = observations, K = dependent variables)
  • : Independent variable matrix (T × M, M = regressors including intercept if added)
  • : Regression coefficients (M × K)
  • : Normally distributed error term with covariance (K × K)

OLS minimizes the sum of squared residuals:

The solution is:


Function Definition

python
from econometron.regression import ols_estimator

beta, fitted, resid, res = ols_estimator(X, Y, add_intercept=None, tol=1e-6)

Parameters

NameTypeDescriptionDefault
Xnp.ndarray or pd.DataFrameIndependent variables (T × M)None
Ynp.ndarray or pd.DataFrameDependent variables (T × K)None
add_interceptbool or NoneIf True, adds intercept. If None, adds if X is not mean-centered.None
tolfloatTolerance for checking mean-centering (used if add_intercept=None)1e-6

Returns

  • beta (np.ndarray): Estimated coefficients (M × K)

  • fitted (np.ndarray): Fitted values (T × K)

  • resid (np.ndarray): Residuals (T × K)

  • res (dict): Diagnostics

    • resid: Residuals
    • se: Standard errors of coefficients (M × K)
    • z_values: Z-statistics (M × K)
    • p_values: P-values (M × K)
    • R2: Overall R-squared
    • R2_per_var: R-squared per dependent variable (K)
    • log_likelihood: Model log-likelihood

Function Details

Purpose: Performs OLS regression, returning coefficients and diagnostics for model evaluation. Handles numerical issues robustly using np.linalg.lstsq and pinv for singular matrices.

Key Steps:

  1. Input Validation

    • Checks that X and Y are not empty.
    • Confirms matching observation counts (T).
    • Converts DataFrames to NumPy arrays.
  2. Intercept Handling

    • add_intercept=None: Checks column means against tol to decide on intercept.
    • add_intercept=True: Adds a column of ones.
    • add_intercept=False: Uses X as provided.
  3. OLS Estimation

    • Computes using np.linalg.lstsq.
    • Calculates fitted values and residuals.
  4. Diagnostics

    • Residual Sum of Squares (RSS) and Total Sum of Squares (TSS)
    • Overall and per-variable R-squared
    • Error variance per variable
    • Standard errors, z-values, p-values (t-distribution for small T, normal otherwise)
    • Log-likelihood assuming multivariate normal errors (adds for stability)

Usage Example

python
import numpy as np
import pandas as pd
from econometron.regression import ols_estimator

# Synthetic data
np.random.seed(42)
T, M, K = 100, 2, 3
X = np.random.randn(T, M)
true_beta = np.array([[1, 0.5, -0.2], [0.3, -0.7, 1.0]])
Y = X @ true_beta + np.random.randn(T, K) * 0.1

# Run OLS
beta, fitted, resid, res = ols_estimator(X, Y, add_intercept=True)

print("Estimated Coefficients:\n", beta)
print("R-squared (overall):", res['R2'])
print("R-squared (per variable):", res['R2_per_var'])
print("Standard Errors:\n", res['se'])
print("P-values:\n", res['p_values'])
print("Log-Likelihood:", res['log_likelihood'])

# Using pandas DataFrame
X_df = pd.DataFrame(X, columns=['x1', 'x2'])
Y_df = pd.DataFrame(Y, columns=['y1', 'y2', 'y3'])
beta_df, fitted_df, resid_df, res_df = ols_estimator(X_df, Y_df, add_intercept=None)
print("DataFrame Results - R-squared:", res_df['R2'])

Notes

  • Intercept Handling: Automatic if add_intercept=None, otherwise user-controlled.
  • Numerical Stability: Uses np.linalg.lstsq and pinv for singular matrices.
  • Diagnostics: Comprehensive statistics for hypothesis testing.
  • Flexibility: Supports NumPy arrays and pandas DataFrames.
  • P-values: Uses t-distribution for small samples; normal distribution otherwise.