econometron.estimation.regression
ols_estimatorFunction
Overview
The ols_estimator function implements Ordinary Least Squares (OLS) regression for estimating parameters of a linear model. It supports multivariate regression, allowing multiple dependent variables to be regressed on a set of independent variables.
The function computes:
- Parameter estimates (
beta) - Fitted values (
fitted) - Residuals (
resid) - Diagnostic statistics (
res) including R-squared, standard errors, z-values, p-values, and log-likelihood.
It automatically handles intercept inclusion, works with both NumPy arrays and pandas DataFrames, and is suitable for econometric and statistical applications.
Linear Regression Model
The OLS estimator fits a model:
Where:
- : Dependent variable matrix (T × K, T = observations, K = dependent variables)
- : Independent variable matrix (T × M, M = regressors including intercept if added)
- : Regression coefficients (M × K)
- : Normally distributed error term with covariance (K × K)
OLS minimizes the sum of squared residuals:
The solution is:
Function Definition
from econometron.regression import ols_estimator
beta, fitted, resid, res = ols_estimator(X, Y, add_intercept=None, tol=1e-6)Parameters
| Name | Type | Description | Default |
|---|---|---|---|
X | np.ndarray or pd.DataFrame | Independent variables (T × M) | None |
Y | np.ndarray or pd.DataFrame | Dependent variables (T × K) | None |
add_intercept | bool or None | If True, adds intercept. If None, adds if X is not mean-centered. | None |
tol | float | Tolerance for checking mean-centering (used if add_intercept=None) | 1e-6 |
Returns
beta(np.ndarray): Estimated coefficients (M × K)fitted(np.ndarray): Fitted values (T × K)resid(np.ndarray): Residuals (T × K)res(dict): Diagnosticsresid: Residualsse: Standard errors of coefficients (M × K)z_values: Z-statistics (M × K)p_values: P-values (M × K)R2: Overall R-squaredR2_per_var: R-squared per dependent variable (K)log_likelihood: Model log-likelihood
Function Details
Purpose: Performs OLS regression, returning coefficients and diagnostics for model evaluation. Handles numerical issues robustly using np.linalg.lstsq and pinv for singular matrices.
Key Steps:
Input Validation
- Checks that X and Y are not empty.
- Confirms matching observation counts (T).
- Converts DataFrames to NumPy arrays.
Intercept Handling
add_intercept=None: Checks column means againsttolto decide on intercept.add_intercept=True: Adds a column of ones.add_intercept=False: Uses X as provided.
OLS Estimation
- Computes using
np.linalg.lstsq. - Calculates fitted values and residuals.
- Computes using
Diagnostics
- Residual Sum of Squares (RSS) and Total Sum of Squares (TSS)
- Overall and per-variable R-squared
- Error variance per variable
- Standard errors, z-values, p-values (t-distribution for small T, normal otherwise)
- Log-likelihood assuming multivariate normal errors (adds for stability)
Usage Example
import numpy as np
import pandas as pd
from econometron.regression import ols_estimator
# Synthetic data
np.random.seed(42)
T, M, K = 100, 2, 3
X = np.random.randn(T, M)
true_beta = np.array([[1, 0.5, -0.2], [0.3, -0.7, 1.0]])
Y = X @ true_beta + np.random.randn(T, K) * 0.1
# Run OLS
beta, fitted, resid, res = ols_estimator(X, Y, add_intercept=True)
print("Estimated Coefficients:\n", beta)
print("R-squared (overall):", res['R2'])
print("R-squared (per variable):", res['R2_per_var'])
print("Standard Errors:\n", res['se'])
print("P-values:\n", res['p_values'])
print("Log-Likelihood:", res['log_likelihood'])
# Using pandas DataFrame
X_df = pd.DataFrame(X, columns=['x1', 'x2'])
Y_df = pd.DataFrame(Y, columns=['y1', 'y2', 'y3'])
beta_df, fitted_df, resid_df, res_df = ols_estimator(X_df, Y_df, add_intercept=None)
print("DataFrame Results - R-squared:", res_df['R2'])Notes
- Intercept Handling: Automatic if
add_intercept=None, otherwise user-controlled. - Numerical Stability: Uses
np.linalg.lstsqandpinvfor singular matrices. - Diagnostics: Comprehensive statistics for hypothesis testing.
- Flexibility: Supports NumPy arrays and pandas DataFrames.
- P-values: Uses t-distribution for small samples; normal distribution otherwise.
