econometron.utils.data_preparation.process_timeseries
TransformTSclass fromeconometron.utils.data_preparation.process_timeseries
Overview
TransformTS is a comprehensive time series transformation and analysis utility. It provides methods for:
- Transforming series to achieve stationarity via differencing, log-differencing, Box-Cox, or Hodrick-Prescott filtering.
- Performing stationarity checks using the Augmented Dickey-Fuller (ADF) test.
- Applying inverse transformations to recover original scale data.
- Conducting exploratory analysis, including summary statistics, correlation matrices, and optional ACF/PACF plots.
This class is particularly useful for preprocessing time series data before modeling, such as ARIMA, VAR, or state-space-based models, ensuring data is suitable for estimation or filtering.
Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
data | Union[pd.DataFrame, pd.Series] | Input time series data. | — |
columns | Optional[List[str]] | List of columns to transform. If None, all numeric columns are selected. | None |
method | str | Transformation method: 'diff', 'boxcox', 'log', 'log-diff', 'hp', 'inverse'. | 'diff' |
demean | bool | If True, remove mean before transformation. | True |
analysis | bool | If True, perform time series analysis (ADF test, correlation, summary). | True |
plot | bool | If True, generate diagnostic plots (time series, ACF, PACF). | False |
lamb | float | Lambda parameter for Hodrick-Prescott filter. | 1600 |
log_data | bool | If True, apply log transformation for 'log' or 'log-diff' methods when data is not already in log form. | True |
max_diff | int | Maximum differencing order before switching to log-diff for non-stationary series. | 2 |
Methods
_validate_inputs()
Validates the input data and parameters, checks for numeric columns, ensures method validity, and warns if NaNs are present.
_check_stationarity(series, col) -> bool
Performs ADF test to check if a series is stationary. Returns True if p-value < 0.05.
_check_stationarity_all()
Checks stationarity of all selected columns and stores results in self.stationary_status.
_check_if_log(series) -> bool
Heuristically determines if a series is likely in log form (positive values and reasonable range).
_make_stationary(series, col) -> pd.Series
Applies differencing until stationary, or switches to log-diff if over-differencing occurs.
transform() -> pd.DataFrame
Applies the specified transformation method to all selected columns:
'diff': Differencing until stationarity.'boxcox': Box-Cox transformation (requires positive values).'log': Log transformation.'log-diff': Log followed by differencing.'hp': Hodrick-Prescott filter (extracts cyclical component).'inverse': Reverts transformed series to original scale.
Returns the transformed DataFrame.
_inverse_transform(series, col) -> pd.Series
Applies inverse transformation depending on method:
- Cumulative sum for differencing.
- Exponential or power formula for log/Box-Cox.
analyze()
Performs exploratory time series analysis, including:
- Stationarity reporting (ADF test results).
- Summary statistics.
- NaN counts.
- Correlation matrix (if multiple columns).
- Optional plots (time series, ACF, PACF).
get_transformed_data() -> pd.DataFrame
Returns the transformed data, dropping NaNs.
trns_info() -> dict
Provides detailed transformation and stationarity info for each column, including:
- Transformation method.
- Differencing order.
- Stationarity status and ADF statistics.
- Log transformation and Box-Cox lambda info.
- Original series stationarity.
- Additional notes depending on method applied.
Example Usage
import pandas as pd
from transform_ts import TransformTS
# Sample time series
data = pd.DataFrame({
'y': [1.2, 2.3, 3.5, 4.1, 5.2, 6.8],
'x': [2.1, 2.9, 3.0, 3.5, 4.0, 4.8]
})
# Create transformer
ts_transformer = TransformTS(data, method='log-diff', analysis=True, plot=True)
# Access transformed data
transformed = ts_transformer.get_transformed_data()
# View transformation info
info = ts_transformer.trns_info()
print(info)Notes
- Automatically detects non-stationary series and applies appropriate transformations.
- Handles non-positive values gracefully for log/Box-Cox methods.
- Useful as a preprocessing step for econometric and machine learning time series models.
- Stores all relevant information for reporting, diagnostics, and inverse transformations.
