API Documentation
Main Pipeline
- nabqr.nabqr.run_nabqr_pipeline(n_samples=2000, phi=0.995, sigma=8, offset_start=10, offset_end=500, offset_step=15, correlation=0.8, data_source='NABQR-TEST', training_size=0.7, epochs=20, timesteps=[0, 1, 2, 6, 12, 24], quantiles=[0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99], X=None, actuals=None, simulation_type='sde', visualize=True, taqr_limit=5000, save_files=True)[source]
Run the complete NABQR pipeline, which may include data simulation, model training, and visualization. The user can either provide pre-computed inputs (X, actuals) or opt to simulate data if both are not provided.
- Parameters:
n_samples (int, optional) – Number of time steps to simulate if no data provided, by default 5000.
phi (float, optional) – AR(1) coefficient for simulation, by default 0.995.
sigma (float, optional) – Standard deviation of noise for simulation, by default 8.
offset_start (int, optional) – Start value for offset range, by default 10.
offset_end (int, optional) – End value for offset range, by default 500.
offset_step (int, optional) – Step size for offset range, by default 15.
correlation (float, optional) – Base correlation between dimensions, by default 0.8.
data_source (str, optional) – Identifier for the data source, by default “NABQR-TEST”.
training_size (float, optional) – Proportion of data to use for training, by default 0.7.
epochs (int, optional) – Number of epochs for model training, by default 100.
timesteps (list, optional) – List of timesteps to use for LSTM, by default [0, 1, 2, 6, 12, 24].
quantiles (list, optional) – List of quantiles to predict, by default [0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99].
X (array-like, optional) – Pre-computed input features. If not provided along with actuals, the function will prompt to simulate data.
actuals (array-like, optional) – Pre-computed actual target values. If not provided along with X, the function will prompt to simulate data.
simulation_type (str, optional) – Type of simulation to use, by default “ar1”. “sde” is more advanced and uses a SDE model and realistic.
visualize (bool, optional) – Determines if any visual elements will be plotted to the screen or saved as figures.
taqr_limit (int, optional) – The lookback limit for the TAQR model, by default 5000.
save_files (bool, optional) – Determines if any files will be saved, by default True. Note: the R-file needs to save some .csv files to run properly.
- Returns:
A tuple containing:
- corrected_ensembles: pd.DataFrame
The corrected ensemble predictions.
- taqr_results: list of numpy.ndarray
The TAQR results.
- actuals_output: list of numpy.ndarray
The actual output values.
- BETA_output: list of numpy.ndarray
The BETA parameters.
- scores: pd.DataFrame
The scores for the predictions and original/corrected ensembles.
- Return type:
tuple
- Raises:
ValueError – If user opts not to simulate data when both X and actuals are missing.
Core Functions
Neural Adaptive Basis Quantile Regression (NABQR) Core Functions
This module provides the core functionality for NABQR.
This module includes: - Scoring metrics (Variogram, CRPS, QSS) - Dataset creation and preprocessing - Model definitions and training - TAQR (Time-Adaptive Quantile Regression) implementation
- class nabqr.functions.QuantileRegressionLSTM(*args, **kwargs)[source]
Bases:
ModelLSTM-based model for quantile regression. Input: x -> LSTM -> Dense -> Dense -> output
- Parameters:
n_quantiles (int) – Number of quantiles to predict
units (int) – Number of LSTM units
n_timesteps (int) – Number of time steps in input
- call(inputs, training=None)[source]
Forward pass of the model.
- Parameters:
inputs (tensorflow.Tensor) – Input tensor
training (bool, optional) – Whether in training mode, by default None
- Returns:
Model output
- Return type:
tensorflow.Tensor
- nabqr.functions.calculate_crps(actuals, corrected_ensembles)[source]
Calculate the Continuous Ranked Probability Score (CRPS) using the properscoring package. If the ensembles do not have the correct dimensions, we transpose them.
- Parameters:
actuals (numpy.ndarray) – Actual observations
corrected_ensembles (numpy.ndarray) – Ensemble forecasts
- Returns:
Mean CRPS score
- Return type:
float
- nabqr.functions.calculate_qss(actuals, taqr_results, quantiles)[source]
Calculate the Quantile Skill Score (QSS).
- Parameters:
actuals (numpy.ndarray) – Actual observations
taqr_results (numpy.ndarray) – TAQR ensemble forecasts
quantiles (array-like) – Quantile levels to evaluate
- Returns:
Quantile Skill Score
- Return type:
float
- nabqr.functions.calculate_scores(actuals, taqr_results, raw_ensembles, corrected_ensembles, quantiles_taqr, data_source, plot_reliability=True, visualize=True)[source]
Calculate Variogram, CRPS, QSS and MAE for the predictions and corrected ensembles.
- Parameters:
actuals (numpy.ndarray) – The actual values
predictions (numpy.ndarray) – The predicted values
raw_ensembles (numpy.ndarray) – The raw ensembles
corrected_ensembles (numpy.ndarray) – The corrected ensembles
quantiles (list) – The quantiles to calculate the scores for
data_source (str) – The data source
- nabqr.functions.create_dataset_for_lstm(X, Y, time_steps)[source]
Create a dataset suitable for LSTM training with multiple time steps (i.e. lags).
- Parameters:
X (numpy.ndarray) – Input features
Y (numpy.ndarray) – Target values
time_steps (list) – List of time steps to include
- Returns:
(X_lstm, Y_lstm) LSTM-ready datasets
- Return type:
tuple
- nabqr.functions.legend_without_duplicate_labels(ax)[source]
Create a legend without duplicate labels. Primarily used for ensemble plots.
- Parameters:
ax (matplotlib.axes.Axes) – Axes object to create legend for
- nabqr.functions.map_range(values, input_start, input_end, output_start, output_end)[source]
Map values from one range to another.
- Parameters:
values (list) – Values to map
input_start (float) – Start of input range
input_end (float) – End of input range
output_start (float) – Start of output range
output_end (float) – End of output range
- Returns:
Mapped values
- Return type:
numpy.ndarray
- nabqr.functions.multi_quantile_skill_score(y_true, y_pred, quantiles)[source]
Calculate the Quantile Skill Score (QSS) for multiple quantile forecasts.
- Parameters:
y_true (numpy.ndarray) – True observed values
y_pred (numpy.ndarray) – Predicted quantile values
quantiles (list) – Quantile levels between 0 and 1
- Returns:
QSS for each quantile forecast
- Return type:
numpy.ndarray
- nabqr.functions.one_step_quantile_prediction(X_input, Y_input, n_init, n_full, quantile=0.5, already_correct_size=False, n_in_X=5000, print_output=True)[source]
Perform one-step quantile prediction using TAQR.
This function takes the entire training set and, based on the last n_init observations, calculates residuals and coefficients for the quantile regression.
An easy wrapper function to run TAQR.
- Parameters:
X_input (numpy.ndarray or pd.DataFrame) – Input features
Y_input (numpy.ndarray or pd.Series) – Target values
n_init (int) – Number of initial observations for warm start
n_full (int) – Total number of observations to process
quantile (float, optional) – Quantile level for prediction, by default 0.5
already_correct_size (bool, optional) – Whether input data is already correctly sized, by default False
n_in_X (int, optional) – Number of observations to include in design matrix, by default 5000
- Returns:
(predictions, actual values, coefficients)
- Return type:
tuple
- nabqr.functions.pipeline(X, y, name='TEST', training_size=0.8, epochs=100, timesteps_for_lstm=[0, 1, 2, 6, 12, 24, 48], **kwargs)[source]
Main pipeline for NABQR model training and evaluation.
The pipeline: 1. Trains an LSTM network to correct the provided ensembles 2. Runs TAQR algorithm on corrected ensembles to predict observations 3. Saves results and model artifacts
- Parameters:
X (pd.DataFrame or numpy.ndarray) – Shape (n_samples, n_features) - Ensemble data
y (pd.Series or numpy.ndarray) – Shape (n_samples,) - Observations
name (str, optional) – Dataset identifier, by default “TEST”
training_size (float, optional) – Fraction of data to use for training, by default 0.8
epochs (int, optional) – Number of training epochs, by default 100
timesteps_for_lstm (list, optional) – Time steps to use for LSTM input, by default [0, 1, 2, 6, 12, 24, 48]
**kwargs (dict) – Additional keyword arguments
- Returns:
A tuple containing: - corrected_ensembles: pd.DataFrame
The corrected ensemble predictions.
- taqr_results: list of numpy.ndarray
The TAQR results.
- actuals_output: list of numpy.ndarray
The actual output values.
- BETA_output: list of numpy.ndarray
The BETA parameters.
- Return type:
tuple
- nabqr.functions.quantile_loss_3(q, y_true, y_pred)[source]
Calculate quantile loss for a single quantile.
- Parameters:
q (float) – Quantile level
y_true (tensorflow.Tensor) – True values
y_pred (tensorflow.Tensor) – Predicted values
- Returns:
Quantile loss value
- Return type:
tensorflow.Tensor
- nabqr.functions.quantile_loss_func(quantiles)[source]
Create a loss function for multiple quantiles.
- Parameters:
quantiles (list) – List of quantile levels
- Returns:
Loss function for multiple quantiles
- Return type:
function
- nabqr.functions.reliability_func(quantile_forecasts, corrected_ensembles, ensembles, actuals, corrected_taqr_quantiles, data_source, plot_reliability=True)[source]
- nabqr.functions.remove_straight_line_outliers(ensembles)[source]
Remove ensemble members that are perfectly straight lines (constant slope). Explanation: Sometimes the output from the LSTM is a straight line, which is not useful for the ensemble.
- Parameters:
ensembles (numpy.ndarray) – 2D array where rows are time steps and columns are ensemble members
- Returns:
Filtered ensemble data without straight-line outliers
- Return type:
numpy.ndarray
- nabqr.functions.remove_zero_columns(df)[source]
Wrapper function to remove columns that contain only zeros from a DataFrame.
- Parameters:
df (pandas.DataFrame) – Input DataFrame
- Returns:
DataFrame with zero columns removed
- Return type:
pandas.DataFrame
- nabqr.functions.remove_zero_columns_numpy(arr)[source]
Remove columns that contain only zeros or constant values from a numpy array.
- Parameters:
arr (numpy.ndarray) – Input array
- Returns:
Array with zero/constant columns removed
- Return type:
numpy.ndarray
- nabqr.functions.run_r_script(X_filename, Y_filename, tau)[source]
Run R script for quantile regression.
- Parameters:
X_filename (str) – Path to X data CSV file
Y_filename (str) – Path to Y data CSV file
tau (float) – Quantile level
- nabqr.functions.run_taqr(corrected_ensembles, actuals, quantiles, n_init, n_full, n_in_X)[source]
Wrapper function to run TAQR on corrected ensembles.
- Parameters:
corrected_ensembles (numpy.ndarray) – Shape (n_timesteps, n_ensembles)
actuals (numpy.ndarray) – Shape (n_timesteps,)
quantiles (list) – Quantiles to predict
n_init (int) – Number of initial timesteps for warm start
n_full (int) – Total number of timesteps
n_in_X (int) – Number of timesteps in design matrix
- Returns:
TAQR results for each quantile
- Return type:
list
- nabqr.functions.train_model_lstm(quantiles, epochs: int, lr: float, batch_size: int, x, y, x_val, y_val, n_timesteps, data_name)[source]
Train LSTM model for quantile regression. The @tf.function decorator is used to speed up the training process.
- Parameters:
quantiles (list) – List of quantile levels to predict
epochs (int) – Number of training epochs
lr (float) – Learning rate for optimizer
batch_size (int) – Batch size for training
x (tensor) – Training input data
y (tensor) – Training target data
x_val (tensor) – Validation input data
y_val (tensor) – Validation target data
n_timesteps (int) – Number of time steps in input sequence
data_name (str) – Name identifier for saving model artifacts
- Returns:
Trained LSTM model
- Return type:
tf.keras.Model
- nabqr.functions.variogram_score_R_multivariate(x, y, p=0.5, t1=12, t2=36)[source]
Calculate the Variogram score for all observations for the time horizon t1 to t2. Modified from the R code in Energy and AI paper: “An introduction to multivariate probabilistic forecast evaluation” by Mathias B.B. et al. Here we use t1 -> t2 as our forecast horizon.
- Parameters:
x (numpy.ndarray) – Ensemble forecast (m x k)
y (numpy.ndarray) – Actual observations (k,)
p (float, optional) – Power parameter, by default 0.5
t1 (int, optional) – Start hour (inclusive), by default 12
t2 (int, optional) – End hour (exclusive), by default 36
- Returns:
(score, score_list) Overall score and list of individual scores
- Return type:
tuple
- nabqr.functions.variogram_score_R_v2(x, y, p=0.5, t1=12, t2=36)[source]
Calculate the Variogram score for all observations for the time horizon t1 to t2. Modified from the paper in Energy and AI, >> An introduction to multivariate probabilistic forecast evaluation <<. Assumes that x and y starts from day 0, 00:00.
Parameters: x : array
Ensemble forecast (m x k), where m is the size of the ensemble, and k is the maximal forecast horizon.
- yarray
Actual observations (k,)
- pfloat
Power parameter for the variogram score.
- t1int
Start of the hour range for comparison (inclusive).
- t2int
End of the hour range for comparison (exclusive).
Returns:
- tuple
(score, score_list) Overall score/100_000 and list of individual VarS contributions
- nabqr.functions.variogram_score_single_observation(x, y, p=0.5)[source]
Calculate the Variogram score for a given observation.
Translated from the R code in Energy and AI paper: “An introduction to multivariate probabilistic forecast evaluation” by Mathias B.B. et al.
- Parameters:
x (numpy.ndarray) – Ensemble forecast (m x k), where m is ensemble size, k is forecast horizon
y (numpy.ndarray) – Actual observations (k,)
p (float, optional) – Power parameter for the variogram score, by default 0.5
- Returns:
Variogram score for the observation
- Return type:
float
TAQR Implementation
Helper Functions
- nabqr.helper_functions.build_ar1_covariance(n, rho, sigma=1.0)[source]
Build the AR(1) covariance matrix for an n-dimensional process.
- Parameters:
n (int) – Dimension of the covariance matrix.
rho (float) – AR(1) correlation parameter (the AR coefficient).
sigma (float, optional) – Standard deviation of the noise (innovation), defaults to 1.0.
- Returns:
The AR(1) covariance matrix of shape (n, n), with elements sigma^2 * rho^(|i-j|).
- Return type:
numpy.ndarray
- nabqr.helper_functions.generate_ou_ensembles(X: ndarray, kappa: float, sigma: float, chunk_size: int = 24, n_ensembles: int = 50) ndarray[source]
Generate continuous Ornstein-Uhlenbeck (OU) ensemble paths that revert to the given reference series X[t], in chunk_size increments, but also simulate ‘extra’ future steps to account for OU lag and shift them back so the paths better align with X in real-time.
The ensemble is clipped to remain within [0,1].
- Parameters:
X (np.ndarray) – Reference series of length T that serves as the time-varying mean for each OU path.
kappa (float) – Mean-reversion speed for the OU process. The characteristic lag ~ 1/kappa.
sigma (float) – Diffusion (volatility) parameter.
chunk_size (int, optional) – Size of each chunk in timesteps. Defaults to 24.
n_ensembles (int, optional) – Number of ensemble paths to generate. Defaults to 50.
- Returns:
Y_corrected – The lag-corrected OU ensemble paths, each of length T.
- Return type:
np.ndarray, shape (T, n_ensembles)
Notes
We break the timeline [0..T-1] into blocks of chunk_size steps. At chunk boundaries, each ensemble path is continuous (meaning; the new chunk starts where the old chunk ended).
We simulate extra steps (about 1/kappa) at the end, then shift the entire simulation backward by ~1/kappa to reduce the effective lag in real time.
For fractional lag, a simple linear interpolation is applied.
This “lag correction” is heuristic but often aligns the OU paths with X(t) more tightly when the reversion is slow.
- nabqr.helper_functions.get_parameter_bounds() Dict[str, Tuple[float, float]][source]
Define bounds for all parameters for SDE simulation. Used to ensure that the parameters are within a reasonable range.
- nabqr.helper_functions.quantile_score(p, z, q)[source]
Calculate the Quantile Score (QS) for a given probability and set of observations and quantiles.
Implementation based on Fauer et al. (2021): “Flexible and consistent quantile estimation for intensity–duration–frequency curves”
- Parameters:
p (float) – The probability level (between 0 and 1)
z (numpy.ndarray) – The observed values
q (numpy.ndarray) – The predicted quantiles
- Returns:
The Quantile Score (QS)
- Return type:
float
- nabqr.helper_functions.set_n_closest_to_zero(arr, n)[source]
Set the n elements closest to zero in an array to zero.
- Parameters:
arr (array-like) – Input array of numbers
n (int) – Number of elements closest to zero to set to zero
- Returns:
Modified array with n elements closest to zero set to zero
- Return type:
numpy.ndarray
- nabqr.helper_functions.set_n_smallest_to_zero(arr, n)[source]
Set the n smallest elements in an array to zero.
- Parameters:
arr (array-like) – Input array of numbers
n (int) – Number of smallest elements to set to zero
- Returns:
Modified array with n smallest elements set to zero
- Return type:
numpy.ndarray
Simulate a correlated AR(1) process with multiple dimensions.
- Parameters:
n (int) – Number of time steps to simulate
phi (float) – AR(1) coefficient (persistence parameter, often denoted rho)
sigma (float) – Standard deviation of the noise
m (int) – Number of dimensions/variables
corr_matrix (numpy.ndarray, optional) – Correlation (or covariance) matrix between dimensions. If None, an AR(1) covariance structure will be generated.
offset (numpy.ndarray, optional) – Offset vector for each dimension. Defaults to zero vector
smooth (int or str, optional) – Number of initial time steps to discard for smoothing. Defaults to “no”
- Returns:
(simulated_ensembles, actuals) where simulated_ensembles is the AR(1) process and actuals is the median of ensembles with added noise
- Return type:
tuple
- nabqr.helper_functions.simulate_wind_power_sde(params: Dict[str, float], T: float = 500, dt: float = 1.0) Tuple[ndarray, ndarray][source]
Simulate wind power production using an Ornstein-Uhlenbeck process with GARCH volatility and jumps of normally distributed sizes driven by a Poisson process. The mean reversion is state-dependent with a repelling mechanism near 1.0 (upper boundary), and the diffusion term vanishes at the boundaries to avoid unphysical values outside [0, 1].
A few additional tweaks include: - GARCH volatility that captures ‘vol_shock’ from recent values. - Repellent forces that strengthen near 1.0, reducing both the drift and diffusion. - Jumps that can persist over multiple steps, and become more negative if values are near 1.0.
- Parameters:
params (Dict[str, float]) –
A dictionary containing all model parameters:
- X0float
Initial wind power production level in [0, 1].
- thetafloat
Long-term mean level; typically in [0, 1].
- kappafloat
Mean reversion speed (absolute value is used).
- sigma_basefloat
Base volatility level (absolute value is used).
- alphafloat
ARCH parameter (absolute value is used).
- betafloat
GARCH parameter; must be in [0, 1].
- lambda_jumpfloat
Intensity of jump arrivals in the Poisson process (absolute value is used).
- jump_mufloat
Mean jump size (can be positive or negative).
- jump_sigmafloat
Standard deviation of jump sizes (absolute value is used).
T (float, optional) – The end time of the simulation (total number of steps is T/dt). Default is 500.
dt (float, optional) – The size of each time step. Default is 1.0.
- Returns:
t (np.ndarray) – Array of time points of length N = int(T/dt).
X (np.ndarray) – Simulated wind power production values of length N, clipped to the interval [0, 1].
Notes
The drift term implements a state-dependent mean reversion that weakens near 1.0 and introduces a strong downward force very close to 1.0.
The diffusion term is modified as (X_t * (1 - X_t)) * (X_t / (X_t + 0.5)) dB_t, ensuring it decreases to zero when X_t is near 0 or 1.
GARCH effects are included to model changing volatility based on recent shocks in the process.
Jumps arrive according to a Poisson process with random normal magnitudes, and can persist over multiple time steps with some decay.
Examples
>>> params = { ... 'X0': 0.5, 'theta': 0.7, 'kappa': 1.0, 'sigma_base': 0.1, ... 'alpha': 0.2, 'beta': 0.5, 'lambda_jump': 0.05, ... 'jump_mu': 0.0, 'jump_sigma': 0.02 ... } >>> t, X = simulate_wind_power_sde(params, T=100, dt=1.0) >>> import matplotlib.pyplot as plt >>> plt.plot(t, X) >>> plt.show()
Visualization
- nabqr.visualization.visualize_results(y_hat, q_hat, ylabel)[source]
Create a visualization of prediction intervals with actual values.
- Parameters:
y_hat (numpy.ndarray) – Actual observed values
q_hat (numpy.ndarray) – Predicted quantiles for different probability levels
ylabel (str) – Label for the y-axis
- Returns:
Saves the plot as ‘TEST_NABQR_taqr_pi_plot.pdf’ and displays it
- Return type:
None
Notes
Creates a filled plot showing prediction intervals using a blue gradient
Overlays actual values as a black line
Automatically adjusts x-axis date formatting
Package Contents
NABQR: Neural Adaptive Basis Quantile Regression
A method for sequential error-corrections tailored for wind power forecast in Denmark.
- nabqr.calculate_crps(actuals, corrected_ensembles)[source]
Calculate the Continuous Ranked Probability Score (CRPS) using the properscoring package. If the ensembles do not have the correct dimensions, we transpose them.
- Parameters:
actuals (numpy.ndarray) – Actual observations
corrected_ensembles (numpy.ndarray) – Ensemble forecasts
- Returns:
Mean CRPS score
- Return type:
float
- nabqr.calculate_qss(actuals, taqr_results, quantiles)[source]
Calculate the Quantile Skill Score (QSS).
- Parameters:
actuals (numpy.ndarray) – Actual observations
taqr_results (numpy.ndarray) – TAQR ensemble forecasts
quantiles (array-like) – Quantile levels to evaluate
- Returns:
Quantile Skill Score
- Return type:
float
- nabqr.one_step_quantile_prediction(X_input, Y_input, n_init, n_full, quantile=0.5, already_correct_size=False, n_in_X=5000)[source]
Perform one-step quantile prediction using TAQR.
Takes the entire training set and, based on the last n_init observations, calculates residuals and coefficients for the quantile regression.
- Parameters:
X_input (numpy.ndarray) – Input features matrix
Y_input (numpy.ndarray) – Target values array
n_init (int) – Number of initial observations for training
n_full (int) – Total number of observations to use
quantile (float, optional) – Quantile level to predict, by default 0.5
already_correct_size (bool, optional) – Whether inputs are already correctly sized, by default False
n_in_X (int, optional) – Number of observations to use in X, by default 5000
- Returns:
(y_pred, y_actual, BETA) Predictions, actual values, and coefficients
- Return type:
tuple
- nabqr.pipeline(X, y, name='TEST', training_size=0.8, epochs=100, timesteps_for_lstm=[0, 1, 2, 6, 12, 24, 48], **kwargs)[source]
Main pipeline for NABQR model training and evaluation.
The pipeline: 1. Trains an LSTM network to correct the provided ensembles 2. Runs TAQR algorithm on corrected ensembles to predict observations 3. Saves results and model artifacts
- Parameters:
X (pd.DataFrame or numpy.ndarray) – Shape (n_samples, n_features) - Ensemble data
y (pd.Series or numpy.ndarray) – Shape (n_samples,) - Observations
name (str, optional) – Dataset identifier, by default “TEST”
training_size (float, optional) – Fraction of data to use for training, by default 0.8
epochs (int, optional) – Number of training epochs, by default 100
timesteps_for_lstm (list, optional) – Time steps to use for LSTM input, by default [0, 1, 2, 6, 12, 24, 48]
**kwargs (dict) – Additional keyword arguments
- Returns:
A tuple containing: - corrected_ensembles: pd.DataFrame
The corrected ensemble predictions.
- taqr_results: list of numpy.ndarray
The TAQR results.
- actuals_output: list of numpy.ndarray
The actual output values.
- BETA_output: list of numpy.ndarray
The BETA parameters.
- Return type:
tuple
- nabqr.quantile_score(p, z, q)[source]
Calculate the Quantile Score (QS) for a given probability and set of observations and quantiles.
Implementation based on Fauer et al. (2021): “Flexible and consistent quantile estimation for intensity–duration–frequency curves”
- Parameters:
p (float) – The probability level (between 0 and 1)
z (numpy.ndarray) – The observed values
q (numpy.ndarray) – The predicted quantiles
- Returns:
The Quantile Score (QS)
- Return type:
float
- nabqr.rq_simplex_final(X, IX, Iy, Iex, r, beta, n, tau, bins, n_in_bin)[source]
Calculate solution to an adaptive simplex algorithm for quantile regression.
The function uses knowledge of the solution at time t to calculate the solution at time t+1. The basic idea is that the solution to the quantile regression problem can be written as: y(t) = X(t)’*beta + r(t)
where beta = X(h)^(-1)*y(h) for some index set h. Simplex algorithm is used to calculate the optimal h at time t+1 based on the solution at time t.
- Parameters:
X (numpy.ndarray) – Design matrix for the linear quantile regression problem
IX (numpy.ndarray) – Index set referring to columns of X which is the design matrix
Iy (int) – Index referring to response column in X
Iex (int) – Index referring to grouping variable column in X
r (numpy.ndarray) – Residuals from initial solution
beta (numpy.ndarray) – Initial solution coefficients
n (int) – Number of elements in r
tau (float) – Required probability
bins (numpy.ndarray) – Vector defining partition intervals
n_in_bin (int) – Number of elements per bin
- Returns:
(N, BETA, GAIN, Ld, Rny, Mx, Re, CON1, T) - N: Number of simplex steps - BETA: Solution matrix - GAIN: Loss function gain - Ld: Number of descent directions - Rny: One-step-ahead prediction residuals - Mx: Minimum constraint solution - Re: Training set reliability - CON1: Condition numbers - T: Computation times
- Return type:
tuple
References
- nabqr.run_nabqr_pipeline(n_samples=2000, phi=0.995, sigma=8, offset_start=10, offset_end=500, offset_step=15, correlation=0.8, data_source='NABQR-TEST', training_size=0.7, epochs=20, timesteps=[0, 1, 2, 6, 12, 24], quantiles=[0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99], X=None, actuals=None, simulation_type='sde', visualize=True, taqr_limit=5000, save_files=True)[source]
Run the complete NABQR pipeline, which may include data simulation, model training, and visualization. The user can either provide pre-computed inputs (X, actuals) or opt to simulate data if both are not provided.
- Parameters:
n_samples (int, optional) – Number of time steps to simulate if no data provided, by default 5000.
phi (float, optional) – AR(1) coefficient for simulation, by default 0.995.
sigma (float, optional) – Standard deviation of noise for simulation, by default 8.
offset_start (int, optional) – Start value for offset range, by default 10.
offset_end (int, optional) – End value for offset range, by default 500.
offset_step (int, optional) – Step size for offset range, by default 15.
correlation (float, optional) – Base correlation between dimensions, by default 0.8.
data_source (str, optional) – Identifier for the data source, by default “NABQR-TEST”.
training_size (float, optional) – Proportion of data to use for training, by default 0.7.
epochs (int, optional) – Number of epochs for model training, by default 100.
timesteps (list, optional) – List of timesteps to use for LSTM, by default [0, 1, 2, 6, 12, 24].
quantiles (list, optional) – List of quantiles to predict, by default [0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99].
X (array-like, optional) – Pre-computed input features. If not provided along with actuals, the function will prompt to simulate data.
actuals (array-like, optional) – Pre-computed actual target values. If not provided along with X, the function will prompt to simulate data.
simulation_type (str, optional) – Type of simulation to use, by default “ar1”. “sde” is more advanced and uses a SDE model and realistic.
visualize (bool, optional) – Determines if any visual elements will be plotted to the screen or saved as figures.
taqr_limit (int, optional) – The lookback limit for the TAQR model, by default 5000.
save_files (bool, optional) – Determines if any files will be saved, by default True. Note: the R-file needs to save some .csv files to run properly.
- Returns:
A tuple containing:
- corrected_ensembles: pd.DataFrame
The corrected ensemble predictions.
- taqr_results: list of numpy.ndarray
The TAQR results.
- actuals_output: list of numpy.ndarray
The actual output values.
- BETA_output: list of numpy.ndarray
The BETA parameters.
- scores: pd.DataFrame
The scores for the predictions and original/corrected ensembles.
- Return type:
tuple
- Raises:
ValueError – If user opts not to simulate data when both X and actuals are missing.
- nabqr.set_n_closest_to_zero(arr, n)[source]
Set the n elements closest to zero in an array to zero.
- Parameters:
arr (array-like) – Input array of numbers
n (int) – Number of elements closest to zero to set to zero
- Returns:
Modified array with n elements closest to zero set to zero
- Return type:
numpy.ndarray
- nabqr.set_n_smallest_to_zero(arr, n)[source]
Set the n smallest elements in an array to zero.
- Parameters:
arr (array-like) – Input array of numbers
n (int) – Number of smallest elements to set to zero
- Returns:
Modified array with n smallest elements set to zero
- Return type:
numpy.ndarray
Simulate a correlated AR(1) process with multiple dimensions.
- Parameters:
n (int) – Number of time steps to simulate
phi (float) – AR(1) coefficient (persistence parameter, often denoted rho)
sigma (float) – Standard deviation of the noise
m (int) – Number of dimensions/variables
corr_matrix (numpy.ndarray, optional) – Correlation (or covariance) matrix between dimensions. If None, an AR(1) covariance structure will be generated.
offset (numpy.ndarray, optional) – Offset vector for each dimension. Defaults to zero vector
smooth (int or str, optional) – Number of initial time steps to discard for smoothing. Defaults to “no”
- Returns:
(simulated_ensembles, actuals) where simulated_ensembles is the AR(1) process and actuals is the median of ensembles with added noise
- Return type:
tuple
- nabqr.variogram_score_R_multivariate(x, y, p=0.5, t1=12, t2=36)[source]
Calculate the Variogram score for all observations for the time horizon t1 to t2. Modified from the R code in Energy and AI paper: “An introduction to multivariate probabilistic forecast evaluation” by Mathias B.B. et al. Here we use t1 -> t2 as our forecast horizon.
- Parameters:
x (numpy.ndarray) – Ensemble forecast (m x k)
y (numpy.ndarray) – Actual observations (k,)
p (float, optional) – Power parameter, by default 0.5
t1 (int, optional) – Start hour (inclusive), by default 12
t2 (int, optional) – End hour (exclusive), by default 36
- Returns:
(score, score_list) Overall score and list of individual scores
- Return type:
tuple
- nabqr.variogram_score_single_observation(x, y, p=0.5)[source]
Calculate the Variogram score for a given observation.
Translated from the R code in Energy and AI paper: “An introduction to multivariate probabilistic forecast evaluation” by Mathias B.B. et al.
- Parameters:
x (numpy.ndarray) – Ensemble forecast (m x k), where m is ensemble size, k is forecast horizon
y (numpy.ndarray) – Actual observations (k,)
p (float, optional) – Power parameter for the variogram score, by default 0.5
- Returns:
Variogram score for the observation
- Return type:
float
- nabqr.visualize_results(y_hat, q_hat, ylabel)[source]
Create a visualization of prediction intervals with actual values.
- Parameters:
y_hat (numpy.ndarray) – Actual observed values
q_hat (numpy.ndarray) – Predicted quantiles for different probability levels
ylabel (str) – Label for the y-axis
- Returns:
Saves the plot as ‘TEST_NABQR_taqr_pi_plot.pdf’ and displays it
- Return type:
None
Notes
Creates a filled plot showing prediction intervals using a blue gradient
Overlays actual values as a black line
Automatically adjusts x-axis date formatting