API Documentation

Main Pipeline

nabqr.nabqr.run_nabqr_pipeline(n_samples=2000, phi=0.995, sigma=8, offset_start=10, offset_end=500, offset_step=15, correlation=0.8, data_source='NABQR-TEST', training_size=0.7, epochs=20, timesteps=[0, 1, 2, 6, 12, 24], quantiles=[0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99], X=None, actuals=None, simulation_type='sde', visualize=True, taqr_limit=5000, save_files=True)[source]

Run the complete NABQR pipeline, which may include data simulation, model training, and visualization. The user can either provide pre-computed inputs (X, actuals) or opt to simulate data if both are not provided.

Parameters:

n_samples (int, optional) – Number of time steps to simulate if no data provided, by default 5000.
phi (float, optional) – AR(1) coefficient for simulation, by default 0.995.
sigma (float, optional) – Standard deviation of noise for simulation, by default 8.
offset_start (int, optional) – Start value for offset range, by default 10.
offset_end (int, optional) – End value for offset range, by default 500.
offset_step (int, optional) – Step size for offset range, by default 15.
correlation (float, optional) – Base correlation between dimensions, by default 0.8.
data_source (str, optional) – Identifier for the data source, by default “NABQR-TEST”.
training_size (float, optional) – Proportion of data to use for training, by default 0.7.
epochs (int, optional) – Number of epochs for model training, by default 100.
timesteps (list, optional) – List of timesteps to use for LSTM, by default [0, 1, 2, 6, 12, 24].
quantiles (list, optional) – List of quantiles to predict, by default [0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99].
X (array-like, optional) – Pre-computed input features. If not provided along with actuals, the function will prompt to simulate data.
actuals (array-like, optional) – Pre-computed actual target values. If not provided along with X, the function will prompt to simulate data.
simulation_type (str, optional) – Type of simulation to use, by default “ar1”. “sde” is more advanced and uses a SDE model and realistic.
visualize (bool, optional) – Determines if any visual elements will be plotted to the screen or saved as figures.
taqr_limit (int, optional) – The lookback limit for the TAQR model, by default 5000.
save_files (bool, optional) – Determines if any files will be saved, by default True. Note: the R-file needs to save some .csv files to run properly.

Returns:

A tuple containing:

corrected_ensembles: pd.DataFrame
The corrected ensemble predictions.
taqr_results: list of numpy.ndarray
The TAQR results.
actuals_output: list of numpy.ndarray
The actual output values.
BETA_output: list of numpy.ndarray
The BETA parameters.
scores: pd.DataFrame
The scores for the predictions and original/corrected ensembles.

Return type:

tuple

Raises:

ValueError – If user opts not to simulate data when both X and actuals are missing.

Core Functions

Neural Adaptive Basis Quantile Regression (NABQR) Core Functions

This module provides the core functionality for NABQR.

This module includes: - Scoring metrics (Variogram, CRPS, QSS) - Dataset creation and preprocessing - Model definitions and training - TAQR (Time-Adaptive Quantile Regression) implementation

class nabqr.functions.QuantileRegressionLSTM(*args, **kwargs)[source]

Bases: Model

LSTM-based model for quantile regression. Input: x -> LSTM -> Dense -> Dense -> output

Parameters:

n_quantiles (int) – Number of quantiles to predict
units (int) – Number of LSTM units
n_timesteps (int) – Number of time steps in input

call(inputs, training=None)[source]

Forward pass of the model.

Parameters:

inputs (tensorflow.Tensor) – Input tensor
training (bool, optional) – Whether in training mode, by default None

Returns:

Model output

Return type:

tensorflow.Tensor

classmethod from_config(config)[source]

Create model from configuration.

Parameters:: config (dict) – Model configuration
Returns:: Model instance
Return type:: QuantileRegressionLSTM

get_config()[source]

Get model configuration.

Returns:: Model configuration
Return type:: dict

nabqr.functions.calculate_crps(actuals, corrected_ensembles)[source]

Calculate the Continuous Ranked Probability Score (CRPS) using the properscoring package. If the ensembles do not have the correct dimensions, we transpose them.

Parameters:

actuals (numpy.ndarray) – Actual observations
corrected_ensembles (numpy.ndarray) – Ensemble forecasts

Returns:

Mean CRPS score

Return type:

float

nabqr.functions.calculate_qss(actuals, taqr_results, quantiles)[source]

Calculate the Quantile Skill Score (QSS).

Parameters:

actuals (numpy.ndarray) – Actual observations
taqr_results (numpy.ndarray) – TAQR ensemble forecasts
quantiles (array-like) – Quantile levels to evaluate

Returns:

Quantile Skill Score

Return type:

float

nabqr.functions.calculate_scores(actuals, taqr_results, raw_ensembles, corrected_ensembles, quantiles_taqr, data_source, plot_reliability=True, visualize=True)[source]

Calculate Variogram, CRPS, QSS and MAE for the predictions and corrected ensembles.

Parameters:

actuals (numpy.ndarray) – The actual values
predictions (numpy.ndarray) – The predicted values
raw_ensembles (numpy.ndarray) – The raw ensembles
corrected_ensembles (numpy.ndarray) – The corrected ensembles
quantiles (list) – The quantiles to calculate the scores for
data_source (str) – The data source

nabqr.functions.create_dataset_for_lstm(X, Y, time_steps)[source]

Create a dataset suitable for LSTM training with multiple time steps (i.e. lags).

Parameters:

X (numpy.ndarray) – Input features
Y (numpy.ndarray) – Target values
time_steps (list) – List of time steps to include

Returns:

(X_lstm, Y_lstm) LSTM-ready datasets

Return type:

tuple

nabqr.functions.legend_without_duplicate_labels(ax)[source]

Create a legend without duplicate labels. Primarily used for ensemble plots.

Parameters:: ax (matplotlib.axes.Axes) – Axes object to create legend for

nabqr.functions.map_range(values, input_start, input_end, output_start, output_end)[source]

Map values from one range to another.

Parameters:

values (list) – Values to map
input_start (float) – Start of input range
input_end (float) – End of input range
output_start (float) – Start of output range
output_end (float) – End of output range

Returns:

Mapped values

Return type:

numpy.ndarray

nabqr.functions.multi_quantile_skill_score(y_true, y_pred, quantiles)[source]

Calculate the Quantile Skill Score (QSS) for multiple quantile forecasts.

Parameters:

y_true (numpy.ndarray) – True observed values
y_pred (numpy.ndarray) – Predicted quantile values
quantiles (list) – Quantile levels between 0 and 1

Returns:

QSS for each quantile forecast

Return type:

numpy.ndarray

nabqr.functions.one_step_quantile_prediction(X_input, Y_input, n_init, n_full, quantile=0.5, already_correct_size=False, n_in_X=5000, print_output=True)[source]

Perform one-step quantile prediction using TAQR.

This function takes the entire training set and, based on the last n_init observations, calculates residuals and coefficients for the quantile regression.

An easy wrapper function to run TAQR.

Parameters:

X_input (numpy.ndarray or pd.DataFrame) – Input features
Y_input (numpy.ndarray or pd.Series) – Target values
n_init (int) – Number of initial observations for warm start
n_full (int) – Total number of observations to process
quantile (float, optional) – Quantile level for prediction, by default 0.5
already_correct_size (bool, optional) – Whether input data is already correctly sized, by default False
n_in_X (int, optional) – Number of observations to include in design matrix, by default 5000

Returns:

(predictions, actual values, coefficients)

Return type:

tuple

nabqr.functions.pipeline(X, y, name='TEST', training_size=0.8, epochs=100, timesteps_for_lstm=[0, 1, 2, 6, 12, 24, 48], **kwargs)[source]

Main pipeline for NABQR model training and evaluation.

The pipeline: 1. Trains an LSTM network to correct the provided ensembles 2. Runs TAQR algorithm on corrected ensembles to predict observations 3. Saves results and model artifacts

Parameters:

X (pd.DataFrame or numpy.ndarray) – Shape (n_samples, n_features) - Ensemble data
y (pd.Series or numpy.ndarray) – Shape (n_samples,) - Observations
name (str, optional) – Dataset identifier, by default “TEST”
training_size (float, optional) – Fraction of data to use for training, by default 0.8
epochs (int, optional) – Number of training epochs, by default 100
timesteps_for_lstm (list, optional) – Time steps to use for LSTM input, by default [0, 1, 2, 6, 12, 24, 48]
**kwargs (dict) – Additional keyword arguments

Returns:

A tuple containing: - corrected_ensembles: pd.DataFrame

The corrected ensemble predictions.

taqr_results: list of numpy.ndarray
The TAQR results.
actuals_output: list of numpy.ndarray
The actual output values.
BETA_output: list of numpy.ndarray
The BETA parameters.

Return type:

tuple

nabqr.functions.quantile_loss_3(q, y_true, y_pred)[source]

Calculate quantile loss for a single quantile.

Parameters:

q (float) – Quantile level
y_true (tensorflow.Tensor) – True values
y_pred (tensorflow.Tensor) – Predicted values

Returns:

Quantile loss value

Return type:

tensorflow.Tensor

nabqr.functions.quantile_loss_func(quantiles)[source]

Create a loss function for multiple quantiles.

Parameters:: quantiles (list) – List of quantile levels
Returns:: Loss function for multiple quantiles
Return type:: function

nabqr.functions.reliability_func(quantile_forecasts, corrected_ensembles, ensembles, actuals, corrected_taqr_quantiles, data_source, plot_reliability=True)[source]

nabqr.functions.remove_straight_line_outliers(ensembles)[source]

Remove ensemble members that are perfectly straight lines (constant slope). Explanation: Sometimes the output from the LSTM is a straight line, which is not useful for the ensemble.

Parameters:: ensembles (numpy.ndarray) – 2D array where rows are time steps and columns are ensemble members
Returns:: Filtered ensemble data without straight-line outliers
Return type:: numpy.ndarray

nabqr.functions.remove_zero_columns(df)[source]

Wrapper function to remove columns that contain only zeros from a DataFrame.

Parameters:: df (pandas.DataFrame) – Input DataFrame
Returns:: DataFrame with zero columns removed
Return type:: pandas.DataFrame

nabqr.functions.remove_zero_columns_numpy(arr)[source]

Remove columns that contain only zeros or constant values from a numpy array.

Parameters:: arr (numpy.ndarray) – Input array
Returns:: Array with zero/constant columns removed
Return type:: numpy.ndarray

nabqr.functions.run_r_script(X_filename, Y_filename, tau)[source]

Run R script for quantile regression.

Parameters:

X_filename (str) – Path to X data CSV file
Y_filename (str) – Path to Y data CSV file
tau (float) – Quantile level

nabqr.functions.run_taqr(corrected_ensembles, actuals, quantiles, n_init, n_full, n_in_X)[source]

Wrapper function to run TAQR on corrected ensembles.

Parameters:

corrected_ensembles (numpy.ndarray) – Shape (n_timesteps, n_ensembles)
actuals (numpy.ndarray) – Shape (n_timesteps,)
quantiles (list) – Quantiles to predict
n_init (int) – Number of initial timesteps for warm start
n_full (int) – Total number of timesteps
n_in_X (int) – Number of timesteps in design matrix

Returns:

TAQR results for each quantile

Return type:

list

nabqr.functions.train_model_lstm(quantiles, epochs: int, lr: float, batch_size: int, x, y, x_val, y_val, n_timesteps, data_name)[source]

Train LSTM model for quantile regression. The @tf.function decorator is used to speed up the training process.

Parameters:

quantiles (list) – List of quantile levels to predict
epochs (int) – Number of training epochs
lr (float) – Learning rate for optimizer
batch_size (int) – Batch size for training
x (tensor) – Training input data
y (tensor) – Training target data
x_val (tensor) – Validation input data
y_val (tensor) – Validation target data
n_timesteps (int) – Number of time steps in input sequence
data_name (str) – Name identifier for saving model artifacts

Returns:

Trained LSTM model

Return type:

tf.keras.Model

nabqr.functions.variogram_score_R_multivariate(x, y, p=0.5, t1=12, t2=36)[source]

Calculate the Variogram score for all observations for the time horizon t1 to t2. Modified from the R code in Energy and AI paper: “An introduction to multivariate probabilistic forecast evaluation” by Mathias B.B. et al. Here we use t1 -> t2 as our forecast horizon.

Parameters:

x (numpy.ndarray) – Ensemble forecast (m x k)
y (numpy.ndarray) – Actual observations (k,)
p (float, optional) – Power parameter, by default 0.5
t1 (int, optional) – Start hour (inclusive), by default 12
t2 (int, optional) – End hour (exclusive), by default 36

Returns:

(score, score_list) Overall score and list of individual scores

Return type:

tuple

nabqr.functions.variogram_score_R_v2(x, y, p=0.5, t1=12, t2=36)[source]

Calculate the Variogram score for all observations for the time horizon t1 to t2. Modified from the paper in Energy and AI, >> An introduction to multivariate probabilistic forecast evaluation <<. Assumes that x and y starts from day 0, 00:00.

Parameters: x : array

Ensemble forecast (m x k), where m is the size of the ensemble, and k is the maximal forecast horizon.

yarray: Actual observations (k,)
pfloat: Power parameter for the variogram score.
t1int: Start of the hour range for comparison (inclusive).
t2int: End of the hour range for comparison (exclusive).

Returns:

tuple: (score, score_list) Overall score/100_000 and list of individual VarS contributions

nabqr.functions.variogram_score_single_observation(x, y, p=0.5)[source]

Calculate the Variogram score for a given observation.

Translated from the R code in Energy and AI paper: “An introduction to multivariate probabilistic forecast evaluation” by Mathias B.B. et al.

Parameters:

x (numpy.ndarray) – Ensemble forecast (m x k), where m is ensemble size, k is forecast horizon
y (numpy.ndarray) – Actual observations (k,)
p (float, optional) – Power parameter for the variogram score, by default 0.5

Returns:

Variogram score for the observation

Return type:

float

TAQR Implementation

Helper Functions

nabqr.helper_functions.build_ar1_covariance(n, rho, sigma=1.0)[source]

Build the AR(1) covariance matrix for an n-dimensional process.

Parameters:

n (int) – Dimension of the covariance matrix.
rho (float) – AR(1) correlation parameter (the AR coefficient).
sigma (float, optional) – Standard deviation of the noise (innovation), defaults to 1.0.

Returns:

The AR(1) covariance matrix of shape (n, n), with elements sigma^2 * rho^(|i-j|).

Return type:

numpy.ndarray

nabqr.helper_functions.generate_ou_ensembles(X: ndarray, kappa: float, sigma: float, chunk_size: int = 24, n_ensembles: int = 50) → ndarray[source]

Generate continuous Ornstein-Uhlenbeck (OU) ensemble paths that revert to the given reference series X[t], in chunk_size increments, but also simulate ‘extra’ future steps to account for OU lag and shift them back so the paths better align with X in real-time.

The ensemble is clipped to remain within [0,1].

Parameters:

X (np.ndarray) – Reference series of length T that serves as the time-varying mean for each OU path.
kappa (float) – Mean-reversion speed for the OU process. The characteristic lag ~ 1/kappa.
sigma (float) – Diffusion (volatility) parameter.
chunk_size (int, optional) – Size of each chunk in timesteps. Defaults to 24.
n_ensembles (int, optional) – Number of ensemble paths to generate. Defaults to 50.

Returns:

Y_corrected – The lag-corrected OU ensemble paths, each of length T.

Return type:

np.ndarray, shape (T, n_ensembles)

Notes

We break the timeline [0..T-1] into blocks of chunk_size steps. At chunk boundaries, each ensemble path is continuous (meaning; the new chunk starts where the old chunk ended).
We simulate extra steps (about 1/kappa) at the end, then shift the entire simulation backward by ~1/kappa to reduce the effective lag in real time.
For fractional lag, a simple linear interpolation is applied.
This “lag correction” is heuristic but often aligns the OU paths with X(t) more tightly when the reversion is slow.

nabqr.helper_functions.get_parameter_bounds() → Dict[str, Tuple[float, float]][source]: Define bounds for all parameters for SDE simulation. Used to ensure that the parameters are within a reasonable range.

nabqr.helper_functions.quantile_score(p, z, q)[source]

Calculate the Quantile Score (QS) for a given probability and set of observations and quantiles.

Implementation based on Fauer et al. (2021): “Flexible and consistent quantile estimation for intensity–duration–frequency curves”

Parameters:

p (float) – The probability level (between 0 and 1)
z (numpy.ndarray) – The observed values
q (numpy.ndarray) – The predicted quantiles

Returns:

The Quantile Score (QS)

Return type:

float

nabqr.helper_functions.set_n_closest_to_zero(arr, n)[source]

Set the n elements closest to zero in an array to zero.

Parameters:

arr (array-like) – Input array of numbers
n (int) – Number of elements closest to zero to set to zero

Returns:

Modified array with n elements closest to zero set to zero

Return type:

numpy.ndarray

nabqr.helper_functions.set_n_smallest_to_zero(arr, n)[source]

Set the n smallest elements in an array to zero.

Parameters:

arr (array-like) – Input array of numbers
n (int) – Number of smallest elements to set to zero

Returns:

Modified array with n smallest elements set to zero

Return type:

numpy.ndarray

nabqr.helper_functions.simulate_correlated_ar1_process(n, phi, sigma, m, corr_matrix=None, offset=None, smooth='no')[source]

Simulate a correlated AR(1) process with multiple dimensions.

Parameters:

n (int) – Number of time steps to simulate
phi (float) – AR(1) coefficient (persistence parameter, often denoted rho)
sigma (float) – Standard deviation of the noise
m (int) – Number of dimensions/variables
corr_matrix (numpy.ndarray, optional) – Correlation (or covariance) matrix between dimensions. If None, an AR(1) covariance structure will be generated.
offset (numpy.ndarray, optional) – Offset vector for each dimension. Defaults to zero vector
smooth (int or str, optional) – Number of initial time steps to discard for smoothing. Defaults to “no”

Returns:

(simulated_ensembles, actuals) where simulated_ensembles is the AR(1) process and actuals is the median of ensembles with added noise

Return type:

tuple

nabqr.helper_functions.simulate_wind_power_sde(params: Dict[str, float], T: float = 500, dt: float = 1.0) → Tuple[ndarray, ndarray][source]

Simulate wind power production using an Ornstein-Uhlenbeck process with GARCH volatility and jumps of normally distributed sizes driven by a Poisson process. The mean reversion is state-dependent with a repelling mechanism near 1.0 (upper boundary), and the diffusion term vanishes at the boundaries to avoid unphysical values outside [0, 1].

A few additional tweaks include: - GARCH volatility that captures ‘vol_shock’ from recent values. - Repellent forces that strengthen near 1.0, reducing both the drift and diffusion. - Jumps that can persist over multiple steps, and become more negative if values are near 1.0.

Parameters:

params (Dict[str, float]) –
A dictionary containing all model parameters:
- X0float
  Initial wind power production level in [0, 1].
- thetafloat
  Long-term mean level; typically in [0, 1].
- kappafloat
  Mean reversion speed (absolute value is used).
- sigma_basefloat
  Base volatility level (absolute value is used).
- alphafloat
  ARCH parameter (absolute value is used).
- betafloat
  GARCH parameter; must be in [0, 1].
- lambda_jumpfloat
  Intensity of jump arrivals in the Poisson process (absolute value is used).
- jump_mufloat
  Mean jump size (can be positive or negative).
- jump_sigmafloat
  Standard deviation of jump sizes (absolute value is used).
T (float, optional) – The end time of the simulation (total number of steps is T/dt). Default is 500.
dt (float, optional) – The size of each time step. Default is 1.0.

Returns:

t (np.ndarray) – Array of time points of length N = int(T/dt).
X (np.ndarray) – Simulated wind power production values of length N, clipped to the interval [0, 1].

Notes

The drift term implements a state-dependent mean reversion that weakens near 1.0 and introduces a strong downward force very close to 1.0.
The diffusion term is modified as (X_t * (1 - X_t)) * (X_t / (X_t + 0.5)) dB_t, ensuring it decreases to zero when X_t is near 0 or 1.
GARCH effects are included to model changing volatility based on recent shocks in the process.
Jumps arrive according to a Poisson process with random normal magnitudes, and can persist over multiple time steps with some decay.

Examples

>>> params = {
...     'X0': 0.5, 'theta': 0.7, 'kappa': 1.0, 'sigma_base': 0.1,
...     'alpha': 0.2, 'beta': 0.5, 'lambda_jump': 0.05,
...     'jump_mu': 0.0, 'jump_sigma': 0.02
... }
>>> t, X = simulate_wind_power_sde(params, T=100, dt=1.0)
>>> import matplotlib.pyplot as plt
>>> plt.plot(t, X)
>>> plt.show()

Visualization

nabqr.visualization.visualize_results(y_hat, q_hat, ylabel)[source]

Create a visualization of prediction intervals with actual values.

Parameters:

y_hat (numpy.ndarray) – Actual observed values
q_hat (numpy.ndarray) – Predicted quantiles for different probability levels
ylabel (str) – Label for the y-axis

Returns:

Saves the plot as ‘TEST_NABQR_taqr_pi_plot.pdf’ and displays it

Return type:

None

Notes

Creates a filled plot showing prediction intervals using a blue gradient
Overlays actual values as a black line
Automatically adjusts x-axis date formatting

Package Contents

NABQR: Neural Adaptive Basis Quantile Regression

A method for sequential error-corrections tailored for wind power forecast in Denmark.

nabqr.calculate_crps(actuals, corrected_ensembles)[source]

Calculate the Continuous Ranked Probability Score (CRPS) using the properscoring package. If the ensembles do not have the correct dimensions, we transpose them.

Parameters:

actuals (numpy.ndarray) – Actual observations
corrected_ensembles (numpy.ndarray) – Ensemble forecasts

Returns:

Mean CRPS score

Return type:

float

nabqr.calculate_qss(actuals, taqr_results, quantiles)[source]

Calculate the Quantile Skill Score (QSS).

Parameters:

actuals (numpy.ndarray) – Actual observations
taqr_results (numpy.ndarray) – TAQR ensemble forecasts
quantiles (array-like) – Quantile levels to evaluate

Returns:

Quantile Skill Score

Return type:

float

nabqr.one_step_quantile_prediction(X_input, Y_input, n_init, n_full, quantile=0.5, already_correct_size=False, n_in_X=5000)[source]

Perform one-step quantile prediction using TAQR.

Takes the entire training set and, based on the last n_init observations, calculates residuals and coefficients for the quantile regression.

Parameters:

X_input (numpy.ndarray) – Input features matrix
Y_input (numpy.ndarray) – Target values array
n_init (int) – Number of initial observations for training
n_full (int) – Total number of observations to use
quantile (float, optional) – Quantile level to predict, by default 0.5
already_correct_size (bool, optional) – Whether inputs are already correctly sized, by default False
n_in_X (int, optional) – Number of observations to use in X, by default 5000

Returns:

(y_pred, y_actual, BETA) Predictions, actual values, and coefficients

Return type:

tuple

nabqr.pipeline(X, y, name='TEST', training_size=0.8, epochs=100, timesteps_for_lstm=[0, 1, 2, 6, 12, 24, 48], **kwargs)[source]

Main pipeline for NABQR model training and evaluation.

The pipeline: 1. Trains an LSTM network to correct the provided ensembles 2. Runs TAQR algorithm on corrected ensembles to predict observations 3. Saves results and model artifacts

Parameters:

X (pd.DataFrame or numpy.ndarray) – Shape (n_samples, n_features) - Ensemble data
y (pd.Series or numpy.ndarray) – Shape (n_samples,) - Observations
name (str, optional) – Dataset identifier, by default “TEST”
training_size (float, optional) – Fraction of data to use for training, by default 0.8
epochs (int, optional) – Number of training epochs, by default 100
timesteps_for_lstm (list, optional) – Time steps to use for LSTM input, by default [0, 1, 2, 6, 12, 24, 48]
**kwargs (dict) – Additional keyword arguments

Returns:

A tuple containing: - corrected_ensembles: pd.DataFrame

The corrected ensemble predictions.

taqr_results: list of numpy.ndarray
The TAQR results.
actuals_output: list of numpy.ndarray
The actual output values.
BETA_output: list of numpy.ndarray
The BETA parameters.

Return type:

tuple

nabqr.quantile_score(p, z, q)[source]

Calculate the Quantile Score (QS) for a given probability and set of observations and quantiles.

Implementation based on Fauer et al. (2021): “Flexible and consistent quantile estimation for intensity–duration–frequency curves”

Parameters:

p (float) – The probability level (between 0 and 1)
z (numpy.ndarray) – The observed values
q (numpy.ndarray) – The predicted quantiles

Returns:

The Quantile Score (QS)

Return type:

float

nabqr.rq_simplex_final(X, IX, Iy, Iex, r, beta, n, tau, bins, n_in_bin)[source]

Calculate solution to an adaptive simplex algorithm for quantile regression.

The function uses knowledge of the solution at time t to calculate the solution at time t+1. The basic idea is that the solution to the quantile regression problem can be written as: y(t) = X(t)’*beta + r(t)

where beta = X(h)^(-1)*y(h) for some index set h. Simplex algorithm is used to calculate the optimal h at time t+1 based on the solution at time t.

Parameters:

X (numpy.ndarray) – Design matrix for the linear quantile regression problem
IX (numpy.ndarray) – Index set referring to columns of X which is the design matrix
Iy (int) – Index referring to response column in X
Iex (int) – Index referring to grouping variable column in X
r (numpy.ndarray) – Residuals from initial solution
beta (numpy.ndarray) – Initial solution coefficients
n (int) – Number of elements in r
tau (float) – Required probability
bins (numpy.ndarray) – Vector defining partition intervals
n_in_bin (int) – Number of elements per bin

Returns:

(N, BETA, GAIN, Ld, Rny, Mx, Re, CON1, T) - N: Number of simplex steps - BETA: Solution matrix - GAIN: Loss function gain - Ld: Number of descent directions - Rny: One-step-ahead prediction residuals - Mx: Minimum constraint solution - Re: Training set reliability - CON1: Condition numbers - T: Computation times

Return type:

tuple

References

nabqr.run_nabqr_pipeline(n_samples=2000, phi=0.995, sigma=8, offset_start=10, offset_end=500, offset_step=15, correlation=0.8, data_source='NABQR-TEST', training_size=0.7, epochs=20, timesteps=[0, 1, 2, 6, 12, 24], quantiles=[0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99], X=None, actuals=None, simulation_type='sde', visualize=True, taqr_limit=5000, save_files=True)[source]

Run the complete NABQR pipeline, which may include data simulation, model training, and visualization. The user can either provide pre-computed inputs (X, actuals) or opt to simulate data if both are not provided.

Parameters:

n_samples (int, optional) – Number of time steps to simulate if no data provided, by default 5000.
phi (float, optional) – AR(1) coefficient for simulation, by default 0.995.
sigma (float, optional) – Standard deviation of noise for simulation, by default 8.
offset_start (int, optional) – Start value for offset range, by default 10.
offset_end (int, optional) – End value for offset range, by default 500.
offset_step (int, optional) – Step size for offset range, by default 15.
correlation (float, optional) – Base correlation between dimensions, by default 0.8.
data_source (str, optional) – Identifier for the data source, by default “NABQR-TEST”.
training_size (float, optional) – Proportion of data to use for training, by default 0.7.
epochs (int, optional) – Number of epochs for model training, by default 100.
timesteps (list, optional) – List of timesteps to use for LSTM, by default [0, 1, 2, 6, 12, 24].
quantiles (list, optional) – List of quantiles to predict, by default [0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99].
X (array-like, optional) – Pre-computed input features. If not provided along with actuals, the function will prompt to simulate data.
actuals (array-like, optional) – Pre-computed actual target values. If not provided along with X, the function will prompt to simulate data.
simulation_type (str, optional) – Type of simulation to use, by default “ar1”. “sde” is more advanced and uses a SDE model and realistic.
visualize (bool, optional) – Determines if any visual elements will be plotted to the screen or saved as figures.
taqr_limit (int, optional) – The lookback limit for the TAQR model, by default 5000.
save_files (bool, optional) – Determines if any files will be saved, by default True. Note: the R-file needs to save some .csv files to run properly.

Returns:

A tuple containing:

corrected_ensembles: pd.DataFrame
The corrected ensemble predictions.
taqr_results: list of numpy.ndarray
The TAQR results.
actuals_output: list of numpy.ndarray
The actual output values.
BETA_output: list of numpy.ndarray
The BETA parameters.
scores: pd.DataFrame
The scores for the predictions and original/corrected ensembles.

Return type:

tuple

Raises:

ValueError – If user opts not to simulate data when both X and actuals are missing.

nabqr.set_n_closest_to_zero(arr, n)[source]

Set the n elements closest to zero in an array to zero.

Parameters:

arr (array-like) – Input array of numbers
n (int) – Number of elements closest to zero to set to zero

Returns:

Modified array with n elements closest to zero set to zero

Return type:

numpy.ndarray

nabqr.set_n_smallest_to_zero(arr, n)[source]

Set the n smallest elements in an array to zero.

Parameters:

arr (array-like) – Input array of numbers
n (int) – Number of smallest elements to set to zero

Returns:

Modified array with n smallest elements set to zero

Return type:

numpy.ndarray

nabqr.simulate_correlated_ar1_process(n, phi, sigma, m, corr_matrix=None, offset=None, smooth='no')[source]

Simulate a correlated AR(1) process with multiple dimensions.

Parameters:

n (int) – Number of time steps to simulate
phi (float) – AR(1) coefficient (persistence parameter, often denoted rho)
sigma (float) – Standard deviation of the noise
m (int) – Number of dimensions/variables
corr_matrix (numpy.ndarray, optional) – Correlation (or covariance) matrix between dimensions. If None, an AR(1) covariance structure will be generated.
offset (numpy.ndarray, optional) – Offset vector for each dimension. Defaults to zero vector
smooth (int or str, optional) – Number of initial time steps to discard for smoothing. Defaults to “no”

Returns:

(simulated_ensembles, actuals) where simulated_ensembles is the AR(1) process and actuals is the median of ensembles with added noise

Return type:

tuple

nabqr.variogram_score_R_multivariate(x, y, p=0.5, t1=12, t2=36)[source]

Calculate the Variogram score for all observations for the time horizon t1 to t2. Modified from the R code in Energy and AI paper: “An introduction to multivariate probabilistic forecast evaluation” by Mathias B.B. et al. Here we use t1 -> t2 as our forecast horizon.

Parameters:

x (numpy.ndarray) – Ensemble forecast (m x k)
y (numpy.ndarray) – Actual observations (k,)
p (float, optional) – Power parameter, by default 0.5
t1 (int, optional) – Start hour (inclusive), by default 12
t2 (int, optional) – End hour (exclusive), by default 36

Returns:

(score, score_list) Overall score and list of individual scores

Return type:

tuple

nabqr.variogram_score_single_observation(x, y, p=0.5)[source]

Calculate the Variogram score for a given observation.

Translated from the R code in Energy and AI paper: “An introduction to multivariate probabilistic forecast evaluation” by Mathias B.B. et al.

Parameters:

x (numpy.ndarray) – Ensemble forecast (m x k), where m is ensemble size, k is forecast horizon
y (numpy.ndarray) – Actual observations (k,)
p (float, optional) – Power parameter for the variogram score, by default 0.5

Returns:

Variogram score for the observation

Return type:

float

nabqr.visualize_results(y_hat, q_hat, ylabel)[source]

Create a visualization of prediction intervals with actual values.

Parameters:

y_hat (numpy.ndarray) – Actual observed values
q_hat (numpy.ndarray) – Predicted quantiles for different probability levels
ylabel (str) – Label for the y-axis

Returns:

Saves the plot as ‘TEST_NABQR_taqr_pi_plot.pdf’ and displays it

Return type:

None

Notes

Creates a filled plot showing prediction intervals using a blue gradient
Overlays actual values as a black line
Automatically adjusts x-axis date formatting