API Documentation

Main Pipeline

nabqr.nabqr.run_nabqr_pipeline(n_samples=2000, phi=0.995, sigma=8, offset_start=10, offset_end=500, offset_step=15, correlation=0.8, data_source='NABQR-TEST', training_size=0.7, epochs=20, timesteps=[0, 1, 2, 6, 12, 24], quantiles=[0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99], X=None, actuals=None, simulation_type='sde', visualize=True, taqr_limit=5000, save_files=True)[source]

Run the complete NABQR pipeline, which may include data simulation, model training, and visualization. The user can either provide pre-computed inputs (X, actuals) or opt to simulate data if both are not provided.

Parameters:
  • n_samples (int, optional) – Number of time steps to simulate if no data provided, by default 5000.

  • phi (float, optional) – AR(1) coefficient for simulation, by default 0.995.

  • sigma (float, optional) – Standard deviation of noise for simulation, by default 8.

  • offset_start (int, optional) – Start value for offset range, by default 10.

  • offset_end (int, optional) – End value for offset range, by default 500.

  • offset_step (int, optional) – Step size for offset range, by default 15.

  • correlation (float, optional) – Base correlation between dimensions, by default 0.8.

  • data_source (str, optional) – Identifier for the data source, by default “NABQR-TEST”.

  • training_size (float, optional) – Proportion of data to use for training, by default 0.7.

  • epochs (int, optional) – Number of epochs for model training, by default 100.

  • timesteps (list, optional) – List of timesteps to use for LSTM, by default [0, 1, 2, 6, 12, 24].

  • quantiles (list, optional) – List of quantiles to predict, by default [0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99].

  • X (array-like, optional) – Pre-computed input features. If not provided along with actuals, the function will prompt to simulate data.

  • actuals (array-like, optional) – Pre-computed actual target values. If not provided along with X, the function will prompt to simulate data.

  • simulation_type (str, optional) – Type of simulation to use, by default “ar1”. “sde” is more advanced and uses a SDE model and realistic.

  • visualize (bool, optional) – Determines if any visual elements will be plotted to the screen or saved as figures.

  • taqr_limit (int, optional) – The lookback limit for the TAQR model, by default 5000.

  • save_files (bool, optional) – Determines if any files will be saved, by default True. Note: the R-file needs to save some .csv files to run properly.

Returns:

A tuple containing:

  • corrected_ensembles: pd.DataFrame

    The corrected ensemble predictions.

  • taqr_results: list of numpy.ndarray

    The TAQR results.

  • actuals_output: list of numpy.ndarray

    The actual output values.

  • BETA_output: list of numpy.ndarray

    The BETA parameters.

  • scores: pd.DataFrame

    The scores for the predictions and original/corrected ensembles.

Return type:

tuple

Raises:

ValueError – If user opts not to simulate data when both X and actuals are missing.

Core Functions

Neural Adaptive Basis Quantile Regression (NABQR) Core Functions

This module provides the core functionality for NABQR.

This module includes: - Scoring metrics (Variogram, CRPS, QSS) - Dataset creation and preprocessing - Model definitions and training - TAQR (Time-Adaptive Quantile Regression) implementation

class nabqr.functions.QuantileRegressionLSTM(*args, **kwargs)[source]

Bases: Model

LSTM-based model for quantile regression. Input: x -> LSTM -> Dense -> Dense -> output

Parameters:
  • n_quantiles (int) – Number of quantiles to predict

  • units (int) – Number of LSTM units

  • n_timesteps (int) – Number of time steps in input

call(inputs, training=None)[source]

Forward pass of the model.

Parameters:
  • inputs (tensorflow.Tensor) – Input tensor

  • training (bool, optional) – Whether in training mode, by default None

Returns:

Model output

Return type:

tensorflow.Tensor

classmethod from_config(config)[source]

Create model from configuration.

Parameters:

config (dict) – Model configuration

Returns:

Model instance

Return type:

QuantileRegressionLSTM

get_config()[source]

Get model configuration.

Returns:

Model configuration

Return type:

dict

nabqr.functions.calculate_crps(actuals, corrected_ensembles)[source]

Calculate the Continuous Ranked Probability Score (CRPS) using the properscoring package. If the ensembles do not have the correct dimensions, we transpose them.

Parameters:
  • actuals (numpy.ndarray) – Actual observations

  • corrected_ensembles (numpy.ndarray) – Ensemble forecasts

Returns:

Mean CRPS score

Return type:

float

nabqr.functions.calculate_qss(actuals, taqr_results, quantiles)[source]

Calculate the Quantile Skill Score (QSS).

Parameters:
  • actuals (numpy.ndarray) – Actual observations

  • taqr_results (numpy.ndarray) – TAQR ensemble forecasts

  • quantiles (array-like) – Quantile levels to evaluate

Returns:

Quantile Skill Score

Return type:

float

nabqr.functions.calculate_scores(actuals, taqr_results, raw_ensembles, corrected_ensembles, quantiles_taqr, data_source, plot_reliability=True, visualize=True)[source]

Calculate Variogram, CRPS, QSS and MAE for the predictions and corrected ensembles.

Parameters:
  • actuals (numpy.ndarray) – The actual values

  • predictions (numpy.ndarray) – The predicted values

  • raw_ensembles (numpy.ndarray) – The raw ensembles

  • corrected_ensembles (numpy.ndarray) – The corrected ensembles

  • quantiles (list) – The quantiles to calculate the scores for

  • data_source (str) – The data source

nabqr.functions.create_dataset_for_lstm(X, Y, time_steps)[source]

Create a dataset suitable for LSTM training with multiple time steps (i.e. lags).

Parameters:
  • X (numpy.ndarray) – Input features

  • Y (numpy.ndarray) – Target values

  • time_steps (list) – List of time steps to include

Returns:

(X_lstm, Y_lstm) LSTM-ready datasets

Return type:

tuple

nabqr.functions.legend_without_duplicate_labels(ax)[source]

Create a legend without duplicate labels. Primarily used for ensemble plots.

Parameters:

ax (matplotlib.axes.Axes) – Axes object to create legend for

nabqr.functions.map_range(values, input_start, input_end, output_start, output_end)[source]

Map values from one range to another.

Parameters:
  • values (list) – Values to map

  • input_start (float) – Start of input range

  • input_end (float) – End of input range

  • output_start (float) – Start of output range

  • output_end (float) – End of output range

Returns:

Mapped values

Return type:

numpy.ndarray

nabqr.functions.multi_quantile_skill_score(y_true, y_pred, quantiles)[source]

Calculate the Quantile Skill Score (QSS) for multiple quantile forecasts.

Parameters:
  • y_true (numpy.ndarray) – True observed values

  • y_pred (numpy.ndarray) – Predicted quantile values

  • quantiles (list) – Quantile levels between 0 and 1

Returns:

QSS for each quantile forecast

Return type:

numpy.ndarray

nabqr.functions.one_step_quantile_prediction(X_input, Y_input, n_init, n_full, quantile=0.5, already_correct_size=False, n_in_X=5000, print_output=True)[source]

Perform one-step quantile prediction using TAQR.

This function takes the entire training set and, based on the last n_init observations, calculates residuals and coefficients for the quantile regression.

An easy wrapper function to run TAQR.

Parameters:
  • X_input (numpy.ndarray or pd.DataFrame) – Input features

  • Y_input (numpy.ndarray or pd.Series) – Target values

  • n_init (int) – Number of initial observations for warm start

  • n_full (int) – Total number of observations to process

  • quantile (float, optional) – Quantile level for prediction, by default 0.5

  • already_correct_size (bool, optional) – Whether input data is already correctly sized, by default False

  • n_in_X (int, optional) – Number of observations to include in design matrix, by default 5000

Returns:

(predictions, actual values, coefficients)

Return type:

tuple

nabqr.functions.pipeline(X, y, name='TEST', training_size=0.8, epochs=100, timesteps_for_lstm=[0, 1, 2, 6, 12, 24, 48], **kwargs)[source]

Main pipeline for NABQR model training and evaluation.

The pipeline: 1. Trains an LSTM network to correct the provided ensembles 2. Runs TAQR algorithm on corrected ensembles to predict observations 3. Saves results and model artifacts

Parameters:
  • X (pd.DataFrame or numpy.ndarray) – Shape (n_samples, n_features) - Ensemble data

  • y (pd.Series or numpy.ndarray) – Shape (n_samples,) - Observations

  • name (str, optional) – Dataset identifier, by default “TEST”

  • training_size (float, optional) – Fraction of data to use for training, by default 0.8

  • epochs (int, optional) – Number of training epochs, by default 100

  • timesteps_for_lstm (list, optional) – Time steps to use for LSTM input, by default [0, 1, 2, 6, 12, 24, 48]

  • **kwargs (dict) – Additional keyword arguments

Returns:

A tuple containing: - corrected_ensembles: pd.DataFrame

The corrected ensemble predictions.

  • taqr_results: list of numpy.ndarray

    The TAQR results.

  • actuals_output: list of numpy.ndarray

    The actual output values.

  • BETA_output: list of numpy.ndarray

    The BETA parameters.

Return type:

tuple

nabqr.functions.quantile_loss_3(q, y_true, y_pred)[source]

Calculate quantile loss for a single quantile.

Parameters:
  • q (float) – Quantile level

  • y_true (tensorflow.Tensor) – True values

  • y_pred (tensorflow.Tensor) – Predicted values

Returns:

Quantile loss value

Return type:

tensorflow.Tensor

nabqr.functions.quantile_loss_func(quantiles)[source]

Create a loss function for multiple quantiles.

Parameters:

quantiles (list) – List of quantile levels

Returns:

Loss function for multiple quantiles

Return type:

function

nabqr.functions.reliability_func(quantile_forecasts, corrected_ensembles, ensembles, actuals, corrected_taqr_quantiles, data_source, plot_reliability=True)[source]
nabqr.functions.remove_straight_line_outliers(ensembles)[source]

Remove ensemble members that are perfectly straight lines (constant slope). Explanation: Sometimes the output from the LSTM is a straight line, which is not useful for the ensemble.

Parameters:

ensembles (numpy.ndarray) – 2D array where rows are time steps and columns are ensemble members

Returns:

Filtered ensemble data without straight-line outliers

Return type:

numpy.ndarray

nabqr.functions.remove_zero_columns(df)[source]

Wrapper function to remove columns that contain only zeros from a DataFrame.

Parameters:

df (pandas.DataFrame) – Input DataFrame

Returns:

DataFrame with zero columns removed

Return type:

pandas.DataFrame

nabqr.functions.remove_zero_columns_numpy(arr)[source]

Remove columns that contain only zeros or constant values from a numpy array.

Parameters:

arr (numpy.ndarray) – Input array

Returns:

Array with zero/constant columns removed

Return type:

numpy.ndarray

nabqr.functions.run_r_script(X_filename, Y_filename, tau)[source]

Run R script for quantile regression.

Parameters:
  • X_filename (str) – Path to X data CSV file

  • Y_filename (str) – Path to Y data CSV file

  • tau (float) – Quantile level

nabqr.functions.run_taqr(corrected_ensembles, actuals, quantiles, n_init, n_full, n_in_X)[source]

Wrapper function to run TAQR on corrected ensembles.

Parameters:
  • corrected_ensembles (numpy.ndarray) – Shape (n_timesteps, n_ensembles)

  • actuals (numpy.ndarray) – Shape (n_timesteps,)

  • quantiles (list) – Quantiles to predict

  • n_init (int) – Number of initial timesteps for warm start

  • n_full (int) – Total number of timesteps

  • n_in_X (int) – Number of timesteps in design matrix

Returns:

TAQR results for each quantile

Return type:

list

nabqr.functions.train_model_lstm(quantiles, epochs: int, lr: float, batch_size: int, x, y, x_val, y_val, n_timesteps, data_name)[source]

Train LSTM model for quantile regression. The @tf.function decorator is used to speed up the training process.

Parameters:
  • quantiles (list) – List of quantile levels to predict

  • epochs (int) – Number of training epochs

  • lr (float) – Learning rate for optimizer

  • batch_size (int) – Batch size for training

  • x (tensor) – Training input data

  • y (tensor) – Training target data

  • x_val (tensor) – Validation input data

  • y_val (tensor) – Validation target data

  • n_timesteps (int) – Number of time steps in input sequence

  • data_name (str) – Name identifier for saving model artifacts

Returns:

Trained LSTM model

Return type:

tf.keras.Model

nabqr.functions.variogram_score_R_multivariate(x, y, p=0.5, t1=12, t2=36)[source]

Calculate the Variogram score for all observations for the time horizon t1 to t2. Modified from the R code in Energy and AI paper: “An introduction to multivariate probabilistic forecast evaluation” by Mathias B.B. et al. Here we use t1 -> t2 as our forecast horizon.

Parameters:
  • x (numpy.ndarray) – Ensemble forecast (m x k)

  • y (numpy.ndarray) – Actual observations (k,)

  • p (float, optional) – Power parameter, by default 0.5

  • t1 (int, optional) – Start hour (inclusive), by default 12

  • t2 (int, optional) – End hour (exclusive), by default 36

Returns:

(score, score_list) Overall score and list of individual scores

Return type:

tuple

nabqr.functions.variogram_score_R_v2(x, y, p=0.5, t1=12, t2=36)[source]

Calculate the Variogram score for all observations for the time horizon t1 to t2. Modified from the paper in Energy and AI, >> An introduction to multivariate probabilistic forecast evaluation <<. Assumes that x and y starts from day 0, 00:00.

Parameters: x : array

Ensemble forecast (m x k), where m is the size of the ensemble, and k is the maximal forecast horizon.

yarray

Actual observations (k,)

pfloat

Power parameter for the variogram score.

t1int

Start of the hour range for comparison (inclusive).

t2int

End of the hour range for comparison (exclusive).

Returns:

tuple

(score, score_list) Overall score/100_000 and list of individual VarS contributions

nabqr.functions.variogram_score_single_observation(x, y, p=0.5)[source]

Calculate the Variogram score for a given observation.

Translated from the R code in Energy and AI paper: “An introduction to multivariate probabilistic forecast evaluation” by Mathias B.B. et al.

Parameters:
  • x (numpy.ndarray) – Ensemble forecast (m x k), where m is ensemble size, k is forecast horizon

  • y (numpy.ndarray) – Actual observations (k,)

  • p (float, optional) – Power parameter for the variogram score, by default 0.5

Returns:

Variogram score for the observation

Return type:

float

TAQR Implementation

Helper Functions

nabqr.helper_functions.build_ar1_covariance(n, rho, sigma=1.0)[source]

Build the AR(1) covariance matrix for an n-dimensional process.

Parameters:
  • n (int) – Dimension of the covariance matrix.

  • rho (float) – AR(1) correlation parameter (the AR coefficient).

  • sigma (float, optional) – Standard deviation of the noise (innovation), defaults to 1.0.

Returns:

The AR(1) covariance matrix of shape (n, n), with elements sigma^2 * rho^(|i-j|).

Return type:

numpy.ndarray

nabqr.helper_functions.generate_ou_ensembles(X: ndarray, kappa: float, sigma: float, chunk_size: int = 24, n_ensembles: int = 50) ndarray[source]

Generate continuous Ornstein-Uhlenbeck (OU) ensemble paths that revert to the given reference series X[t], in chunk_size increments, but also simulate ‘extra’ future steps to account for OU lag and shift them back so the paths better align with X in real-time.

The ensemble is clipped to remain within [0,1].

Parameters:
  • X (np.ndarray) – Reference series of length T that serves as the time-varying mean for each OU path.

  • kappa (float) – Mean-reversion speed for the OU process. The characteristic lag ~ 1/kappa.

  • sigma (float) – Diffusion (volatility) parameter.

  • chunk_size (int, optional) – Size of each chunk in timesteps. Defaults to 24.

  • n_ensembles (int, optional) – Number of ensemble paths to generate. Defaults to 50.

Returns:

Y_corrected – The lag-corrected OU ensemble paths, each of length T.

Return type:

np.ndarray, shape (T, n_ensembles)

Notes

  • We break the timeline [0..T-1] into blocks of chunk_size steps. At chunk boundaries, each ensemble path is continuous (meaning; the new chunk starts where the old chunk ended).

  • We simulate extra steps (about 1/kappa) at the end, then shift the entire simulation backward by ~1/kappa to reduce the effective lag in real time.

  • For fractional lag, a simple linear interpolation is applied.

  • This “lag correction” is heuristic but often aligns the OU paths with X(t) more tightly when the reversion is slow.

nabqr.helper_functions.get_parameter_bounds() Dict[str, Tuple[float, float]][source]

Define bounds for all parameters for SDE simulation. Used to ensure that the parameters are within a reasonable range.

nabqr.helper_functions.quantile_score(p, z, q)[source]

Calculate the Quantile Score (QS) for a given probability and set of observations and quantiles.

Implementation based on Fauer et al. (2021): “Flexible and consistent quantile estimation for intensity–duration–frequency curves”

Parameters:
  • p (float) – The probability level (between 0 and 1)

  • z (numpy.ndarray) – The observed values

  • q (numpy.ndarray) – The predicted quantiles

Returns:

The Quantile Score (QS)

Return type:

float

nabqr.helper_functions.set_n_closest_to_zero(arr, n)[source]

Set the n elements closest to zero in an array to zero.

Parameters:
  • arr (array-like) – Input array of numbers

  • n (int) – Number of elements closest to zero to set to zero

Returns:

Modified array with n elements closest to zero set to zero

Return type:

numpy.ndarray

nabqr.helper_functions.set_n_smallest_to_zero(arr, n)[source]

Set the n smallest elements in an array to zero.

Parameters:
  • arr (array-like) – Input array of numbers

  • n (int) – Number of smallest elements to set to zero

Returns:

Modified array with n smallest elements set to zero

Return type:

numpy.ndarray

nabqr.helper_functions.simulate_correlated_ar1_process(n, phi, sigma, m, corr_matrix=None, offset=None, smooth='no')[source]

Simulate a correlated AR(1) process with multiple dimensions.

Parameters:
  • n (int) – Number of time steps to simulate

  • phi (float) – AR(1) coefficient (persistence parameter, often denoted rho)

  • sigma (float) – Standard deviation of the noise

  • m (int) – Number of dimensions/variables

  • corr_matrix (numpy.ndarray, optional) – Correlation (or covariance) matrix between dimensions. If None, an AR(1) covariance structure will be generated.

  • offset (numpy.ndarray, optional) – Offset vector for each dimension. Defaults to zero vector

  • smooth (int or str, optional) – Number of initial time steps to discard for smoothing. Defaults to “no”

Returns:

(simulated_ensembles, actuals) where simulated_ensembles is the AR(1) process and actuals is the median of ensembles with added noise

Return type:

tuple

nabqr.helper_functions.simulate_wind_power_sde(params: Dict[str, float], T: float = 500, dt: float = 1.0) Tuple[ndarray, ndarray][source]

Simulate wind power production using an Ornstein-Uhlenbeck process with GARCH volatility and jumps of normally distributed sizes driven by a Poisson process. The mean reversion is state-dependent with a repelling mechanism near 1.0 (upper boundary), and the diffusion term vanishes at the boundaries to avoid unphysical values outside [0, 1].

A few additional tweaks include: - GARCH volatility that captures ‘vol_shock’ from recent values. - Repellent forces that strengthen near 1.0, reducing both the drift and diffusion. - Jumps that can persist over multiple steps, and become more negative if values are near 1.0.

Parameters:
  • params (Dict[str, float]) –

    A dictionary containing all model parameters:

    • X0float

      Initial wind power production level in [0, 1].

    • thetafloat

      Long-term mean level; typically in [0, 1].

    • kappafloat

      Mean reversion speed (absolute value is used).

    • sigma_basefloat

      Base volatility level (absolute value is used).

    • alphafloat

      ARCH parameter (absolute value is used).

    • betafloat

      GARCH parameter; must be in [0, 1].

    • lambda_jumpfloat

      Intensity of jump arrivals in the Poisson process (absolute value is used).

    • jump_mufloat

      Mean jump size (can be positive or negative).

    • jump_sigmafloat

      Standard deviation of jump sizes (absolute value is used).

  • T (float, optional) – The end time of the simulation (total number of steps is T/dt). Default is 500.

  • dt (float, optional) – The size of each time step. Default is 1.0.

Returns:

  • t (np.ndarray) – Array of time points of length N = int(T/dt).

  • X (np.ndarray) – Simulated wind power production values of length N, clipped to the interval [0, 1].

Notes

  • The drift term implements a state-dependent mean reversion that weakens near 1.0 and introduces a strong downward force very close to 1.0.

  • The diffusion term is modified as (X_t * (1 - X_t)) * (X_t / (X_t + 0.5)) dB_t, ensuring it decreases to zero when X_t is near 0 or 1.

  • GARCH effects are included to model changing volatility based on recent shocks in the process.

  • Jumps arrive according to a Poisson process with random normal magnitudes, and can persist over multiple time steps with some decay.

Examples

>>> params = {
...     'X0': 0.5, 'theta': 0.7, 'kappa': 1.0, 'sigma_base': 0.1,
...     'alpha': 0.2, 'beta': 0.5, 'lambda_jump': 0.05,
...     'jump_mu': 0.0, 'jump_sigma': 0.02
... }
>>> t, X = simulate_wind_power_sde(params, T=100, dt=1.0)
>>> import matplotlib.pyplot as plt
>>> plt.plot(t, X)
>>> plt.show()

Visualization

nabqr.visualization.visualize_results(y_hat, q_hat, ylabel)[source]

Create a visualization of prediction intervals with actual values.

Parameters:
  • y_hat (numpy.ndarray) – Actual observed values

  • q_hat (numpy.ndarray) – Predicted quantiles for different probability levels

  • ylabel (str) – Label for the y-axis

Returns:

Saves the plot as ‘TEST_NABQR_taqr_pi_plot.pdf’ and displays it

Return type:

None

Notes

  • Creates a filled plot showing prediction intervals using a blue gradient

  • Overlays actual values as a black line

  • Automatically adjusts x-axis date formatting

Package Contents

NABQR: Neural Adaptive Basis Quantile Regression

A method for sequential error-corrections tailored for wind power forecast in Denmark.

nabqr.calculate_crps(actuals, corrected_ensembles)[source]

Calculate the Continuous Ranked Probability Score (CRPS) using the properscoring package. If the ensembles do not have the correct dimensions, we transpose them.

Parameters:
  • actuals (numpy.ndarray) – Actual observations

  • corrected_ensembles (numpy.ndarray) – Ensemble forecasts

Returns:

Mean CRPS score

Return type:

float

nabqr.calculate_qss(actuals, taqr_results, quantiles)[source]

Calculate the Quantile Skill Score (QSS).

Parameters:
  • actuals (numpy.ndarray) – Actual observations

  • taqr_results (numpy.ndarray) – TAQR ensemble forecasts

  • quantiles (array-like) – Quantile levels to evaluate

Returns:

Quantile Skill Score

Return type:

float

nabqr.one_step_quantile_prediction(X_input, Y_input, n_init, n_full, quantile=0.5, already_correct_size=False, n_in_X=5000)[source]

Perform one-step quantile prediction using TAQR.

Takes the entire training set and, based on the last n_init observations, calculates residuals and coefficients for the quantile regression.

Parameters:
  • X_input (numpy.ndarray) – Input features matrix

  • Y_input (numpy.ndarray) – Target values array

  • n_init (int) – Number of initial observations for training

  • n_full (int) – Total number of observations to use

  • quantile (float, optional) – Quantile level to predict, by default 0.5

  • already_correct_size (bool, optional) – Whether inputs are already correctly sized, by default False

  • n_in_X (int, optional) – Number of observations to use in X, by default 5000

Returns:

(y_pred, y_actual, BETA) Predictions, actual values, and coefficients

Return type:

tuple

nabqr.pipeline(X, y, name='TEST', training_size=0.8, epochs=100, timesteps_for_lstm=[0, 1, 2, 6, 12, 24, 48], **kwargs)[source]

Main pipeline for NABQR model training and evaluation.

The pipeline: 1. Trains an LSTM network to correct the provided ensembles 2. Runs TAQR algorithm on corrected ensembles to predict observations 3. Saves results and model artifacts

Parameters:
  • X (pd.DataFrame or numpy.ndarray) – Shape (n_samples, n_features) - Ensemble data

  • y (pd.Series or numpy.ndarray) – Shape (n_samples,) - Observations

  • name (str, optional) – Dataset identifier, by default “TEST”

  • training_size (float, optional) – Fraction of data to use for training, by default 0.8

  • epochs (int, optional) – Number of training epochs, by default 100

  • timesteps_for_lstm (list, optional) – Time steps to use for LSTM input, by default [0, 1, 2, 6, 12, 24, 48]

  • **kwargs (dict) – Additional keyword arguments

Returns:

A tuple containing: - corrected_ensembles: pd.DataFrame

The corrected ensemble predictions.

  • taqr_results: list of numpy.ndarray

    The TAQR results.

  • actuals_output: list of numpy.ndarray

    The actual output values.

  • BETA_output: list of numpy.ndarray

    The BETA parameters.

Return type:

tuple

nabqr.quantile_score(p, z, q)[source]

Calculate the Quantile Score (QS) for a given probability and set of observations and quantiles.

Implementation based on Fauer et al. (2021): “Flexible and consistent quantile estimation for intensity–duration–frequency curves”

Parameters:
  • p (float) – The probability level (between 0 and 1)

  • z (numpy.ndarray) – The observed values

  • q (numpy.ndarray) – The predicted quantiles

Returns:

The Quantile Score (QS)

Return type:

float

nabqr.rq_simplex_final(X, IX, Iy, Iex, r, beta, n, tau, bins, n_in_bin)[source]

Calculate solution to an adaptive simplex algorithm for quantile regression.

The function uses knowledge of the solution at time t to calculate the solution at time t+1. The basic idea is that the solution to the quantile regression problem can be written as: y(t) = X(t)’*beta + r(t)

where beta = X(h)^(-1)*y(h) for some index set h. Simplex algorithm is used to calculate the optimal h at time t+1 based on the solution at time t.

Parameters:
  • X (numpy.ndarray) – Design matrix for the linear quantile regression problem

  • IX (numpy.ndarray) – Index set referring to columns of X which is the design matrix

  • Iy (int) – Index referring to response column in X

  • Iex (int) – Index referring to grouping variable column in X

  • r (numpy.ndarray) – Residuals from initial solution

  • beta (numpy.ndarray) – Initial solution coefficients

  • n (int) – Number of elements in r

  • tau (float) – Required probability

  • bins (numpy.ndarray) – Vector defining partition intervals

  • n_in_bin (int) – Number of elements per bin

Returns:

(N, BETA, GAIN, Ld, Rny, Mx, Re, CON1, T) - N: Number of simplex steps - BETA: Solution matrix - GAIN: Loss function gain - Ld: Number of descent directions - Rny: One-step-ahead prediction residuals - Mx: Minimum constraint solution - Re: Training set reliability - CON1: Condition numbers - T: Computation times

Return type:

tuple

References

nabqr.run_nabqr_pipeline(n_samples=2000, phi=0.995, sigma=8, offset_start=10, offset_end=500, offset_step=15, correlation=0.8, data_source='NABQR-TEST', training_size=0.7, epochs=20, timesteps=[0, 1, 2, 6, 12, 24], quantiles=[0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99], X=None, actuals=None, simulation_type='sde', visualize=True, taqr_limit=5000, save_files=True)[source]

Run the complete NABQR pipeline, which may include data simulation, model training, and visualization. The user can either provide pre-computed inputs (X, actuals) or opt to simulate data if both are not provided.

Parameters:
  • n_samples (int, optional) – Number of time steps to simulate if no data provided, by default 5000.

  • phi (float, optional) – AR(1) coefficient for simulation, by default 0.995.

  • sigma (float, optional) – Standard deviation of noise for simulation, by default 8.

  • offset_start (int, optional) – Start value for offset range, by default 10.

  • offset_end (int, optional) – End value for offset range, by default 500.

  • offset_step (int, optional) – Step size for offset range, by default 15.

  • correlation (float, optional) – Base correlation between dimensions, by default 0.8.

  • data_source (str, optional) – Identifier for the data source, by default “NABQR-TEST”.

  • training_size (float, optional) – Proportion of data to use for training, by default 0.7.

  • epochs (int, optional) – Number of epochs for model training, by default 100.

  • timesteps (list, optional) – List of timesteps to use for LSTM, by default [0, 1, 2, 6, 12, 24].

  • quantiles (list, optional) – List of quantiles to predict, by default [0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99].

  • X (array-like, optional) – Pre-computed input features. If not provided along with actuals, the function will prompt to simulate data.

  • actuals (array-like, optional) – Pre-computed actual target values. If not provided along with X, the function will prompt to simulate data.

  • simulation_type (str, optional) – Type of simulation to use, by default “ar1”. “sde” is more advanced and uses a SDE model and realistic.

  • visualize (bool, optional) – Determines if any visual elements will be plotted to the screen or saved as figures.

  • taqr_limit (int, optional) – The lookback limit for the TAQR model, by default 5000.

  • save_files (bool, optional) – Determines if any files will be saved, by default True. Note: the R-file needs to save some .csv files to run properly.

Returns:

A tuple containing:

  • corrected_ensembles: pd.DataFrame

    The corrected ensemble predictions.

  • taqr_results: list of numpy.ndarray

    The TAQR results.

  • actuals_output: list of numpy.ndarray

    The actual output values.

  • BETA_output: list of numpy.ndarray

    The BETA parameters.

  • scores: pd.DataFrame

    The scores for the predictions and original/corrected ensembles.

Return type:

tuple

Raises:

ValueError – If user opts not to simulate data when both X and actuals are missing.

nabqr.set_n_closest_to_zero(arr, n)[source]

Set the n elements closest to zero in an array to zero.

Parameters:
  • arr (array-like) – Input array of numbers

  • n (int) – Number of elements closest to zero to set to zero

Returns:

Modified array with n elements closest to zero set to zero

Return type:

numpy.ndarray

nabqr.set_n_smallest_to_zero(arr, n)[source]

Set the n smallest elements in an array to zero.

Parameters:
  • arr (array-like) – Input array of numbers

  • n (int) – Number of smallest elements to set to zero

Returns:

Modified array with n smallest elements set to zero

Return type:

numpy.ndarray

nabqr.simulate_correlated_ar1_process(n, phi, sigma, m, corr_matrix=None, offset=None, smooth='no')[source]

Simulate a correlated AR(1) process with multiple dimensions.

Parameters:
  • n (int) – Number of time steps to simulate

  • phi (float) – AR(1) coefficient (persistence parameter, often denoted rho)

  • sigma (float) – Standard deviation of the noise

  • m (int) – Number of dimensions/variables

  • corr_matrix (numpy.ndarray, optional) – Correlation (or covariance) matrix between dimensions. If None, an AR(1) covariance structure will be generated.

  • offset (numpy.ndarray, optional) – Offset vector for each dimension. Defaults to zero vector

  • smooth (int or str, optional) – Number of initial time steps to discard for smoothing. Defaults to “no”

Returns:

(simulated_ensembles, actuals) where simulated_ensembles is the AR(1) process and actuals is the median of ensembles with added noise

Return type:

tuple

nabqr.variogram_score_R_multivariate(x, y, p=0.5, t1=12, t2=36)[source]

Calculate the Variogram score for all observations for the time horizon t1 to t2. Modified from the R code in Energy and AI paper: “An introduction to multivariate probabilistic forecast evaluation” by Mathias B.B. et al. Here we use t1 -> t2 as our forecast horizon.

Parameters:
  • x (numpy.ndarray) – Ensemble forecast (m x k)

  • y (numpy.ndarray) – Actual observations (k,)

  • p (float, optional) – Power parameter, by default 0.5

  • t1 (int, optional) – Start hour (inclusive), by default 12

  • t2 (int, optional) – End hour (exclusive), by default 36

Returns:

(score, score_list) Overall score and list of individual scores

Return type:

tuple

nabqr.variogram_score_single_observation(x, y, p=0.5)[source]

Calculate the Variogram score for a given observation.

Translated from the R code in Energy and AI paper: “An introduction to multivariate probabilistic forecast evaluation” by Mathias B.B. et al.

Parameters:
  • x (numpy.ndarray) – Ensemble forecast (m x k), where m is ensemble size, k is forecast horizon

  • y (numpy.ndarray) – Actual observations (k,)

  • p (float, optional) – Power parameter for the variogram score, by default 0.5

Returns:

Variogram score for the observation

Return type:

float

nabqr.visualize_results(y_hat, q_hat, ylabel)[source]

Create a visualization of prediction intervals with actual values.

Parameters:
  • y_hat (numpy.ndarray) – Actual observed values

  • q_hat (numpy.ndarray) – Predicted quantiles for different probability levels

  • ylabel (str) – Label for the y-axis

Returns:

Saves the plot as ‘TEST_NABQR_taqr_pi_plot.pdf’ and displays it

Return type:

None

Notes

  • Creates a filled plot showing prediction intervals using a blue gradient

  • Overlays actual values as a black line

  • Automatically adjusts x-axis date formatting