bayesflow.diagnostics module#

bayesflow.diagnostics.plot_recovery(post_samples, prior_samples, point_agg=<function median>, uncertainty_agg=<function median_abs_deviation>, param_names=None, fig_size=None, label_fontsize=16, title_fontsize=18, metric_fontsize=16, tick_fontsize=12, add_corr=True, add_r2=True, color='#8f2727', n_col=None, n_row=None, xlabel='Ground truth', ylabel='Estimated', **kwargs)[source]#

Creates and plots publication-ready recovery plot with true vs. point estimate + uncertainty. The point estimate can be controlled with the point_agg argument, and the uncertainty estimate can be controlled with the uncertainty_agg argument.

This plot yields similar information as the “posterior z-score”, but allows for generic point and uncertainty estimates:

https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html

Important: Posterior aggregates play no special role in Bayesian inference and should only be used heuristically. For instance, in the case of multi-modal posteriors, common point estimates, such as mean, (geometric) median, or maximum a posteriori (MAP) mean nothing.

Parameters:
post_samplesnp.ndarray of shape (n_data_sets, n_post_draws, n_params)

The posterior draws obtained from n_data_sets

prior_samplesnp.ndarray of shape (n_data_sets, n_params)

The prior draws (true parameters) obtained for generating the n_data_sets

point_aggcallable, optional, default: np.median

The function to apply to the posterior draws to get a point estimate for each marginal. The default computes the marginal median for each marginal posterior as a robust point estimate.

uncertainty_aggcallable or None, optional, default: scipy.stats.median_abs_deviation

The function to apply to the posterior draws to get an uncertainty estimate. If None provided, a simple scatter using only point_agg will be plotted.

param_nameslist or None, optional, default: None

The parameter names for nice plot titles. Inferred if None

fig_sizetuple or None, optional, defaultNone

The figure size passed to the matplotlib constructor. Inferred if None.

label_fontsizeint, optional, default: 16

The font size of the y-label text

title_fontsizeint, optional, default: 18

The font size of the title text

metric_fontsizeint, optional, default: 16

The font size of the goodness-of-fit metric (if provided)

tick_fontsizeint, optional, default: 12

The font size of the axis tick labels

add_corrbool, optional, default: True

A flag for adding correlation between true and estimates to the plot

add_r2bool, optional, default: True

A flag for adding R^2 between true and estimates to the plot

colorstr, optional, default: ‘#8f2727’

The color for the true vs. estimated scatter points and error bars

n_rowint, optional, default: None

The number of rows for the subplots. Dynamically determined if None.

n_colint, optional, default: None

The number of columns for the subplots. Dynamically determined if None.

xlabelstr, optional, default: ‘Ground truth’

The label on the x-axis of the plot

ylabelstr, optional, default: ‘Estimated’

The label on the y-axis of the plot

**kwargsoptional

Additional keyword arguments passed to ax.errorbar or ax.scatter. Example: rasterized=True to reduce PDF file size with many dots

Returns:
fplt.Figure - the figure instance for optional saving
Raises:
ShapeError

If there is a deviation from the expected shapes of post_samples and prior_samples.

bayesflow.diagnostics.plot_z_score_contraction(post_samples, prior_samples, param_names=None, fig_size=None, label_fontsize=16, title_fontsize=18, tick_fontsize=12, color='#8f2727', n_col=None, n_row=None)[source]#

Implements a graphical check for global model sensitivity by plotting the posterior z-score over the posterior contraction for each set of posterior samples in post_samples according to [1].

  • The definition of the posterior z-score is:

post_z_score = (posterior_mean - true_parameters) / posterior_std

And the score is adequate if it centers around zero and spreads roughly in the interval [-3, 3]

  • The definition of posterior contraction is:

post_contraction = 1 - (posterior_variance / prior_variance)

In other words, the posterior contraction is a proxy for the reduction in uncertainty gained by replacing the prior with the posterior. The ideal posterior contraction tends to 1. Contraction near zero indicates that the posterior variance is almost identical to the prior variance for the particular marginal parameter distribution.

Note: Means and variances will be estimated via their sample-based estimators.

[1] Schad, D. J., Betancourt, M., & Vasishth, S. (2021). Toward a principled Bayesian workflow in cognitive science. Psychological methods, 26(1), 103.

Paper also available at https://arxiv.org/abs/1904.12765

Parameters:
post_samplesnp.ndarray of shape (n_data_sets, n_post_draws, n_params)

The posterior draws obtained from n_data_sets

prior_samplesnp.ndarray of shape (n_data_sets, n_params)

The prior draws (true parameters) obtained for generating the n_data_sets

param_nameslist or None, optional, default: None

The parameter names for nice plot titles. Inferred if None

fig_sizetuple or None, optional, defaultNone

The figure size passed to the matplotlib constructor. Inferred if None.

label_fontsizeint, optional, default: 16

The font size of the y-label text

title_fontsizeint, optional, default: 18

The font size of the title text

tick_fontsizeint, optional, default: 12

The font size of the axis ticklabels

colorstr, optional, default: ‘#8f2727’

The color for the true vs. estimated scatter points and error bars

n_rowint, optional, default: None

The number of rows for the subplots. Dynamically determined if None.

n_colint, optional, default: None

The number of columns for the subplots. Dynamically determined if None.

Returns:
fplt.Figure - the figure instance for optional saving
Raises:
ShapeError

If there is a deviation from the expected shapes of post_samples and prior_samples.

bayesflow.diagnostics.plot_sbc_ecdf(post_samples, prior_samples, difference=False, stacked=False, fig_size=None, param_names=None, label_fontsize=16, legend_fontsize=14, title_fontsize=18, tick_fontsize=12, rank_ecdf_color='#a34f4f', fill_color='grey', n_row=None, n_col=None, **kwargs)[source]#

Creates the empirical CDFs for each marginal rank distribution and plots it against a uniform ECDF. ECDF simultaneous bands are drawn using simulations from the uniform, as proposed by [1].

For models with many parameters, use stacked=True to obtain an idea of the overall calibration of a posterior approximator.

[1] Säilynoja, T., Bürkner, P. C., & Vehtari, A. (2022). Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison. Statistics and Computing, 32(2), 1-21. https://arxiv.org/abs/2103.10522

Parameters:
post_samplesnp.ndarray of shape (n_data_sets, n_post_draws, n_params)

The posterior draws obtained from n_data_sets

prior_samplesnp.ndarray of shape (n_data_sets, n_params)

The prior draws obtained for generating n_data_sets

differencebool, optional, default: False

If True, plots the ECDF difference. Enables a more dynamic visualization range.

stackedbool, optional, default: False

If True, all ECDFs will be plotted on the same plot. If False, each ECDF will have its own subplot, similar to the behavior of plot_sbc_histograms.

param_nameslist or None, optional, default: None

The parameter names for nice plot titles. Inferred if None. Only relevant if stacked=False.

fig_sizetuple or None, optional, default: None

The figure size passed to the matplotlib constructor. Inferred if None.

label_fontsizeint, optional, default: 16

The font size of the y-label and y-label texts

legend_fontsizeint, optional, default: 14

The font size of the legend text

title_fontsizeint, optional, default: 18

The font size of the title text. Only relevant if stacked=False

tick_fontsizeint, optional, default: 12

The font size of the axis ticklabels

rank_ecdf_colorstr, optional, default: ‘#a34f4f’

The color to use for the rank ECDFs

fill_colorstr, optional, default: ‘grey’

The color of the fill arguments.

n_rowint, optional, default: None

The number of rows for the subplots. Dynamically determined if None.

n_colint, optional, default: None

The number of columns for the subplots. Dynamically determined if None.

**kwargsdict, optional, default: {}

Keyword arguments can be passed to control the behavior of ECDF simultaneous band computation through the ecdf_bands_kwargs dictionary. See simultaneous_ecdf_bands for keyword arguments

Returns:
fplt.Figure - the figure instance for optional saving
Raises:
ShapeError

If there is a deviation form the expected shapes of post_samples and prior_samples.

bayesflow.diagnostics.plot_sbc_histograms(post_samples, prior_samples, param_names=None, fig_size=None, num_bins=None, binomial_interval=0.99, label_fontsize=16, title_fontsize=18, tick_fontsize=12, hist_color='#a34f4f', n_row=None, n_col=None)[source]#

Creates and plots publication-ready histograms of rank statistics for simulation-based calibration (SBC) checks according to [1].

Any deviation from uniformity indicates miscalibration and thus poor convergence of the networks or poor combination between generative model / networks.

[1] Talts, S., Betancourt, M., Simpson, D., Vehtari, A., & Gelman, A. (2018). Validating Bayesian inference algorithms with simulation-based calibration. arXiv preprint arXiv:1804.06788.

Parameters:
post_samplesnp.ndarray of shape (n_data_sets, n_post_draws, n_params)

The posterior draws obtained from n_data_sets

prior_samplesnp.ndarray of shape (n_data_sets, n_params)

The prior draws obtained for generating n_data_sets

param_nameslist or None, optional, default: None

The parameter names for nice plot titles. Inferred if None

fig_sizetuple or None, optional, defaultNone

The figure size passed to the matplotlib constructor. Inferred if None

num_binsint, optional, default: 10

The number of bins to use for each marginal histogram

binomial_intervalfloat in (0, 1), optional, default: 0.99

The width of the confidence interval for the binomial distribution

label_fontsizeint, optional, default: 16

The font size of the y-label text

title_fontsizeint, optional, default: 18

The font size of the title text

tick_fontsizeint, optional, default: 12

The font size of the axis ticklabels

hist_colorstr, optional, default ‘#a34f4f’

The color to use for the histogram body

n_rowint, optional, default: None

The number of rows for the subplots. Dynamically determined if None.

n_colint, optional, default: None

The number of columns for the subplots. Dynamically determined if None.

Returns:
fplt.Figure - the figure instance for optional saving
Raises:
ShapeError

If there is a deviation form the expected shapes of post_samples and prior_samples.

bayesflow.diagnostics.plot_posterior_2d(posterior_draws, prior=None, prior_draws=None, param_names=None, height=3, label_fontsize=14, legend_fontsize=16, tick_fontsize=12, post_color='#8f2727', prior_color='gray', post_alpha=0.9, prior_alpha=0.7)[source]#

Generates a bivariate pairplot given posterior draws and optional prior or prior draws.

posterior_drawsnp.ndarray of shape (n_post_draws, n_params)

The posterior draws obtained for a SINGLE observed data set.

priorbayesflow.forward_inference.Prior instance or None, optional, default: None

The optional prior object having an input-output signature as given by ayesflow.forward_inference.Prior

prior_drawsnp.ndarray of shape (n_prior_draws, n_params) or None, optonal (default: None)

The optional prior draws obtained from the prior. If both prior and prior_draws are provided, prior_draws will be used.

param_nameslist or None, optional, default: None

The parameter names for nice plot titles. Inferred if None

heightfloat, optional, default: 3

The height of the pairplot

label_fontsizeint, optional, default: 14

The font size of the x and y-label texts (parameter names)

legend_fontsizeint, optional, default: 16

The font size of the legend text

tick_fontsizeint, optional, default: 12

The font size of the axis ticklabels

post_colorstr, optional, default: ‘#8f2727’

The color for the posterior histograms and KDEs

priors_colorstr, optional, default: gray

The color for the optional prior histograms and KDEs

post_alphafloat in [0, 1], optonal, default: 0.9

The opacity of the posterior plots

prior_alphafloat in [0, 1], optonal, default: 0.7

The opacity of the prior plots

Returns:
fplt.Figure - the figure instance for optional saving
Raises:
AssertionError

If the shape of posterior_draws is not 2-dimensional.

bayesflow.diagnostics.plot_losses(train_losses, val_losses=None, moving_average=False, ma_window_fraction=0.01, fig_size=None, train_color='#8f2727', val_color='black', lw_train=2, lw_val=3, grid_alpha=0.5, legend_fontsize=14, label_fontsize=14, title_fontsize=16)[source]#

A generic helper function to plot the losses of a series of training epochs and runs.

Parameters:
train_lossespd.DataFrame

The (plottable) history as returned by a train_[…] method of a Trainer instance. Alternatively, you can just pass a data frame of validation losses instead of train losses, if you only want to plot the validation loss.

val_lossespd.DataFrame or None, optional, default: None

The (plottable) validation history as returned by a train_[…] method of a Trainer instance. If left None, only train losses are plotted. Should have the same number of columns as train_losses.

moving_averagebool, optional, default: False

A flag for adding a moving average line of the train_losses.

ma_window_fractionint, optional, default: 0.01

Window size for the moving average as a fraction of total training steps.

fig_sizetuple or None, optional, default: None

The figure size passed to the matplotlib constructor. Inferred if None

train_colorstr, optional, default: ‘#8f2727’

The color for the train loss trajectory

val_colorstr, optional, default: black

The color for the optional validation loss trajectory

lw_trainint, optional, default: 2

The linewidth for the training loss curve

lw_valint, optional, default: 3

The linewidth for the validation loss curve

grid_alphafloat, optional, default 0.5

The opacity factor for the background gridlines

legend_fontsizeint, optional, default: 14

The font size of the legend text

label_fontsizeint, optional, default: 14

The font size of the y-label text

title_fontsizeint, optional, default: 16

The font size of the title text

Returns:
fplt.Figure - the figure instance for optional saving
Raises:
AssertionError

If the number of columns in train_losses does not match the number of columns in val_losses.

bayesflow.diagnostics.plot_prior2d(prior, param_names=None, n_samples=2000, height=2.5, color='#8f2727', **kwargs)[source]#

Creates pair-plots for a given joint prior.

Parameters:
priorcallable

The prior object which takes a single integer argument and generates random draws.

param_nameslist of str or None, optional, default None

An optional list of strings which

n_samplesint, optional, default: 1000

The number of random draws from the joint prior

heightfloat, optional, default: 2.5

The height of the pair plot

colorstr, optional, default‘#8f2727’

The color of the plot

**kwargsdict, optional

Additional keyword arguments passed to the sns.PairGrid constructor

Returns:
fplt.Figure - the figure instance for optional saving
bayesflow.diagnostics.plot_latent_space_2d(z_samples, height=2.5, color='#8f2727', **kwargs)[source]#

Creates pair plots for the latent space learned by the inference network. Enables visual inspection of the latent space and whether its structure corresponds to the one enforced by the optimization criterion.

Parameters:
z_samplesnp.ndarray or tf.Tensor of shape (n_sim, n_params)

The latent samples computed through a forward pass of the inference network.

heightfloat, optional, default: 2.5

The height of the pair plot.

colorstr, optional, default‘#8f2727’

The color of the plot

**kwargsdict, optional

Additional keyword arguments passed to the sns.PairGrid constructor

Returns:
fplt.Figure - the figure instance for optional saving
bayesflow.diagnostics.plot_calibration_curves(true_models, pred_models, model_names=None, num_bins=10, label_fontsize=16, legend_fontsize=14, title_fontsize=18, tick_fontsize=12, epsilon=0.02, fig_size=None, color='#8f2727', n_row=None, n_col=None)[source]#

Plots the calibration curves, the ECEs and the marginal histograms of predicted posterior model probabilities for a model comparison problem. The marginal histograms inform about the fraction of predictions in each bin. Depends on the expected_calibration_error function for computing the ECE.

Parameters:
true_modelsnp.ndarray of shape (num_data_sets, num_models)

The one-hot-encoded true model indices per data set.

pred_modelsnp.ndarray of shape (num_data_sets, num_models)

The predicted posterior model probabilities (PMPs) per data set.

model_nameslist or None, optional, default: None

The model names for nice plot titles. Inferred if None.

num_binsint, optional, default: 10

The number of bins to use for the calibration curves (and marginal histograms).

label_fontsizeint, optional, default: 16

The font size of the y-label and y-label texts

legend_fontsizeint, optional, default: 14

The font size of the legend text (ECE value)

title_fontsizeint, optional, default: 18

The font size of the title text. Only relevant if stacked=False

tick_fontsizeint, optional, default: 12

The font size of the axis ticklabels

epsilonfloat, optional, default: 0.02

A small amount to pad the [0, 1]-bounded axes from both side.

fig_sizetuple or None, optional, default: None

The figure size passed to the matplotlib constructor. Inferred if None

colorstr, optional, default: ‘#8f2727’

The color of the calibration curves

n_rowint, optional, default: None

The number of rows for the subplots. Dynamically determined if None.

n_colint, optional, default: None

The number of columns for the subplots. Dynamically determined if None.

Returns:
figplt.Figure - the figure instance for optional saving
bayesflow.diagnostics.plot_confusion_matrix(true_models, pred_models, model_names=None, fig_size=(5, 5), label_fontsize=16, title_fontsize=18, value_fontsize=10, tick_fontsize=12, xtick_rotation=None, ytick_rotation=None, normalize=True, cmap=None, title=True)[source]#

Plots a confusion matrix for validating a neural network trained for Bayesian model comparison.

Parameters:
true_modelsnp.ndarray of shape (num_data_sets, num_models)

The one-hot-encoded true model indices per data set.

pred_modelsnp.ndarray of shape (num_data_sets, num_models)

The predicted posterior model probabilities (PMPs) per data set.

model_nameslist or None, optional, default: None

The model names for nice plot titles. Inferred if None.

fig_sizetuple or None, optional, default: (5, 5)

The figure size passed to the matplotlib constructor. Inferred if None

label_fontsizeint, optional, default: 16

The font size of the y-label and y-label texts

title_fontsizeint, optional, default: 18

The font size of the title text.

value_fontsizeint, optional, default: 10

The font size of the text annotations and the colorbar tick labels.

tick_fontsizeint, optional, default: 12

The font size of the axis label and model name texts.

xtick_rotation: int, optional, default: None

Rotation of x-axis tick labels (helps with long model names).

ytick_rotation: int, optional, default: None

Rotation of y-axis tick labels (helps with long model names).

normalizebool, optional, default: True

A flag for normalization of the confusion matrix. If True, each row of the confusion matrix is normalized to sum to 1.

cmapmatplotlib.colors.Colormap or str, optional, default: None

Colormap to be used for the cells. If a str, it should be the name of a registered colormap, e.g., ‘viridis’. Default colormap matches the BayesFlow defaults by ranging from white to red.

titlebool, optional, default True

A flag for adding ‘Confusion Matrix’ above the matrix.

Returns:
figplt.Figure - the figure instance for optional saving
bayesflow.diagnostics.plot_mmd_hypothesis_test(mmd_null, mmd_observed=None, alpha_level=0.05, null_color=(0.16407, 0.020171, 0.577478), observed_color='red', alpha_color='orange', truncate_vlines_at_kde=False, xmin=None, xmax=None, bw_factor=1.5)[source]#
Parameters:
mmd_nullnp.ndarray

The samples from the MMD sampling distribution under the null hypothesis “the model is well-specified”

mmd_observedfloat

The observed MMD value

alpha_levelfloat, optional, default: 0.05

The rejection probability (type I error)

null_colorstr or tuple, optional, default: (0.16407, 0.020171, 0.577478)

The color of the H0 sampling distribution

observed_colorstr or tuple, optional, default: “red”

The color of the observed MMD

alpha_colorstr or tuple, optional, default: “orange”

The color of the rejection area

truncate_vlines_at_kde: bool, optional, default: False

true: cut off the vlines at the kde false: continue kde lines across the plot

xminfloat, optional, default: None

The lower x-axis limit

xmaxfloat, optional, default: None

The upper x-axis limit

bw_factorfloat, optional, default: 1.5

bandwidth (aka. smoothing parameter) of the kernel density estimate

Returns:
fplt.Figure - the figure instance for optional saving