bayesflow.diagnostics module#
- bayesflow.diagnostics.plot_recovery(post_samples, prior_samples, point_agg=<function median>, uncertainty_agg=<function median_abs_deviation>, param_names=None, fig_size=None, label_fontsize=16, title_fontsize=18, metric_fontsize=16, tick_fontsize=12, add_corr=True, add_r2=True, color='#8f2727', n_col=None, n_row=None, xlabel='Ground truth', ylabel='Estimated', **kwargs)[source]#
Creates and plots publication-ready recovery plot with true vs. point estimate + uncertainty. The point estimate can be controlled with the
point_agg
argument, and the uncertainty estimate can be controlled with theuncertainty_agg
argument.This plot yields similar information as the “posterior z-score”, but allows for generic point and uncertainty estimates:
https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html
Important: Posterior aggregates play no special role in Bayesian inference and should only be used heuristically. For instance, in the case of multi-modal posteriors, common point estimates, such as mean, (geometric) median, or maximum a posteriori (MAP) mean nothing.
- Parameters:
- post_samplesnp.ndarray of shape (n_data_sets, n_post_draws, n_params)
The posterior draws obtained from n_data_sets
- prior_samplesnp.ndarray of shape (n_data_sets, n_params)
The prior draws (true parameters) obtained for generating the n_data_sets
- point_aggcallable, optional, default:
np.median
The function to apply to the posterior draws to get a point estimate for each marginal. The default computes the marginal median for each marginal posterior as a robust point estimate.
- uncertainty_aggcallable or None, optional, default: scipy.stats.median_abs_deviation
The function to apply to the posterior draws to get an uncertainty estimate. If
None
provided, a simple scatter using onlypoint_agg
will be plotted.- param_nameslist or None, optional, default: None
The parameter names for nice plot titles. Inferred if None
- fig_sizetuple or None, optional, defaultNone
The figure size passed to the matplotlib constructor. Inferred if None.
- label_fontsizeint, optional, default: 16
The font size of the y-label text
- title_fontsizeint, optional, default: 18
The font size of the title text
- metric_fontsizeint, optional, default: 16
The font size of the goodness-of-fit metric (if provided)
- tick_fontsizeint, optional, default: 12
The font size of the axis tick labels
- add_corrbool, optional, default: True
A flag for adding correlation between true and estimates to the plot
- add_r2bool, optional, default: True
A flag for adding R^2 between true and estimates to the plot
- colorstr, optional, default: ‘#8f2727’
The color for the true vs. estimated scatter points and error bars
- n_rowint, optional, default: None
The number of rows for the subplots. Dynamically determined if None.
- n_colint, optional, default: None
The number of columns for the subplots. Dynamically determined if None.
- xlabelstr, optional, default: ‘Ground truth’
The label on the x-axis of the plot
- ylabelstr, optional, default: ‘Estimated’
The label on the y-axis of the plot
- **kwargsoptional
Additional keyword arguments passed to ax.errorbar or ax.scatter. Example: rasterized=True to reduce PDF file size with many dots
- Returns:
- fplt.Figure - the figure instance for optional saving
- Raises:
- ShapeError
If there is a deviation from the expected shapes of
post_samples
andprior_samples
.
- bayesflow.diagnostics.plot_z_score_contraction(post_samples, prior_samples, param_names=None, fig_size=None, label_fontsize=16, title_fontsize=18, tick_fontsize=12, color='#8f2727', n_col=None, n_row=None)[source]#
Implements a graphical check for global model sensitivity by plotting the posterior z-score over the posterior contraction for each set of posterior samples in
post_samples
according to [1].The definition of the posterior z-score is:
post_z_score = (posterior_mean - true_parameters) / posterior_std
And the score is adequate if it centers around zero and spreads roughly in the interval [-3, 3]
The definition of posterior contraction is:
post_contraction = 1 - (posterior_variance / prior_variance)
In other words, the posterior contraction is a proxy for the reduction in uncertainty gained by replacing the prior with the posterior. The ideal posterior contraction tends to 1. Contraction near zero indicates that the posterior variance is almost identical to the prior variance for the particular marginal parameter distribution.
Note: Means and variances will be estimated via their sample-based estimators.
[1] Schad, D. J., Betancourt, M., & Vasishth, S. (2021). Toward a principled Bayesian workflow in cognitive science. Psychological methods, 26(1), 103.
Paper also available at https://arxiv.org/abs/1904.12765
- Parameters:
- post_samplesnp.ndarray of shape (n_data_sets, n_post_draws, n_params)
The posterior draws obtained from n_data_sets
- prior_samplesnp.ndarray of shape (n_data_sets, n_params)
The prior draws (true parameters) obtained for generating the n_data_sets
- param_nameslist or None, optional, default: None
The parameter names for nice plot titles. Inferred if None
- fig_sizetuple or None, optional, defaultNone
The figure size passed to the matplotlib constructor. Inferred if None.
- label_fontsizeint, optional, default: 16
The font size of the y-label text
- title_fontsizeint, optional, default: 18
The font size of the title text
- tick_fontsizeint, optional, default: 12
The font size of the axis ticklabels
- colorstr, optional, default: ‘#8f2727’
The color for the true vs. estimated scatter points and error bars
- n_rowint, optional, default: None
The number of rows for the subplots. Dynamically determined if None.
- n_colint, optional, default: None
The number of columns for the subplots. Dynamically determined if None.
- Returns:
- fplt.Figure - the figure instance for optional saving
- Raises:
- ShapeError
If there is a deviation from the expected shapes of
post_samples
andprior_samples
.
- bayesflow.diagnostics.plot_sbc_ecdf(post_samples, prior_samples, difference=False, stacked=False, fig_size=None, param_names=None, label_fontsize=16, legend_fontsize=14, title_fontsize=18, tick_fontsize=12, rank_ecdf_color='#a34f4f', fill_color='grey', n_row=None, n_col=None, **kwargs)[source]#
Creates the empirical CDFs for each marginal rank distribution and plots it against a uniform ECDF. ECDF simultaneous bands are drawn using simulations from the uniform, as proposed by [1].
For models with many parameters, use stacked=True to obtain an idea of the overall calibration of a posterior approximator.
[1] Säilynoja, T., Bürkner, P. C., & Vehtari, A. (2022). Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison. Statistics and Computing, 32(2), 1-21. https://arxiv.org/abs/2103.10522
- Parameters:
- post_samplesnp.ndarray of shape (n_data_sets, n_post_draws, n_params)
The posterior draws obtained from n_data_sets
- prior_samplesnp.ndarray of shape (n_data_sets, n_params)
The prior draws obtained for generating n_data_sets
- differencebool, optional, default: False
If True, plots the ECDF difference. Enables a more dynamic visualization range.
- stackedbool, optional, default: False
If True, all ECDFs will be plotted on the same plot. If False, each ECDF will have its own subplot, similar to the behavior of plot_sbc_histograms.
- param_nameslist or None, optional, default: None
The parameter names for nice plot titles. Inferred if None. Only relevant if stacked=False.
- fig_sizetuple or None, optional, default: None
The figure size passed to the matplotlib constructor. Inferred if None.
- label_fontsizeint, optional, default: 16
The font size of the y-label and y-label texts
- legend_fontsizeint, optional, default: 14
The font size of the legend text
- title_fontsizeint, optional, default: 18
The font size of the title text. Only relevant if stacked=False
- tick_fontsizeint, optional, default: 12
The font size of the axis ticklabels
- rank_ecdf_colorstr, optional, default: ‘#a34f4f’
The color to use for the rank ECDFs
- fill_colorstr, optional, default: ‘grey’
The color of the fill arguments.
- n_rowint, optional, default: None
The number of rows for the subplots. Dynamically determined if None.
- n_colint, optional, default: None
The number of columns for the subplots. Dynamically determined if None.
- **kwargsdict, optional, default: {}
Keyword arguments can be passed to control the behavior of ECDF simultaneous band computation through the
ecdf_bands_kwargs
dictionary. See simultaneous_ecdf_bands for keyword arguments
- Returns:
- fplt.Figure - the figure instance for optional saving
- Raises:
- ShapeError
If there is a deviation form the expected shapes of post_samples and prior_samples.
- bayesflow.diagnostics.plot_sbc_histograms(post_samples, prior_samples, param_names=None, fig_size=None, num_bins=None, binomial_interval=0.99, label_fontsize=16, title_fontsize=18, tick_fontsize=12, hist_color='#a34f4f', n_row=None, n_col=None)[source]#
Creates and plots publication-ready histograms of rank statistics for simulation-based calibration (SBC) checks according to [1].
Any deviation from uniformity indicates miscalibration and thus poor convergence of the networks or poor combination between generative model / networks.
[1] Talts, S., Betancourt, M., Simpson, D., Vehtari, A., & Gelman, A. (2018). Validating Bayesian inference algorithms with simulation-based calibration. arXiv preprint arXiv:1804.06788.
- Parameters:
- post_samplesnp.ndarray of shape (n_data_sets, n_post_draws, n_params)
The posterior draws obtained from n_data_sets
- prior_samplesnp.ndarray of shape (n_data_sets, n_params)
The prior draws obtained for generating n_data_sets
- param_nameslist or None, optional, default: None
The parameter names for nice plot titles. Inferred if None
- fig_sizetuple or None, optional, defaultNone
The figure size passed to the matplotlib constructor. Inferred if None
- num_binsint, optional, default: 10
The number of bins to use for each marginal histogram
- binomial_intervalfloat in (0, 1), optional, default: 0.99
The width of the confidence interval for the binomial distribution
- label_fontsizeint, optional, default: 16
The font size of the y-label text
- title_fontsizeint, optional, default: 18
The font size of the title text
- tick_fontsizeint, optional, default: 12
The font size of the axis ticklabels
- hist_colorstr, optional, default ‘#a34f4f’
The color to use for the histogram body
- n_rowint, optional, default: None
The number of rows for the subplots. Dynamically determined if None.
- n_colint, optional, default: None
The number of columns for the subplots. Dynamically determined if None.
- Returns:
- fplt.Figure - the figure instance for optional saving
- Raises:
- ShapeError
If there is a deviation form the expected shapes of post_samples and prior_samples.
- bayesflow.diagnostics.plot_posterior_2d(posterior_draws, prior=None, prior_draws=None, param_names=None, height=3, label_fontsize=14, legend_fontsize=16, tick_fontsize=12, post_color='#8f2727', prior_color='gray', post_alpha=0.9, prior_alpha=0.7)[source]#
Generates a bivariate pairplot given posterior draws and optional prior or prior draws.
- posterior_drawsnp.ndarray of shape (n_post_draws, n_params)
The posterior draws obtained for a SINGLE observed data set.
- priorbayesflow.forward_inference.Prior instance or None, optional, default: None
The optional prior object having an input-output signature as given by ayesflow.forward_inference.Prior
- prior_drawsnp.ndarray of shape (n_prior_draws, n_params) or None, optonal (default: None)
The optional prior draws obtained from the prior. If both prior and prior_draws are provided, prior_draws will be used.
- param_nameslist or None, optional, default: None
The parameter names for nice plot titles. Inferred if None
- heightfloat, optional, default: 3
The height of the pairplot
- label_fontsizeint, optional, default: 14
The font size of the x and y-label texts (parameter names)
- legend_fontsizeint, optional, default: 16
The font size of the legend text
- tick_fontsizeint, optional, default: 12
The font size of the axis ticklabels
- post_colorstr, optional, default: ‘#8f2727’
The color for the posterior histograms and KDEs
- priors_colorstr, optional, default: gray
The color for the optional prior histograms and KDEs
- post_alphafloat in [0, 1], optonal, default: 0.9
The opacity of the posterior plots
- prior_alphafloat in [0, 1], optonal, default: 0.7
The opacity of the prior plots
- Returns:
- fplt.Figure - the figure instance for optional saving
- Raises:
- AssertionError
If the shape of posterior_draws is not 2-dimensional.
- bayesflow.diagnostics.plot_losses(train_losses, val_losses=None, moving_average=False, ma_window_fraction=0.01, fig_size=None, train_color='#8f2727', val_color='black', lw_train=2, lw_val=3, grid_alpha=0.5, legend_fontsize=14, label_fontsize=14, title_fontsize=16)[source]#
A generic helper function to plot the losses of a series of training epochs and runs.
- Parameters:
- train_lossespd.DataFrame
The (plottable) history as returned by a train_[…] method of a
Trainer
instance. Alternatively, you can just pass a data frame of validation losses instead of train losses, if you only want to plot the validation loss.- val_lossespd.DataFrame or None, optional, default: None
The (plottable) validation history as returned by a train_[…] method of a
Trainer
instance. If leftNone
, only train losses are plotted. Should have the same number of columns astrain_losses
.- moving_averagebool, optional, default: False
A flag for adding a moving average line of the train_losses.
- ma_window_fractionint, optional, default: 0.01
Window size for the moving average as a fraction of total training steps.
- fig_sizetuple or None, optional, default: None
The figure size passed to the
matplotlib
constructor. Inferred ifNone
- train_colorstr, optional, default: ‘#8f2727’
The color for the train loss trajectory
- val_colorstr, optional, default: black
The color for the optional validation loss trajectory
- lw_trainint, optional, default: 2
The linewidth for the training loss curve
- lw_valint, optional, default: 3
The linewidth for the validation loss curve
- grid_alphafloat, optional, default 0.5
The opacity factor for the background gridlines
- legend_fontsizeint, optional, default: 14
The font size of the legend text
- label_fontsizeint, optional, default: 14
The font size of the y-label text
- title_fontsizeint, optional, default: 16
The font size of the title text
- Returns:
- fplt.Figure - the figure instance for optional saving
- Raises:
- AssertionError
If the number of columns in
train_losses
does not match the number of columns inval_losses
.
- bayesflow.diagnostics.plot_prior2d(prior, param_names=None, n_samples=2000, height=2.5, color='#8f2727', **kwargs)[source]#
Creates pair-plots for a given joint prior.
- Parameters:
- priorcallable
The prior object which takes a single integer argument and generates random draws.
- param_nameslist of str or None, optional, default None
An optional list of strings which
- n_samplesint, optional, default: 1000
The number of random draws from the joint prior
- heightfloat, optional, default: 2.5
The height of the pair plot
- colorstr, optional, default‘#8f2727’
The color of the plot
- **kwargsdict, optional
Additional keyword arguments passed to the sns.PairGrid constructor
- Returns:
- fplt.Figure - the figure instance for optional saving
- bayesflow.diagnostics.plot_latent_space_2d(z_samples, height=2.5, color='#8f2727', **kwargs)[source]#
Creates pair plots for the latent space learned by the inference network. Enables visual inspection of the latent space and whether its structure corresponds to the one enforced by the optimization criterion.
- Parameters:
- z_samplesnp.ndarray or tf.Tensor of shape (n_sim, n_params)
The latent samples computed through a forward pass of the inference network.
- heightfloat, optional, default: 2.5
The height of the pair plot.
- colorstr, optional, default‘#8f2727’
The color of the plot
- **kwargsdict, optional
Additional keyword arguments passed to the sns.PairGrid constructor
- Returns:
- fplt.Figure - the figure instance for optional saving
- bayesflow.diagnostics.plot_calibration_curves(true_models, pred_models, model_names=None, num_bins=10, label_fontsize=16, legend_fontsize=14, title_fontsize=18, tick_fontsize=12, epsilon=0.02, fig_size=None, color='#8f2727', n_row=None, n_col=None)[source]#
Plots the calibration curves, the ECEs and the marginal histograms of predicted posterior model probabilities for a model comparison problem. The marginal histograms inform about the fraction of predictions in each bin. Depends on the
expected_calibration_error
function for computing the ECE.- Parameters:
- true_modelsnp.ndarray of shape (num_data_sets, num_models)
The one-hot-encoded true model indices per data set.
- pred_modelsnp.ndarray of shape (num_data_sets, num_models)
The predicted posterior model probabilities (PMPs) per data set.
- model_nameslist or None, optional, default: None
The model names for nice plot titles. Inferred if None.
- num_binsint, optional, default: 10
The number of bins to use for the calibration curves (and marginal histograms).
- label_fontsizeint, optional, default: 16
The font size of the y-label and y-label texts
- legend_fontsizeint, optional, default: 14
The font size of the legend text (ECE value)
- title_fontsizeint, optional, default: 18
The font size of the title text. Only relevant if stacked=False
- tick_fontsizeint, optional, default: 12
The font size of the axis ticklabels
- epsilonfloat, optional, default: 0.02
A small amount to pad the [0, 1]-bounded axes from both side.
- fig_sizetuple or None, optional, default: None
The figure size passed to the
matplotlib
constructor. Inferred ifNone
- colorstr, optional, default: ‘#8f2727’
The color of the calibration curves
- n_rowint, optional, default: None
The number of rows for the subplots. Dynamically determined if None.
- n_colint, optional, default: None
The number of columns for the subplots. Dynamically determined if None.
- Returns:
- figplt.Figure - the figure instance for optional saving
- bayesflow.diagnostics.plot_confusion_matrix(true_models, pred_models, model_names=None, fig_size=(5, 5), label_fontsize=16, title_fontsize=18, value_fontsize=10, tick_fontsize=12, xtick_rotation=None, ytick_rotation=None, normalize=True, cmap=None, title=True)[source]#
Plots a confusion matrix for validating a neural network trained for Bayesian model comparison.
- Parameters:
- true_modelsnp.ndarray of shape (num_data_sets, num_models)
The one-hot-encoded true model indices per data set.
- pred_modelsnp.ndarray of shape (num_data_sets, num_models)
The predicted posterior model probabilities (PMPs) per data set.
- model_nameslist or None, optional, default: None
The model names for nice plot titles. Inferred if None.
- fig_sizetuple or None, optional, default: (5, 5)
The figure size passed to the
matplotlib
constructor. Inferred ifNone
- label_fontsizeint, optional, default: 16
The font size of the y-label and y-label texts
- title_fontsizeint, optional, default: 18
The font size of the title text.
- value_fontsizeint, optional, default: 10
The font size of the text annotations and the colorbar tick labels.
- tick_fontsizeint, optional, default: 12
The font size of the axis label and model name texts.
- xtick_rotation: int, optional, default: None
Rotation of x-axis tick labels (helps with long model names).
- ytick_rotation: int, optional, default: None
Rotation of y-axis tick labels (helps with long model names).
- normalizebool, optional, default: True
A flag for normalization of the confusion matrix. If True, each row of the confusion matrix is normalized to sum to 1.
- cmapmatplotlib.colors.Colormap or str, optional, default: None
Colormap to be used for the cells. If a str, it should be the name of a registered colormap, e.g., ‘viridis’. Default colormap matches the BayesFlow defaults by ranging from white to red.
- titlebool, optional, default True
A flag for adding ‘Confusion Matrix’ above the matrix.
- Returns:
- figplt.Figure - the figure instance for optional saving
- bayesflow.diagnostics.plot_mmd_hypothesis_test(mmd_null, mmd_observed=None, alpha_level=0.05, null_color=(0.16407, 0.020171, 0.577478), observed_color='red', alpha_color='orange', truncate_vlines_at_kde=False, xmin=None, xmax=None, bw_factor=1.5)[source]#
- Parameters:
- mmd_nullnp.ndarray
The samples from the MMD sampling distribution under the null hypothesis “the model is well-specified”
- mmd_observedfloat
The observed MMD value
- alpha_levelfloat, optional, default: 0.05
The rejection probability (type I error)
- null_colorstr or tuple, optional, default: (0.16407, 0.020171, 0.577478)
The color of the H0 sampling distribution
- observed_colorstr or tuple, optional, default: “red”
The color of the observed MMD
- alpha_colorstr or tuple, optional, default: “orange”
The color of the rejection area
- truncate_vlines_at_kde: bool, optional, default: False
true: cut off the vlines at the kde false: continue kde lines across the plot
- xminfloat, optional, default: None
The lower x-axis limit
- xmaxfloat, optional, default: None
The upper x-axis limit
- bw_factorfloat, optional, default: 1.5
bandwidth (aka. smoothing parameter) of the kernel density estimate
- Returns:
- fplt.Figure - the figure instance for optional saving