calibration_ecdf#

bayesflow.diagnostics.calibration_ecdf(estimates: dict[str, ndarray] | ndarray, targets: dict[str, ndarray] | ndarray, variable_keys: Sequence[str] = None, variable_names: Sequence[str] = None, difference: bool = False, stacked: bool = False, rank_type: str | ndarray = 'fractional', figsize: Sequence[float] = None, label_fontsize: int = 16, legend_fontsize: int = 14, title_fontsize: int = 18, tick_fontsize: int = 12, rank_ecdf_color: str = '#132a70', fill_color: str = 'grey', num_row: int = None, num_col: int = None, **kwargs) Figure[source]#

Creates the empirical CDFs for each marginal rank distribution and plots it against a uniform ECDF. ECDF simultaneous bands are drawn using simulations from the uniform, as proposed by [1].

For models with many parameters, use stacked=True to obtain an idea of the overall calibration of a posterior approximator.

To compute ranks based on the Euclidean distance to the origin or a reference, use rank_type=’distance’ (and pass a reference array, respectively). This can be used to check the joint calibration of the posterior approximator and might show potential biases in the posterior approximation which are not detected by the fractional ranks (e.g., when the prior equals the posterior). This is motivated by [2].

[1] Säilynoja, T., Bürkner, P. C., & Vehtari, A. (2022). Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison. Statistics and Computing, 32(2), 1-21. https://arxiv.org/abs/2103.10522

[2] Lemos, Pablo, et al. “Sampling-based accuracy testing of posterior estimators

for general inference.” International Conference on Machine Learning. PMLR, 2023. https://proceedings.mlr.press/v202/lemos23a.html

Parameters:
estimatesnp.ndarray of shape (n_data_sets, n_post_draws, n_params)

The posterior draws obtained from n_data_sets

targetsnp.ndarray of shape (n_data_sets, n_params)

The prior draws obtained for generating n_data_sets

differencebool, optional, default: False

If True, plots the ECDF difference. Enables a more dynamic visualization range.

stackedbool, optional, default: False

If True, all ECDFs will be plotted on the same plot. If False, each ECDF will have its own subplot, similar to the behavior of calibration_histogram.

rank_typestr, optional, default: ‘fractional’

If fractional (default), the ranks are computed as the fraction of posterior samples that are smaller than the prior. If distance, the ranks are computed as the fraction of posterior samples that are closer to a reference points (default here is the origin). You can pass a reference array in the same shape as the estimates array by setting targets in the ranks_kwargs. This is motivated by [2].

variable_keyslist or None, optional, default: None

Select keys from the dictionaries provided in estimates and targets. By default, select all keys.

variable_nameslist or None, optional, default: None

The parameter names for nice plot titles. Inferred if None. Only relevant if stacked=False.

figsizetuple or None, optional, default: None

The figure size passed to the matplotlib constructor. Inferred if None.

label_fontsizeint, optional, default: 16

The font size of the y-label and y-label texts

legend_fontsizeint, optional, default: 14

The font size of the legend text

title_fontsizeint, optional, default: 18

The font size of the title text. Only relevant if stacked=False

tick_fontsizeint, optional, default: 12

The font size of the axis ticklabels

rank_ecdf_colorstr, optional, default: ‘#a34f4f’

The color to use for the rank ECDFs

fill_colorstr, optional, default: ‘grey’

The color of the fill arguments.

num_rowint, optional, default: None

The number of rows for the subplots. Dynamically determined if None.

num_colint, optional, default: None

The number of columns for the subplots. Dynamically determined if None.

**kwargsdict, optional, default: {}

Keyword arguments can be passed to control the behavior of ECDF simultaneous band computation through the ecdf_bands_kwargs dictionary. See simultaneous_ecdf_bands for keyword arguments. Moreover, additional keyword arguments can be passed to control the behavior of the rank computation through the ranks_kwargs dictionary.

Returns:
fplt.Figure - the figure instance for optional saving
Raises:
ShapeError

If there is a deviation form the expected shapes of estimates and targets.

ValueError

If an unknown rank_type is passed.