calibration_log_gamma#

bayesflow.diagnostics.calibration_log_gamma(estimates: Mapping[str, ndarray] | ndarray, targets: Mapping[str, ndarray] | ndarray, variable_keys: Sequence[str] = None, variable_names: Sequence[str] = None, test_quantities: dict[str, Callable] = None, num_null_draws: int = 1000, quantile: float = 0.05)[source]#

Compute the log gamma discrepancy statistic to test posterior calibration, see [1] for additional information. Log gamma is log(gamma/gamma_null), where gamma_null is the 5th percentile of the null distribution under uniformity of ranks. That is, if adopting a hypothesis testing framework,then log_gamma < 0 implies a rejection of the hypothesis of uniform ranks at the 5% level. This diagnostic is typically more sensitive than the Kolmogorov-Smirnoff test or ChiSq test.

[1] Martin Modrák. Angie H. Moon. Shinyoung Kim. Paul Bürkner. Niko Huurre. Kateřina Faltejsková. Andrew Gelman. Aki Vehtari. “Simulation-Based Calibration Checking for Bayesian Computation: The Choice of Test Quantities Shapes Sensitivity.” Bayesian Anal. 20 (2) 461 - 488, June 2025. https://doi.org/10.1214/23-BA1404

Parameters:

estimatesnp.ndarray of shape (num_datasets, num_draws, num_variables)

The random draws from the approximate posteriors over num_datasets

targetsnp.ndarray of shape (num_datasets, num_variables)

The corresponding ground-truth values sampled from the prior

variable_keysSequence[str], optional (default = None)

Select keys from the dictionaries provided in estimates and targets. By default, select all keys.

variable_namesSequence[str], optional (default = None)

Optional variable names to show in the output.

test_quantitiesdict or None, optional, default: None

A dict that maps plot titles to functions that compute test quantities based on estimate/target draws.

The dict keys are automatically added to variable_keys and variable_names. Test quantity functions are expected to accept a dict of draws with shape (batch_size, ...) as the first (typically only) positional argument and return an NumPy array of shape (batch_size,). The functions do not have to deal with an additional sample dimension, as appropriate reshaping is done internally.

quantilefloat in (0, 1), optional, default 0.05

The quantile from the null distribution to be used as a threshold. A lower quantile increases sensitivity to deviations from uniformity.

Returns:

resultdict

Dictionary containing:

“values”float or np.ndarray
The log gamma values per variable
“metric_name”str
The name of the metric (“Log Gamma”).
“variable_names”str
The (inferred) variable names.