calibration_log_gamma#
- bayesflow.diagnostics.calibration_log_gamma(estimates: Mapping[str, ndarray] | ndarray, targets: Mapping[str, ndarray] | ndarray, variable_keys: Sequence[str] = None, variable_names: Sequence[str] = None, test_quantities: dict[str, Callable] = None, num_null_draws: int = 1000, quantile: float = 0.05)[source]#
Compute the log gamma discrepancy statistic to test posterior calibration, see [1] for additional information. Log gamma is log(gamma/gamma_null), where gamma_null is the 5th percentile of the null distribution under uniformity of ranks. That is, if adopting a hypothesis testing framework,then log_gamma < 0 implies a rejection of the hypothesis of uniform ranks at the 5% level. This diagnostic is typically more sensitive than the Kolmogorov-Smirnoff test or ChiSq test.
[1] Martin Modrák. Angie H. Moon. Shinyoung Kim. Paul Bürkner. Niko Huurre. Kateřina Faltejsková. Andrew Gelman. Aki Vehtari. “Simulation-Based Calibration Checking for Bayesian Computation: The Choice of Test Quantities Shapes Sensitivity.” Bayesian Anal. 20 (2) 461 - 488, June 2025. https://doi.org/10.1214/23-BA1404
- Parameters:
- estimatesnp.ndarray of shape (num_datasets, num_draws, num_variables)
The random draws from the approximate posteriors over
num_datasets- targetsnp.ndarray of shape (num_datasets, num_variables)
The corresponding ground-truth values sampled from the prior
- variable_keysSequence[str], optional (default = None)
Select keys from the dictionaries provided in estimates and targets. By default, select all keys.
- variable_namesSequence[str], optional (default = None)
Optional variable names to show in the output.
- test_quantitiesdict or None, optional, default: None
A dict that maps plot titles to functions that compute test quantities based on estimate/target draws.
The dict keys are automatically added to
variable_keysandvariable_names. Test quantity functions are expected to accept a dict of draws with shape(batch_size, ...)as the first (typically only) positional argument and return an NumPy array of shape(batch_size,). The functions do not have to deal with an additional sample dimension, as appropriate reshaping is done internally.- quantilefloat in (0, 1), optional, default 0.05
The quantile from the null distribution to be used as a threshold. A lower quantile increases sensitivity to deviations from uniformity.
- Returns:
- resultdict
Dictionary containing:
- “values”float or np.ndarray
The log gamma values per variable
- “metric_name”str
The name of the metric (“Log Gamma”).
- “variable_names”str
The (inferred) variable names.