6. Posterior Estimation for SIR-like Models#

6.1. Table of Contents#

import datetime
from functools import partial

import numpy as np
import pandas as pd
import bayesflow.diagnostics as diag
from bayesflow.amortizers import AmortizedPosterior
from bayesflow.networks import InvertibleNetwork, SequenceNetwork
from bayesflow.simulation import GenerativeModel, Prior, Simulator
from bayesflow.trainers import Trainer

6.2. Introduction #

In this tutorial, we will illustrate how to perform posterior inference on simple, stationary SIR-like models (complex models will be tackled in a further notebook). SIR-like models comprise suitable illustrative examples, since they generate time-series and their outputs represent the results of solving a system of ordinary differential equations (ODEs).

The details for tackling stochastic epidemiological models with neural networks are described in our corresponding paper, which you can consult for a more formal exposition and a more comprehensive treatment of neural architectures:

OutbreakFlow: Model-based Bayesian inference of disease outbreak dynamics with invertible neural networks and its application to the COVID-19 pandemics in Germany https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009472

RNG = np.random.default_rng(2023)

6.3. Defining the Generative Model #

As described in our very first notebook, a generative model consists of a prior (encoding suitable parameter ranges) and a simulator (generating data given simulations). Our underlying model distinguishes between susceptible, \(S\), infected, \(I\), and recovered, \(R\), individuals with infection and recovery occurring at a constant transmission rate \(\lambda\) and constant recovery rate \(\mu\), respectively. The model dynamics are governed by the following system of ODEs:

\[\begin{split} \begin{align} \frac{dS}{dt} &= -\lambda\,\left(\frac{S\,I}{N}\right) \\ \frac{dI}{dt} &= \lambda\,\left(\frac{S\,I}{N}\right) - \mu\,I \\ \frac{dR}{dt} &= \mu\,I, \end{align} \end{split}\]

with \(N = S + I + R\) denoting the total population size. For the purpose of forward inference (simulation), we will use a time step of \(dt = 1\), corresponding to daily case reports. In addition to the ODE parameters \(\lambda\) and \(\mu\), we consider a reporting delay parameter \(L\) and a dispersion parameter \(\psi\), which affect the number of reported infected individuals via a negative binomial disttribution (https://en.wikipedia.org/wiki/Negative_binomial_distribution):

\[ \begin{equation} I_t^{(obs)} \sim \textrm{NegBinomial}(I^{(new)}_{t-L}, \psi), \end{equation} \]

In this way, we connect the latent disease model to an observation model, which renders the relationship between parameters and data a stochastic one. Note, that the observation model induces a further parameter \(\psi\), responsible for the dispersion of the noise. Finally, we will also treat the number of initially infected individuals, \(I_0\) as an unknown parameter (having its own prior distribution).

6.3.1. Prior #

We will place the following prior distributions over the five model parameters, summarized in the table below:

\[\begin{split} \begin{aligned} & \text {Table 1. Description of model parameters and corresponding prior distributions}\\ &\begin{array}{lcl} \hline \hline \text { Description} & \text { Symbol } & \text { Prior Distribution } \\ \hline \hline \text{Initial transmission rate} & \text{$\lambda$} & \text{$\textrm{LogNormal}(\log(0.4), 0.5)$} \\ \text{Recovery rate of infected individuals} & \text{$\mu$} & \text{$\textrm{LogNormal}(\log(1/8), 0.2)$} \\ \text{Reporting delay (lag)} & \text{$L$} & \text{$\textrm{LogNormal}(\log(8), 0.2)$} \\ \text{Number of initially infected individuals} & \text{$I_0$} & \text{$\textrm{Gamma}(2, 20)$} \\ \text{Dispersion of the negative binomial distribution} & \text{$\psi$} & \text{$\textrm{Exponential}(5)$} \\ \hline \end{array} \end{aligned} \end{split}\]

How did we come up with these priors? In this case, we rely on the domain expertise and previous research by https://www.science.org/doi/10.1126/science.abb9789. In addition, the new parameter \(\psi\) follows an exponential distribution, which restricts it to positive numbers. Below is the implementation of these priors:

def model_prior():
    """Generates a random draw from the joint prior."""

    lambd = RNG.lognormal(mean=np.log(0.4), sigma=0.5)
    mu = RNG.lognormal(mean=np.log(1 / 8), sigma=0.2)
    D = RNG.lognormal(mean=np.log(8), sigma=0.2)
    I0 = RNG.gamma(shape=2, scale=20)
    psi = RNG.exponential(5)
    return np.array([lambd, mu, D, I0, psi])
prior = Prior(prior_fun=model_prior, param_names=[r"$\lambda$", r"$\mu$", r"$L$", r"$I_0$", r"$\psi$"])

During training, we will also standardize the prior draws, that is, ensure zero location and unit scale. We will do this purely for technical reasons - neural networks like scaled values. In addition, our current prior ranges differ vastly, so each parameter axis may contribute disproportionately to the loss function.

Here, we will use the estimate_means_and_stds() method of a Prior instance, which will estimate the prior means and standard deviations from random draws. We could have also just taken the analytic means and standard deviations, but these may not be available in all settings (e.g., implicit priors).

Caution: Make sure you have a seed or you set a seed whenever you are doing Monte Carlo estimation, since your results might differ slightly due to the empirical variation of the estimates!

prior_means, prior_stds = prior.estimate_means_and_stds()

6.3.2. Simulator (Implicit Likelihood Function) #

from scipy.stats import nbinom

def convert_params(mu, phi):
    """Helper function to convert mean/dispersion parameterization of a negative binomial to N and p,
    as expected by numpy's negative_binomial.

    See https://en.wikipedia.org/wiki/Negative_binomial_distribution#Alternative_formulations

    r = phi
    var = mu + 1 / r * mu**2
    p = (var - mu) / var
    return r, 1 - p

def stationary_SIR(params, N, T, eps=1e-5):
    """Performs a forward simulation from the stationary SIR model given a random draw from the prior."""

    # Extract parameters and round I0 and D
    lambd, mu, D, I0, psi = params
    I0 = np.ceil(I0)
    D = int(round(D))

    # Initial conditions
    S, I, R = [N - I0], [I0], [0]

    # Reported new cases
    C = [I0]

    # Simulate T-1 timesteps
    for t in range(1, T + D):
        # Calculate new cases
        I_new = lambd * (I[-1] * S[-1] / N)

        # SIR equations
        S_t = S[-1] - I_new
        I_t = np.clip(I[-1] + I_new - mu * I[-1], 0.0, N)
        R_t = np.clip(R[-1] + mu * I[-1], 0.0, N)

        # Track

    reparam = convert_params(np.clip(np.array(C[D:]), 0, N) + eps, psi)
    C_obs = RNG.negative_binomial(reparam[0], reparam[1])
    return C_obs[:, np.newaxis]

As you can see, in addition to the parameters, our simulator requires two further arguments: the total population size \(N\) and the time horizon \(T\). These are quantities over which we can amortize (i.e., context variables), but for this example, we will just use the population of Germany and the first two weeks of the pandemics (i.e., \(T=14\)), in the same vein as https://www.science.org/doi/10.1126/science.abb9789.

6.3.3. Loading Real Data #

We will define a simple helper function to load the actually reported cases for the first 2 weeks in Germany.

def load_data():
    """Helper function to load cumulative cases and transform them to new cases."""

    confirmed_cases_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
    confirmed_cases = pd.read_csv(confirmed_cases_url, sep=",")

    date_data_begin = datetime.date(2020, 3, 1)
    date_data_end = datetime.date(2020, 3, 15)
    format_date = lambda date_py: f"{date_py.month}/{date_py.day}/{str(date_py.year)[2:4]}"
    date_formatted_begin = format_date(date_data_begin)
    date_formatted_end = format_date(date_data_end)

    cases_obs = np.array(
        confirmed_cases.loc[confirmed_cases["Country/Region"] == "Germany", date_formatted_begin:date_formatted_end]
    new_cases_obs = np.diff(cases_obs)
    return new_cases_obs

We then collect the context and real data into a Python dictionary for convenience.

config = {"T": 14, "N": 83e6, "obs_data": load_data()}

Since we won’t vary the context variables during training, we can also define our simulator with fixed keyword arguments with the help of the partial function:

simulator = Simulator(simulator_fun=partial(stationary_SIR, T=config["T"], N=config["N"]))

Thus, whenever we call the simulator object, it will always use the keyword arguments provided to the partial function. Also, pay attention that we are passing the simulator function as a simulator_fun argument. A Simulator instance can also be initialized with a batched_simulator_fun, which implies that the simulator works on multiple (batched), instead of single, random draws from the prior.

6.3.4. Generative Model #

We now connect the prior and the simulator through the GenerativeModel wrapper:

model = GenerativeModel(prior, simulator, name="basic_covid_simulator")
INFO:root:Performing 2 pilot runs with the basic_covid_simulator model...
INFO:root:Shape of parameter batch after 2 pilot simulations: (batch_size = 2, 5)
INFO:root:Shape of simulation batch after 2 pilot simulations: (batch_size = 2, 14, 1)
INFO:root:No optional prior non-batchable context provided.
INFO:root:No optional prior batchable context provided.
INFO:root:No optional simulation non-batchable context provided.
INFO:root:No optional simulation batchable context provided.

6.4. Prior Checking #

Any principled Bayesian workflow requires some prior predictive or prior pushforward checks to ensure that the prior specification is consistent with domain expertise (see https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html). The BayesFlow library provides some rudimentary visual tools for performing prior checking. For instance, we can visually inspect the joint prior in the form of bivariate plots:

# As per default, the plot_prior2d function will obtain 1000 draws from the joint prior.
f = prior.plot_prior2d()
/home/radevs/anaconda3/envs/BayesFlowDev/lib/python3.10/site-packages/seaborn/axisgrid.py:64: UserWarning: The figure layout has changed to tight
  self.fig.tight_layout(*args, **kwargs)

6.5. Defining the Neural Approximator #

We can now proceed to define our BayesFlow neural architecture, that is, combine a summary network with an invertible inference network.

6.5.1. Summary Network #

Since our simulator outputs 3D tensors of shape (batch_size, T = 14, 1), we need to reduce this three-dimensional tensor into a two-dimensional tensor of shape (batch_size, summary_dim). Our model outputs are actually so simple that we could have just removed the trailing dimension of the raw outputs and simply fed the data directly to the inference network.

However, for the purpose of illustration (and generalization), we will create a more elaborate summary network consisting of 1D convolutional layers (https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/blocks/1d-convolution) followed by a Long Short-Term Memory (LSTM) network (https://colah.github.io/posts/2015-08-Understanding-LSTMs/). Such an architecture not only does what we want, but also generalizes to much more complex models and longer time-series of varying time steps, see for instance our OutbreakFlow architecture:


Feel free to experiment with different summary architectures as well!

summary_net = SequenceNetwork()

6.5.2. Inference Network #

inference_net = InvertibleNetwork(num_params=len(prior.param_names), num_coupling_layers=4)

6.5.3. Amortized Posterior #

We can now connect the summary and inference networks via the AmortizedPosterior wrapper:

amortizer = AmortizedPosterior(inference_net, summary_net, name="covid_amortizer")

Note, that the name keyword argument is optional, but it is good practice to name your models and amortizers.

6.6. Defining the Configurator #

As a reminder, a configurator acts as an intermediary between a generative model and an amortizer:


In other words, we need to ensure the outputs of the forward model are suitable for processing with neural networks. Currently, they are not, since our data \(\boldsymbol{x}_{1:T}\) consists of large integer (count) values. However, neural networks like scaled data. Furthermore, our parameters \(\boldsymbol{\theta}\) exhibit widely different scales due to their prior specification and role in the simulator, so we will standardize them using our previously computed prior means and standard deviations. In addition, ODE models are prone to divergences and exploding outputs, which will mess up our training. In sum, our configurator does the following:

  1. Initializes a new dictionary (line 7).

  2. Performs a log-transform on the simulated data and convert it to float32 type (line 10).

  3. Converts the prior draws to float32 type and standardizes them (lines 13 - 14).

  4. Removes potentially problematic simulations from the batch (lines 17 - 19).

def configure_input(forward_dict):
    """Function to configure the simulated quantities (i.e., simulator outputs)
    into a neural network-friendly (BayesFlow) format.

    # Prepare placeholder dict
    out_dict = {}

    # Convert data to logscale
    logdata = np.log1p(forward_dict["sim_data"]).astype(np.float32)

    # Extract prior draws and z-standardize with previously computed means
    params = forward_dict["prior_draws"].astype(np.float32)
    params = (params - prior_means) / prior_stds

    # Remove a batch if it contains nan, inf or -inf
    idx_keep = np.all(np.isfinite(logdata), axis=(1, 2))
    if not np.all(idx_keep):
        print("Invalid value encountered...removing from batch")

    # Add to keys
    out_dict["summary_conditions"] = logdata[idx_keep]
    out_dict["parameters"] = params[idx_keep]

    return out_dict

6.7. Defining the Trainer #

Finally, we are in a position to define our Trainer instance. Notice that we also pass out custom configurator function to the constructer. The default configurator won’t do in this case!

Note, that you should supply a checkpoint_path for the Trainer instance, if you don’t want to save the neural approximators manually! Note also, that we are using memory=True so that the trainer keeps some simulations encountered during training. After training, we can quickly perform some diagnostics on these “remembered” simulations to judge the convergence quality.

trainer = Trainer(amortizer=amortizer, generative_model=model, configurator=configure_input, memory=True)

Great, the trainer informs us that the consistency check (i.e., simulation -> configuration -> transformation -> loss computation) was successful. We can now train our networks on epidemiological simulations. We can also check out the number of trainable neural network parameters for the composite approximator:

Model: "covid_amortizer"
 Layer (type)                Output Shape              Param #   
 invertible_network (Invert  multiple                  297080    
 sequence_network (Sequence  multiple                  91178     
Total params: 388258 (1.48 MB)
Trainable params: 388218 (1.48 MB)
Non-trainable params: 40 (160.00 Byte)

6.8. Training Phase #

Ready to train! Since our simulator is pretty fast, we can safely go with online training. Let’s glean the time taken for a batch of \(32\) simulations:

_ = model(32)
CPU times: user 13.8 ms, sys: 0 ns, total: 13.8 ms
Wall time: 13.4 ms

Still, for the purpose of this illustration, we will use offline training on as little as \(10000\) simulated data sets.

offline_data = model(10000)

Depending on your machine, offline training for \(30\) epochs should take between \(1\) and \(3\) minutes. In practice, you should train for a longer number of epochs (\(100\) is a good starting heuristic).

history = trainer.train_offline(offline_data, epochs=30, batch_size=32, validation_sims=200)

6.8.1. Inspecting the Loss #

Following our online simulation-based training, we can quickly visualize the loss trajectory using the plot_losses function from the diagnostics module.

f = diag.plot_losses(history["train_losses"], history["val_losses"], moving_average=True)

Great, it seems that our approximator has converged! Before we get too excited and throw our networks at real data, we need to make sure that they meet our expectations in silico, that is, given the small world of simulations the networks have seen.

6.9. Validation Phase #

6.9.1. Inspecting the Latent Space #

A quick and useful diagnostic is to check whether the marginal latent distribution \(p(\boldsymbol{z})\) has the prescribed probabilistic structure. Since, by default, we optimize the amortizer with the Kullback-Leibler (KL) loss (also known as maximum likelihood training, which is not to be confused with maximum likelihood estimation!), we expect to observe approximately Gaussian latent space with independent axes. Moreover, since the trainer also keeps an internal SimulationMemory instance, we can also directly call it’s diagnose_latent2d method (also available in the diagnostics module):

f = trainer.diagnose_latent2d()
/home/radevs/anaconda3/envs/BayesFlowDev/lib/python3.10/site-packages/seaborn/axisgrid.py:64: UserWarning: The figure layout has changed to tight
  self.fig.tight_layout(*args, **kwargs)

6.9.2. Simulation-Based Calibration - Rank Histograms #

As a further small world (i.e., before real data) sanity check, we can also test the calibration of the amortizer through simulation-based calibration (SBC). See the corresponding paper by Sean Talts, Michael Betancourt, Daniel Simpson, Aki Vehtari, and Andrew Gelman for more details:


Accordingly, we expect to observe approximately uniform rank statistic histograms.

f = trainer.diagnose_sbc_histograms()

6.9.3. Simulation-Based Calibration - Rank ECDF #

For models with many parameters, inspecting many histograms can become unwieldly. Moreover, the num_bins hyperparameter for the construction of SBC rank histograms can be hard to choose. An alternative diagnostic approach for calibration is through empirical cumulative distribution functions (ECDF) of rank statistics. You can read more about this approach in the corresponding paper by Teemu Säilynoja, Paul-Christian Bürkner, and Aki Vehtari:


In order to inspect the ECDFs of marginal distributions, we will simulate \(300\) new pairs of simulated data and generating parameters \((\boldsymbol{x}, \boldsymbol{\theta})\) and use the function plot_sbc_ecdf from the diagnostics module:

# Generate some validation data
validation_sims = trainer.configurator(model(batch_size=300))

# Generate posterior draws for all simulations
post_samples = amortizer.sample(validation_sims, n_samples=100)
# Create ECDF plot
f = diag.plot_sbc_ecdf(post_samples, validation_sims["parameters"], param_names=prior.param_names)

We can also produce stacked ECDFs and compute ECDF differences for a more dynamic visualization range.

f = diag.plot_sbc_ecdf(
    post_samples, validation_sims["parameters"], stacked=True, difference=True, legend_fontsize=12, fig_size=(6, 5)

Fianlly, we can also compute SBC histograms on the new validation data by calling the function plot_sbc_histograms directly.

f = diag.plot_sbc_histograms(post_samples, validation_sims["parameters"], param_names=prior.param_names)
INFO:root:The ratio of simulations / posterior draws should be > 20 for reliable variance reduction, but your ratio is 3.                    Confidence intervals might be unreliable!

6.9.4. Inferential Adequacy (Global) #

Depending on the application, it might be interesting to see how well summaries of the full posterior (e.g., means, medians) recover the assumed true parameter values. We can test this in silico via the plot_recovery function in the diagnostics module. For instance, we can compare how well posterior means recover the true parameter (i.e., posterior z-score, https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html). Below, we re-use the \(300\) simulations we took for computing the rank ECDFs, but obtain a larger number of posterior draws per data set for more stable results:

post_samples = amortizer.sample(validation_sims, n_samples=1000)
f = diag.plot_recovery(post_samples, validation_sims["parameters"], param_names=prior.param_names)

6.10. Inference Phase #

We can now move on to using real data. As an important general remark: remember that the real and simulated data need to be in the same format (i.e., discrete indicators should be one-hot-encoded, transformations during training should also be applied during inference, etc.).

6.10.1. Bivariate Posteriors #

Finally, we can feed the real case data from the first two weeks and inspect the approximate posteriors or obtain model-based predictions.

# Format data into a 3D array of shape (1, n_time_steps, 1) and perform log transform
obs_data = np.log1p(config["obs_data"])[np.newaxis, :, np.newaxis].astype(np.float32)
# Obtain 500 posterior draws given real data
post_samples = amortizer.sample({"summary_conditions": obs_data}, 500)

# Undo standardization to get parameters on their original (unstandardized) scales
post_samples = prior_means + post_samples * prior_stds Standalone #

Using the plot_posterior_2d function from the diagnostics module, we can look at the bivariate posteriors in isolation:

f = diag.plot_posterior_2d(post_samples, param_names=prior.param_names)
/home/radevs/anaconda3/envs/BayesFlowDev/lib/python3.10/site-packages/seaborn/axisgrid.py:64: UserWarning: The figure layout has changed to tight
  self.fig.tight_layout(*args, **kwargs)
../_images/980019d5d9075d24b0e14a3917c8c32777695794a48a8b14aa43d89c955249a2.png Compared to Prior #

In addition, we can have a more informative plot which indicates the Bayesian surprise (i.e., difference between prior and posterior) by also supplying the prior object to the function:

f = diag.plot_posterior_2d(post_samples, prior=prior)
/home/radevs/anaconda3/envs/BayesFlowDev/lib/python3.10/site-packages/seaborn/axisgrid.py:64: UserWarning: The figure layout has changed to tight
  self.fig.tight_layout(*args, **kwargs)

6.10.2. Posterior Retrodictive Checks #

These are also called posterior predictive checks, but here we want to explicitly highlight the fact that we are not predicting future data but testing the generative performance or re-simulation performance of the model. In other words, we want to test how well the simulator can reproduce the actually observed data given the parameter posterior \(p(\boldsymbol{\theta} | \boldsymbol{x}_{1:T})\).

Here, we will create a custom function which plots the observed data and then overlays draws from the posterior predictive.

import matplotlib.pyplot as plt

def plot_ppc(config, post_samples, logscale=True, color="#8f2727", dummy=True, figsize=(12, 6), font_size=18):
    Helper function to perform some plotting of the posterior predictive.
    # Plot settings
    plt.rcParams["font.size"] = font_size

    # Remove parameters < 0
    samples = post_samples[np.sum(post_samples < 0, axis=1) == 0]

    f, ax = plt.subplots(1, 1, figsize=figsize)

    # Re-simulations
    sims = []
    for i in range(samples.shape[0]):
        # Note - simulator returns 2D arrays of shape (T, 1), so we remove trailing dim
        sim_cases = stationary_SIR(samples[i], config["N"], config["T"])[:, 0]
    sims = np.array(sims)

    # Compute quantiles for each t = 1,...,T
    qs_50 = np.quantile(sims, q=[0.25, 0.75], axis=0)
    qs_90 = np.quantile(sims, q=[0.05, 0.95], axis=0)
    qs_95 = np.quantile(sims, q=[0.025, 0.975], axis=0)

    # Plot median predictions and observed data
    ax.plot(np.median(sims, axis=0), label="Median predicted cases", color=color)
    ax.plot(config["obs_data"], marker="o", label="Reported cases", color="black", linestyle="dashed", alpha=0.8)

    # Add compatibility intervals (also called credible intervals)
    ax.fill_between(range(config["T"]), qs_50[0], qs_50[1], color=color, alpha=0.5, label="50% CI")
    ax.fill_between(range(config["T"]), qs_90[0], qs_90[1], color=color, alpha=0.3, label="90% CI")
    ax.fill_between(range(config["T"]), qs_95[0], qs_95[1], color=color, alpha=0.1, label="95% CI")

    # Grid and schmuck
    ax.grid(color="grey", linestyle="-", linewidth=0.25, alpha=0.5)
    ax.set_xlabel("Days since pandemic onset")
    ax.set_ylabel("Number of cases")
    if logscale:
    return f

We can now go on and plot the re-simulations:

f = plot_ppc(config, post_samples)

That’s it for this tutorial! You now know how to use the basic building blocks of BayesFlow to create amortized neural approximators. :)