3. Data Processing: Adapters#

An Adapter is a composable preprocessing pipeline that transforms the raw simulator output dict into the exact tensor format expected by the networks. It runs both during training and inference, ensuring preprocessing consistency automatically.

3.1. Reserved Keys#

BayesFlow routes tensors by name. Three main keys are recognized:

Key	Purpose
`"inference_variables"`	Inference network target - what is being inferred (e.g. parameters)
`"inference_conditions"`	Passed directly to the inference network, bypassing the summary network
`"summary_variables"`	Passed to the summary network; its output becomes an inference condition

Further four special keys are recognized:

Key	Purpose
`"sample_weight"`	Can be used to assign different weights or mask out entries in a batch
`"summary_attention_mask"`	Passed to the summary network’s attention blocks (if any)
`"summary_mask"`	Passed to the summary network as a general mask (if consumed by any subnets)
`"inference_attention_mask"`	Passed to the inferece network’s attention blocks (if any)
`"inference_mask"`	Passed to the inference network as a general mask (if consumed by any subnets)

Any key not renamed to one of these is ignored during training and inference.

3.2. Workflow Shorthand vs. Explicit Adapter#

BasicWorkflow accepts inference_variables=, summary_variables=, and inference_conditions= as keyword arguments for simple cases. However, the shorthand and an explicit adapter= argument are mutually exclusive:

Warning

Never pass both adapter= and the naming keyword arguments (inference_variables=, etc.) to BasicWorkflow. They will silently conflict. When in doubt, always use an explicit bf.Adapter().

Use an explicit adapter whenever you need structural transforms (.as_set(), .broadcast()), parameter constraints (.constrain()), dtype conversion, or any derived quantity not already present in the simulator output.

3.3. Example: Posterior Estimation#

The adapter below converts a simple dict of arrays into the standard BayesFlow format, assuming a summary network is used.

import bayesflow as bf
import numpy as np

batch_size = 2
rng = np.random.default_rng(seed=2025)
data = {
    "theta_1": np.zeros((batch_size, 1)),
    "theta_2": np.ones((batch_size, 1)),
    "x": rng.uniform(size=(batch_size, 3)),
}
print("Shapes:", {k: v.shape for k, v in data.items()})

Shapes: {'theta_1': (2, 1), 'theta_2': (2, 1), 'x': (2, 3)}

Build an adapter to convert theta_1, theta_2, and x into the named routing keys:

adapter = (
    bf.Adapter()
    .convert_dtype("float64", "float32")
    .concatenate(["theta_1", "theta_2"], into="inference_variables")
    .rename("x", "summary_variables")
)

print(adapter)

Adapter([0: ConvertDType -> 1: Concatenate(['theta_1', 'theta_2'] -> 'inference_variables') -> 2: Rename('x' -> 'summary_variables')])

Apply the adapter to get the transformed dict:

transformed_data = adapter(data)
print(transformed_data)
print("Shapes:", {k: v.shape for k, v in transformed_data.items()})

{'inference_variables': array([[0., 1.],
       [0., 1.]], dtype=float32), 'summary_variables': array([[0.9944578 , 0.38200974, 0.827148  ],
       [0.8372553 , 0.97580904, 0.07722503]], dtype=float32)}
Shapes: {'inference_variables': (2, 2), 'summary_variables': (2, 3)}

Most transforms are invertible. Pass inverse=True to recover the original tensors:

cycled_data = adapter(transformed_data, inverse=True)
print("Shapes:", {k: v.shape for k, v in cycled_data.items()})

Shapes: {'x': (2, 3), 'theta_1': (2, 1), 'theta_2': (2, 1)}

3.4. Example: Likelihood Estimation#

For likelihood estimation the roles are reversed: x becomes the inference target ("inference_variables") and the parameters are passed directly as conditions ("inference_conditions"), without a summary network.

adapter = (
    bf.Adapter()
    .convert_dtype("float64", "float32")
    .concatenate(["theta_1", "theta_2"], into="inference_conditions")
    .rename("x", "inference_variables")
)

print(adapter)
transformed_data = adapter(data)
print("Shapes:", {k: v.shape for k, v in transformed_data.items()})

Adapter([0: ConvertDType -> 1: Concatenate(['theta_1', 'theta_2'] -> 'inference_conditions') -> 2: Rename('x' -> 'inference_variables')])
Shapes: {'inference_conditions': (2, 2), 'inference_variables': (2, 3)}

3.5. Full Example: Varying-N Linear Regression#

The following example shows all major adapter operations together. The simulator draws regression parameters from a prior and generates N observations, where N itself varies across simulations via a meta function:

def prior():
    beta = np.random.normal([2, 0], [3, 1])
    sigma = np.random.gamma(1, 1)
    return dict(beta=beta, sigma=sigma)

def likelihood(beta, sigma, N):
    x = np.random.normal(0, 1, size=N)
    y = np.random.normal(beta[0] + beta[1] * x, sigma, size=N)
    return dict(y=y, x=x)

def meta():
    return dict(N=np.random.randint(5, 50))

simulator = bf.make_simulator([prior, likelihood], meta_fn=meta)

data = simulator.sample(2)

for k, v in data.items():
    print(f"{k}: {v.shape if isinstance(v, np.ndarray) else v}")

N: 21
beta: (2, 2)
sigma: (2, 1)
y: (2, 21)
x: (2, 21)

The adapter transforms the raw output dict into the tensor layout the networks expect:

adapter = (
    bf.Adapter()
    .broadcast("N", to="x")                              # replicate scalar N to batch size of x
    .as_set(["x", "y"])                                  # add trailing feature axis for SetTransformer
    .constrain("sigma", lower=0)                         # softplus bijection for positive-only values
    .sqrt("N")                                           # compress wide dynamic range of N
    .convert_dtype("float64", "float32")                 # backends expect float32
    .concatenate(["beta", "sigma"], into="inference_variables")
    .concatenate(["x", "y"], into="summary_variables")
    .rename("N", "inference_conditions")
)

adapted_data = adapter(data)

for k, v in adapted_data.items():
    print(f"{k}: {v.shape if isinstance(v, np.ndarray) else v}")

inference_variables: (2, 3)
summary_variables: (2, 21, 2)
inference_conditions: (2, 1)

3.5.1. Step-by-Step#

Step	Why
`.broadcast("N", to="x")`	`N` is a scalar shared across the batch. Replicates it to match the batch size of `x`. Required before any later operation treats `N` as a per-sample tensor.
`.as_set(["x", "y"])`	Marks observations as exchangeable (i.i.d.) and adds a trailing feature axis, turning each scalar into a `(1,)` vector. Summary networks require at least 3-D input `(batch, N, D)`. Use `.as_time_series()` for ordered sequences instead.
`.constrain("sigma", lower=0)`	Maps `sigma` through a softplus bijection so the network operates in unconstrained space while back-transformed samples are always positive. Also supports `upper=1` (probabilities) and `lower=a, upper=b` (bounded). Always constrain before concatenating.
`.sqrt("N")`	Compresses the dynamic range of `N` before it enters the inference network. Apply similar monotone transforms (`.log()`, `.standardize()`) to any context variable with a wide range.
`.convert_dtype("float64", "float32")`	NumPy defaults to `float64`; all backends default to `float32`. Always call this near the end, before concatenation and renaming.
`.concatenate([...], into=...)`	Stacks arrays along the last axis into a single named tensor. The column order here defines the column order in posterior samples.
`.rename("N", "inference_conditions")`	Final routing step — changes the key name only, no value transformation.

3.5.2. Typical Ordering Rules#

Steps execute sequentially. Follow this order to avoid subtle bugs:

Structural — .broadcast(), .as_set(), .as_time_series()
Constraints — .constrain()
Feature engineering — .sqrt(), .log(), .standardize()
Dtype — .convert_dtype() (always near the end)
Assembly — .concatenate(), .rename() (always last)

For the full list of available transforms, see the Adapter API reference. For applied examples, see the Examples section.