EnsembleDataset#

class bayesflow.datasets.EnsembleDataset(dataset: PyDataset, member_names: Sequence[str], data_reuse: float = 1.0, **kwargs)[source]#

Bases: PyDataset

Wrap a BayesFlow dataset to provide per-ensemble-member batches.

This dataset class is the recommended entry point for training ensembles. The wrapped dataset should meet the requirements of any single approximator in the EnsembleApproximator. EnsembleDataset supports OnlineDataset, OfflineDataset, and DiskDataset and returns a key-value pair for each ensemble member, containing output of the same structure as the wrapped dataset.

The wrapper controls how much data is shared between ensemble members through the data_reuse parameter:

  • data_reuse = 1.0: all ensemble members receive identical data.

  • data_reuse = 0.0: each member receives maximally different data.

  • intermediate values: the total amount of data used per step / per epoch interpolates linearly between these extremes.

Parameters:
datasetkeras.utils.PyDataset

A BayesFlow dataset (OnlineDataset, OfflineDataset, DiskDataset).

member_names: Sequence[str]

Names of ensemble members, used as dictionary keys.

data_reusefloat, default=1.0

Degree of independence between ensemble members in [0, 1]. See Notes for how it is applied for different dataset types.

Notes

Implementation details differ by dataset type:

OnlineDataset

A larger “pool” of simulations is generated per training step and split into overlapping member batches (sharing is enforced per batch). This is implemented by EnsembleOnlineDataset.

OfflineDataset / DiskDataset

A member-specific subdataset (window into the full index set) is constructed once on initialization. Batches are drawn from these subdatasets and reshuffled on on_epoch_end (sharing is enforced at the subdataset level). This is implemented by EnsembleIndexedDataset.

property num_batches: int#

Number of batches in the PyDataset.

Returns:

The number of batches in the PyDataset or None to indicate that the dataset is infinite.

property batch_size: int#
property max_queue_size#
on_epoch_begin()#

Method called at the beginning of every epoch.

on_epoch_end()[source]#

Method called at the end of every epoch.

property use_multiprocessing#
property workers#