EnsembleDataset#
- class bayesflow.datasets.EnsembleDataset(dataset: PyDataset, member_names: Sequence[str], data_reuse: float = 1.0, **kwargs)[source]#
Bases:
PyDatasetWrap a BayesFlow dataset to provide per-ensemble-member batches.
This dataset class is the recommended entry point for training ensembles. The wrapped dataset should meet the requirements of any single approximator in the
EnsembleApproximator. EnsembleDataset supportsOnlineDataset,OfflineDataset, andDiskDatasetand returns a key-value pair for each ensemble member, containing output of the same structure as the wrapped dataset.The wrapper controls how much data is shared between ensemble members through the
data_reuseparameter:data_reuse = 1.0: all ensemble members receive identical data.data_reuse = 0.0: each member receives maximally different data.intermediate values: the total amount of data used per step / per epoch interpolates linearly between these extremes.
- Parameters:
- datasetkeras.utils.PyDataset
A BayesFlow dataset (OnlineDataset, OfflineDataset, DiskDataset).
- member_names: Sequence[str]
Names of ensemble members, used as dictionary keys.
- data_reusefloat, default=1.0
Degree of independence between ensemble members in
[0, 1]. See Notes for how it is applied for different dataset types.
Notes
Implementation details differ by dataset type:
- OnlineDataset
A larger “pool” of simulations is generated per training step and split into overlapping member batches (sharing is enforced per batch). This is implemented by
EnsembleOnlineDataset.- OfflineDataset / DiskDataset
A member-specific subdataset (window into the full index set) is constructed once on initialization. Batches are drawn from these subdatasets and reshuffled on
on_epoch_end(sharing is enforced at the subdataset level). This is implemented byEnsembleIndexedDataset.
- property num_batches: int#
Number of batches in the PyDataset.
- Returns:
The number of batches in the PyDataset or None to indicate that the dataset is infinite.
- property max_queue_size#
- on_epoch_begin()#
Method called at the beginning of every epoch.
- property use_multiprocessing#
- property workers#