OfflineDataset#

class bayesflow.datasets.OfflineDataset(data: Mapping[str, ndarray], batch_size: int, adapter: Adapter | None, num_samples: int = None, *, stage: str = 'training', augmentations: Callable | Mapping[str, Callable] | Sequence[Callable] = None, shuffle: bool = True, **kwargs)[source]#

Bases: PyDataset

A dataset that is pre-simulated and stored in memory. When storing and loading data from disk, it is recommended to save any pre-simulated data in raw form and create the OfflineDataset object only after loading in the raw data. See the DiskDataset class for handling large datasets that are split into multiple smaller files.

Initialize an OfflineDataset instance for offline training with optional data augmentations.

Parameters:

dataMapping[str, np.ndarray]

Pre-simulated data stored in a dictionary, where each key maps to a NumPy array.

batch_sizeint

Number of samples per batch.

adapterAdapter or None

Optional adapter to transform the batch.

num_samplesint, optional

Number of samples in the dataset. If None, it will be inferred from the data.

stagestr, default=”training”

Current stage (e.g., “training”, “validation”, etc.) used by the adapter.

augmentationsCallable or Mapping[str, Callable] or Sequence[Callable], optional

A single augmentation function, dictionary of augmentation functions, or sequence of augmentation functions to apply to the batch.

If you provide a dictionary of functions, each function should accept one element of your output batch and return the corresponding transformed element.

Otherwise, your function should accept the entire dictionary output and return a dictionary.

Note - augmentations are applied before the adapter is called and are generally transforms that you only want to apply during training.

shufflebool, optional

Whether to shuffle the dataset at initialization and at the end of each epoch. Default is True.

**kwargs

Additional keyword arguments passed to the base PyDataset.

property num_batches: int | None#

Number of batches in the PyDataset.

Returns:: The number of batches in the PyDataset or None to indicate that the dataset is infinite.

on_epoch_end() → None[source]#: Method called at the end of every epoch.

shuffle() → None[source]#: Shuffle the dataset in-place.

property max_queue_size#

on_epoch_begin()#: Method called at the beginning of every epoch.

property use_multiprocessing#

property workers#