OfflineDataset#

class bayesflow.datasets.OfflineDataset(data: Mapping[str, ndarray], batch_size: int, adapter: Adapter | None, num_samples: int = None, *, augmentations: Callable | Mapping[str, Callable] | Sequence[Callable] = None, shuffle: bool = True, **kwargs)[source]#

Bases: PyDataset

A dataset that is pre-simulated and stored in memory.

When storing and loading data from disk, it is recommended to save any pre-simulated data in raw form and create the OfflineDataset object only after loading in the raw data. See DiskDataset for handling large datasets that are split into multiple smaller files.

Parameters:

dataMapping[str, np.ndarray]

Pre-simulated data stored in a dictionary, where each key maps to a NumPy array.

batch_sizeint

Number of samples per batch.

adapterAdapter or None

Optional adapter to transform the batch.

num_samplesint, optional

Number of samples in the dataset. If None, it will be inferred from the data.

augmentationsCallable or Mapping[str, Callable] or Sequence[Callable], optional

A single augmentation function, dictionary of augmentation functions, or sequence of augmentation functions to apply to the batch.

If you provide a dictionary of functions, each function should accept one element of your output batch and return the corresponding transformed element.

Otherwise, your function should accept the entire dictionary output and return a dictionary.

Note: augmentations are applied before the adapter is called and are generally transforms that you only want to apply during training.

shufflebool, optional

Whether to shuffle the dataset at initialization and at the end of each epoch. Default is True.

**kwargs

Additional keyword arguments passed to the base PyDataset.

property num_batches: int#

Number of batches in the PyDataset.

Returns:: The number of batches in the PyDataset or None to indicate that the dataset is infinite.

get_batch_by_sample_indices(indices: ndarray) → dict[str, ndarray][source]#

Return a batch for explicit sample indices.

This method is the index-based access primitive used by ensemble dataset wrappers. It selects samples from the underlying in-memory arrays, then applies augmentations and the adapter just like in __getitem__().

Parameters:

indicesnp.ndarray: 1D integer array of sample indices in the range [0, num_samples). The returned batch will have leading dimension len(indices).

Returns:

dict of str to np.ndarray: A batch dictionary where each NumPy array has shape (len(indices), ...). Non-array entries are passed through unchanged.

on_epoch_end() → None[source]#: Method called at the end of every epoch.

shuffle() → None[source]#: Shuffle the dataset in-place.

property max_queue_size#

on_epoch_begin()#: Method called at the beginning of every epoch.

property use_multiprocessing#

property workers#