OfflineDataset#
- class bayesflow.datasets.OfflineDataset(data: Mapping[str, ndarray], batch_size: int, adapter: Adapter | None, num_samples: int = None, *, augmentations: Callable | Mapping[str, Callable] | Sequence[Callable] = None, shuffle: bool = True, **kwargs)[source]#
Bases:
PyDatasetA dataset that is pre-simulated and stored in memory.
When storing and loading data from disk, it is recommended to save any pre-simulated data in raw form and create the
OfflineDatasetobject only after loading in the raw data. SeeDiskDatasetfor handling large datasets that are split into multiple smaller files.- Parameters:
- dataMapping[str, np.ndarray]
Pre-simulated data stored in a dictionary, where each key maps to a NumPy array.
- batch_sizeint
Number of samples per batch.
- adapterAdapter or None
Optional adapter to transform the batch.
- num_samplesint, optional
Number of samples in the dataset. If
None, it will be inferred from the data.- augmentationsCallable or Mapping[str, Callable] or Sequence[Callable], optional
A single augmentation function, dictionary of augmentation functions, or sequence of augmentation functions to apply to the batch.
If you provide a dictionary of functions, each function should accept one element of your output batch and return the corresponding transformed element.
Otherwise, your function should accept the entire dictionary output and return a dictionary.
Note: augmentations are applied before the adapter is called and are generally transforms that you only want to apply during training.
- shufflebool, optional
Whether to shuffle the dataset at initialization and at the end of each epoch. Default is
True.- **kwargs
Additional keyword arguments passed to the base
PyDataset.
- property num_batches: int#
Number of batches in the PyDataset.
- Returns:
The number of batches in the PyDataset or None to indicate that the dataset is infinite.
- get_batch_by_sample_indices(indices: ndarray) dict[str, ndarray][source]#
Return a batch for explicit sample indices.
This method is the index-based access primitive used by ensemble dataset wrappers. It selects samples from the underlying in-memory arrays, then applies augmentations and the adapter just like in
__getitem__().- Parameters:
- indicesnp.ndarray
1D integer array of sample indices in the range
[0, num_samples). The returned batch will have leading dimensionlen(indices).
- Returns:
- dict of str to np.ndarray
A batch dictionary where each NumPy array has shape
(len(indices), ...). Non-array entries are passed through unchanged.
- property max_queue_size#
- on_epoch_begin()#
Method called at the beginning of every epoch.
- property use_multiprocessing#
- property workers#