bayesflow.summary_networks module#

class bayesflow.summary_networks.TimeSeriesTransformer(*args, **kwargs)[source]#

Bases: Model

Implements a many-to-one transformer architecture for time series encoding. Some ideas can be found in [1]:

[1] Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., & Sun, L. (2022). Transformers in time series: A survey. arXiv preprint arXiv:2202.07125. https://arxiv.org/abs/2202.07125

__init__(input_dim, attention_settings=None, dense_settings=None, use_layer_norm=True, num_dense_fc=2, summary_dim=10, num_attention_blocks=2, template_type='lstm', bidirectional=False, template_dim=64, **kwargs)[source]#

Creates a transformer architecture for encoding time series data into fixed size vectors given by summary_dim. It features a recurrent network given by template_type which is responsible for providing a single summary of the time series which then attends to each point in the time series pro- cessed via a series of num_attention_blocks self-attention layers.

Important: Assumes that positional encodings have been appended to the input time series, e.g., through a custom configurator.

Recommended: When using transformers as summary networks, you may want to use a smaller learning rate during training, e.g., setting default_lr=5e-5 in a Trainer instance.

Layer normalization (controllable through the use_layer_norm keyword argument) may not always work well in certain applications. Consider setting it to False if the network is underperforming.

Parameters:
input_dimint

The dimensionality of the input data (last axis).

attention_settingsdict or None, optional, default None

A dictionary which will be unpacked as the arguments for the MultiHeadAttention layer. If None, default settings will be used (see bayesflow.default_settings) For instance, to use an attention block with 4 heads and key dimension 32, you can do:

attention_settings=dict(num_heads=4, key_dim=32)

You may also want to include dropout regularization in small-to-medium data regimes:

attention_settings=dict(num_heads=4, key_dim=32, dropout=0.1)

For more details and arguments, see: https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention

dense_settingsdict or None, optional, default: None

A dictionary which will be unpacked as the arguments for the Dense layer. For instance, to use hidden layers with 32 units and a relu activation, you can do:

``dict(units=32, activation=’relu’)

For more details and arguments, see: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense

use_layer_normboolean, optional, default: True

Whether layer normalization before and after attention + feedforward

num_dense_fcint, optional, default: 2

The number of hidden layers for the internal feedforward network

summary_dimint

The dimensionality of the learned permutation-invariant representation.

num_attention_blocksint, optional, default: 2

The number of self-attention blocks to use before pooling.

template_typestr or callable, optional, default: ‘lstm’

The many-to-one (learnable) transformation of the time series. if lstm, an LSTM network will be used. if gru, a GRU unit will be used. if callable, a reference to template_type will be stored as an attribute.

bidirectionalbool, optional, default: False

Indicates whether the involved LSTM template network is bidirectional (i.e., forward and backward in time) or unidirectional (forward in time). Defaults to False, but may increase performance in some applications.

template_dimint, optional, default: 64

Only used if template_type in [‘lstm’, ‘gru’]. The number of hidden units (equiv. output dimensions) of the recurrent network. When using bidirectional=True, the output dimensions of the template will be double the template_dim size, so consider reducing it in half.

**kwargsdict, optional, default: {}

Optional keyword arguments passed to the __init__() method of tf.keras.Model

call(x, **kwargs)[source]#

Performs the forward pass through the transformer.

Parameters:
xtf.Tensor

Time series input of shape (batch_size, num_time_points, input_dim)

Returns:
outtf.Tensor

Output of shape (batch_size, summary_dim)

class bayesflow.summary_networks.SetTransformer(*args, **kwargs)[source]#

Bases: Model

Implements the set transformer architecture from [1] which ultimately represents a learnable permutation-invariant function. Designed to naturally model interactions in the input set, which may be hard to capture with the simpler DeepSet architecture.

[1] Lee, J., Lee, Y., Kim, J., Kosiorek, A., Choi, S., & Teh, Y. W. (2019).

Set transformer: A framework for attention-based permutation-invariant neural networks. In International conference on machine learning (pp. 3744-3753). PMLR.

__init__(input_dim, attention_settings=None, dense_settings=None, use_layer_norm=False, num_dense_fc=2, summary_dim=10, num_attention_blocks=2, num_inducing_points=32, num_seeds=1, **kwargs)[source]#

Creates a set transformer architecture according to [1] which will extract permutation-invariant features from an input set using a set of seed vectors (typically one for a single summary) with summary_dim output dimensions.

Recommended: When using transformers as summary networks, you may want to use a smaller learning rate during training, e.g., setting default_lr=1e-4 in a Trainer instance.

Parameters:
input_dimint

The dimensionality of the input data (last axis).

attention_settingsdict or None, optional, default: None

A dictionary which will be unpacked as the arguments for the MultiHeadAttention layer For instance, to use an attention block with 4 heads and key dimension 32, you can do:

attention_settings=dict(num_heads=4, key_dim=32)

You may also want to include stronger dropout regularization in small-to-medium data regimes:

attention_settings=dict(num_heads=4, key_dim=32, dropout=0.1)

For more details and arguments, see: https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention

dense_settingsdict or None, optional, default: None

A dictionary which will be unpacked as the arguments for the Dense layer. For instance, to use hidden layers with 32 units and a relu activation, you can do:

``dict(units=32, activation=’relu’)

For more details and arguments, see: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense

use_layer_normboolean, optional, default: False

Whether to use layer normalization before and after attention + feedforward

num_dense_fcint, optional, default: 2

The number of hidden layers for the internal feedforward network

summary_dimint

The dimensionality of the learned permutation-invariant representation.

num_attention_blocksint, optional, default: 2

The number of self-attention blocks to use before pooling.

num_inducing_pointsint or None, optional, default: 32

The number of inducing points. Should be lower than the smallest set size. If None selected, a vanilla self-attention block (SAB) will be used, otherwise ISAB blocks will be used. For num_attention_blocks > 1, we currently recommend always using some number of inducing points.

num_seedsint, optional, default: 1

The number of “seed vectors” to use. Each seed vector represents a permutation-invariant summary of the entire set. If you use num_seeds > 1, the resulting seeds will be flattened into a 2-dimensional output, which will have a dimensionality of num_seeds * summary_dim.

**kwargsdict, optional, default: {}

Optional keyword arguments passed to the __init__() method of tf.keras.Model

call(x, **kwargs)[source]#

Performs the forward pass through the set-transformer.

Parameters:
xtf.Tensor

The input set of shape (batch_size, set_size, input_dim)

Returns:
outtf.Tensor

Output of shape (batch_size, summary_dim * num_seeds)

class bayesflow.summary_networks.DeepSet(*args, **kwargs)[source]#

Bases: Model

Implements a deep permutation-invariant network according to [1] and [2].

[1] Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R. R., & Smola, A. J. (2017). Deep sets. Advances in neural information processing systems, 30.

[2] Bloem-Reddy, B., & Teh, Y. W. (2020). Probabilistic Symmetries and Invariant Neural Networks. J. Mach. Learn. Res., 21, 90-1.

__init__(summary_dim=10, num_dense_s1=2, num_dense_s2=2, num_dense_s3=2, num_equiv=2, dense_s1_args=None, dense_s2_args=None, dense_s3_args=None, pooling_fun='mean', **kwargs)[source]#

Creates a stack of ‘num_equiv’ equivariant layers followed by a final invariant layer.

Parameters:
summary_dimint, optional, default: 10

The number of learned summary statistics.

num_dense_s1int, optional, default: 2

The number of dense layers in the inner function of a deep set.

num_dense_s2int, optional, default: 2

The number of dense layers in the outer function of a deep set.

num_dense_s3int, optional, default: 2

The number of dense layers in an equivariant layer.

num_equivint, optional, default: 2

The number of equivariant layers in the network.

dense_s1_argsdict or None, optional, default: None

The arguments for the dense layers of s1 (inner, pre-pooling function). If None, defaults will be used (see default_settings). Otherwise, all arguments for a tf.keras.layers.Dense layer are supported.

dense_s2_argsdict or None, optional, default: None

The arguments for the dense layers of s2 (outer, post-pooling function). If None, defaults will be used (see default_settings). Otherwise, all arguments for a tf.keras.layers.Dense layer are supported.

dense_s3_argsdict or None, optional, default: None

The arguments for the dense layers of s3 (equivariant function). If None, defaults will be used (see default_settings). Otherwise, all arguments for a tf.keras.layers.Dense layer are supported.

pooling_funstr of callable, optional, default: ‘mean’

If string argument provided, should be one in [‘mean’, ‘max’]. In addition, ac actual neural network can be passed for learnable pooling.

**kwargsdict, optional, default: {}

Optional keyword arguments passed to the __init__() method of tf.keras.Model.

call(x, **kwargs)[source]#

Performs the forward pass of a learnable deep invariant transformation consisting of a sequence of equivariant transforms followed by an invariant transform.

Parameters:
xtf.Tensor

Input of shape (batch_size, n_obs, data_dim)

Returns:
outtf.Tensor

Output of shape (batch_size, out_dim)

class bayesflow.summary_networks.InvariantNetwork(*args, **kwargs)[source]#

Bases: DeepSet

Deprecated class for InvariantNetwork.

__init__(*args, **kwargs)[source]#

Creates a stack of ‘num_equiv’ equivariant layers followed by a final invariant layer.

Parameters:
summary_dimint, optional, default: 10

The number of learned summary statistics.

num_dense_s1int, optional, default: 2

The number of dense layers in the inner function of a deep set.

num_dense_s2int, optional, default: 2

The number of dense layers in the outer function of a deep set.

num_dense_s3int, optional, default: 2

The number of dense layers in an equivariant layer.

num_equivint, optional, default: 2

The number of equivariant layers in the network.

dense_s1_argsdict or None, optional, default: None

The arguments for the dense layers of s1 (inner, pre-pooling function). If None, defaults will be used (see default_settings). Otherwise, all arguments for a tf.keras.layers.Dense layer are supported.

dense_s2_argsdict or None, optional, default: None

The arguments for the dense layers of s2 (outer, post-pooling function). If None, defaults will be used (see default_settings). Otherwise, all arguments for a tf.keras.layers.Dense layer are supported.

dense_s3_argsdict or None, optional, default: None

The arguments for the dense layers of s3 (equivariant function). If None, defaults will be used (see default_settings). Otherwise, all arguments for a tf.keras.layers.Dense layer are supported.

pooling_funstr of callable, optional, default: ‘mean’

If string argument provided, should be one in [‘mean’, ‘max’]. In addition, ac actual neural network can be passed for learnable pooling.

**kwargsdict, optional, default: {}

Optional keyword arguments passed to the __init__() method of tf.keras.Model.

class bayesflow.summary_networks.SequenceNetwork(*args, **kwargs)[source]#

Bases: Model

Implements a sequence of MultiConv1D layers followed by an (bidirectional) LSTM network.

For details and rationale, see [1]:

[1] Radev, S. T., Graw, F., Chen, S., Mutters, N. T., Eichel, V. M., Bärnighausen, T., & Köthe, U. (2021). OutbreakFlow: Model-based Bayesian inference of disease outbreak dynamics with invertible neural networks and its application to the COVID-19 pandemics in Germany. PLoS computational biology, 17(10), e1009472.

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009472

__init__(summary_dim=10, num_conv_layers=2, lstm_units=128, bidirectional=False, conv_settings=None, **kwargs)[source]#

Creates a stack of inception-like layers followed by an LSTM network, with the idea of learning vector representations from multivariate time series data.

Parameters:
summary_dimint, optional, default: 10

The number of learned summary statistics.

num_conv_layersint, optional, default: 2

The number of convolutional layers to use.

lstm_unitsint, optional, default: 128

The number of hidden LSTM units.

conv_settingsdict or None, optional, default: None

The arguments passed to the MultiConv1D internal networks. If None, defaults will be used from default_settings. If a dictionary is provided, it should contain the following keys: - layer_args (dict) : arguments for tf.keras.layers.Conv1D without kernel_size - min_kernel_size (int) : the minimum kernel size (>= 1) - max_kernel_size (int) : the maximum kernel size

bidirectionalbool, optional, default: False

Indicates whether the involved LSTM network is bidirectional (forward and backward in time) or unidirectional (forward in time). Defaults to False, but may increase performance.

**kwargsdict

Optional keyword arguments passed to the __init__() method of tf.keras.Model

call(x, **kwargs)[source]#

Performs a forward pass through the network by first passing x through the sequence of multi-convolutional layers and then applying the LSTM network.

Parameters:
xtf.Tensor

Input of shape (batch_size, n_time_steps, n_time_series)

Returns:
outtf.Tensor

Output of shape (batch_size, summary_dim)

class bayesflow.summary_networks.SequentialNetwork(*args, **kwargs)[source]#

Bases: SequenceNetwork

Deprecated class for amortizer posterior estimation.

__init__(*args, **kwargs)[source]#

Creates a stack of inception-like layers followed by an LSTM network, with the idea of learning vector representations from multivariate time series data.

Parameters:
summary_dimint, optional, default: 10

The number of learned summary statistics.

num_conv_layersint, optional, default: 2

The number of convolutional layers to use.

lstm_unitsint, optional, default: 128

The number of hidden LSTM units.

conv_settingsdict or None, optional, default: None

The arguments passed to the MultiConv1D internal networks. If None, defaults will be used from default_settings. If a dictionary is provided, it should contain the following keys: - layer_args (dict) : arguments for tf.keras.layers.Conv1D without kernel_size - min_kernel_size (int) : the minimum kernel size (>= 1) - max_kernel_size (int) : the maximum kernel size

bidirectionalbool, optional, default: False

Indicates whether the involved LSTM network is bidirectional (forward and backward in time) or unidirectional (forward in time). Defaults to False, but may increase performance.

**kwargsdict

Optional keyword arguments passed to the __init__() method of tf.keras.Model

class bayesflow.summary_networks.SplitNetwork(*args, **kwargs)[source]#

Bases: Model

Implements a vertical stack of networks and concatenates their individual outputs. Allows for splitting of data to provide an individual network for each split of the data.

__init__(num_splits, split_data_configurator, network_type=<class 'bayesflow.summary_networks.DeepSet'>, network_kwargs={}, **kwargs)[source]#

Creates a composite network of num_splits subnetworks of type network_type, each with configuration specified by meta.

Parameters:
num_splitsint

The number if splits for the data, which will equal the number of sub-networks.

split_data_configuratorcallable

Function that takes the arguments i and x where i is the index of the network and x are the inputs to the SplitNetwork. Should return the input for the corresponding network.

For example, to achieve a network with is permutation-invariant both vertically (i.e., across rows) and horizontally (i.e., across columns), one could to: `def split(i, x):

selector = tf.where(x[:,:,0]==i, 1.0, 0.0) selected = x[:,:,1] * selector split_x = tf.stack((selector, selected), axis=-1) return split_x

` where x[:,:,0] contains an integer indicating which split the data in x[:,:,1] belongs to. All values in x[:,:,1] that are not selected are set to zero. The selector is passed along with the modified data, indicating which rows belong to the split i.

network_typecallable, optional, default: InvariantNetowk

Type of neural network to use.

network_kwargsdict, optional, default: {}

A dictionary containing the configuration for the networks.

**kwargs

Optional keyword arguments to be passed to the tf.keras.Model superclass.

call(x, **kwargs)[source]#

Performs a forward pass through the subnetworks and concatenates their output.

Parameters:
xtf.Tensor

Input of shape (batch_size, n_obs, data_dim)

Returns:
outtf.Tensor

Output of shape (batch_size, out_dim)

class bayesflow.summary_networks.HierarchicalNetwork(*args, **kwargs)[source]#

Bases: Model

Implements a hierarchical summary network according to [1].

[1] Elsemüller, L., Schnuerch, M., Bürkner, P. C., & Radev, S. T. (2023).

A Deep Learning Method for Comparing Bayesian Hierarchical Models. arXiv preprint arXiv:2301.11873.

__init__(networks_list, **kwargs)[source]#

Creates a hierarchical network consisting of stacked summary networks (one for each hierarchical level) that are aligned with the probabilistic structure of the processed data.

Note: The networks will start processing from the lowest hierarchical level (e.g., observational level) up to the highest hierarchical level. It is recommended to provide higher-level networks with more expressive power to allow for an adequate compression of lower-level data.

Example: For two-level hierarchical models with the assumption of temporal dependencies on the lowest hierarchical level (e.g., observational level) and exchangeable units at the higher level (e.g., group level), a list of [SequenceNetwork(), DeepSet()] could be passed.


Parameters: networks_list : list of tf.keras.Model:

The list of summary networks (one per hierarchical level), starting from the lowest hierarchical level

call(x, return_all=False, **kwargs)[source]#

Performs the forward pass through the hierarchical network, transforming the nested input into learned summary statistics.

Parameters:
xtf.Tensor of shape (batch_size, …, data_dim)

Example, hierarchical data sets with two levels: (batch_size, D, L, x_dim) -> reduces to (batch_size, out_dim).

return_allboolean, optional, default: False

Whether to return all intermediate outputs (True) or just the final one (False).

Returns:
outtf.Tensor

Output of shape (batch_size, out_dim) if return_all=False else a tuple of len(outputs) == len(networks) corresponding to all outputs of all networks.