MLP#

class bayesflow.networks.MLP(*args, **kwargs)[source]#

Bases: Sequential

Implements a simple configurable MLP with optional residual connections and dropout.

If used in conjunction with a coupling net, a diffusion model, or a flow matching model, it assumes that the input and conditions are already concatenated (i.e., this is a single-input model).

Implements a flexible multi-layer perceptron (MLP) with optional residual connections, dropout, and spectral normalization.

This MLP can be used as a general-purpose feature extractor or function approximator, supporting configurable depth, width, activation functions, and weight initializations.

If residual is enabled, each layer includes a skip connection for improved gradient flow. The model also supports dropout for regularization and spectral normalization for stability in learning smooth functions.

Parameters:
widthsSequence[int], optional

Defines the number of hidden units per layer, as well as the number of layers to be used.

activationstr, optional

Activation function applied in the hidden layers, such as “mish”. Default is “mish”.

kernel_initializerstr, optional

Initialization strategy for kernel weights, such as “he_normal”. Default is “he_normal”.

residualbool, optional

Whether to use residual connections for improved training stability. Default is False.

dropoutfloat or None, optional

Dropout rate applied within the MLP layers for regularization. Default is 0.05.

norm: str, optional
spectral_normalizationbool, optional

Whether to apply spectral normalization to stabilize training. Default is False.

**kwargs

Additional keyword arguments passed to the Keras layer initialization.

build(input_shape=None)[source]#
classmethod from_config(config, custom_objects=None)[source]#

Creates an operation from its config.

This method is the reverse of get_config, capable of instantiating the same operation from the config dictionary.

Note: If you override this method, you might receive a serialized dtype config, which is a dict. You can deserialize it as follows:

if "dtype" in config and isinstance(config["dtype"], dict):
    policy = dtype_policies.deserialize(config["dtype"])
Args:

config: A Python dictionary, typically the output of get_config.

Returns:

An operation instance.

get_config()[source]#

Returns the config of the object.

An object config is a Python dictionary (serializable) containing the information needed to re-instantiate it.

__call__(*args, **kwargs)#

Call self as a function.

add(layer, rebuild=True)#

Adds a layer instance on top of the layer stack.

Args:

layer: layer instance.

add_loss(loss)#

Can be called inside of the call() method to add a scalar loss.

Example:

class MyLayer(Layer):
    ...
    def call(self, x):
        self.add_loss(ops.sum(x))
        return x
add_metric(*args, **kwargs)#
add_variable(shape, initializer, dtype=None, trainable=True, autocast=True, regularizer=None, constraint=None, name=None)#

Add a weight variable to the layer.

Alias of add_weight().

add_weight(shape=None, initializer=None, dtype=None, trainable=True, autocast=True, regularizer=None, constraint=None, aggregation='none', name=None)#

Add a weight variable to the layer.

Args:
shape: Shape tuple for the variable. Must be fully-defined

(no None entries). Defaults to () (scalar) if unspecified.

initializer: Initializer object to use to populate the initial

variable value, or string name of a built-in initializer (e.g. “random_normal”). If unspecified, defaults to “glorot_uniform” for floating-point variables and to “zeros” for all other types (e.g. int, bool).

dtype: Dtype of the variable to create, e.g. “float32”. If

unspecified, defaults to the layer’s variable dtype (which itself defaults to “float32” if unspecified).

trainable: Boolean, whether the variable should be trainable via

backprop or whether its updates are managed manually. Defaults to True.

autocast: Boolean, whether to autocast layers variables when

accessing them. Defaults to True.

regularizer: Regularizer object to call to apply penalty on the

weight. These penalties are summed into the loss function during optimization. Defaults to None.

constraint: Contrainst object to call on the variable after any

optimizer update, or string name of a built-in constraint. Defaults to None.

aggregation: Optional string, one of None, “none”, “mean”,

“sum” or “only_first_replica”. Annotates the variable with the type of multi-replica aggregation to be used for this variable when writing custom data parallel training loops. Defaults to “none”.

name: String name of the variable. Useful for debugging purposes.

build_from_config(config)#

Builds the layer’s states with the supplied config dict.

By default, this method calls the build(config[“input_shape”]) method, which creates weights based on the layer’s input shape in the supplied config. If your config contains other information needed to load the layer’s state, you should override this method.

Args:

config: Dict containing the input shape associated with this layer.

call(inputs, training=None, mask=None)#
compile(optimizer='rmsprop', loss=None, loss_weights=None, metrics=None, weighted_metrics=None, run_eagerly=False, steps_per_execution=1, jit_compile='auto', auto_scale_loss=True)#

Configures the model for training.

Example:

model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[
        keras.metrics.BinaryAccuracy(),
        keras.metrics.FalseNegatives(),
    ],
)
Args:
optimizer: String (name of optimizer) or optimizer instance. See

keras.optimizers.

loss: Loss function. May be a string (name of loss function), or

a keras.losses.Loss instance. See keras.losses. A loss function is any callable with the signature loss = fn(y_true, y_pred), where y_true are the ground truth values, and y_pred are the model’s predictions. y_true should have shape (batch_size, d0, .. dN) (except in the case of sparse loss functions such as sparse categorical crossentropy which expects integer arrays of shape (batch_size, d0, .. dN-1)). y_pred should have shape (batch_size, d0, .. dN). The loss function should return a float tensor.

loss_weights: Optional list or dictionary specifying scalar

coefficients (Python floats) to weight the loss contributions of different model outputs. The loss value that will be minimized by the model will then be the weighted sum of all individual losses, weighted by the loss_weights coefficients. If a list, it is expected to have a 1:1 mapping to the model’s outputs. If a dict, it is expected to map output names (strings) to scalar coefficients.

metrics: List of metrics to be evaluated by the model during

training and testing. Each of this can be a string (name of a built-in function), function or a keras.metrics.Metric instance. See keras.metrics. Typically you will use metrics=[‘accuracy’]. A function is any callable with the signature result = fn(y_true, _pred). To specify different metrics for different outputs of a multi-output model, you could also pass a dictionary, such as metrics={‘a’:’accuracy’, ‘b’:[‘accuracy’, ‘mse’]}. You can also pass a list to specify a metric or a list of metrics for each output, such as metrics=[[‘accuracy’], [‘accuracy’, ‘mse’]] or metrics=[‘accuracy’, [‘accuracy’, ‘mse’]]. When you pass the strings ‘accuracy’ or ‘acc’, we convert this to one of keras.metrics.BinaryAccuracy, keras.metrics.CategoricalAccuracy, keras.metrics.SparseCategoricalAccuracy based on the shapes of the targets and of the model output. A similar conversion is done for the strings “crossentropy” and “ce” as well. The metrics passed here are evaluated without sample weighting; if you would like sample weighting to apply, you can specify your metrics via the weighted_metrics argument instead.

weighted_metrics: List of metrics to be evaluated and weighted by

sample_weight or class_weight during training and testing.

run_eagerly: Bool. If True, this model’s forward pass

will never be compiled. It is recommended to leave this as False when training (for best performance), and to set it to True when debugging.

steps_per_execution: Int. The number of batches to run

during each a single compiled function call. Running multiple batches inside a single compiled function call can greatly improve performance on TPUs or small models with a large Python overhead. At most, one full epoch will be run each execution. If a number larger than the size of the epoch is passed, the execution will be truncated to the size of the epoch. Note that if steps_per_execution is set to N, Callback.on_batch_begin and Callback.on_batch_end methods will only be called every N batches (i.e. before/after each compiled function execution). Not supported with the PyTorch backend.

jit_compile: Bool or “auto”. Whether to use XLA compilation when

compiling a model. For jax and tensorflow backends, jit_compile=”auto” enables XLA compilation if the model supports it, and disabled otherwise. For torch backend, “auto” will default to eager execution and jit_compile=True will run with torch.compile with the “inductor” backend.

auto_scale_loss: Bool. If True and the model dtype policy is

“mixed_float16”, the passed optimizer will be automatically wrapped in a LossScaleOptimizer, which will dynamically scale the loss to prevent underflow.

compile_from_config(config)#

Compiles the model with the information given in config.

This method uses the information in the config (optimizer, loss, metrics, etc.) to compile the model.

Args:

config: Dict containing information for compiling the model.

property compute_dtype#

The dtype of the computations performed by the layer.

compute_loss(x=None, y=None, y_pred=None, sample_weight=None, training=True)#

Compute the total loss, validate it, and return it.

Subclasses can optionally override this method to provide custom loss computation logic.

Example:

class MyModel(Model):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.loss_tracker = metrics.Mean(name='loss')

    def compute_loss(self, x, y, y_pred, sample_weight, training=True):
        loss = ops.mean((y_pred - y) ** 2)
        loss += ops.sum(self.losses)
        self.loss_tracker.update_state(loss)
        return loss

    def reset_metrics(self):
        self.loss_tracker.reset_state()

    @property
    def metrics(self):
        return [self.loss_tracker]

inputs = layers.Input(shape=(10,), name='my_input')
outputs = layers.Dense(10)(inputs)
model = MyModel(inputs, outputs)
model.add_loss(ops.sum(outputs))

optimizer = SGD()
model.compile(optimizer, loss='mse', steps_per_execution=10)
dataset = ...
model.fit(dataset, epochs=2, steps_per_epoch=10)
print(f"Custom loss: {model.loss_tracker.result()}")
Args:

x: Input data. y: Target data. y_pred: Predictions returned by the model (output of model(x)) sample_weight: Sample weights for weighting the loss function. training: Whether we are training or evaluating the model.

Returns:

The total loss as a scalar tensor, or None if no loss results (which is the case when called by Model.test_step).

compute_loss_and_updates(trainable_variables, non_trainable_variables, metrics_variables, x, y, sample_weight, training=False, optimizer_variables=None)#

This method is stateless and is intended for use with jax.grad.

compute_mask(inputs, previous_mask)#
compute_metrics(x, y, y_pred, sample_weight=None)#

Update metric states and collect all metrics to be returned.

Subclasses can optionally override this method to provide custom metric updating and collection logic. Custom metrics are not passed in compile(), they can be created in __init__ or build. They are automatically tracked and returned by self.metrics.

Example:

class MyModel(Sequential):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.custom_metric = MyMetric(name="custom_metric")

    def compute_metrics(self, x, y, y_pred, sample_weight):
        # This super call updates metrics from `compile` and returns
        # results for all metrics listed in `self.metrics`.
        metric_results = super().compute_metrics(
            x, y, y_pred, sample_weight)

        # `metric_results` contains the previous result for
        # `custom_metric`, this is where we update it.
        self.custom_metric.update_state(x, y, y_pred, sample_weight)
        metric_results['custom_metric'] = self.custom_metric.result()
        return metric_results
Args:

x: Input data. y: Target data. y_pred: Predictions returned by the model output of model.call(x). sample_weight: Sample weights for weighting the loss function.

Returns:

A dict containing values that will be passed to keras.callbacks.CallbackList.on_train_batch_end(). Typically, the values of the metrics listed in self.metrics are returned. Example: {‘loss’: 0.2, ‘accuracy’: 0.7}.

compute_output_shape(input_shape)#
compute_output_spec(inputs, training=None, mask=None)#
count_params()#

Count the total number of scalars composing the weights.

Returns:

An integer count.

property dtype#

Alias of layer.variable_dtype.

property dtype_policy#
evaluate(x=None, y=None, batch_size=None, verbose='auto', sample_weight=None, steps=None, callbacks=None, return_dict=False, **kwargs)#

Returns the loss value & metrics values for the model in test mode.

Computation is done in batches (see the batch_size arg.)

Args:
x: Input data. It can be:
  • A NumPy array (or array-like), or a list of arrays

(in case the model has multiple inputs). - A backend-native tensor, or a list of tensors (in case the model has multiple inputs). - A dict mapping input names to the corresponding array/tensors, if the model has named inputs. - A keras.utils.PyDataset returning (inputs, targets) or (inputs, targets, sample_weights). - A tf.data.Dataset yielding (inputs, targets) or (inputs, targets, sample_weights). - A torch.utils.data.DataLoader yielding (inputs, targets) or (inputs, targets, sample_weights). - A Python generator function yielding (inputs, targets) or (inputs, targets, sample_weights).

y: Target data. Like the input data x, it can be either NumPy

array(s) or backend-native tensor(s). If x is a keras.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or a Python generator function, y should not be specified since targets will be obtained from x.

batch_size: Integer or None.

Number of samples per batch of computation. If unspecified, batch_size will default to 32. Do not specify the batch_size if your input data x is a keras.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or Python generator function since they generate batches.

verbose: “auto”, 0, 1, or 2. Verbosity mode.

0 = silent, 1 = progress bar, 2 = single line. “auto” becomes 1 for most cases. Note that the progress bar is not particularly useful when logged to a file, so verbose=2 is recommended when not running interactively (e.g. in a production environment). Defaults to “auto”.

sample_weight: Optional NumPy array or tensor of weights for

the training samples, used for weighting the loss function (during training only). You can either pass a flat (1D) NumPy array or tensor with the same length as the input samples (1:1 mapping between weights and samples), or in the case of temporal data, you can pass a 2D NumPy array or tensor with shape (samples, sequence_length) to apply a different weight to every timestep of every sample. This argument is not supported when x is a keras.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or Python generator function. Instead, provide sample_weights as the third element of x. Note that sample weighting does not apply to metrics specified via the metrics argument in compile(). To apply sample weighting to your metrics, you can specify them via the weighted_metrics in compile() instead.

steps: Integer or None.

Total number of steps (batches of samples) to draw before declaring the evaluation round finished. If steps is None, it will run until x is exhausted. In the case of an infinitely repeating dataset, it will run indefinitely.

callbacks: List of keras.callbacks.Callback instances.

List of callbacks to apply during evaluation.

return_dict: If True, loss and metric results are returned as a

dict, with each key being the name of the metric. If False, they are returned as a list.

Returns:

Scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has multiple outputs and/or metrics). The attribute model.metrics_names will give you the display labels for the scalar outputs.

export(filepath, format='tf_saved_model', verbose=None, input_signature=None, **kwargs)#

Export the model as an artifact for inference.

Args:
filepath: str or pathlib.Path object. The path to save the

artifact.

format: str. The export format. Supported values:

“tf_saved_model” and “onnx”. Defaults to “tf_saved_model”.

verbose: bool. Whether to print a message during export. Defaults

to None, which uses the default value set by different backends and formats.

input_signature: Optional. Specifies the shape and dtype of the

model inputs. Can be a structure of keras.InputSpec, tf.TensorSpec, backend.KerasTensor, or backend tensor. If not provided, it will be automatically computed. Defaults to None.

**kwargs: Additional keyword arguments:
  • Specific to the JAX backend and format=”tf_saved_model”:
    • is_static: Optional bool. Indicates whether fn is

      static. Set to False if fn involves state updates (e.g., RNG seeds and counters).

    • jax2tf_kwargs: Optional dict. Arguments for

      jax2tf.convert. See the documentation for [jax2tf.convert](

      If native_serialization and polymorphic_shapes are not provided, they will be automatically computed.

Note: This feature is currently supported only with TensorFlow, JAX and Torch backends.

Note: Be aware that the exported artifact may contain information from the local file system when using format=”onnx”, verbose=True and Torch backend.

Examples:

Here’s how to export a TensorFlow SavedModel for inference.

# Export the model as a TensorFlow SavedModel artifact
model.export("path/to/location", format="tf_saved_model")

# Load the artifact in a different process/environment
reloaded_artifact = tf.saved_model.load("path/to/location")
predictions = reloaded_artifact.serve(input_data)

Here’s how to export an ONNX for inference.

# Export the model as a ONNX artifact
model.export("path/to/location", format="onnx")

# Load the artifact in a different process/environment
ort_session = onnxruntime.InferenceSession("path/to/location")
ort_inputs = {
    k.name: v for k, v in zip(ort_session.get_inputs(), input_data)
}
predictions = ort_session.run(None, ort_inputs)
fit(x=None, y=None, batch_size=None, epochs=1, verbose='auto', callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None, validation_batch_size=None, validation_freq=1)#

Trains the model for a fixed number of epochs (dataset iterations).

Args:
x: Input data. It can be:
  • A NumPy array (or array-like), or a list of arrays

(in case the model has multiple inputs). - A backend-native tensor, or a list of tensors (in case the model has multiple inputs). - A dict mapping input names to the corresponding array/tensors, if the model has named inputs. - A keras.utils.PyDataset returning (inputs, targets) or (inputs, targets, sample_weights). - A tf.data.Dataset yielding (inputs, targets) or (inputs, targets, sample_weights). - A torch.utils.data.DataLoader yielding (inputs, targets) or (inputs, targets, sample_weights). - A Python generator function yielding (inputs, targets) or (inputs, targets, sample_weights).

y: Target data. Like the input data x, it can be either NumPy

array(s) or backend-native tensor(s). If x is a keras.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or a Python generator function, y should not be specified since targets will be obtained from x.

batch_size: Integer or None.

Number of samples per gradient update. If unspecified, batch_size will default to 32. Do not specify the batch_size if your input data x is a keras.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or Python generator function since they generate batches.

epochs: Integer. Number of epochs to train the model.

An epoch is an iteration over the entire x and y data provided (unless the steps_per_epoch flag is set to something other than None). Note that in conjunction with initial_epoch, epochs is to be understood as “final epoch”. The model is not trained for a number of iterations given by epochs, but merely until the epoch of index epochs is reached.

verbose: “auto”, 0, 1, or 2. Verbosity mode.

0 = silent, 1 = progress bar, 2 = one line per epoch. “auto” becomes 1 for most cases. Note that the progress bar is not particularly useful when logged to a file, so verbose=2 is recommended when not running interactively (e.g., in a production environment). Defaults to “auto”.

callbacks: List of keras.callbacks.Callback instances.

List of callbacks to apply during training. See keras.callbacks. Note keras.callbacks.ProgbarLogger and keras.callbacks.History callbacks are created automatically and need not be passed to model.fit(). keras.callbacks.ProgbarLogger is created or not based on the verbose argument in model.fit().

validation_split: Float between 0 and 1.

Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and y data provided, before shuffling. This argument is only supported when x and y are made of NumPy arrays or tensors. If both validation_data and validation_split are provided, validation_data will override validation_split.

validation_data: Data on which to evaluate

the loss and any model metrics at the end of each epoch. The model will not be trained on this data. Thus, note the fact that the validation loss of data provided using validation_split or validation_data is not affected by regularization layers like noise and dropout. validation_data will override validation_split. It can be: - A tuple (x_val, y_val) of NumPy arrays or tensors. - A tuple (x_val, y_val, val_sample_weights) of NumPy arrays. - A keras.utils.PyDataset, a tf.data.Dataset, a torch.utils.data.DataLoader yielding (inputs, targets) or a Python generator function yielding (x_val, y_val) or (inputs, targets, sample_weights).

shuffle: Boolean, whether to shuffle the training data before each

epoch. This argument is ignored when x is a keras.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or Python generator function.

class_weight: Optional dictionary mapping class indices (integers)

to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to “pay more attention” to samples from an under-represented class. When class_weight is specified and targets have a rank of 2 or greater, either y must be one-hot encoded, or an explicit final dimension of 1 must be included for sparse class labels.

sample_weight: Optional NumPy array or tensor of weights for

the training samples, used for weighting the loss function (during training only). You can either pass a flat (1D) NumPy array or tensor with the same length as the input samples (1:1 mapping between weights and samples), or in the case of temporal data, you can pass a 2D NumPy array or tensor with shape (samples, sequence_length) to apply a different weight to every timestep of every sample. This argument is not supported when x is a keras.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or Python generator function. Instead, provide sample_weights as the third element of x. Note that sample weighting does not apply to metrics specified via the metrics argument in compile(). To apply sample weighting to your metrics, you can specify them via the weighted_metrics in compile() instead.

initial_epoch: Integer.

Epoch at which to start training (useful for resuming a previous training run).

steps_per_epoch: Integer or None.

Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch. When training with input tensors or NumPy arrays, the default None means that the value used is the number of samples in your dataset divided by the batch size, or 1 if that cannot be determined. If x is a keras.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or Python generator function, the epoch will run until the input dataset is exhausted. When passing an infinitely repeating dataset, you must specify the steps_per_epoch argument, otherwise the training will run indefinitely.

validation_steps: Integer or None.

Only relevant if validation_data is provided. Total number of steps (batches of samples) to draw before stopping when performing validation at the end of every epoch. If validation_steps is None, validation will run until the validation_data dataset is exhausted. In the case of an infinitely repeating dataset, it will run indefinitely. If validation_steps is specified and only part of the dataset is consumed, the evaluation will start from the beginning of the dataset at each epoch. This ensures that the same validation samples are used every time.

validation_batch_size: Integer or None.

Number of samples per validation batch. If unspecified, will default to batch_size. Do not specify the validation_batch_size if your data is a keras.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or Python generator function since they generate batches.

validation_freq: Only relevant if validation data is provided.

Specifies how many training epochs to run before a new validation run is performed, e.g. validation_freq=2 runs validation every 2 epochs.

Unpacking behavior for iterator-like inputs:

A common pattern is to pass an iterator like object such as a tf.data.Dataset or a keras.utils.PyDataset to fit(), which will in fact yield not only features (x) but optionally targets (y) and sample weights (sample_weight). Keras requires that the output of such iterator-likes be unambiguous. The iterator should return a tuple of length 1, 2, or 3, where the optional second and third elements will be used for y and sample_weight respectively. Any other type provided will be wrapped in a length-one tuple, effectively treating everything as x. When yielding dicts, they should still adhere to the top-level tuple structure, e.g. ({“x0”: x0, “x1”: x1}, y). Keras will not attempt to separate features, targets, and weights from the keys of a single dict. A notable unsupported data type is the namedtuple. The reason is that it behaves like both an ordered datatype (tuple) and a mapping datatype (dict). So given a namedtuple of the form: namedtuple(“example_tuple”, [“y”, “x”]) it is ambiguous whether to reverse the order of the elements when interpreting the value. Even worse is a tuple of the form: namedtuple(“other_tuple”, [“x”, “y”, “z”]) where it is unclear if the tuple was intended to be unpacked into x, y, and sample_weight or passed through as a single element to x.

Returns:

A History object. Its History.history attribute is a record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values (if applicable).

get_build_config()#

Returns a dictionary with the layer’s input shape.

This method returns a config dict that can be used by build_from_config(config) to create all states (e.g. Variables and Lookup tables) needed by the layer.

By default, the config only contains the input shape that the layer was built with. If you’re writing a custom layer that creates state in an unusual way, you should override this method to make sure this state is already created when Keras attempts to load its value upon model loading.

Returns:

A dict containing the input shape associated with the layer.

get_compile_config()#

Returns a serialized config with information for compiling the model.

This method returns a config dictionary containing all the information (optimizer, loss, metrics, etc.) with which the model was compiled.

Returns:

A dict containing information for compiling the model.

get_layer(name=None, index=None)#

Retrieves a layer based on either its name (unique) or index.

If name and index are both provided, index will take precedence. Indices are based on order of horizontal graph traversal (bottom-up).

Args:

name: String, name of layer. index: Integer, index of layer.

Returns:

A layer instance.

get_metrics_result()#

Returns the model’s metrics values as a dict.

If any of the metric result is a dict (containing multiple metrics), each of them gets added to the top level returned dict of this method.

Returns:

A dict containing values of the metrics listed in self.metrics. Example: {‘loss’: 0.2, ‘accuracy’: 0.7}.

get_state_tree(value_format='backend_tensor')#

Retrieves tree-like structure of model variables.

This method allows retrieval of different model variables (trainable, non-trainable, optimizer, and metrics). The variables are returned in a nested dictionary format, where the keys correspond to the variable names and the values are the nested representations of the variables.

Returns:
dict: A dictionary containing the nested representations of the

requested variables. The keys are the variable names, and the values are the corresponding nested dictionaries.

value_format: One of “backend_tensor”, “numpy_array”.
The kind of array to return as the leaves of the nested

state tree.

Example:

model = keras.Sequential([
    keras.Input(shape=(1,), name="my_input"),
    keras.layers.Dense(1, activation="sigmoid", name="my_dense"),
], name="my_sequential")
model.compile(optimizer="adam", loss="mse", metrics=["mae"])
model.fit(np.array([[1.0]]), np.array([[1.0]]))
state_tree = model.get_state_tree()

The state_tree dictionary returned looks like:

{
    'metrics_variables': {
        'loss': {
            'count': ...,
            'total': ...,
        },
        'mean_absolute_error': {
            'count': ...,
            'total': ...,
        }
    },
    'trainable_variables': {
        'my_sequential': {
            'my_dense': {
                'bias': ...,
                'kernel': ...,
            }
        }
    },
    'non_trainable_variables': {},
    'optimizer_variables': {
        'adam': {
                'iteration': ...,
                'learning_rate': ...,
                'my_sequential_my_dense_bias_momentum': ...,
                'my_sequential_my_dense_bias_velocity': ...,
                'my_sequential_my_dense_kernel_momentum': ...,
                'my_sequential_my_dense_kernel_velocity': ...,
            }
        }
    }
}
get_weights()#

Return the values of layer.weights as a list of NumPy arrays.

property input#

Retrieves the input tensor(s) of a symbolic operation.

Only returns the tensor(s) corresponding to the first time the operation was called.

Returns:

Input tensor or list of input tensors.

property input_dtype#

The dtype layer inputs should be converted to.

property input_shape#
property input_spec#
property inputs#
jax_state_sync()#
property jit_compile#
property layers#
load_own_variables(store)#

Loads the state of the layer.

You can override this method to take full control of how the state of the layer is loaded upon calling keras.models.load_model().

Args:

store: Dict from which the state of the model will be loaded.

load_weights(filepath, skip_mismatch=False, **kwargs)#

Load weights from a file saved via save_weights().

Weights are loaded based on the network’s topology. This means the architecture should be the same as when the weights were saved. Note that layers that don’t have weights are not taken into account in the topological ordering, so adding or removing layers is fine as long as they don’t have weights.

Partial weight loading

If you have modified your model, for instance by adding a new layer (with weights) or by changing the shape of the weights of a layer, you can choose to ignore errors and continue loading by setting skip_mismatch=True. In this case any layer with mismatching weights will be skipped. A warning will be displayed for each skipped layer.

Args:
filepath: String, path to the weights file to load.

It can either be a .weights.h5 file or a legacy .h5 weights file.

skip_mismatch: Boolean, whether to skip loading of layers where

there is a mismatch in the number of weights, or a mismatch in the shape of the weights.

property losses#

List of scalar losses from add_loss, regularizers and sublayers.

make_predict_function(force=False)#
make_test_function(force=False)#
make_train_function(force=False)#
property metrics#

List of all metrics.

property metrics_names#
property metrics_variables#

List of all metric variables.

property non_trainable_variables#

List of all non-trainable layer state.

This extends layer.non_trainable_weights to include all state used by the layer including state for metrics and `SeedGenerator`s.

property non_trainable_weights#

List of all non-trainable weight variables of the layer.

These are the weights that should not be updated by the optimizer during training. Unlike, layer.non_trainable_variables this excludes metric state and random seeds.

property output#

Retrieves the output tensor(s) of a layer.

Only returns the tensor(s) corresponding to the first time the operation was called.

Returns:

Output tensor or list of output tensors.

property output_shape#
property outputs#
property path#

The path of the layer.

If the layer has not been built yet, it will be None.

pop(rebuild=True)#

Removes the last layer in the model.

Args:

rebuild: bool. Whether to rebuild the model after removing the layer. Defaults to True.

Returns:

layer: layer instance.

predict(x, batch_size=None, verbose='auto', steps=None, callbacks=None)#

Generates output predictions for the input samples.

Computation is done in batches. This method is designed for batch processing of large numbers of inputs. It is not intended for use inside of loops that iterate over your data and process small numbers of inputs at a time.

For small numbers of inputs that fit in one batch, directly use __call__() for faster execution, e.g., model(x), or model(x, training=False) if you have layers such as BatchNormalization that behave differently during inference.

Note: See [this FAQ entry]( https://keras.io/getting_started/faq/#whats-the-difference-between-model-methods-predict-and-call) for more details about the difference between Model methods predict() and __call__().

Args:
x: Input data. It can be:
  • A NumPy array (or array-like), or a list of arrays

(in case the model has multiple inputs). - A backend-native tensor, or a list of tensors (in case the model has multiple inputs). - A dict mapping input names to the corresponding array/tensors, if the model has named inputs. - A keras.utils.PyDataset. - A tf.data.Dataset. - A torch.utils.data.DataLoader. - A Python generator function.

batch_size: Integer or None.

Number of samples per batch of computation. If unspecified, batch_size will default to 32. Do not specify the batch_size if your input data x is a keras.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or Python generator function since they generate batches.

verbose: “auto”, 0, 1, or 2. Verbosity mode.

0 = silent, 1 = progress bar, 2 = single line. “auto” becomes 1 for most cases. Note that the progress bar is not particularly useful when logged to a file, so verbose=2 is recommended when not running interactively (e.g. in a production environment). Defaults to “auto”.

steps: Total number of steps (batches of samples) to draw before

declaring the prediction round finished. If steps is None, it will run until x is exhausted. In the case of an infinitely repeating dataset, it will run indefinitely.

callbacks: List of keras.callbacks.Callback instances.

List of callbacks to apply during prediction.

Returns:

NumPy array(s) of predictions.

predict_on_batch(x)#

Returns predictions for a single batch of samples.

Args:

x: Input data. It must be array-like.

Returns:

NumPy array(s) of predictions.

predict_step(state, data)#
property quantization_mode#

The quantization mode of this layer, None if not quantized.

quantize(mode, **kwargs)#

Quantize the weights of the model.

Note that the model must be built first before calling this method. quantize will recursively call quantize(mode) in all layers and will be skipped if the layer doesn’t implement the function.

Args:
mode: The mode of the quantization. Only ‘int8’ is supported at this

time.

quantized_build(input_shape, mode)#
quantized_call(*args, **kwargs)#
rematerialized_call(layer_call, *args, **kwargs)#

Enable rematerialization dynamically for layer’s call method.

Args:

layer_call: The original call method of a layer.

Returns:

Rematerialized layer’s call method.

reset_metrics()#
property run_eagerly#
save(filepath, overwrite=True, zipped=None, **kwargs)#

Saves a model as a .keras file.

Args:
filepath: str or pathlib.Path object.

The path where to save the model. Must end in .keras (unless saving the model as an unzipped directory via zipped=False).

overwrite: Whether we should overwrite any existing model at

the target location, or instead ask the user via an interactive prompt.

zipped: Whether to save the model as a zipped .keras

archive (default when saving locally), or as an unzipped directory (default when saving on the Hugging Face Hub).

Example:

model = keras.Sequential(
    [
        keras.layers.Dense(5, input_shape=(3,)),
        keras.layers.Softmax(),
    ],
)
model.save("model.keras")
loaded_model = keras.saving.load_model("model.keras")
x = keras.random.uniform((10, 3))
assert np.allclose(model.predict(x), loaded_model.predict(x))

Note that model.save() is an alias for keras.saving.save_model().

The saved .keras file contains:

  • The model’s configuration (architecture)

  • The model’s weights

  • The model’s optimizer’s state (if any)

Thus models can be reinstantiated in the exact same state.

save_own_variables(store)#

Saves the state of the layer.

You can override this method to take full control of how the state of the layer is saved upon calling model.save().

Args:

store: Dict where the state of the model will be saved.

save_weights(filepath, overwrite=True)#

Saves all layer weights to a .weights.h5 file.

Args:
filepath: str or pathlib.Path object.

Path where to save the model. Must end in .weights.h5.

overwrite: Whether we should overwrite any existing model

at the target location, or instead ask the user via an interactive prompt.

set_state_tree(state_tree)#

Assigns values to variables of the model.

This method takes a dictionary of nested variable values, which represents the state tree of the model, and assigns them to the corresponding variables of the model. The dictionary keys represent the variable names (e.g., ‘trainable_variables’, ‘optimizer_variables’), and the values are nested dictionaries containing the variable paths and their corresponding values.

Args:
state_tree: A dictionary representing the state tree of the model.

The keys are the variable names, and the values are nested dictionaries representing the variable paths and their values.

set_weights(weights)#

Sets the values of layer.weights from a list of NumPy arrays.

stateless_call(trainable_variables, non_trainable_variables, *args, return_losses=False, **kwargs)#

Call the layer without any side effects.

Args:

trainable_variables: List of trainable variables of the model. non_trainable_variables: List of non-trainable variables of the

model.

*args: Positional arguments to be passed to call(). return_losses: If True, stateless_call() will return the list of

losses created during call() as part of its return values.

**kwargs: Keyword arguments to be passed to call().

Returns:
A tuple. By default, returns (outputs, non_trainable_variables).

If return_losses = True, then returns (outputs, non_trainable_variables, losses).

Note: non_trainable_variables include not only non-trainable weights such as BatchNormalization statistics, but also RNG seed state (if there are any random operations part of the layer, such as dropout), and Metric state (if there are any metrics attached to the layer). These are all elements of state of the layer.

Example:

model = ...
data = ...
trainable_variables = model.trainable_variables
non_trainable_variables = model.non_trainable_variables
# Call the model with zero side effects
outputs, non_trainable_variables = model.stateless_call(
    trainable_variables,
    non_trainable_variables,
    data,
)
# Attach the updated state to the model
# (until you do this, the model is still in its pre-call state).
for ref_var, value in zip(
    model.non_trainable_variables, non_trainable_variables
):
    ref_var.assign(value)
stateless_compute_loss(trainable_variables, non_trainable_variables, metrics_variables, x=None, y=None, y_pred=None, sample_weight=None, training=True)#
summary(line_length=None, positions=None, print_fn=None, expand_nested=False, show_trainable=False, layer_range=None)#

Prints a string summary of the network.

Args:
line_length: Total length of printed lines

(e.g. set this to adapt the display to different terminal window sizes).

positions: Relative or absolute positions of log elements

in each line. If not provided, becomes [0.3, 0.6, 0.70, 1.]. Defaults to None.

print_fn: Print function to use. By default, prints to stdout.

If stdout doesn’t work in your environment, change to print. It will be called on each line of the summary. You can set it to a custom function in order to capture the string summary.

expand_nested: Whether to expand the nested models.

Defaults to False.

show_trainable: Whether to show if a layer is trainable.

Defaults to False.

layer_range: a list or tuple of 2 strings,

which is the starting layer name and ending layer name (both inclusive) indicating the range of layers to be printed in summary. It also accepts regex patterns instead of exact names. In this case, the start predicate will be the first element that matches layer_range[0] and the end predicate will be the last element that matches layer_range[1]. By default None considers all layers of the model.

Raises:

ValueError: if summary() is called before the model is built.

property supports_masking#

Whether this layer supports computing a mask using compute_mask.

symbolic_call(*args, **kwargs)#
test_on_batch(x, y=None, sample_weight=None, return_dict=False)#

Test the model on a single batch of samples.

Args:

x: Input data. Must be array-like. y: Target data. Must be array-like. sample_weight: Optional array of the same length as x, containing

weights to apply to the model’s loss for each sample. In the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample.

return_dict: If True, loss and metric results are returned as a

dict, with each key being the name of the metric. If False, they are returned as a list.

Returns:

A scalar loss value (when no metrics and return_dict=False), a list of loss and metric values (if there are metrics and return_dict=False), or a dict of metric and loss values (if return_dict=True).

test_step(state, data)#
to_json(**kwargs)#

Returns a JSON string containing the network configuration.

To load a network from a JSON save file, use keras.models.model_from_json(json_string, custom_objects={…}).

Args:
**kwargs: Additional keyword arguments to be passed to

json.dumps().

Returns:

A JSON string.

train_on_batch(x, y=None, sample_weight=None, class_weight=None, return_dict=False)#

Runs a single gradient update on a single batch of data.

Args:

x: Input data. Must be array-like. y: Target data. Must be array-like. sample_weight: Optional array of the same length as x, containing

weights to apply to the model’s loss for each sample. In the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample.

class_weight: Optional dictionary mapping class indices (integers)

to a weight (float) to apply to the model’s loss for the samples from this class during training. This can be useful to tell the model to “pay more attention” to samples from an under-represented class. When class_weight is specified and targets have a rank of 2 or greater, either y must be one-hot encoded, or an explicit final dimension of 1 must be included for sparse class labels.

return_dict: If True, loss and metric results are returned as a

dict, with each key being the name of the metric. If False, they are returned as a list.

Returns:

A scalar loss value (when no metrics and return_dict=False), a list of loss and metric values (if there are metrics and return_dict=False), or a dict of metric and loss values (if return_dict=True).

train_step(state, data)#
property trainable#

Settable boolean, whether this layer should be trainable or not.

property trainable_variables#

List of all trainable layer state.

This is equivalent to layer.trainable_weights.

property trainable_weights#

List of all trainable weight variables of the layer.

These are the weights that get updated by the optimizer during training.

property variable_dtype#

The dtype of the state (weights) of the layer.

property variables#

List of all layer state, including random seeds.

This extends layer.weights to include all state used by the layer including `SeedGenerator`s.

Note that metrics variables are not included here, use metrics_variables to visit all the metric variables.

property weights#

List of all weight variables of the layer.

Unlike, layer.variables this excludes metric state and random seeds.