RatioApproximator#

class bayesflow.approximators.RatioApproximator(*args, **kwargs)[source]#

Bases: Approximator

Implements contrastive neural likelihood-to-evidence ratio estimation (NRE-C) as described in https://arxiv.org/pdf/2210.06170.

The estimation target is the ratio of likelihood and evidence: p(x | theta) / p(x).

Parameters:
inference_networkkeras.Layer

A network backbone to perform contrastive learning. Last logits layer is automatically added on top of the inference network.

adapterbayesflow.adapters.Adapter, optional

Adapter for data processing. You can use build_adapter() to create it. If None (default), an identity adapter is used that makes a shallow copy and passes data through unchanged.

summary_networkSummaryNetwork, optional

The summary network used for data summarization of summary_variables (default is None). When present, summary outputs are automatically concatenated with inference_conditions.

gamma: float, optional

Odds or of any pair being drawn dependently to completely independently. Default is 1.

K: int, optional

Number of parameter candidates used for contrastive learning. Default is 5.

standardizestr | Sequence[str] | None

The variables to standardize before passing to the networks. Can be either “all” or any subset of [“inference_variables”, “inference_conditions”, “summary_variables”]. (default is “inference_variables”).

**kwargsdict, optional

Additional arguments passed to the bayesflow.approximators.Approximator class.

build(data_shapes: Mapping[str, Tensor])[source]#

Template method for building all network components.

This method orchestrates the build process by: 1. Building the summary network (if present) and caching its output shape 2. Enriching data_shapes with computed values for hooks to access 3. Calling hook methods in the proper sequence 4. Marking as built

Hooks receive an enriched data_shapes dict that includes “_summary_outputs” if a summary network was built, so they don’t need to recompute this value.

compute_metrics(inference_variables: Tensor, inference_conditions: Tensor = None, summary_variables: Tensor = None, sample_weight: Tensor = None, summary_attention_mask: Tensor = None, summary_mask: Tensor = None, inference_attention_mask: Tensor = None, inference_mask: Tensor = None, stage: str = 'training') dict[str, Tensor][source]#

Computes loss following https://arxiv.org/pdf/2210.06170.

Handles both summary network outputs (if present) and inference conditions, combining them via ConditionBuilder.resolve().

fit(*args, **kwargs)[source]#

Trains the approximator on the provided dataset or on-demand data generated from the given simulator. If dataset is not provided, a dataset is built from the simulator. If the model has not been built, it will be built using a batch from the dataset.

Parameters:
datasetkeras.utils.PyDataset, optional

A dataset containing simulations for training. If provided, simulator must be None.

simulatorSimulator, optional

A simulator used to generate a dataset. If provided, dataset must be None.

**kwargs

Additional keyword arguments passed to keras.Model.fit(), as described in:

https://github.com/keras-team/keras/blob/v3.13.2/keras/src/backend/tensorflow/trainer.py#L314
Returns:
keras.callbacks.History

A history object containing the training loss and metrics values.

Raises:
ValueError

If both dataset and simulator are provided or neither is provided.

log_ratio(data: Mapping[str, ndarray], **kwargs) Tensor[source]#

Computes the log likelihood-to-evidence ratio. The data dictionary is preprocessed using the adapter.

Parameters:
dataMapping[str, np.ndarray]

Dictionary of observed data as NumPy arrays.

**kwargsdict

Additional keyword arguments for the adapter and log-probability computation.

Returns:
log_ratio: Tensor

The estimated log ratios.

logits(inference_variables: Tensor, inference_conditions: Tensor, stage: str, **kwargs) Tensor[source]#

Computes logits for K batches of variables-conditions pairs.

get_config()[source]#

Returns the config of the object.

An object config is a Python dictionary (serializable) containing the information needed to re-instantiate it.

__call__(*args, **kwargs)#

Call self as a function.

add_loss(loss)#

Can be called inside of the call() method to add a scalar loss.

Example:

class MyLayer(Layer):
    ...
    def call(self, x):
        self.add_loss(ops.sum(x))
        return x
add_metric(*args, **kwargs)#
add_variable(shape, initializer, dtype=None, trainable=True, autocast=True, regularizer=None, constraint=None, name=None)#

Add a weight variable to the layer.

Alias of add_weight().

add_weight(shape=None, initializer=None, dtype=None, trainable=True, autocast=True, regularizer=None, constraint=None, aggregation='none', overwrite_with_gradient=False, name=None)#

Add a weight variable to the layer.

Args:
shape: Shape tuple for the variable. Must be fully-defined

(no None entries). Defaults to () (scalar) if unspecified.

initializer: Initializer object to use to populate the initial

variable value, or string name of a built-in initializer (e.g. “random_normal”). If unspecified, defaults to “glorot_uniform” for floating-point variables and to “zeros” for all other types (e.g. int, bool).

dtype: Dtype of the variable to create, e.g. “float32”. If

unspecified, defaults to the layer’s variable dtype (which itself defaults to “float32” if unspecified).

trainable: Boolean, whether the variable should be trainable via

backprop or whether its updates are managed manually. Defaults to True.

autocast: Boolean, whether to autocast layers variables when

accessing them. Defaults to True.

regularizer: Regularizer object to call to apply penalty on the

weight. These penalties are summed into the loss function during optimization. Defaults to None.

constraint: Contrainst object to call on the variable after any

optimizer update, or string name of a built-in constraint. Defaults to None.

aggregation: Optional string, one of None, “none”, “mean”,

“sum” or “only_first_replica”. Annotates the variable with the type of multi-replica aggregation to be used for this variable when writing custom data parallel training loops. Defaults to “none”.

overwrite_with_gradient: Boolean, whether to overwrite the variable

with the computed gradient. This is useful for float8 training. Defaults to False.

name: String name of the variable. Useful for debugging purposes.

classmethod build_adapter(inference_variables: str | Sequence[str], inference_conditions: str | Sequence[str] = None, summary_variables: str | Sequence[str] = None, sample_weight: str = None, summary_attention_mask: str = None, summary_mask: str = None, inference_attention_mask: str = None, inference_mask: str = None) Adapter#

Create a default Adapter for the approximator.

Handles the common pipeline shared by all approximators: to_array -> convert_dtype -> concatenate -> keep. Subclasses can call super().build_adapter(...) and apply additional steps to the returned adapter.

Parameters:
inference_variablesstr or Sequence[str]

Names of the inference variables in the data dict.

inference_conditionsstr or Sequence[str], optional

Names of the inference conditions in the data dict.

summary_variablesstr or Sequence[str], optional

Names of the summary variables in the data dict.

sample_weightstr, optional

Name of the sample weight variable.

summary_attention_maskstr, optional

Name of the attention mask for the summary network. Forwarded as attention_mask to the summary network.

summary_maskstr, optional

Name of the padding/key mask for the summary network. Forwarded as mask to the summary network.

inference_attention_maskstr, optional

Name of the attention mask for the inference network. Forwarded as attention_mask to the inference network.

inference_maskstr, optional

Name of the padding/key mask for the inference network. Forwarded as mask to the inference network.

build_dataset(*, batch_size: int = 'auto', num_batches: int, adapter: Adapter = 'auto', memory_budget: str | int = 'auto', simulator: Simulator, workers: int = 'auto', use_multiprocessing: bool = False, max_queue_size: int = 32, **kwargs) OnlineDataset#
build_from_config(config)#

Builds the layer’s states with the supplied config dict.

By default, this method calls the build(config[“input_shape”]) method, which creates weights based on the layer’s input shape in the supplied config. If your config contains other information needed to load the layer’s state, you should override this method.

Args:

config: Dict containing the input shape associated with this layer.

build_from_data(adapted_data: Mapping[str, Any])#

Build the approximator from adapted data by extracting shapes.

call(*args, **kwargs)#
compile(*args, inference_metrics: Any = None, summary_metrics: Any = None, **kwargs)#

Compile the approximator, setting metrics on inference and summary networks if provided.

Parameters:
inference_metricskeras.Metric or Sequence[keras.Metric], optional

Metric(s) to set on the inference_network.

summary_metricskeras.Metric or Sequence[keras.Metric], optional

Metric(s) to set on the summary_network (if present).

*args, **kwargs

Additional arguments passed to the parent compile method.

compile_from_config(config)#

Compile the approximator from a saved configuration.

property compute_dtype#

The dtype of the computations performed by the layer.

compute_loss(x=None, y=None, y_pred=None, sample_weight=None, training=True)#

Compute the total loss, validate it, and return it.

Subclasses can optionally override this method to provide custom loss computation logic.

Example:

class MyModel(Model):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.loss_tracker = metrics.Mean(name='loss')

    def compute_loss(self, x, y, y_pred, sample_weight, training=True):
        loss = ops.mean((y_pred - y) ** 2)
        loss += ops.sum(self.losses)
        self.loss_tracker.update_state(loss)
        return loss

    def reset_metrics(self):
        self.loss_tracker.reset_state()

    @property
    def metrics(self):
        return [self.loss_tracker]

inputs = layers.Input(shape=(10,), name='my_input')
outputs = layers.Dense(10)(inputs)
model = MyModel(inputs, outputs)
model.add_loss(ops.sum(outputs))

optimizer = SGD()
model.compile(optimizer, loss='mse', steps_per_execution=10)
dataset = ...
model.fit(dataset, epochs=2, steps_per_epoch=10)
print(f"Custom loss: {model.loss_tracker.result()}")
Args:

x: Input data. y: Target data. y_pred: Predictions returned by the model (output of model(x)) sample_weight: Sample weights for weighting the loss function. training: Whether we are training or evaluating the model.

Returns:

The total loss as a scalar tensor, or None if no loss results (which is the case when called by Model.test_step).

compute_loss_and_updates(trainable_variables, non_trainable_variables, metrics_variables, x, y, sample_weight, training=False, optimizer_variables=None)#

This method is stateless and is intended for use with jax.grad.

compute_mask(inputs, previous_mask)#
compute_output_shape(*args, **kwargs)#
compute_output_spec(*args, **kwargs)#
count_params()#

Count the total number of scalars composing the weights.

Returns:

An integer count.

property dtype#

Alias of layer.variable_dtype.

property dtype_policy#
evaluate(x=None, y=None, batch_size=None, verbose='auto', sample_weight=None, steps=None, callbacks=None, return_dict=False, **kwargs)#

Returns the loss value & metrics values for the model in test mode.

Computation is done in batches (see the batch_size arg.)

Args:
x: Input data. It can be:
  • A NumPy array (or array-like), or a list of arrays

(in case the model has multiple inputs). - A backend-native tensor, or a list of tensors (in case the model has multiple inputs). - A dict mapping input names to the corresponding array/tensors, if the model has named inputs. - A keras.utils.PyDataset returning (inputs, targets) or (inputs, targets, sample_weights). - A tf.data.Dataset yielding (inputs, targets) or (inputs, targets, sample_weights). - A torch.utils.data.DataLoader yielding (inputs, targets) or (inputs, targets, sample_weights). - A Python generator function yielding (inputs, targets) or (inputs, targets, sample_weights).

y: Target data. Like the input data x, it can be either NumPy

array(s) or backend-native tensor(s). If x is a keras.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or a Python generator function, y should not be specified since targets will be obtained from x.

batch_size: Integer or None.

Number of samples per batch of computation. If unspecified, batch_size will default to 32. Do not specify the batch_size if your input data x is a keras.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or Python generator function since they generate batches.

verbose: “auto”, 0, 1, or 2. Verbosity mode.

0 = silent, 1 = progress bar, 2 = single line. “auto” becomes 1 for most cases. Note that the progress bar is not particularly useful when logged to a file, so verbose=2 is recommended when not running interactively (e.g. in a production environment). Defaults to “auto”.

sample_weight: Optional NumPy array or tensor of weights for

the training samples, used for weighting the loss function (during training only). You can either pass a flat (1D) NumPy array or tensor with the same length as the input samples (1:1 mapping between weights and samples), or in the case of temporal data, you can pass a 2D NumPy array or tensor with shape (samples, sequence_length) to apply a different weight to every timestep of every sample. This argument is not supported when x is a keras.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or Python generator function. Instead, provide sample_weights as the third element of x. Note that sample weighting does not apply to metrics specified via the metrics argument in compile(). To apply sample weighting to your metrics, you can specify them via the weighted_metrics in compile() instead.

steps: Integer or None.

Total number of steps (batches of samples) to draw before declaring the evaluation round finished. If steps is None, it will run until x is exhausted. In the case of an infinitely repeating dataset, it will run indefinitely.

callbacks: List of keras.callbacks.Callback instances.

List of callbacks to apply during evaluation.

return_dict: If True, loss and metric results are returned as a

dict, with each key being the name of the metric. If False, they are returned as a list.

Returns:

Scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has multiple outputs and/or metrics).

Note: When using compiled metrics, evaluate() may return multiple submetric values, while model.metrics_names often lists only top-level names (e.g., ‘loss’, ‘compile_metrics’), leading to a length mismatch. The order of the evaluate() output corresponds to the order of metrics specified during model.compile(). You can use this order to map the evaluate() results to the intended metric. model.metrics_names itself will still return only the top-level names.

export(filepath, format='tf_saved_model', verbose=None, input_signature=None, **kwargs)#

Export the model as an artifact for inference.

Args:
filepath: str or pathlib.Path object. The path to save the

artifact.

format: str. The export format. Supported values:

“tf_saved_model”, “onnx”, “openvino”, and “litert”. Defaults to “tf_saved_model”.

verbose: bool. Whether to print a message during export. Defaults

to None, which uses the default value set by different backends and formats.

input_signature: Optional. Specifies the shape and dtype of the

model inputs. Can be a structure of keras.InputSpec, tf.TensorSpec, backend.KerasTensor, or backend tensor. If not provided, it will be automatically computed. Defaults to None.

**kwargs: Additional keyword arguments.
  • is_static: Optional bool. Specific to the JAX backend and

    format=”tf_saved_model”. Indicates whether fn is static. Set to False if fn involves state updates (e.g., RNG seeds and counters).

  • jax2tf_kwargs: Optional dict. Specific to the JAX backend

    and format=”tf_saved_model”. Arguments for jax2tf.convert. See the documentation for [jax2tf.convert](

    If native_serialization and polymorphic_shapes are not provided, they will be automatically computed.

  • opset_version: Optional int. Specific to format=”onnx”.

    An integer value that specifies the ONNX opset version.

  • LiteRT-specific options: Optional keyword arguments specific

    to format=”litert”. These are passed directly to the TensorFlow Lite converter and include options like optimizations, representative_dataset, experimental_new_quantizer, allow_custom_ops, enable_select_tf_ops, etc. See TensorFlow Lite documentation for all available options.

Note: This feature is currently supported only with TensorFlow, JAX and Torch backends.

Note: Be aware that the exported artifact may contain information from the local file system when using format=”onnx”, verbose=True and Torch backend.

Examples:

Here’s how to export a TensorFlow SavedModel for inference.

# Export the model as a TensorFlow SavedModel artifact
model.export("path/to/location", format="tf_saved_model")

# Load the artifact in a different process/environment
reloaded_artifact = tf.saved_model.load("path/to/location")
predictions = reloaded_artifact.serve(input_data)

Here’s how to export an ONNX for inference.

# Export the model as a ONNX artifact
model.export("path/to/location", format="onnx")

# Load the artifact in a different process/environment
ort_session = onnxruntime.InferenceSession("path/to/location")
ort_inputs = {
    k.name: v for k, v in zip(ort_session.get_inputs(), input_data)
}
predictions = ort_session.run(None, ort_inputs)

Here’s how to export a LiteRT (TFLite) for inference.

# Export the model as a LiteRT artifact
model.export("path/to/location", format="litert")

# Load the artifact in a different process/environment
interpreter = tf.lite.Interpreter(model_path="path/to/location")
interpreter.allocate_tensors()
interpreter.set_tensor(
    interpreter.get_input_details()[0]['index'], input_data
)
interpreter.invoke()
output_data = interpreter.get_tensor(
    interpreter.get_output_details()[0]['index']
)
classmethod from_config(config, custom_objects=None)#

Deserialize and instantiate an approximator from configuration.

get_build_config()#

Returns a dictionary with the layer’s input shape.

This method returns a config dict that can be used by build_from_config(config) to create all states (e.g. Variables and Lookup tables) needed by the layer.

By default, the config only contains the input shape that the layer was built with. If you’re writing a custom layer that creates state in an unusual way, you should override this method to make sure this state is already created when Keras attempts to load its value upon model loading.

Returns:

A dict containing the input shape associated with the layer.

get_compile_config()#

Serialize compile configuration for all network metrics.

Collects metrics from inference_network and summary_network (if present), serializes them, and merges with parent class config.

Returns:
dict

Configuration dictionary with serialized metrics.

get_layer(name=None, index=None)#

Retrieves a layer based on either its name (unique) or index.

If name and index are both provided, index will take precedence. Indices are based on order of horizontal graph traversal (bottom-up).

Args:

name: String, name of layer. index: Integer, index of layer.

Returns:

A layer instance.

get_metrics_result()#

Returns the model’s metrics values as a dict.

If any of the metric result is a dict (containing multiple metrics), each of them gets added to the top level returned dict of this method.

Returns:

A dict containing values of the metrics listed in self.metrics. Example: {‘loss’: 0.2, ‘accuracy’: 0.7}.

get_quantization_layer_structure(mode=None)#

Returns the quantization structure for the model.

This method is intended to be overridden by model authors to provide topology information required for structure-aware quantization modes like ‘gptq’.

Args:

mode: The quantization mode.

Returns:

A dictionary describing the topology, e.g.: {‘pre_block_layers’: [list], ‘sequential_blocks’: [list]} or None if the mode does not require structure or is not supported. ‘pre_block_layers’ is a list of layers that the inputs should be passed through, before being passed to the sequential blocks. For example, inputs to an LLM must first be passed through an embedding layer, followed by the transformer.

get_state_tree(value_format='backend_tensor')#

Retrieves tree-like structure of model variables.

This method allows retrieval of different model variables (trainable, non-trainable, optimizer, and metrics). The variables are returned in a nested dictionary format, where the keys correspond to the variable names and the values are the nested representations of the variables.

Returns:
dict: A dictionary containing the nested representations of the

requested variables. The keys are the variable names, and the values are the corresponding nested dictionaries.

value_format: One of “backend_tensor”, “numpy_array”.
The kind of array to return as the leaves of the nested

state tree.

Example:

model = keras.Sequential([
    keras.Input(shape=(1,), name="my_input"),
    keras.layers.Dense(1, activation="sigmoid", name="my_dense"),
], name="my_sequential")
model.compile(optimizer="adam", loss="mse", metrics=["mae"])
model.fit(np.array([[1.0]]), np.array([[1.0]]))
state_tree = model.get_state_tree()

The state_tree dictionary returned looks like:

{
    'metrics_variables': {
        'loss': {
            'count': ...,
            'total': ...,
        },
        'mean_absolute_error': {
            'count': ...,
            'total': ...,
        }
    },
    'trainable_variables': {
        'my_sequential': {
            'my_dense': {
                'bias': ...,
                'kernel': ...,
            }
        }
    },
    'non_trainable_variables': {},
    'optimizer_variables': {
        'adam': {
                'iteration': ...,
                'learning_rate': ...,
                'my_sequential_my_dense_bias_momentum': ...,
                'my_sequential_my_dense_bias_velocity': ...,
                'my_sequential_my_dense_kernel_momentum': ...,
                'my_sequential_my_dense_kernel_velocity': ...,
            }
        }
    }
}
get_weights()#

Return the values of layer.weights as a list of NumPy arrays.

property input#

Retrieves the input tensor(s) of a symbolic operation.

Only returns the tensor(s) corresponding to the first time the operation was called.

Returns:

Input tensor or list of input tensors.

property input_dtype#

The dtype layer inputs should be converted to.

property input_spec#
jax_state_sync()#
property jit_compile#
property layers#
load_own_variables(store)#

Loads the state of the layer.

You can override this method to take full control of how the state of the layer is loaded upon calling keras.models.load_model().

Args:

store: Dict from which the state of the model will be loaded.

load_weights(filepath, skip_mismatch=False, **kwargs)#

Load the weights from a single file or sharded files.

Weights are loaded based on the network’s topology. This means the architecture should be the same as when the weights were saved. Note that layers that don’t have weights are not taken into account in the topological ordering, so adding or removing layers is fine as long as they don’t have weights.

Partial weight loading

If you have modified your model, for instance by adding a new layer (with weights) or by changing the shape of the weights of a layer, you can choose to ignore errors and continue loading by setting skip_mismatch=True. In this case any layer with mismatching weights will be skipped. A warning will be displayed for each skipped layer.

Sharding

When loading sharded weights, it is important to specify filepath that ends with *.weights.json which is used as the configuration file. Additionally, the sharded files *_xxxxx.weights.h5 must be in the same directory as the configuration file.

Args:
filepath: str or pathlib.Path object. Path where the weights

will be saved. When sharding, the filepath must end in .weights.json.

skip_mismatch: Boolean, whether to skip loading of layers where

there is a mismatch in the number of weights, or a mismatch in the shape of the weights.

Example:

# Load the weights in a single file.
model.load_weights("model.weights.h5")

# Load the weights in sharded files.
model.load_weights("model.weights.json")
property losses#

List of scalar losses from add_loss, regularizers and sublayers.

make_predict_function(force=False)#
make_test_function(force=False)#
make_train_function(force=False)#
property metrics#

List of all metrics.

property metrics_names#
property metrics_variables#

List of all metric variables.

property non_trainable_variables#

List of all non-trainable layer state.

This extends layer.non_trainable_weights to include all state used by the layer including state for metrics and `SeedGenerator`s.

property non_trainable_weights#

List of all non-trainable weight variables of the layer.

These are the weights that should not be updated by the optimizer during training. Unlike, layer.non_trainable_variables this excludes metric state and random seeds.

property output#

Retrieves the output tensor(s) of a layer.

Only returns the tensor(s) corresponding to the first time the operation was called.

Returns:

Output tensor or list of output tensors.

property path#

The path of the layer.

If the layer has not been built yet, it will be None.

predict(x, batch_size=None, verbose='auto', steps=None, callbacks=None)#

Generates output predictions for the input samples.

Computation is done in batches. This method is designed for batch processing of large numbers of inputs. It is not intended for use inside of loops that iterate over your data and process small numbers of inputs at a time.

For small numbers of inputs that fit in one batch, directly use __call__() for faster execution, e.g., model(x), or model(x, training=False) if you have layers such as BatchNormalization that behave differently during inference.

Note: See [this FAQ entry]( https://keras.io/getting_started/faq/#whats-the-difference-between-model-methods-predict-and-call) for more details about the difference between Model methods predict() and __call__().

Args:
x: Input data. It can be:
  • A NumPy array (or array-like), or a list of arrays

(in case the model has multiple inputs). - A backend-native tensor, or a list of tensors (in case the model has multiple inputs). - A dict mapping input names to the corresponding array/tensors, if the model has named inputs. - A keras.utils.PyDataset. - A tf.data.Dataset. - A torch.utils.data.DataLoader. - A Python generator function.

batch_size: Integer or None.

Number of samples per batch of computation. If unspecified, batch_size will default to 32. Do not specify the batch_size if your input data x is a keras.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or Python generator function since they generate batches.

verbose: “auto”, 0, 1, or 2. Verbosity mode.

0 = silent, 1 = progress bar, 2 = single line. “auto” becomes 1 for most cases. Note that the progress bar is not particularly useful when logged to a file, so verbose=2 is recommended when not running interactively (e.g. in a production environment). Defaults to “auto”.

steps: Total number of steps (batches of samples) to draw before

declaring the prediction round finished. If steps is None, it will run until x is exhausted. In the case of an infinitely repeating dataset, it will run indefinitely.

callbacks: List of keras.callbacks.Callback instances.

List of callbacks to apply during prediction.

Returns:

NumPy array(s) of predictions.

predict_on_batch(x)#

Returns predictions for a single batch of samples.

Args:

x: Input data. It must be array-like.

Returns:

NumPy array(s) of predictions.

predict_step(state, data)#
property quantization_mode#

The quantization mode of this layer, None if not quantized.

quantize(mode=None, config=None, filters=None, **kwargs)#

Quantize the weights of the model.

Note that the model must be built first before calling this method. quantize will recursively call quantize(…) in all layers and will be skipped if the layer doesn’t implement the function.

This method can be called by passing a mode string, which uses the default configuration for that mode. Alternatively, a config object can be passed to customize the behavior of the quantization (e.g. to use specific quantizers for weights or activations).

Args:
mode: The mode of the quantization. Supported modes are:

“int8”, “int4”, “float8”, “gptq”. This is optional if config is provided.

config: The configuration object specifying additional

quantization options. This argument allows to configure the weight and activation quantizers. be an instance of keras.quantizers.QuantizationConfig.

filters: Optional filters to apply to the quantization. Can be a

regex string, a list of regex strings, or a callable. Only the layers which match the filter conditions will be quantized.

**kwargs: Additional keyword arguments.

Example:

Quantize a model to int8 with default configuration:

# Build the model
model = keras.Sequential([
    keras.Input(shape=(10,)),
    keras.layers.Dense(10),
])
model.build((None, 10))

# Quantize with default int8 config
model.quantize("int8")

Quantize a model to int8 with a custom configuration:

from keras.quantizers import Int8QuantizationConfig
from keras.quantizers import AbsMaxQuantizer

# Build the model
model = keras.Sequential([
    keras.Input(shape=(10,)),
    keras.layers.Dense(10),
])
model.build((None, 10))

# Create a custom config
config = Int8QuantizationConfig(
    weight_quantizer=AbsMaxQuantizer(
        axis=0,
        value_range=(-127, 127)
    ),
    activation_quantizer=AbsMaxQuantizer(
        axis=-1,
        value_range=(-127, 127)
    ),
)

# Quantize with custom config
model.quantize(config=config)
quantized_build(input_shape, mode)#
quantized_call(*args, **kwargs)#
rematerialized_call(layer_call, *args, **kwargs)#

Enable rematerialization dynamically for layer’s call method.

Args:

layer_call: The original call method of a layer.

Returns:

Rematerialized layer’s call method.

reset_metrics()#
property run_eagerly#
save(filepath, overwrite=True, zipped=None, **kwargs)#

Saves a model as a .keras file.

Note that model.save() is an alias for keras.saving.save_model().

The saved .keras file contains:

  • The model’s configuration (architecture)

  • The model’s weights

  • The model’s optimizer’s state (if any)

Thus models can be reinstantiated in the exact same state.

Args:
filepath: str or pathlib.Path object.

The path where to save the model. Must end in .keras (unless saving the model as an unzipped directory via zipped=False).

overwrite: Whether we should overwrite any existing model at

the target location, or instead ask the user via an interactive prompt.

zipped: Whether to save the model as a zipped .keras

archive (default when saving locally), or as an unzipped directory (default when saving on the Hugging Face Hub).

Example:

model = keras.Sequential(
    [
        keras.layers.Dense(5, input_shape=(3,)),
        keras.layers.Softmax(),
    ],
)
model.save("model.keras")
loaded_model = keras.saving.load_model("model.keras")
x = keras.random.uniform((10, 3))
assert np.allclose(model.predict(x), loaded_model.predict(x))
save_own_variables(store)#

Saves the state of the layer.

You can override this method to take full control of how the state of the layer is saved upon calling model.save().

Args:

store: Dict where the state of the model will be saved.

save_weights(filepath, overwrite=True, max_shard_size=None)#

Saves all weights to a single file or sharded files.

By default, the weights will be saved in a single .weights.h5 file. If sharding is enabled (max_shard_size is not None), the weights will be saved in multiple files, each with a size at most max_shard_size (in GB). Additionally, a configuration file .weights.json will contain the metadata for the sharded files.

The saved sharded files contain:

  • *.weights.json: The configuration file containing ‘metadata’ and

    ‘weight_map’.

  • *_xxxxxx.weights.h5: The sharded files containing only the

    weights.

Args:
filepath: str or pathlib.Path object. Path where the weights

will be saved. When sharding, the filepath must end in .weights.json. If .weights.h5 is provided, it will be overridden.

overwrite: Whether to overwrite any existing weights at the target

location or instead ask the user via an interactive prompt.

max_shard_size: int or float. Maximum size in GB for each

sharded file. If None, no sharding will be done. Defaults to None.

Example:

# Instantiate a EfficientNetV2L model with about 454MB of weights.
model = keras.applications.EfficientNetV2L(weights=None)

# Save the weights in a single file.
model.save_weights("model.weights.h5")

# Save the weights in sharded files. Use `max_shard_size=0.25` means
# each sharded file will be at most ~250MB.
model.save_weights("model.weights.json", max_shard_size=0.25)

# Load the weights in a new model with the same architecture.
loaded_model = keras.applications.EfficientNetV2L(weights=None)
loaded_model.load_weights("model.weights.h5")
x = keras.random.uniform((1, 480, 480, 3))
assert np.allclose(model.predict(x), loaded_model.predict(x))

# Load the sharded weights in a new model with the same architecture.
loaded_model = keras.applications.EfficientNetV2L(weights=None)
loaded_model.load_weights("model.weights.json")
x = keras.random.uniform((1, 480, 480, 3))
assert np.allclose(model.predict(x), loaded_model.predict(x))
set_state_tree(state_tree)#

Assigns values to variables of the model.

This method takes a dictionary of nested variable values, which represents the state tree of the model, and assigns them to the corresponding variables of the model. The dictionary keys represent the variable names (e.g., ‘trainable_variables’, ‘optimizer_variables’), and the values are nested dictionaries containing the variable paths and their corresponding values.

Args:
state_tree: A dictionary representing the state tree of the model.

The keys are the variable names, and the values are nested dictionaries representing the variable paths and their values.

set_weights(weights)#

Sets the values of layer.weights from a list of NumPy arrays.

property standardize_layers#

Shortcut to the standardizer’s per-variable layers.

stateless_call(trainable_variables, non_trainable_variables, *args, return_losses=False, **kwargs)#

Call the layer without any side effects.

Args:

trainable_variables: List of trainable variables of the model. non_trainable_variables: List of non-trainable variables of the

model.

*args: Positional arguments to be passed to call(). return_losses: If True, stateless_call() will return the list of

losses created during call() as part of its return values.

**kwargs: Keyword arguments to be passed to call().

Returns:
A tuple. By default, returns (outputs, non_trainable_variables).

If return_losses = True, then returns (outputs, non_trainable_variables, losses).

Note: non_trainable_variables include not only non-trainable weights such as BatchNormalization statistics, but also RNG seed state (if there are any random operations part of the layer, such as dropout), and Metric state (if there are any metrics attached to the layer). These are all elements of state of the layer.

Example:

model = ...
data = ...
trainable_variables = model.trainable_variables
non_trainable_variables = model.non_trainable_variables
# Call the model with zero side effects
outputs, non_trainable_variables = model.stateless_call(
    trainable_variables,
    non_trainable_variables,
    data,
)
# Attach the updated state to the model
# (until you do this, the model is still in its pre-call state).
for ref_var, value in zip(
    model.non_trainable_variables, non_trainable_variables
):
    ref_var.assign(value)
stateless_compute_loss(trainable_variables, non_trainable_variables, metrics_variables, x=None, y=None, y_pred=None, sample_weight=None, training=True)#
stateless_compute_metrics(trainable_variables: Any, non_trainable_variables: Any, metrics_variables: Any, data: dict[str, Any], stage: str = 'training') tuple[Array, tuple]#

Stateless forward pass used as the jax.value_and_grad target.

All model state is injected via keras.StatelessScope so that JAX can differentiate through the computation.

Parameters:
trainable_variablesAny

Current trainable weight values.

non_trainable_variablesAny

Current non-trainable variable values (e.g. batch-norm statistics).

metrics_variablesAny

Current metric tracking variable values.

datadict[str, Any]

Input data dictionary passed to compute_metrics().

stagestr, default "training"

"training" or "validation".

Returns:
lossjax.Array

Scalar loss for gradient computation.

auxtuple

(metrics_dict, updated_non_trainable_variables, updated_metrics_variables).

stateless_test_step(state: tuple, data: dict[str, Any]) tuple[dict[str, Array], tuple]#

Stateless validation step.

Parameters:
statetuple

(trainable_variables, non_trainable_variables, metrics_variables).

datadict[str, Any]

Input data for validation.

Returns:
metricsdict[str, jax.Array]

Computed evaluation metrics.

statetuple

Updated state tuple.

stateless_train_step(state: tuple, data: dict[str, Any]) tuple[dict[str, Array], tuple]#

Stateless training step with jax.value_and_grad.

Computes gradients via jax.value_and_grad on stateless_compute_metrics() and applies the optimizer update statelessly.

Parameters:
statetuple

(trainable_variables, non_trainable_variables, optimizer_variables, metrics_variables).

datadict[str, Any]

Input data for training.

Returns:
metricsdict[str, jax.Array]

Computed training metrics.

statetuple

Updated state tuple.

summarize(conditions: Mapping[str, ndarray], **kwargs) ndarray#

Computes the learned summary statistics of given summary variables.

The conditions dictionary is preprocessed using the adapter and passed through the summary network.

Parameters:
conditionsMapping[str, np.ndarray]

Dictionary of simulated or real quantities as NumPy arrays.

**kwargsdict

Additional keyword arguments for the adapter and the summary network.

Returns:
summariesnp.ndarray

The learned summary statistics. Returns None if no summary network is present.

summary(line_length=None, positions=None, print_fn=None, expand_nested=False, show_trainable=False, layer_range=None)#

Prints a string summary of the network.

Args:
line_length: Total length of printed lines

(e.g. set this to adapt the display to different terminal window sizes).

positions: Relative or absolute positions of log elements

in each line. If not provided, becomes [0.3, 0.6, 0.70, 1.]. Defaults to None.

print_fn: Print function to use. By default, prints to stdout.

If stdout doesn’t work in your environment, change to print. It will be called on each line of the summary. You can set it to a custom function in order to capture the string summary.

expand_nested: Whether to expand the nested models.

Defaults to False.

show_trainable: Whether to show if a layer is trainable.

Defaults to False.

layer_range: a list or tuple of 2 strings,

which is the starting layer name and ending layer name (both inclusive) indicating the range of layers to be printed in summary. It also accepts regex patterns instead of exact names. In this case, the start predicate will be the first element that matches layer_range[0] and the end predicate will be the last element that matches layer_range[1]. By default None considers all layers of the model.

Raises:

ValueError: if summary() is called before the model is built.

property supports_masking#

Whether this layer supports computing a mask using compute_mask.

symbolic_call(*args, **kwargs)#
test_on_batch(x, y=None, sample_weight=None, return_dict=False)#

Test the model on a single batch of samples.

Args:

x: Input data. Must be array-like. y: Target data. Must be array-like. sample_weight: Optional array of the same length as x, containing

weights to apply to the model’s loss for each sample. In the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample.

return_dict: If True, loss and metric results are returned as a

dict, with each key being the name of the metric. If False, they are returned as a list.

Returns:

A scalar loss value (when no metrics and return_dict=False), a list of loss and metric values (if there are metrics and return_dict=False), or a dict of metric and loss values (if return_dict=True).

test_step(*args, **kwargs)#

Alias for stateless_test_step() (required by keras.Model.fit()).

to_json(**kwargs)#

Returns a JSON string containing the network configuration.

To load a network from a JSON save file, use keras.models.model_from_json(json_string, custom_objects={…}).

Args:
**kwargs: Additional keyword arguments to be passed to

json.dumps().

Returns:

A JSON string.

train_on_batch(x, y=None, sample_weight=None, class_weight=None, return_dict=False)#

Runs a single gradient update on a single batch of data.

Args:

x: Input data. Must be array-like. y: Target data. Must be array-like. sample_weight: Optional array of the same length as x, containing

weights to apply to the model’s loss for each sample. In the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample.

class_weight: Optional dictionary mapping class indices (integers)

to a weight (float) to apply to the model’s loss for the samples from this class during training. This can be useful to tell the model to “pay more attention” to samples from an under-represented class. When class_weight is specified and targets have a rank of 2 or greater, either y must be one-hot encoded, or an explicit final dimension of 1 must be included for sparse class labels.

return_dict: If True, loss and metric results are returned as a

dict, with each key being the name of the metric. If False, they are returned as a list.

Returns:

A scalar loss value (when no metrics and return_dict=False), a list of loss and metric values (if there are metrics and return_dict=False), or a dict of metric and loss values (if return_dict=True).

train_step(*args, **kwargs)#

Alias for stateless_train_step() (required by keras.Model.fit()).

property trainable#

Settable boolean, whether this layer should be trainable or not.

property trainable_variables#

List of all trainable layer state.

This is equivalent to layer.trainable_weights.

property trainable_weights#

List of all trainable weight variables of the layer.

These are the weights that get updated by the optimizer during training.

property variable_dtype#

The dtype of the state (weights) of the layer.

property variables#

List of all layer state, including random seeds.

This extends layer.weights to include all state used by the layer including `SeedGenerator`s.

Note that metrics variables are not included here, use metrics_variables to visit all the metric variables.

property weights#

List of all weight variables of the layer.

Unlike, layer.variables this excludes metric state and random seeds.