diff --git a/.gitignore b/.gitignore
index d646eb5568..1670e78af3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -16,7 +16,8 @@ templates/examples/graph/*
templates/**/guides/**/*.md
templates/keras_hub/getting_started.md
templates/keras_tuner/getting_started.md
+templates/keras_rs/examples/*
datasets/*
.history
.vscode/*
-.idea/*
+.idea/*
\ No newline at end of file
diff --git a/scripts/autogen.py b/scripts/autogen.py
index a8e7d1dd0d..18f689f2b6 100644
--- a/scripts/autogen.py
+++ b/scripts/autogen.py
@@ -32,7 +32,7 @@
GUIDES_GH_LOCATION = Path("keras-team") / "keras-io" / "blob" / "master" / "guides"
KERAS_TEAM_GH = "https://github.com/keras-team"
PROJECT_URL = {
- "keras": f"{KERAS_TEAM_GH}/keras/tree/v3.11.3/",
+ "keras": f"{KERAS_TEAM_GH}/keras/tree/v3.12.0/",
"keras_tuner": f"{KERAS_TEAM_GH}/keras-tuner/tree/v1.4.7/",
"keras_hub": f"{KERAS_TEAM_GH}/keras-hub/tree/v0.23.0/",
"tf_keras": f"{KERAS_TEAM_GH}/tf-keras/tree/v2.19.0/",
diff --git a/templates/examples/audio/ctc_asr.md b/templates/examples/audio/ctc_asr.md
deleted file mode 100644
index 095e102a45..0000000000
--- a/templates/examples/audio/ctc_asr.md
+++ /dev/null
@@ -1,659 +0,0 @@
-# Automatic Speech Recognition using CTC
-
-**Authors:** [Mohamed Reda Bouadjenek](https://rbouadjenek.github.io/) and [Ngoc Dung Huynh](https://www.linkedin.com/in/parkerhuynh/)
-**Date created:** 2021/09/26
-**Last modified:** 2021/09/26
-**Description:** Training a CTC-based model for automatic speech recognition.
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/audio/ipynb/ctc_asr.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/audio/ctc_asr.py)
-
-
-
----
-## Introduction
-
-Speech recognition is an interdisciplinary subfield of computer science
-and computational linguistics that develops methodologies and technologies
-that enable the recognition and translation of spoken language into text
-by computers. It is also known as automatic speech recognition (ASR),
-computer speech recognition or speech to text (STT). It incorporates
-knowledge and research in the computer science, linguistics and computer
-engineering fields.
-
-This demonstration shows how to combine a 2D CNN, RNN and a Connectionist
-Temporal Classification (CTC) loss to build an ASR. CTC is an algorithm
-used to train deep neural networks in speech recognition, handwriting
-recognition and other sequence problems. CTC is used when we don’t know
-how the input aligns with the output (how the characters in the transcript
-align to the audio). The model we create is similar to
-[DeepSpeech2](https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition/deepspeech2.html).
-
-We will use the LJSpeech dataset from the
-[LibriVox](https://librivox.org/) project. It consists of short
-audio clips of a single speaker reading passages from 7 non-fiction books.
-
-We will evaluate the quality of the model using
-[Word Error Rate (WER)](https://en.wikipedia.org/wiki/Word_error_rate).
-WER is obtained by adding up
-the substitutions, insertions, and deletions that occur in a sequence of
-recognized words. Divide that number by the total number of words originally
-spoken. The result is the WER. To get the WER score you need to install the
-[jiwer](https://pypi.org/project/jiwer/) package. You can use the following command line:
-
-```
-pip install jiwer
-```
-
-**References:**
-
-- [LJSpeech Dataset](https://keithito.com/LJ-Speech-Dataset/)
-- [Speech recognition](https://en.wikipedia.org/wiki/Speech_recognition)
-- [Sequence Modeling With CTC](https://distill.pub/2017/ctc/)
-- [DeepSpeech2](https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition/deepspeech2.html)
-
----
-## Setup
-
-
-```python
-import pandas as pd
-import numpy as np
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-import matplotlib.pyplot as plt
-from IPython import display
-from jiwer import wer
-
-```
-
----
-## Load the LJSpeech Dataset
-
-Let's download the [LJSpeech Dataset](https://keithito.com/LJ-Speech-Dataset/).
-The dataset contains 13,100 audio files as `wav` files in the `/wavs/` folder.
-The label (transcript) for each audio file is a string
-given in the `metadata.csv` file. The fields are:
-
-- **ID**: this is the name of the corresponding .wav file
-- **Transcription**: words spoken by the reader (UTF-8)
-- **Normalized transcription**: transcription with numbers,
-ordinals, and monetary units expanded into full words (UTF-8).
-
-For this demo we will use on the "Normalized transcription" field.
-
-Each audio file is a single-channel 16-bit PCM WAV with a sample rate of 22,050 Hz.
-
-
-```python
-data_url = "https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2"
-data_path = keras.utils.get_file("LJSpeech-1.1", data_url, untar=True)
-wavs_path = data_path + "/wavs/"
-metadata_path = data_path + "/metadata.csv"
-
-
-# Read metadata file and parse it
-metadata_df = pd.read_csv(metadata_path, sep="|", header=None, quoting=3)
-metadata_df.columns = ["file_name", "transcription", "normalized_transcription"]
-metadata_df = metadata_df[["file_name", "normalized_transcription"]]
-metadata_df = metadata_df.sample(frac=1).reset_index(drop=True)
-metadata_df.head(3)
-
-```
-
-
-
-
-
-
-
-
-
-
-
file_name
-
normalized_transcription
-
-
-
-
-
0
-
LJ029-0199
-
On November eighteen the Dallas City Council a...
-
-
-
1
-
LJ028-0237
-
with orders to march into the town by the bed ...
-
-
-
2
-
LJ009-0116
-
On the following day the capital convicts, who...
-
-
-
-
-
-
-
-We now split the data into training and validation set.
-
-
-```python
-split = int(len(metadata_df) * 0.90)
-df_train = metadata_df[:split]
-df_val = metadata_df[split:]
-
-print(f"Size of the training set: {len(df_train)}")
-print(f"Size of the training set: {len(df_val)}")
-
-```
-
-
-```
-Size of the training set: 11790
-Size of the training set: 1310
-
-```
-
----
-## Preprocessing
-
-We first prepare the vocabulary to be used.
-
-
-```python
-# The set of characters accepted in the transcription.
-characters = [x for x in "abcdefghijklmnopqrstuvwxyz'?! "]
-# Mapping characters to integers
-char_to_num = keras.layers.StringLookup(vocabulary=characters, oov_token="")
-# Mapping integers back to original characters
-num_to_char = keras.layers.StringLookup(
- vocabulary=char_to_num.get_vocabulary(), oov_token="", invert=True
-)
-
-print(
- f"The vocabulary is: {char_to_num.get_vocabulary()} "
- f"(size ={char_to_num.vocabulary_size()})"
-)
-```
-
-
----
-## Training and Evaluating
-
-
-```python
-# A utility function to decode the output of the network
-def decode_batch_predictions(pred):
- input_len = np.ones(pred.shape[0]) * pred.shape[1]
- # Use greedy search. For complex tasks, you can use beam search
- results = keras.backend.ctc_decode(pred, input_length=input_len, greedy=True)[0][0]
- # Iterate over the results and get back the text
- output_text = []
- for result in results:
- result = tf.strings.reduce_join(num_to_char(result)).numpy().decode("utf-8")
- output_text.append(result)
- return output_text
-
-
-# A callback class to output a few transcriptions during training
-class CallbackEval(keras.callbacks.Callback):
- """Displays a batch of outputs after every epoch."""
-
- def __init__(self, dataset):
- super().__init__()
- self.dataset = dataset
-
- def on_epoch_end(self, epoch: int, logs=None):
- predictions = []
- targets = []
- for batch in self.dataset:
- X, y = batch
- batch_predictions = model.predict(X)
- batch_predictions = decode_batch_predictions(batch_predictions)
- predictions.extend(batch_predictions)
- for label in y:
- label = (
- tf.strings.reduce_join(num_to_char(label)).numpy().decode("utf-8")
- )
- targets.append(label)
- wer_score = wer(targets, predictions)
- print("-" * 100)
- print(f"Word Error Rate: {wer_score:.4f}")
- print("-" * 100)
- for i in np.random.randint(0, len(predictions), 2):
- print(f"Target : {targets[i]}")
- print(f"Prediction: {predictions[i]}")
- print("-" * 100)
-
-```
-
-Let's start the training process.
-
-
-```python
-# Define the number of epochs.
-epochs = 1
-# Callback function to check transcription on the val set.
-validation_callback = CallbackEval(validation_dataset)
-# Train the model
-history = model.fit(
- train_dataset,
- validation_data=validation_dataset,
- epochs=epochs,
- callbacks=[validation_callback],
-)
-
-```
-
-
-```
-369/369 [==============================] - ETA: 0s - loss: 302.4755----------------------------------------------------------------------------------------------------
-Word Error Rate: 1.0000
-----------------------------------------------------------------------------------------------------
-Target : special agent lyndal l shaneyfelt a photography expert with the fbi
-Prediction: s
-----------------------------------------------------------------------------------------------------
-Target : dissolved in water the sugar is transported down delicate tubes chiefly in the growing bark region of the stem
-Prediction: sss
-----------------------------------------------------------------------------------------------------
-369/369 [==============================] - 407s 1s/step - loss: 302.4755 - val_loss: 252.1534
-
-```
-
-```
-----------------------------------------------------------------------------------------------------
-Word Error Rate: 1.0000
-----------------------------------------------------------------------------------------------------
-Target : the owners of the latter would then issue a second set of warrants on these goods in total ignorance of the fact that they were already pledged
-Prediction: ssnssss
-----------------------------------------------------------------------------------------------------
-Target : till the whole body of the slaves were manumitted in eighteen thirtythree
-Prediction: sr
-----------------------------------------------------------------------------------------------------
-Target : the committee most of all insisted upon the entire individual separation of prisoners except during the hours of labor
-Prediction: ssssss
-----------------------------------------------------------------------------------------------------
-Target : he made no attempt to help her and there are other indications that he did not want her to learn that language
-Prediction: s
-----------------------------------------------------------------------------------------------------
-Target : the building of the babylon so famous in history began with nabopolassar
-Prediction: sssrs
-----------------------------------------------------------------------------------------------------
-
-```
-
----
-## Conclusion
-
-In practice, you should train for around 50 epochs or more. Each epoch
-takes approximately 5-6mn using a `GeForce RTX 2080 Ti` GPU.
-The model we trained at 50 epochs has a `Word Error Rate (WER) ≈ 16% to 17%`.
-
-Some of the transcriptions around epoch 50:
-
-**Audio file: LJ017-0009.wav**
-```
-- Target : sir thomas overbury was undoubtedly poisoned by lord rochester in the reign
-of james the first
-- Prediction: cer thomas overbery was undoubtedly poisoned by lordrochester in the reign
-of james the first
-```
-
-**Audio file: LJ003-0340.wav**
-```
-- Target : the committee does not seem to have yet understood that newgate could be
-only and properly replaced
-- Prediction: the committee does not seem to have yet understood that newgate could be
-only and proberly replace
-```
-
-**Audio file: LJ011-0136.wav**
-```
-- Target : still no sentence of death was carried out for the offense and in eighteen
-thirtytwo
-- Prediction: still no sentence of death was carried out for the offense and in eighteen
-thirtytwo
-```
-
-Example available on HuggingFace.
-| Trained Model | Demo |
-| :--: | :--: |
-| [](https://huggingface.co/keras-io/ctc_asr) | [](https://huggingface.co/spaces/keras-io/ctc_asr) |
-
diff --git a/templates/examples/audio/melgan_spectrogram_inversion.md b/templates/examples/audio/melgan_spectrogram_inversion.md
deleted file mode 100644
index eef8effd1f..0000000000
--- a/templates/examples/audio/melgan_spectrogram_inversion.md
+++ /dev/null
@@ -1,953 +0,0 @@
-# MelGAN-based spectrogram inversion using feature matching
-
-**Author:** [Darshan Deshpande](https://twitter.com/getdarshan)
-**Date created:** 02/09/2021
-**Last modified:** 15/09/2021
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/audio/ipynb/melgan_spectrogram_inversion.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/audio/melgan_spectrogram_inversion.py)
-
-
-**Description:** Inversion of audio from mel-spectrograms using the MelGAN architecture and feature matching.
-
----
-## Introduction
-
-Autoregressive vocoders have been ubiquitous for a majority of the history of speech processing,
-but for most of their existence they have lacked parallelism.
-[MelGAN](https://arxiv.org/pdf/1910.06711v3.pdf) is a
-non-autoregressive, fully convolutional vocoder architecture used for purposes ranging
-from spectral inversion and speech enhancement to present-day state-of-the-art
-speech synthesis when used as a decoder
-with models like Tacotron2 or FastSpeech that convert text to mel spectrograms.
-
-In this tutorial, we will have a look at the MelGAN architecture and how it can achieve
-fast spectral inversion, i.e. conversion of spectrograms to audio waves. The MelGAN
-implemented in this tutorial is similar to the original implementation with only the
-difference of method of padding for convolutions where we will use 'same' instead of
-reflect padding.
-
----
-## Importing and Defining Hyperparameters
-
-
-```python
-!pip install -qqq tensorflow_addons
-!pip install -qqq tensorflow-io
-```
-
-```python
-import tensorflow as tf
-import tensorflow_io as tfio
-from tensorflow import keras
-from tensorflow.keras import layers
-from tensorflow_addons import layers as addon_layers
-
-# Setting logger level to avoid input shape warnings
-tf.get_logger().setLevel("ERROR")
-
-# Defining hyperparameters
-
-DESIRED_SAMPLES = 8192
-LEARNING_RATE_GEN = 1e-5
-LEARNING_RATE_DISC = 1e-6
-BATCH_SIZE = 16
-
-mse = keras.losses.MeanSquaredError()
-mae = keras.losses.MeanAbsoluteError()
-```
-
----
-## Loading the Dataset
-
-This example uses the [LJSpeech dataset](https://keithito.com/LJ-Speech-Dataset/).
-
-The LJSpeech dataset is primarily used for text-to-speech and consists of 13,100 discrete
-speech samples taken from 7 non-fiction books, having a total length of approximately 24
-hours. The MelGAN training is only concerned with the audio waves so we process only the
-WAV files and ignore the audio annotations.
-
-
-```python
-!wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
-!tar -xf /content/LJSpeech-1.1.tar.bz2
-```
-
-
-```
---2021-09-16 11:45:24-- https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
-Resolving data.keithito.com (data.keithito.com)... 174.138.79.61
-Connecting to data.keithito.com (data.keithito.com)|174.138.79.61|:443... connected.
-HTTP request sent, awaiting response... 200 OK
-Length: 2748572632 (2.6G) [application/octet-stream]
-Saving to: ‘LJSpeech-1.1.tar.bz2’
-```
-
-
-
-```
-LJSpeech-1.1.tar.bz 100%[===================>] 2.56G 68.3MB/s in 36s
-```
-
-
-
-
-We create a `tf.data.Dataset` to load and process the audio files on the fly.
-The `preprocess()` function takes the file path as input and returns two instances of the
-wave, one for input and one as the ground truth for comparison. The input wave will be
-mapped to a spectrogram using the custom `MelSpec` layer as shown later in this example.
-
-
-```python
-# Splitting the dataset into training and testing splits
-wavs = tf.io.gfile.glob("LJSpeech-1.1/wavs/*.wav")
-print(f"Number of audio files: {len(wavs)}")
-
-# Mapper function for loading the audio. This function returns two instances of the wave
-def preprocess(filename):
- audio = tf.audio.decode_wav(tf.io.read_file(filename), 1, DESIRED_SAMPLES).audio
- return audio, audio
-
-
-# Create tf.data.Dataset objects and apply preprocessing
-train_dataset = tf.data.Dataset.from_tensor_slices((wavs,))
-train_dataset = train_dataset.map(preprocess, num_parallel_calls=tf.data.AUTOTUNE)
-```
-
-
-```
-Number of audio files: 13100
-
-```
-
----
-## Defining custom layers for MelGAN
-
-The MelGAN architecture consists of 3 main modules:
-
-1. The residual block
-2. Dilated convolutional block
-3. Discriminator block
-
-
-
-Since the network takes a mel-spectrogram as input, we will create an additional custom
-layer
-which can convert the raw audio wave to a spectrogram on-the-fly. We use the raw audio
-tensor from `train_dataset` and map it to a mel-spectrogram using the `MelSpec` layer
-below.
-
-
-```python
-# Custom keras layer for on-the-fly audio to spectrogram conversion
-
-
-class MelSpec(layers.Layer):
- def __init__(
- self,
- frame_length=1024,
- frame_step=256,
- fft_length=None,
- sampling_rate=22050,
- num_mel_channels=80,
- freq_min=125,
- freq_max=7600,
- **kwargs,
- ):
- super().__init__(**kwargs)
- self.frame_length = frame_length
- self.frame_step = frame_step
- self.fft_length = fft_length
- self.sampling_rate = sampling_rate
- self.num_mel_channels = num_mel_channels
- self.freq_min = freq_min
- self.freq_max = freq_max
- # Defining mel filter. This filter will be multiplied with the STFT output
- self.mel_filterbank = tf.signal.linear_to_mel_weight_matrix(
- num_mel_bins=self.num_mel_channels,
- num_spectrogram_bins=self.frame_length // 2 + 1,
- sample_rate=self.sampling_rate,
- lower_edge_hertz=self.freq_min,
- upper_edge_hertz=self.freq_max,
- )
-
- def call(self, audio, training=True):
- # We will only perform the transformation during training.
- if training:
- # Taking the Short Time Fourier Transform. Ensure that the audio is padded.
- # In the paper, the STFT output is padded using the 'REFLECT' strategy.
- stft = tf.signal.stft(
- tf.squeeze(audio, -1),
- self.frame_length,
- self.frame_step,
- self.fft_length,
- pad_end=True,
- )
-
- # Taking the magnitude of the STFT output
- magnitude = tf.abs(stft)
-
- # Multiplying the Mel-filterbank with the magnitude and scaling it using the db scale
- mel = tf.matmul(tf.square(magnitude), self.mel_filterbank)
- log_mel_spec = tfio.audio.dbscale(mel, top_db=80)
- return log_mel_spec
- else:
- return audio
-
- def get_config(self):
- config = super().get_config()
- config.update(
- {
- "frame_length": self.frame_length,
- "frame_step": self.frame_step,
- "fft_length": self.fft_length,
- "sampling_rate": self.sampling_rate,
- "num_mel_channels": self.num_mel_channels,
- "freq_min": self.freq_min,
- "freq_max": self.freq_max,
- }
- )
- return config
-
-```
-
-The residual convolutional block extensively uses dilations and has a total receptive
-field of 27 timesteps per block. The dilations must grow as a power of the `kernel_size`
-to ensure reduction of hissing noise in the output. The network proposed by the paper is
-as follows:
-
-
-
-
-```python
-# Creating the residual stack block
-
-
-def residual_stack(input, filters):
- """Convolutional residual stack with weight normalization.
-
- Args:
- filters: int, determines filter size for the residual stack.
-
- Returns:
- Residual stack output.
- """
- c1 = addon_layers.WeightNormalization(
- layers.Conv1D(filters, 3, dilation_rate=1, padding="same"), data_init=False
- )(input)
- lrelu1 = layers.LeakyReLU()(c1)
- c2 = addon_layers.WeightNormalization(
- layers.Conv1D(filters, 3, dilation_rate=1, padding="same"), data_init=False
- )(lrelu1)
- add1 = layers.Add()([c2, input])
-
- lrelu2 = layers.LeakyReLU()(add1)
- c3 = addon_layers.WeightNormalization(
- layers.Conv1D(filters, 3, dilation_rate=3, padding="same"), data_init=False
- )(lrelu2)
- lrelu3 = layers.LeakyReLU()(c3)
- c4 = addon_layers.WeightNormalization(
- layers.Conv1D(filters, 3, dilation_rate=1, padding="same"), data_init=False
- )(lrelu3)
- add2 = layers.Add()([add1, c4])
-
- lrelu4 = layers.LeakyReLU()(add2)
- c5 = addon_layers.WeightNormalization(
- layers.Conv1D(filters, 3, dilation_rate=9, padding="same"), data_init=False
- )(lrelu4)
- lrelu5 = layers.LeakyReLU()(c5)
- c6 = addon_layers.WeightNormalization(
- layers.Conv1D(filters, 3, dilation_rate=1, padding="same"), data_init=False
- )(lrelu5)
- add3 = layers.Add()([c6, add2])
-
- return add3
-
-```
-
-Each convolutional block uses the dilations offered by the residual stack
-and upsamples the input data by the `upsampling_factor`.
-
-
-```python
-# Dilated convolutional block consisting of the Residual stack
-
-
-def conv_block(input, conv_dim, upsampling_factor):
- """Dilated Convolutional Block with weight normalization.
-
- Args:
- conv_dim: int, determines filter size for the block.
- upsampling_factor: int, scale for upsampling.
-
- Returns:
- Dilated convolution block.
- """
- conv_t = addon_layers.WeightNormalization(
- layers.Conv1DTranspose(conv_dim, 16, upsampling_factor, padding="same"),
- data_init=False,
- )(input)
- lrelu1 = layers.LeakyReLU()(conv_t)
- res_stack = residual_stack(lrelu1, conv_dim)
- lrelu2 = layers.LeakyReLU()(res_stack)
- return lrelu2
-
-```
-
-The discriminator block consists of convolutions and downsampling layers. This block is
-essential for the implementation of the feature matching technique.
-
-Each discriminator outputs a list of feature maps that will be compared during training
-to compute the feature matching loss.
-
-
-```python
-
-def discriminator_block(input):
- conv1 = addon_layers.WeightNormalization(
- layers.Conv1D(16, 15, 1, "same"), data_init=False
- )(input)
- lrelu1 = layers.LeakyReLU()(conv1)
- conv2 = addon_layers.WeightNormalization(
- layers.Conv1D(64, 41, 4, "same", groups=4), data_init=False
- )(lrelu1)
- lrelu2 = layers.LeakyReLU()(conv2)
- conv3 = addon_layers.WeightNormalization(
- layers.Conv1D(256, 41, 4, "same", groups=16), data_init=False
- )(lrelu2)
- lrelu3 = layers.LeakyReLU()(conv3)
- conv4 = addon_layers.WeightNormalization(
- layers.Conv1D(1024, 41, 4, "same", groups=64), data_init=False
- )(lrelu3)
- lrelu4 = layers.LeakyReLU()(conv4)
- conv5 = addon_layers.WeightNormalization(
- layers.Conv1D(1024, 41, 4, "same", groups=256), data_init=False
- )(lrelu4)
- lrelu5 = layers.LeakyReLU()(conv5)
- conv6 = addon_layers.WeightNormalization(
- layers.Conv1D(1024, 5, 1, "same"), data_init=False
- )(lrelu5)
- lrelu6 = layers.LeakyReLU()(conv6)
- conv7 = addon_layers.WeightNormalization(
- layers.Conv1D(1, 3, 1, "same"), data_init=False
- )(lrelu6)
- return [lrelu1, lrelu2, lrelu3, lrelu4, lrelu5, lrelu6, conv7]
-
-```
-
-### Create the generator
-
-
-```python
-
-def create_generator(input_shape):
- inp = keras.Input(input_shape)
- x = MelSpec()(inp)
- x = layers.Conv1D(512, 7, padding="same")(x)
- x = layers.LeakyReLU()(x)
- x = conv_block(x, 256, 8)
- x = conv_block(x, 128, 8)
- x = conv_block(x, 64, 2)
- x = conv_block(x, 32, 2)
- x = addon_layers.WeightNormalization(
- layers.Conv1D(1, 7, padding="same", activation="tanh")
- )(x)
- return keras.Model(inp, x)
-
-
-# We use a dynamic input shape for the generator since the model is fully convolutional
-generator = create_generator((None, 1))
-generator.summary()
-```
-
-
----
-## Defining the loss functions
-
-**Generator Loss**
-
-The generator architecture uses a combination of two losses
-
-1. Mean Squared Error:
-
-This is the standard MSE generator loss calculated between ones and the outputs from the
-discriminator with _N_ layers.
-
-
-
-
-
-2. Feature Matching Loss:
-
-This loss involves extracting the outputs of every layer from the discriminator for both
-the generator and ground truth and compare each layer output _k_ using Mean Absolute Error.
-
-
-
-
-
-**Discriminator Loss**
-
-The discriminator uses the Mean Absolute Error and compares the real data predictions
-with ones and generated predictions with zeros.
-
-
-
-
-
-
-```python
-# Generator loss
-
-
-def generator_loss(real_pred, fake_pred):
- """Loss function for the generator.
-
- Args:
- real_pred: Tensor, output of the ground truth wave passed through the discriminator.
- fake_pred: Tensor, output of the generator prediction passed through the discriminator.
-
- Returns:
- Loss for the generator.
- """
- gen_loss = []
- for i in range(len(fake_pred)):
- gen_loss.append(mse(tf.ones_like(fake_pred[i][-1]), fake_pred[i][-1]))
-
- return tf.reduce_mean(gen_loss)
-
-
-def feature_matching_loss(real_pred, fake_pred):
- """Implements the feature matching loss.
-
- Args:
- real_pred: Tensor, output of the ground truth wave passed through the discriminator.
- fake_pred: Tensor, output of the generator prediction passed through the discriminator.
-
- Returns:
- Feature Matching Loss.
- """
- fm_loss = []
- for i in range(len(fake_pred)):
- for j in range(len(fake_pred[i]) - 1):
- fm_loss.append(mae(real_pred[i][j], fake_pred[i][j]))
-
- return tf.reduce_mean(fm_loss)
-
-
-def discriminator_loss(real_pred, fake_pred):
- """Implements the discriminator loss.
-
- Args:
- real_pred: Tensor, output of the ground truth wave passed through the discriminator.
- fake_pred: Tensor, output of the generator prediction passed through the discriminator.
-
- Returns:
- Discriminator Loss.
- """
- real_loss, fake_loss = [], []
- for i in range(len(real_pred)):
- real_loss.append(mse(tf.ones_like(real_pred[i][-1]), real_pred[i][-1]))
- fake_loss.append(mse(tf.zeros_like(fake_pred[i][-1]), fake_pred[i][-1]))
-
- # Calculating the final discriminator loss after scaling
- disc_loss = tf.reduce_mean(real_loss) + tf.reduce_mean(fake_loss)
- return disc_loss
-
-```
-
-Defining the MelGAN model for training.
-This subclass overrides the `train_step()` method to implement the training logic.
-
-
-```python
-
-class MelGAN(keras.Model):
- def __init__(self, generator, discriminator, **kwargs):
- """MelGAN trainer class
-
- Args:
- generator: keras.Model, Generator model
- discriminator: keras.Model, Discriminator model
- """
- super().__init__(**kwargs)
- self.generator = generator
- self.discriminator = discriminator
-
- def compile(
- self,
- gen_optimizer,
- disc_optimizer,
- generator_loss,
- feature_matching_loss,
- discriminator_loss,
- ):
- """MelGAN compile method.
-
- Args:
- gen_optimizer: keras.optimizer, optimizer to be used for training
- disc_optimizer: keras.optimizer, optimizer to be used for training
- generator_loss: callable, loss function for generator
- feature_matching_loss: callable, loss function for feature matching
- discriminator_loss: callable, loss function for discriminator
- """
- super().compile()
-
- # Optimizers
- self.gen_optimizer = gen_optimizer
- self.disc_optimizer = disc_optimizer
-
- # Losses
- self.generator_loss = generator_loss
- self.feature_matching_loss = feature_matching_loss
- self.discriminator_loss = discriminator_loss
-
- # Trackers
- self.gen_loss_tracker = keras.metrics.Mean(name="gen_loss")
- self.disc_loss_tracker = keras.metrics.Mean(name="disc_loss")
-
- def train_step(self, batch):
- x_batch_train, y_batch_train = batch
-
- with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
- # Generating the audio wave
- gen_audio_wave = generator(x_batch_train, training=True)
-
- # Generating the features using the discriminator
- real_pred = discriminator(y_batch_train)
- fake_pred = discriminator(gen_audio_wave)
-
- # Calculating the generator losses
- gen_loss = generator_loss(real_pred, fake_pred)
- fm_loss = feature_matching_loss(real_pred, fake_pred)
-
- # Calculating final generator loss
- gen_fm_loss = gen_loss + 10 * fm_loss
-
- # Calculating the discriminator losses
- disc_loss = discriminator_loss(real_pred, fake_pred)
-
- # Calculating and applying the gradients for generator and discriminator
- grads_gen = gen_tape.gradient(gen_fm_loss, generator.trainable_weights)
- grads_disc = disc_tape.gradient(disc_loss, discriminator.trainable_weights)
- gen_optimizer.apply_gradients(zip(grads_gen, generator.trainable_weights))
- disc_optimizer.apply_gradients(zip(grads_disc, discriminator.trainable_weights))
-
- self.gen_loss_tracker.update_state(gen_fm_loss)
- self.disc_loss_tracker.update_state(disc_loss)
-
- return {
- "gen_loss": self.gen_loss_tracker.result(),
- "disc_loss": self.disc_loss_tracker.result(),
- }
-
-```
-
----
-## Training
-
-The paper suggests that the training with dynamic shapes takes around 400,000 steps (~500
-epochs). For this example, we will run it only for a single epoch (819 steps).
-Longer training time (greater than 300 epochs) will almost certainly provide better results.
-
-
-```python
-gen_optimizer = keras.optimizers.Adam(
- LEARNING_RATE_GEN, beta_1=0.5, beta_2=0.9, clipnorm=1
-)
-disc_optimizer = keras.optimizers.Adam(
- LEARNING_RATE_DISC, beta_1=0.5, beta_2=0.9, clipnorm=1
-)
-
-# Start training
-generator = create_generator((None, 1))
-discriminator = create_discriminator((None, 1))
-
-mel_gan = MelGAN(generator, discriminator)
-mel_gan.compile(
- gen_optimizer,
- disc_optimizer,
- generator_loss,
- feature_matching_loss,
- discriminator_loss,
-)
-mel_gan.fit(
- train_dataset.shuffle(200).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE), epochs=1
-)
-```
-
-
----
-## Testing the model
-
-The trained model can now be used for real time text-to-speech translation tasks.
-To test how fast the MelGAN inference can be, let us take a sample audio mel-spectrogram
-and convert it. Note that the actual model pipeline will not include the `MelSpec` layer
-and hence this layer will be disabled during inference. The inference input will be a
-mel-spectrogram processed similar to the `MelSpec` layer configuration.
-
-For testing this, we will create a randomly uniformly distributed tensor to simulate the
-behavior of the inference pipeline.
-
-
-```python
-# Sampling a random tensor to mimic a batch of 128 spectrograms of shape [50, 80]
-audio_sample = tf.random.uniform([128, 50, 80])
-```
-
-Timing the inference speed of a single sample. Running this, you can see that the average
-inference time per spectrogram ranges from 8 milliseconds to 10 milliseconds on a K80 GPU which is
-pretty fast.
-
-
-```python
-pred = generator.predict(audio_sample, batch_size=32, verbose=1)
-```
-
-
----
-## Conclusion
-
-The MelGAN is a highly effective architecture for spectral inversion that has a Mean
-Opinion Score (MOS) of 3.61 that considerably outperforms the Griffin
-Lim algorithm having a MOS of just 1.57. In contrast with this, the MelGAN compares with
-the state-of-the-art WaveGlow and WaveNet architectures on text-to-speech and speech
-enhancement tasks on
-the LJSpeech and VCTK datasets [1].
-
-This tutorial highlights:
-
-1. The advantages of using dilated convolutions that grow with the filter size
-2. Implementation of a custom layer for on-the-fly conversion of audio waves to
-mel-spectrograms
-3. Effectiveness of using the feature matching loss function for training GAN generators.
-
-Further reading
-
-1. [MelGAN paper](https://arxiv.org/pdf/1910.06711v3.pdf) (Kundan Kumar et al.) to
-understand the reasoning behind the architecture and training process
-2. For in-depth understanding of the feature matching loss, you can refer to [Improved
-Techniques for Training GANs](https://arxiv.org/pdf/1606.03498v1.pdf) (Tim Salimans et
-al.).
-
-Example available on HuggingFace
-
-| Trained Model | Demo |
-| :--: | :--: |
-| [](https://huggingface.co/keras-io/MelGAN-spectrogram-inversion) | [](https://huggingface.co/spaces/keras-io/MelGAN-spectrogram-inversion) |
-
diff --git a/templates/examples/audio/speaker_recognition_using_cnn.md b/templates/examples/audio/speaker_recognition_using_cnn.md
deleted file mode 100644
index c2d6657891..0000000000
--- a/templates/examples/audio/speaker_recognition_using_cnn.md
+++ /dev/null
@@ -1,852 +0,0 @@
-# Speaker Recognition
-
-**Author:** [Fadi Badine](https://twitter.com/fadibadine)
-**Date created:** 14/06/2020
-**Last modified:** 19/07/2023
-**Description:** Classify speakers using Fast Fourier Transform (FFT) and a 1D Convnet.
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/audio/ipynb/speaker_recognition_using_cnn.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/audio/speaker_recognition_using_cnn.py)
-
-
-
----
-## Introduction
-
-This example demonstrates how to create a model to classify speakers from the
-frequency domain representation of speech recordings, obtained via Fast Fourier
-Transform (FFT).
-
-It shows the following:
-
-- How to use `tf.data` to load, preprocess and feed audio streams into a model
-- How to create a 1D convolutional network with residual
-connections for audio classification.
-
-Our process:
-
-- We prepare a dataset of speech samples from different speakers, with the speaker as label.
-- We add background noise to these samples to augment our data.
-- We take the FFT of these samples.
-- We train a 1D convnet to predict the correct speaker given a noisy FFT speech sample.
-
-Note:
-
-- This example should be run with TensorFlow 2.3 or higher, or `tf-nightly`.
-- The noise samples in the dataset need to be resampled to a sampling rate of 16000 Hz
-before using the code in this example. In order to do this, you will need to have
-installed `ffmpg`.
-
----
-## Setup
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "tensorflow"
-
-import shutil
-import numpy as np
-
-import tensorflow as tf
-import keras
-
-from pathlib import Path
-from IPython.display import display, Audio
-
-# Get the data from https://www.kaggle.com/kongaevans/speaker-recognition-dataset/
-# and save it to ./speaker-recognition-dataset.zip
-# then unzip it to ./16000_pcm_speeches
-```
-
-
-```python
-!kaggle datasets download -d kongaevans/speaker-recognition-dataset
-!unzip -qq speaker-recognition-dataset.zip
-```
-
-```python
-DATASET_ROOT = "16000_pcm_speeches"
-
-# The folders in which we will put the audio samples and the noise samples
-AUDIO_SUBFOLDER = "audio"
-NOISE_SUBFOLDER = "noise"
-
-DATASET_AUDIO_PATH = os.path.join(DATASET_ROOT, AUDIO_SUBFOLDER)
-DATASET_NOISE_PATH = os.path.join(DATASET_ROOT, NOISE_SUBFOLDER)
-
-# Percentage of samples to use for validation
-VALID_SPLIT = 0.1
-
-# Seed to use when shuffling the dataset and the noise
-SHUFFLE_SEED = 43
-
-# The sampling rate to use.
-# This is the one used in all the audio samples.
-# We will resample all the noise to this sampling rate.
-# This will also be the output size of the audio wave samples
-# (since all samples are of 1 second long)
-SAMPLING_RATE = 16000
-
-# The factor to multiply the noise with according to:
-# noisy_sample = sample + noise * prop * scale
-# where prop = sample_amplitude / noise_amplitude
-SCALE = 0.5
-
-BATCH_SIZE = 128
-EPOCHS = 1
-
-```
-
-```
-Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/fchollet/.kaggle/kaggle.json'
-Downloading speaker-recognition-dataset.zip to /home/fchollet/keras-io/scripts/tmp_5022915
- 90%|████████████████████████████████████▉ | 208M/231M [00:00<00:00, 217MB/s]
-100%|█████████████████████████████████████████| 231M/231M [00:01<00:00, 227MB/s]
-
-```
-
----
-## Data preparation
-
-The dataset is composed of 7 folders, divided into 2 groups:
-
-- Speech samples, with 5 folders for 5 different speakers. Each folder contains
-1500 audio files, each 1 second long and sampled at 16000 Hz.
-- Background noise samples, with 2 folders and a total of 6 files. These files
-are longer than 1 second (and originally not sampled at 16000 Hz, but we will resample them to 16000 Hz).
-We will use those 6 files to create 354 1-second-long noise samples to be used for training.
-
-Let's sort these 2 categories into 2 folders:
-
-- An `audio` folder which will contain all the per-speaker speech sample folders
-- A `noise` folder which will contain all the noise samples
-
-Before sorting the audio and noise categories into 2 folders,
-we have the following directory structure:
-
-```
-main_directory/
-...speaker_a/
-...speaker_b/
-...speaker_c/
-...speaker_d/
-...speaker_e/
-...other/
-..._background_noise_/
-```
-
-After sorting, we end up with the following structure:
-
-```
-main_directory/
-...audio/
-......speaker_a/
-......speaker_b/
-......speaker_c/
-......speaker_d/
-......speaker_e/
-...noise/
-......other/
-......_background_noise_/
-```
-
-
-```python
-for folder in os.listdir(DATASET_ROOT):
- if os.path.isdir(os.path.join(DATASET_ROOT, folder)):
- if folder in [AUDIO_SUBFOLDER, NOISE_SUBFOLDER]:
- # If folder is `audio` or `noise`, do nothing
- continue
- elif folder in ["other", "_background_noise_"]:
- # If folder is one of the folders that contains noise samples,
- # move it to the `noise` folder
- shutil.move(
- os.path.join(DATASET_ROOT, folder),
- os.path.join(DATASET_NOISE_PATH, folder),
- )
- else:
- # Otherwise, it should be a speaker folder, then move it to
- # `audio` folder
- shutil.move(
- os.path.join(DATASET_ROOT, folder),
- os.path.join(DATASET_AUDIO_PATH, folder),
- )
-```
-
----
-## Noise preparation
-
-In this section:
-
-- We load all noise samples (which should have been resampled to 16000)
-- We split those noise samples to chunks of 16000 samples which
-correspond to 1 second duration each
-
-
-```python
-# Get the list of all noise files
-noise_paths = []
-for subdir in os.listdir(DATASET_NOISE_PATH):
- subdir_path = Path(DATASET_NOISE_PATH) / subdir
- if os.path.isdir(subdir_path):
- noise_paths += [
- os.path.join(subdir_path, filepath)
- for filepath in os.listdir(subdir_path)
- if filepath.endswith(".wav")
- ]
-if not noise_paths:
- raise RuntimeError(f"Could not find any files at {DATASET_NOISE_PATH}")
-print(
- "Found {} files belonging to {} directories".format(
- len(noise_paths), len(os.listdir(DATASET_NOISE_PATH))
- )
-)
-```
-
-
-We get ~ 98% validation accuracy.
-
----
-## Demonstration
-
-Let's take some samples and:
-
-- Predict the speaker
-- Compare the prediction with the real speaker
-- Listen to the audio to see that despite the samples being noisy,
-the model is still pretty accurate
-
-
-```python
-SAMPLES_TO_DISPLAY = 10
-
-test_ds = paths_and_labels_to_dataset(valid_audio_paths, valid_labels)
-test_ds = test_ds.shuffle(buffer_size=BATCH_SIZE * 8, seed=SHUFFLE_SEED).batch(
- BATCH_SIZE
-)
-
-test_ds = test_ds.map(
- lambda x, y: (add_noise(x, noises, scale=SCALE), y),
- num_parallel_calls=tf.data.AUTOTUNE,
-)
-
-for audios, labels in test_ds.take(1):
- # Get the signal FFT
- ffts = audio_to_fft(audios)
- # Predict
- y_pred = model.predict(ffts)
- # Take random samples
- rnd = np.random.randint(0, BATCH_SIZE, SAMPLES_TO_DISPLAY)
- audios = audios.numpy()[rnd, :, :]
- labels = labels.numpy()[rnd]
- y_pred = np.argmax(y_pred, axis=-1)[rnd]
-
- for index in range(SAMPLES_TO_DISPLAY):
- # For every sample, print the true and predicted label
- # as well as run the voice with the noise
- print(
- "Speaker:\33{} {}\33[0m\tPredicted:\33{} {}\33[0m".format(
- "[92m" if labels[index] == y_pred[index] else "[91m",
- class_names[labels[index]],
- "[92m" if labels[index] == y_pred[index] else "[91m",
- class_names[y_pred[index]],
- )
- )
- display(Audio(audios[index, :, :].squeeze(), rate=SAMPLING_RATE))
-```
-
-
-
-
-
-
diff --git a/templates/examples/audio/stft.md b/templates/examples/audio/stft.md
deleted file mode 100644
index 331e929f7f..0000000000
--- a/templates/examples/audio/stft.md
+++ /dev/null
@@ -1,1822 +0,0 @@
-# Audio Classification with the STFTSpectrogram layer
-
-**Author:** [Mostafa M. Amin](https://mostafa-amin.com)
-**Date created:** 2024/10/04
-**Last modified:** 2024/10/04
-**Description:** Introducing the `STFTSpectrogram` layer to extract spectrograms for audio classification.
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/audio/ipynb/stft.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/audio/stft.py)
-
-
-
----
-## Introduction
-
-Preprocessing audio as spectrograms is an essential step in the vast majority
-of audio-based applications. Spectrograms represent the frequency content of a
-signal over time, are widely used for this purpose. In this tutorial, we'll
-demonstrate how to use the `STFTSpectrogram` layer in Keras to convert raw
-audio waveforms into spectrograms **within the model**. We'll then feed
-these spectrograms into an LSTM network followed by Dense layers to perform
-audio classification on the Speech Commands dataset.
-
-We will:
-
-- Load the ESC-10 dataset.
-- Preprocess the raw audio waveforms and generate spectrograms using
- `STFTSpectrogram`.
-- Build two models, one using spectrograms as 1D signals and the other is using
- as images (2D signals) with a pretrained image model.
-- Train and evaluate the models.
-
----
-## Setup
-
-### Importing the necessary libraries
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax"
-```
-
-
-```python
-import keras
-import matplotlib.pyplot as plt
-import numpy as np
-import pandas as pd
-import scipy.io.wavfile
-from keras import layers
-from scipy.signal import resample
-
-keras.utils.set_random_seed(41)
-```
-
-### Define some variables
-
-
-```python
-BASE_DATA_DIR = "./datasets/esc-50_extracted/ESC-50-master/"
-BATCH_SIZE = 16
-NUM_CLASSES = 10
-EPOCHS = 200
-SAMPLE_RATE = 16000
-```
-
----
-## Download and Preprocess the ESC-10 Dataset
-
-We'll use the Dataset for Environmental Sound Classification dataset (ESC-10).
-This dataset consists of five-second .wav files of environmental sounds.
-
-### Download and Extract the dataset
-
-
-```python
-keras.utils.get_file(
- "esc-50.zip",
- "https://github.com/karoldvl/ESC-50/archive/master.zip",
- cache_dir="./",
- cache_subdir="datasets",
- extract=True,
-)
-```
-
-
-
-
- './datasets/esc-50_extracted'
-
-
-
-### Read the CSV file
-
-
-```python
-pd_data = pd.read_csv(os.path.join(BASE_DATA_DIR, "meta", "esc50.csv"))
-# filter ESC-50 to ESC-10 and reassign the targets
-pd_data = pd_data[pd_data["esc10"]]
-targets = sorted(pd_data["target"].unique().tolist())
-assert len(targets) == NUM_CLASSES
-old_target_to_new_target = {old: new for new, old in enumerate(targets)}
-pd_data["target"] = pd_data["target"].map(lambda t: old_target_to_new_target[t])
-pd_data
-```
-
-
-
-
-
-
-
-
-
-
-
-
-
filename
-
fold
-
target
-
category
-
esc10
-
src_file
-
take
-
-
-
-
-
0
-
1-100032-A-0.wav
-
1
-
0
-
dog
-
True
-
100032
-
A
-
-
-
14
-
1-110389-A-0.wav
-
1
-
0
-
dog
-
True
-
110389
-
A
-
-
-
24
-
1-116765-A-41.wav
-
1
-
9
-
chainsaw
-
True
-
116765
-
A
-
-
-
54
-
1-17150-A-12.wav
-
1
-
4
-
crackling_fire
-
True
-
17150
-
A
-
-
-
55
-
1-172649-A-40.wav
-
1
-
8
-
helicopter
-
True
-
172649
-
A
-
-
-
...
-
...
-
...
-
...
-
...
-
...
-
...
-
...
-
-
-
1876
-
5-233160-A-1.wav
-
5
-
1
-
rooster
-
True
-
233160
-
A
-
-
-
1888
-
5-234879-A-1.wav
-
5
-
1
-
rooster
-
True
-
234879
-
A
-
-
-
1889
-
5-234879-B-1.wav
-
5
-
1
-
rooster
-
True
-
234879
-
B
-
-
-
1894
-
5-235671-A-38.wav
-
5
-
7
-
clock_tick
-
True
-
235671
-
A
-
-
-
1999
-
5-9032-A-0.wav
-
5
-
0
-
dog
-
True
-
9032
-
A
-
-
-
-
400 rows × 7 columns
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-### Define functions to read and preprocess the WAV files
-
-
-```python
-def read_wav_file(path, target_sr=SAMPLE_RATE):
- sr, wav = scipy.io.wavfile.read(os.path.join(BASE_DATA_DIR, "audio", path))
- wav = wav.astype(np.float32) / 32768.0 # normalize to [-1, 1]
- num_samples = int(len(wav) * target_sr / sr) # resample to 16 kHz
- wav = resample(wav, num_samples)
- return wav[:, None] # Add a channel dimension (of size 1)
-```
-
-Create a function that uses the `STFTSpectrogram` to compute a spectrogram,
-then plots it.
-
-
-```python
-def plot_single_spectrogram(sample_wav_data):
- spectrogram = layers.STFTSpectrogram(
- mode="log",
- frame_length=SAMPLE_RATE * 20 // 1000,
- frame_step=SAMPLE_RATE * 5 // 1000,
- fft_length=1024,
- trainable=False,
- )(sample_wav_data[None, ...])[0, ...]
-
- # Plot the spectrogram
- plt.imshow(spectrogram.T, origin="lower")
- plt.title("Single Channel Spectrogram")
- plt.xlabel("Time")
- plt.ylabel("Frequency")
- plt.show()
-```
-
-Create a function that uses the `STFTSpectrogram` to compute three
-spectrograms with multiple bandwidths, then aligns them as an image
-with different channels, to get a multi-bandwith spectrogram, then plots the spectrogram.
-
-
-```python
-def plot_multi_bandwidth_spectrogram(sample_wav_data):
- # All spectrograms must use the same `fft_length`, `frame_step`, and
- # `padding="same"` in order to produce spectrograms with identical shapes,
- # hence aligning them together. `expand_dims` ensures that the shapes are
- # compatible with image models.
-
- spectrograms = np.concatenate(
- [
- layers.STFTSpectrogram(
- mode="log",
- frame_length=SAMPLE_RATE * x // 1000,
- frame_step=SAMPLE_RATE * 5 // 1000,
- fft_length=1024,
- padding="same",
- expand_dims=True,
- )(sample_wav_data[None, ...])[0, ...]
- for x in [5, 10, 20]
- ],
- axis=-1,
- ).transpose([1, 0, 2])
-
- # normalize each color channel for better viewing
- mn = spectrograms.min(axis=(0, 1), keepdims=True)
- mx = spectrograms.max(axis=(0, 1), keepdims=True)
- spectrograms = (spectrograms - mn) / (mx - mn)
-
- plt.imshow(spectrograms, origin="lower")
- plt.title("Multi-bandwidth Spectrogram")
- plt.xlabel("Time")
- plt.ylabel("Frequency")
- plt.show()
-```
-
-Demonstrate a sample wav file.
-
-
-```python
-sample_wav_data = read_wav_file(pd_data["filename"].tolist()[52])
-plt.plot(sample_wav_data[:, 0])
-plt.show()
-```
-
-
-
-
-
-
-
-Plot a Spectrogram
-
-
-```python
-plot_single_spectrogram(sample_wav_data)
-```
-
-
-
-
-
-
-
-Plot a multi-bandwidth spectrogram
-
-
-```python
-plot_multi_bandwidth_spectrogram(sample_wav_data)
-```
-
-
-
-
-
-
-
-### Define functions to construct a TF Dataset
-
-
-```python
-def read_dataset(df, folds):
- msk = df["fold"].isin(folds)
- filenames = df["filename"][msk]
- targets = df["target"][msk].values
- waves = np.array([read_wav_file(fil) for fil in filenames], dtype=np.float32)
- return waves, targets
-```
-
-### Create the datasets
-
-
-```python
-train_x, train_y = read_dataset(pd_data, [1, 2, 3])
-valid_x, valid_y = read_dataset(pd_data, [4])
-test_x, test_y = read_dataset(pd_data, [5])
-```
-
----
-## Training the Models
-
-In this tutorial we demonstrate the different usecases of the `STFTSpectrogram`
-layer.
-
-The first model will use a non-trainable `STFTSpectrogram` layer, so it is
-intended purely for preprocessing. Additionally, the model will use 1D signals,
-hence it make use of Conv1D layers.
-
-The second model will use a trainable `STFTSpectrogram` layer with the
-`expand_dims` option, which expands the shapes to be compatible with image
-models.
-
-### Create the 1D model
-
-1. Create a non-trainable spectrograms, extracting a 1D time signal.
-2. Apply `Conv1D` layers with `LayerNormalization` simialar to the
- classic VGG design.
-4. Apply global maximum pooling to have fixed set of features.
-5. Add `Dense` layers to make the final predictions based on the features.
-
-
-```python
-model1d = keras.Sequential(
- [
- layers.InputLayer((None, 1)),
- layers.STFTSpectrogram(
- mode="log",
- frame_length=SAMPLE_RATE * 40 // 1000,
- frame_step=SAMPLE_RATE * 15 // 1000,
- trainable=False,
- ),
- layers.Conv1D(64, 64, activation="relu"),
- layers.Conv1D(128, 16, activation="relu"),
- layers.LayerNormalization(),
- layers.MaxPooling1D(4),
- layers.Conv1D(128, 8, activation="relu"),
- layers.Conv1D(256, 8, activation="relu"),
- layers.Conv1D(512, 4, activation="relu"),
- layers.LayerNormalization(),
- layers.Dropout(0.5),
- layers.GlobalMaxPooling1D(),
- layers.Dense(256, activation="relu"),
- layers.Dense(256, activation="relu"),
- layers.Dropout(0.5),
- layers.Dense(NUM_CLASSES, activation="softmax"),
- ],
- name="model_1d_non_trainble_stft",
-)
-model1d.compile(
- optimizer=keras.optimizers.Adam(1e-5),
- loss="sparse_categorical_crossentropy",
- metrics=["accuracy"],
-)
-model1d.summary()
-```
-
-
-
----
-## Callbacks to display predictions
-
-
-```python
-
-class DisplayOutputs(keras.callbacks.Callback):
- def __init__(
- self, batch, idx_to_token, target_start_token_idx=27, target_end_token_idx=28
- ):
- """Displays a batch of outputs after every epoch
-
- Args:
- batch: A test batch containing the keys "source" and "target"
- idx_to_token: A List containing the vocabulary tokens corresponding to their indices
- target_start_token_idx: A start token index in the target vocabulary
- target_end_token_idx: An end token index in the target vocabulary
- """
- self.batch = batch
- self.target_start_token_idx = target_start_token_idx
- self.target_end_token_idx = target_end_token_idx
- self.idx_to_char = idx_to_token
-
- def on_epoch_end(self, epoch, logs=None):
- if epoch % 5 != 0:
- return
- source = self.batch["source"]
- target = self.batch["target"].numpy()
- bs = tf.shape(source)[0]
- preds = self.model.generate(source, self.target_start_token_idx)
- preds = preds.numpy()
- for i in range(bs):
- target_text = "".join([self.idx_to_char[_] for _ in target[i, :]])
- prediction = ""
- for idx in preds[i, :]:
- prediction += self.idx_to_char[idx]
- if idx == self.target_end_token_idx:
- break
- print(f"target: {target_text.replace('-','')}")
- print(f"prediction: {prediction}\n")
-
-```
-
----
-## Learning rate schedule
-
-
-```python
-
-class CustomSchedule(keras.optimizers.schedules.LearningRateSchedule):
- def __init__(
- self,
- init_lr=0.00001,
- lr_after_warmup=0.001,
- final_lr=0.00001,
- warmup_epochs=15,
- decay_epochs=85,
- steps_per_epoch=203,
- ):
- super().__init__()
- self.init_lr = init_lr
- self.lr_after_warmup = lr_after_warmup
- self.final_lr = final_lr
- self.warmup_epochs = warmup_epochs
- self.decay_epochs = decay_epochs
- self.steps_per_epoch = steps_per_epoch
-
- def calculate_lr(self, epoch):
- """linear warm up - linear decay"""
- warmup_lr = (
- self.init_lr
- + ((self.lr_after_warmup - self.init_lr) / (self.warmup_epochs - 1)) * epoch
- )
- decay_lr = tf.math.maximum(
- self.final_lr,
- self.lr_after_warmup
- - (epoch - self.warmup_epochs)
- * (self.lr_after_warmup - self.final_lr)
- / self.decay_epochs,
- )
- return tf.math.minimum(warmup_lr, decay_lr)
-
- def __call__(self, step):
- epoch = step // self.steps_per_epoch
- epoch = tf.cast(epoch, "float32")
- return self.calculate_lr(epoch)
-
-```
-
----
-## Create & train the end-to-end model
-
-
-```python
-batch = next(iter(val_ds))
-
-# The vocabulary to convert predicted indices into characters
-idx_to_char = vectorizer.get_vocabulary()
-display_cb = DisplayOutputs(
- batch, idx_to_char, target_start_token_idx=2, target_end_token_idx=3
-) # set the arguments as per vocabulary index for '<' and '>'
-
-model = Transformer(
- num_hid=200,
- num_head=2,
- num_feed_forward=400,
- target_maxlen=max_target_len,
- num_layers_enc=4,
- num_layers_dec=1,
- num_classes=34,
-)
-loss_fn = keras.losses.CategoricalCrossentropy(
- from_logits=True,
- label_smoothing=0.1,
-)
-
-learning_rate = CustomSchedule(
- init_lr=0.00001,
- lr_after_warmup=0.001,
- final_lr=0.00001,
- warmup_epochs=15,
- decay_epochs=85,
- steps_per_epoch=len(ds),
-)
-optimizer = keras.optimizers.Adam(learning_rate)
-model.compile(optimizer=optimizer, loss=loss_fn)
-
-history = model.fit(ds, validation_data=val_ds, callbacks=[display_cb], epochs=1)
-```
-
-
-```
- 1/203 [37m━━━━━━━━━━━━━━━━━━━━ 9:20:11 166s/step - loss: 2.2387
-
-WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
-I0000 00:00:1700071380.331418 678094 device_compiler.h:187] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
-
- 203/203 ━━━━━━━━━━━━━━━━━━━━ 0s 947ms/step - loss: 1.8285target:
-prediction:
-
-
-In practice, you should train for around 100 epochs or more.
-
-Some of the predicted text at or around epoch 35 may look as follows:
-```
-target:
-prediction:
-
-target:
-prediction:
-```
-
diff --git a/templates/examples/audio/uk_ireland_accent_recognition.md b/templates/examples/audio/uk_ireland_accent_recognition.md
deleted file mode 100644
index 4c1b889250..0000000000
--- a/templates/examples/audio/uk_ireland_accent_recognition.md
+++ /dev/null
@@ -1,1203 +0,0 @@
-# English speaker accent recognition using Transfer Learning
-
-**Author:** [Fadi Badine](https://twitter.com/fadibadine)
-**Date created:** 2022/04/16
-**Last modified:** 2022/04/16
-**Description:** Training a model to classify UK & Ireland accents using feature extraction from Yamnet.
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/audio/ipynb/uk_ireland_accent_recognition.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/audio/uk_ireland_accent_recognition.py)
-
-
-
----
-## Introduction
-
-The following example shows how to use feature extraction in order to
-train a model to classify the English accent spoken in an audio wave.
-
-Instead of training a model from scratch, transfer learning enables us to
-take advantage of existing state-of-the-art deep learning models and use them as feature extractors.
-
-Our process:
-
-* Use a TF Hub pre-trained model (Yamnet) and apply it as part of the tf.data pipeline which transforms
-the audio files into feature vectors.
-* Train a dense model on the feature vectors.
-* Use the trained model for inference on a new audio file.
-
-Note:
-
-* We need to install TensorFlow IO in order to resample audio files to 16 kHz as required by Yamnet model.
-* In the test section, ffmpeg is used to convert the mp3 file to wav.
-
-You can install TensorFlow IO with the following command:
-
-
-```python
-!pip install -U -q tensorflow_io
-```
-
----
-## Configuration
-
-
-```python
-SEED = 1337
-EPOCHS = 100
-BATCH_SIZE = 64
-VALIDATION_RATIO = 0.1
-MODEL_NAME = "uk_irish_accent_recognition"
-
-# Location where the dataset will be downloaded.
-# By default (None), keras.utils.get_file will use ~/.keras/ as the CACHE_DIR
-CACHE_DIR = None
-
-# The location of the dataset
-URL_PATH = "https://www.openslr.org/resources/83/"
-
-# List of datasets compressed files that contain the audio files
-zip_files = {
- 0: "irish_english_male.zip",
- 1: "midlands_english_female.zip",
- 2: "midlands_english_male.zip",
- 3: "northern_english_female.zip",
- 4: "northern_english_male.zip",
- 5: "scottish_english_female.zip",
- 6: "scottish_english_male.zip",
- 7: "southern_english_female.zip",
- 8: "southern_english_male.zip",
- 9: "welsh_english_female.zip",
- 10: "welsh_english_male.zip",
-}
-
-# We see that there are 2 compressed files for each accent (except Irish):
-# - One for male speakers
-# - One for female speakers
-# However, we will be using a gender agnostic dataset.
-
-# List of gender agnostic categories
-gender_agnostic_categories = [
- "ir", # Irish
- "mi", # Midlands
- "no", # Northern
- "sc", # Scottish
- "so", # Southern
- "we", # Welsh
-]
-
-class_names = [
- "Irish",
- "Midlands",
- "Northern",
- "Scottish",
- "Southern",
- "Welsh",
- "Not a speech",
-]
-```
-
----
-## Imports
-
-
-```python
-import os
-import io
-import csv
-import numpy as np
-import pandas as pd
-import tensorflow as tf
-import tensorflow_hub as hub
-import tensorflow_io as tfio
-from tensorflow import keras
-import matplotlib.pyplot as plt
-import seaborn as sns
-from scipy import stats
-from IPython.display import Audio
-
-
-# Set all random seeds in order to get reproducible results
-keras.utils.set_random_seed(SEED)
-
-# Where to download the dataset
-DATASET_DESTINATION = os.path.join(CACHE_DIR if CACHE_DIR else "~/.keras/", "datasets")
-```
-
----
-## Yamnet Model
-
-Yamnet is an audio event classifier trained on the AudioSet dataset to predict audio
-events from the AudioSet ontology. It is available on TensorFlow Hub.
-
-Yamnet accepts a 1-D tensor of audio samples with a sample rate of 16 kHz.
-As output, the model returns a 3-tuple:
-
-* Scores of shape `(N, 521)` representing the scores of the 521 classes.
-* Embeddings of shape `(N, 1024)`.
-* The log-mel spectrogram of the entire audio frame.
-
-We will use the embeddings, which are the features extracted from the audio samples, as the input to our dense model.
-
-For more detailed information about Yamnet, please refer to its [TensorFlow Hub](https://tfhub.dev/google/yamnet/1) page.
-
-
-```python
-yamnet_model = hub.load("https://tfhub.dev/google/yamnet/1")
-```
-
----
-## Dataset
-
-The dataset used is the
-[Crowdsourced high-quality UK and Ireland English Dialect speech data set](https://openslr.org/83/)
-which consists of a total of 17,877 high-quality audio wav files.
-
-This dataset includes over 31 hours of recording from 120 volunteers who self-identify as
-native speakers of Southern England, Midlands, Northern England, Wales, Scotland and Ireland.
-
-For more info, please refer to the above link or to the following paper:
-[Open-source Multi-speaker Corpora of the English Accents in the British Isles](https://aclanthology.org/2020.lrec-1.804.pdf)
-
----
-## Download the data
-
-
-```python
-# CSV file that contains information about the dataset. For each entry, we have:
-# - ID
-# - wav file name
-# - transcript
-line_index_file = keras.utils.get_file(
- fname="line_index_file", origin=URL_PATH + "line_index_all.csv"
-)
-
-# Download the list of compressed files that contain the audio wav files
-for i in zip_files:
- fname = zip_files[i].split(".")[0]
- url = URL_PATH + zip_files[i]
-
- zip_file = keras.utils.get_file(fname=fname, origin=url, extract=True)
- os.remove(zip_file)
-```
-
-
-```
-Downloading data from https://www.openslr.org/resources/83/line_index_all.csv
-1990656/1986139 [==============================] - 1s 0us/step
-1998848/1986139 [==============================] - 1s 0us/step
-Downloading data from https://www.openslr.org/resources/83/irish_english_male.zip
-164536320/164531638 [==============================] - 9s 0us/step
-164544512/164531638 [==============================] - 9s 0us/step
-Downloading data from https://www.openslr.org/resources/83/midlands_english_female.zip
-103088128/103085118 [==============================] - 6s 0us/step
-103096320/103085118 [==============================] - 6s 0us/step
-Downloading data from https://www.openslr.org/resources/83/midlands_english_male.zip
-166838272/166833961 [==============================] - 9s 0us/step
-166846464/166833961 [==============================] - 9s 0us/step
-Downloading data from https://www.openslr.org/resources/83/northern_english_female.zip
-314990592/314983063 [==============================] - 15s 0us/step
-314998784/314983063 [==============================] - 15s 0us/step
-Downloading data from https://www.openslr.org/resources/83/northern_english_male.zip
-817774592/817772034 [==============================] - 39s 0us/step
-817782784/817772034 [==============================] - 39s 0us/step
-Downloading data from https://www.openslr.org/resources/83/scottish_english_female.zip
-351444992/351443880 [==============================] - 17s 0us/step
-351453184/351443880 [==============================] - 17s 0us/step
-Downloading data from https://www.openslr.org/resources/83/scottish_english_male.zip
-620257280/620254118 [==============================] - 30s 0us/step
-620265472/620254118 [==============================] - 30s 0us/step
-Downloading data from https://www.openslr.org/resources/83/southern_english_female.zip
-1636704256/1636701939 [==============================] - 77s 0us/step
-1636712448/1636701939 [==============================] - 77s 0us/step
-Downloading data from https://www.openslr.org/resources/83/southern_english_male.zip
-1700962304/1700955740 [==============================] - 79s 0us/step
-1700970496/1700955740 [==============================] - 79s 0us/step
-Downloading data from https://www.openslr.org/resources/83/welsh_english_female.zip
-595689472/595683538 [==============================] - 29s 0us/step
-595697664/595683538 [==============================] - 29s 0us/step
-Downloading data from https://www.openslr.org/resources/83/welsh_english_male.zip
-757653504/757645790 [==============================] - 37s 0us/step
-757661696/757645790 [==============================] - 37s 0us/step
-
-```
-
----
-## Load the data in a Dataframe
-
-Of the 3 columns (ID, filename and transcript), we are only interested in the filename column in order to read the audio file.
-We will ignore the other two.
-
-
-```python
-dataframe = pd.read_csv(
- line_index_file, names=["id", "filename", "transcript"], usecols=["filename"]
-)
-dataframe.head()
-```
-
-
-
-
-
-
-
-
-
-
-
filename
-
-
-
-
-
0
-
wef_12484_01482829612
-
-
-
1
-
wef_12484_01345932698
-
-
-
2
-
wef_12484_00999757777
-
-
-
3
-
wef_12484_00036278823
-
-
-
4
-
wef_12484_00458512623
-
-
-
-
-
-
-
-Let's now preprocess the dataset by:
-
-* Adjusting the filename (removing a leading space & adding ".wav" extension to the
-filename).
-* Creating a label using the first 2 characters of the filename which indicate the
-accent.
-* Shuffling the samples.
-
-
-```python
-# The purpose of this function is to preprocess the dataframe by applying the following:
-# - Cleaning the filename from a leading space
-# - Generating a label column that is gender agnostic i.e.
-# welsh english male and welsh english female for example are both labeled as
-# welsh english
-# - Add extension .wav to the filename
-# - Shuffle samples
-def preprocess_dataframe(dataframe):
- # Remove leading space in filename column
- dataframe["filename"] = dataframe.apply(lambda row: row["filename"].strip(), axis=1)
-
- # Create gender agnostic labels based on the filename first 2 letters
- dataframe["label"] = dataframe.apply(
- lambda row: gender_agnostic_categories.index(row["filename"][:2]), axis=1
- )
-
- # Add the file path to the name
- dataframe["filename"] = dataframe.apply(
- lambda row: os.path.join(DATASET_DESTINATION, row["filename"] + ".wav"), axis=1
- )
-
- # Shuffle the samples
- dataframe = dataframe.sample(frac=1, random_state=SEED).reset_index(drop=True)
-
- return dataframe
-
-
-dataframe = preprocess_dataframe(dataframe)
-dataframe.head()
-```
-
-
-
-
-
-
-
-
-
-
-
filename
-
label
-
-
-
-
-
0
-
/root/.keras/datasets/som_03853_01027933689.wav
-
4
-
-
-
1
-
/root/.keras/datasets/som_04310_01833253760.wav
-
4
-
-
-
2
-
/root/.keras/datasets/sof_06136_01210700905.wav
-
4
-
-
-
3
-
/root/.keras/datasets/som_02484_00261230384.wav
-
4
-
-
-
4
-
/root/.keras/datasets/nom_06136_00616878975.wav
-
2
-
-
-
-
-
-
-
----
-## Prepare training & validation sets
-
-Let's split the samples creating training and validation sets.
-
-
-```python
-split = int(len(dataframe) * (1 - VALIDATION_RATIO))
-train_df = dataframe[:split]
-valid_df = dataframe[split:]
-
-print(
- f"We have {train_df.shape[0]} training samples & {valid_df.shape[0]} validation ones"
-)
-```
-
-
-```
-We have 16089 training samples & 1788 validation ones
-
-```
-
----
-## Prepare a TensorFlow Dataset
-
-Next, we need to create a `tf.data.Dataset`.
-This is done by creating a `dataframe_to_dataset` function that does the following:
-
-* Create a dataset using filenames and labels.
-* Get the Yamnet embeddings by calling another function `filepath_to_embeddings`.
-* Apply caching, reshuffling and setting batch size.
-
-The `filepath_to_embeddings` does the following:
-
-* Load audio file.
-* Resample audio to 16 kHz.
-* Generate scores and embeddings from Yamnet model.
-* Since Yamnet generates multiple samples for each audio file,
-this function also duplicates the label for all the generated samples
-that have `score=0` (speech) whereas sets the label for the others as
-'other' indicating that this audio segment is not a speech and we won't label it as one of the accents.
-
-The below `load_16k_audio_file` is copied from the following tutorial
-[Transfer learning with YAMNet for environmental sound classification](https://www.tensorflow.org/tutorials/audio/transfer_learning_audio)
-
-
-```python
-
-@tf.function
-def load_16k_audio_wav(filename):
- # Read file content
- file_content = tf.io.read_file(filename)
-
- # Decode audio wave
- audio_wav, sample_rate = tf.audio.decode_wav(file_content, desired_channels=1)
- audio_wav = tf.squeeze(audio_wav, axis=-1)
- sample_rate = tf.cast(sample_rate, dtype=tf.int64)
-
- # Resample to 16k
- audio_wav = tfio.audio.resample(audio_wav, rate_in=sample_rate, rate_out=16000)
-
- return audio_wav
-
-
-def filepath_to_embeddings(filename, label):
- # Load 16k audio wave
- audio_wav = load_16k_audio_wav(filename)
-
- # Get audio embeddings & scores.
- # The embeddings are the audio features extracted using transfer learning
- # while scores will be used to identify time slots that are not speech
- # which will then be gathered into a specific new category 'other'
- scores, embeddings, _ = yamnet_model(audio_wav)
-
- # Number of embeddings in order to know how many times to repeat the label
- embeddings_num = tf.shape(embeddings)[0]
- labels = tf.repeat(label, embeddings_num)
-
- # Change labels for time-slots that are not speech into a new category 'other'
- labels = tf.where(tf.argmax(scores, axis=1) == 0, label, len(class_names) - 1)
-
- # Using one-hot in order to use AUC
- return (embeddings, tf.one_hot(labels, len(class_names)))
-
-
-def dataframe_to_dataset(dataframe, batch_size=64):
- dataset = tf.data.Dataset.from_tensor_slices(
- (dataframe["filename"], dataframe["label"])
- )
-
- dataset = dataset.map(
- lambda x, y: filepath_to_embeddings(x, y),
- num_parallel_calls=tf.data.experimental.AUTOTUNE,
- ).unbatch()
-
- return dataset.cache().batch(batch_size).prefetch(tf.data.AUTOTUNE)
-
-
-train_ds = dataframe_to_dataset(train_df)
-valid_ds = dataframe_to_dataset(valid_df)
-```
-
----
-## Build the model
-
-The model that we use consists of:
-
-* An input layer which is the embedding output of the Yamnet classifier.
-* 4 dense hidden layers and 4 dropout layers.
-* An output dense layer.
-
-The model's hyperparameters were selected using
-[KerasTuner](https://keras.io/keras_tuner/).
-
-
-```python
-keras.backend.clear_session()
-
-
-def build_and_compile_model():
- inputs = keras.layers.Input(shape=(1024), name="embedding")
-
- x = keras.layers.Dense(256, activation="relu", name="dense_1")(inputs)
- x = keras.layers.Dropout(0.15, name="dropout_1")(x)
-
- x = keras.layers.Dense(384, activation="relu", name="dense_2")(x)
- x = keras.layers.Dropout(0.2, name="dropout_2")(x)
-
- x = keras.layers.Dense(192, activation="relu", name="dense_3")(x)
- x = keras.layers.Dropout(0.25, name="dropout_3")(x)
-
- x = keras.layers.Dense(384, activation="relu", name="dense_4")(x)
- x = keras.layers.Dropout(0.2, name="dropout_4")(x)
-
- outputs = keras.layers.Dense(len(class_names), activation="softmax", name="ouput")(
- x
- )
-
- model = keras.Model(inputs=inputs, outputs=outputs, name="accent_recognition")
-
- model.compile(
- optimizer=keras.optimizers.Adam(learning_rate=1.9644e-5),
- loss=keras.losses.CategoricalCrossentropy(),
- metrics=["accuracy", keras.metrics.AUC(name="auc")],
- )
-
- return model
-
-
-model = build_and_compile_model()
-model.summary()
-```
-
-
----
-## Class weights calculation
-
-Since the dataset is quite unbalanced, we will use `class_weight` argument during training.
-
-Getting the class weights is a little tricky because even though we know the number of
-audio files for each class, it does not represent the number of samples for that class
-since Yamnet transforms each audio file into multiple audio samples of 0.96 seconds each.
-So every audio file will be split into a number of samples that is proportional to its length.
-
-Therefore, to get those weights, we have to calculate the number of samples for each class
-after preprocessing through Yamnet.
-
-
-```python
-class_counts = tf.zeros(shape=(len(class_names),), dtype=tf.int32)
-
-for x, y in iter(train_ds):
- class_counts = class_counts + tf.math.bincount(
- tf.cast(tf.math.argmax(y, axis=1), tf.int32), minlength=len(class_names)
- )
-
-class_weight = {
- i: tf.math.reduce_sum(class_counts).numpy() / class_counts[i].numpy()
- for i in range(len(class_counts))
-}
-
-print(class_weight)
-```
-
-
-We can see that the model achieves the following results:
-
-Results | Training | Validation
------------|-----------|------------
-Accuracy | 54% | 51%
-AUC | 0.91 | 0.89
-d-prime | 1.882 | 1.740
-
----
-## Confusion Matrix
-
-Let's now plot the confusion matrix for the validation dataset.
-
-The confusion matrix lets us see, for every class, not only how many samples were correctly classified,
-but also which other classes were the samples confused with.
-
-It allows us to calculate the precision and recall for every class.
-
-
-```python
-# Create x and y tensors
-x_valid = None
-y_valid = None
-
-for x, y in iter(valid_ds):
- if x_valid is None:
- x_valid = x.numpy()
- y_valid = y.numpy()
- else:
- x_valid = np.concatenate((x_valid, x.numpy()), axis=0)
- y_valid = np.concatenate((y_valid, y.numpy()), axis=0)
-
-# Generate predictions
-y_pred = model.predict(x_valid)
-
-# Calculate confusion matrix
-confusion_mtx = tf.math.confusion_matrix(
- np.argmax(y_valid, axis=1), np.argmax(y_pred, axis=1)
-)
-
-# Plot the confusion matrix
-plt.figure(figsize=(10, 8))
-sns.heatmap(
- confusion_mtx, xticklabels=class_names, yticklabels=class_names, annot=True, fmt="g"
-)
-plt.xlabel("Prediction")
-plt.ylabel("Label")
-plt.title("Validation Confusion Matrix")
-plt.show()
-```
-
-
-
-
-
----
-## Precision & recall
-
-For every class:
-
-* Recall is the ratio of correctly classified samples i.e. it shows how many samples
-of this specific class, the model is able to detect.
-It is the ratio of diagonal elements to the sum of all elements in the row.
-* Precision shows the accuracy of the classifier. It is the ratio of correctly predicted
-samples among the ones classified as belonging to this class.
-It is the ratio of diagonal elements to the sum of all elements in the column.
-
-
-```python
-for i, label in enumerate(class_names):
- precision = confusion_mtx[i, i] / np.sum(confusion_mtx[:, i])
- recall = confusion_mtx[i, i] / np.sum(confusion_mtx[i, :])
- print(
- "{0:15} Precision:{1:.2f}%; Recall:{2:.2f}%".format(
- label, precision * 100, recall * 100
- )
- )
-```
-
-
----
-## Run inference on test data
-
-Let's now run a test on a single audio file.
-Let's check this example from [The Scottish Voice](https://www.thescottishvoice.org.uk/home/)
-
-We will:
-
-* Download the mp3 file.
-* Convert it to a 16k wav file.
-* Run the model on the wav file.
-* Plot the results.
-
-
-```python
-filename = "audio-sample-Stuart"
-url = "https://www.thescottishvoice.org.uk/files/cm/files/"
-
-if os.path.exists(filename + ".wav") == False:
- print(f"Downloading {filename}.mp3 from {url}")
- command = f"wget {url}{filename}.mp3"
- os.system(command)
-
- print(f"Converting mp3 to wav and resampling to 16 kHZ")
- command = (
- f"ffmpeg -hide_banner -loglevel panic -y -i {filename}.mp3 -acodec "
- f"pcm_s16le -ac 1 -ar 16000 {filename}.wav"
- )
- os.system(command)
-
-filename = filename + ".wav"
-
-```
-
-
-```
-Downloading audio-sample-Stuart.mp3 from https://www.thescottishvoice.org.uk/files/cm/files/
-Converting mp3 to wav and resampling to 16 kHZ
-
-```
-
-The below function `yamnet_class_names_from_csv` was copied and very slightly changed
-from this [Yamnet Notebook](https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/yamnet.ipynb).
-
-
-```python
-
-def yamnet_class_names_from_csv(yamnet_class_map_csv_text):
- """Returns list of class names corresponding to score vector."""
- yamnet_class_map_csv = io.StringIO(yamnet_class_map_csv_text)
- yamnet_class_names = [
- name for (class_index, mid, name) in csv.reader(yamnet_class_map_csv)
- ]
- yamnet_class_names = yamnet_class_names[1:] # Skip CSV header
- return yamnet_class_names
-
-
-yamnet_class_map_path = yamnet_model.class_map_path().numpy()
-yamnet_class_names = yamnet_class_names_from_csv(
- tf.io.read_file(yamnet_class_map_path).numpy().decode("utf-8")
-)
-
-
-def calculate_number_of_non_speech(scores):
- number_of_non_speech = tf.math.reduce_sum(
- tf.where(tf.math.argmax(scores, axis=1, output_type=tf.int32) != 0, 1, 0)
- )
-
- return number_of_non_speech
-
-
-def filename_to_predictions(filename):
- # Load 16k audio wave
- audio_wav = load_16k_audio_wav(filename)
-
- # Get audio embeddings & scores.
- scores, embeddings, mel_spectrogram = yamnet_model(audio_wav)
-
- print(
- "Out of {} samples, {} are not speech".format(
- scores.shape[0], calculate_number_of_non_speech(scores)
- )
- )
-
- # Predict the output of the accent recognition model with embeddings as input
- predictions = model.predict(embeddings)
-
- return audio_wav, predictions, mel_spectrogram
-
-```
-
-Let's run the model on the audio file:
-
-
-```python
-audio_wav, predictions, mel_spectrogram = filename_to_predictions(filename)
-
-infered_class = class_names[predictions.mean(axis=0).argmax()]
-print(f"The main accent is: {infered_class} English")
-```
-
-
-```
-Out of 66 samples, 0 are not speech
-The main accent is: Scottish English
-
-```
-
-Listen to the audio
-
-
-```python
-Audio(audio_wav, rate=16000)
-```
-
-
-
-
-
-
-
-
-
-
-The below function was copied from this [Yamnet notebook](tinyurl.com/4a8xn7at) and adjusted to our need.
-
-This function plots the following:
-
-* Audio waveform
-* Mel spectrogram
-* Predictions for every time step
-
-
-```python
-plt.figure(figsize=(10, 6))
-
-# Plot the waveform.
-plt.subplot(3, 1, 1)
-plt.plot(audio_wav)
-plt.xlim([0, len(audio_wav)])
-
-# Plot the log-mel spectrogram (returned by the model).
-plt.subplot(3, 1, 2)
-plt.imshow(
- mel_spectrogram.numpy().T, aspect="auto", interpolation="nearest", origin="lower"
-)
-
-# Plot and label the model output scores for the top-scoring classes.
-mean_predictions = np.mean(predictions, axis=0)
-
-top_class_indices = np.argsort(mean_predictions)[::-1]
-plt.subplot(3, 1, 3)
-plt.imshow(
- predictions[:, top_class_indices].T,
- aspect="auto",
- interpolation="nearest",
- cmap="gray_r",
-)
-
-# patch_padding = (PATCH_WINDOW_SECONDS / 2) / PATCH_HOP_SECONDS
-# values from the model documentation
-patch_padding = (0.025 / 2) / 0.01
-plt.xlim([-patch_padding - 0.5, predictions.shape[0] + patch_padding - 0.5])
-# Label the top_N classes.
-yticks = range(0, len(class_names), 1)
-plt.yticks(yticks, [class_names[top_class_indices[x]] for x in yticks])
-_ = plt.ylim(-0.5 + np.array([len(class_names), 0]))
-```
-
-
-
-
diff --git a/templates/examples/audio/wav2vec2_audiocls.md b/templates/examples/audio/wav2vec2_audiocls.md
deleted file mode 100644
index b8817bac69..0000000000
--- a/templates/examples/audio/wav2vec2_audiocls.md
+++ /dev/null
@@ -1,482 +0,0 @@
-# Audio Classification with Hugging Face Transformers
-
-**Author:** Sreyan Ghosh
-**Date created:** 2022/07/01
-**Last modified:** 2022/08/27
-**Description:** Training Wav2Vec 2.0 using Hugging Face Transformers for Audio Classification.
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/audio/ipynb/wav2vec2_audiocls.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/audio/wav2vec2_audiocls.py)
-
-
-
----
-## Introduction
-
-Identification of speech commands, also known as *keyword spotting* (KWS),
-is important from an engineering perspective for a wide range of applications,
-from indexing audio databases and indexing keywords, to running speech models locally
-on microcontrollers. Currently, many human-computer interfaces (HCI) like Google
-Assistant, Microsoft Cortana, Amazon Alexa, Apple Siri and others rely on keyword
-spotting. There is a significant amount of research in the field by all major companies,
-notably Google and Baidu.
-
-In the past decade, deep learning has led to significant performance
-gains on this task. Though low-level audio features extracted from raw audio like MFCC or
-mel-filterbanks have been used for decades, the design of these low-level features
-are [flawed by biases](https://arxiv.org/abs/2101.08596). Moreover, deep learning models
-trained on these low-level features can easily overfit to noise or signals irrelevant to the
-task. This makes it is essential for any system to learn speech representations that make
-high-level information, such as acoustic and linguistic content, including phonemes,
-words, semantic meanings, tone, speaker characteristics from speech signals available to
-solve the downstream task. [Wav2Vec 2.0](https://arxiv.org/abs/2006.11477), which solves a
-self-supervised contrastive learning task to learn high-level speech representations,
-provides a great alternative to traditional low-level features for training deep learning
-models for KWS.
-
-In this notebook, we train the Wav2Vec 2.0 (base) model, built on the
-Hugging Face Transformers library, in an end-to-end fashion on the keyword spotting task and
-achieve state-of-the-art results on the Google Speech Commands Dataset.
-
----
-## Setup
-
-### Installing the requirements
-
-
-```python
-pip install git+https://github.com/huggingface/transformers.git
-pip install datasets
-pip install huggingface-hub
-pip install joblib
-pip install librosa
-```
-
-### Importing the necessary libraries
-
-
-```python
-import random
-import logging
-
-import numpy as np
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-
-# Only log error messages
-tf.get_logger().setLevel(logging.ERROR)
-# Set random seed
-tf.keras.utils.set_random_seed(42)
-```
-
-### Define certain variables
-
-
-```python
-# Maximum duration of the input audio file we feed to our Wav2Vec 2.0 model.
-MAX_DURATION = 1
-# Sampling rate is the number of samples of audio recorded every second
-SAMPLING_RATE = 16000
-BATCH_SIZE = 32 # Batch-size for training and evaluating our model.
-NUM_CLASSES = 10 # Number of classes our dataset will have (11 in our case).
-HIDDEN_DIM = 768 # Dimension of our model output (768 in case of Wav2Vec 2.0 - Base).
-MAX_SEQ_LENGTH = MAX_DURATION * SAMPLING_RATE # Maximum length of the input audio file.
-# Wav2Vec 2.0 results in an output frequency with a stride of about 20ms.
-MAX_FRAMES = 49
-MAX_EPOCHS = 2 # Maximum number of training epochs.
-
-MODEL_CHECKPOINT = "facebook/wav2vec2-base" # Name of pretrained model from Hugging Face Model Hub
-```
-
----
-## Load the Google Speech Commands Dataset
-
-We now download the [Google Speech Commands V1 Dataset](https://arxiv.org/abs/1804.03209),
-a popular benchmark for training and evaluating deep learning models built for solving the KWS task.
-The dataset consists of a total of 60,973 audio files, each of 1 second duration,
-divided into ten classes of keywords ("Yes", "No", "Up", "Down", "Left", "Right", "On",
-"Off", "Stop", and "Go"), a class for silence, and an unknown class to include the false
-positive. We load the dataset from [Hugging Face Datasets](https://github.com/huggingface/datasets).
-This can be easily done with the `load_dataset` function.
-
-
-```python
-from datasets import load_dataset
-
-speech_commands_v1 = load_dataset("superb", "ks")
-```
-
-The dataset has the following fields:
-
-- **file**: the path to the raw .wav file of the audio
-- **audio**: the audio file sampled at 16kHz
-- **label**: label ID of the audio utterance
-
-
-```python
-print(speech_commands_v1)
-```
-
-
----
-## Data Pre-processing
-
-For the sake of demonstrating the workflow, in this notebook we only take
-small stratified balanced splits (50%) of the train as our training and test sets.
-We can easily split the dataset using the `train_test_split` method which expects
-the split size and the name of the column relative to which you want to stratify.
-
-Post splitting the dataset, we remove the `unknown` and `silence` classes and only
-focus on the ten main classes. The `filter` method does that easily for you.
-
-Next we sample our train and test splits to a multiple of the `BATCH_SIZE` to
-facilitate smooth training and inference. You can achieve that using the `select`
-method which expects the indices of the samples you want to keep. Rest all are
-discarded.
-
-
-```python
-speech_commands_v1 = speech_commands_v1["train"].train_test_split(
- train_size=0.5, test_size=0.5, stratify_by_column="label"
-)
-
-speech_commands_v1 = speech_commands_v1.filter(
- lambda x: x["label"]
- != (
- speech_commands_v1["train"].features["label"].names.index("_unknown_")
- and speech_commands_v1["train"].features["label"].names.index("_silence_")
- )
-)
-
-speech_commands_v1["train"] = speech_commands_v1["train"].select(
- [i for i in range((len(speech_commands_v1["train"]) // BATCH_SIZE) * BATCH_SIZE)]
-)
-speech_commands_v1["test"] = speech_commands_v1["test"].select(
- [i for i in range((len(speech_commands_v1["test"]) // BATCH_SIZE) * BATCH_SIZE)]
-)
-
-print(speech_commands_v1)
-```
-
-
-Before we can feed the audio utterance samples to our model, we need to
-pre-process them. This is done by a Hugging Face Transformers "Feature Extractor"
-which will (as the name indicates) re-sample your inputs to the sampling rate
-the model expects (in-case they exist with a different sampling rate), as well
-as generate the other inputs that model requires.
-
-To do all of this, we instantiate our `Feature Extractor` with the
-`AutoFeatureExtractor.from_pretrained`, which will ensure:
-
-We get a `Feature Extractor` that corresponds to the model architecture we want to use.
-We download the config that was used when pretraining this specific checkpoint.
-This will be cached so that it's not downloaded again the next time we run the cell.
-
-The `from_pretrained()` method expects the name of a model from the Hugging Face Hub. This is
-exactly similar to `MODEL_CHECKPOINT` and we just pass that.
-
-We write a simple function that helps us in the pre-processing that is compatible
-with Hugging Face Datasets. To summarize, our pre-processing function should:
-
-- Call the audio column to load and if necessary resample the audio file.
-- Check the sampling rate of the audio file matches the sampling rate of the audio data a
-model was pretrained with. You can find this information on the Wav2Vec 2.0 model card.
-- Set a maximum input length so longer inputs are batched without being truncated.
-
-
-```python
-from transformers import AutoFeatureExtractor
-
-feature_extractor = AutoFeatureExtractor.from_pretrained(
- MODEL_CHECKPOINT, return_attention_mask=True
-)
-
-
-def preprocess_function(examples):
- audio_arrays = [x["array"] for x in examples["audio"]]
- inputs = feature_extractor(
- audio_arrays,
- sampling_rate=feature_extractor.sampling_rate,
- max_length=MAX_SEQ_LENGTH,
- truncation=True,
- padding=True,
- )
- return inputs
-
-
-# This line with pre-process our speech_commands_v1 dataset. We also remove the "audio"
-# and "file" columns as they will be of no use to us while training.
-processed_speech_commands_v1 = speech_commands_v1.map(
- preprocess_function, remove_columns=["audio", "file"], batched=True
-)
-
-# Load the whole dataset splits as a dict of numpy arrays
-train = processed_speech_commands_v1["train"].shuffle(seed=42).with_format("numpy")[:]
-test = processed_speech_commands_v1["test"].shuffle(seed=42).with_format("numpy")[:]
-```
-
----
-## Defining the Wav2Vec 2.0 with Classification-Head
-
-We now define our model. To be precise, we define a Wav2Vec 2.0 model and add a
-Classification-Head on top to output a probability distribution of all classes for each
-input audio sample. Since the model might get complex we first define the Wav2Vec
-2.0 model with Classification-Head as a Keras layer and then build the model using that.
-
-We instantiate our main Wav2Vec 2.0 model using the `TFWav2Vec2Model` class. This will
-instantiate a model which will output 768 or 1024 dimensional embeddings according to
-the config you choose (BASE or LARGE). The `from_pretrained()` additionally helps you
-load pre-trained weights from the Hugging Face Model Hub. It will download the pre-trained weights
-together with the config corresponding to the name of the model you have mentioned when
-calling the method. For our task, we choose the BASE variant of the model that has
-just been pre-trained, since we fine-tune over it.
-
-
-```python
-from transformers import TFWav2Vec2Model
-
-
-def mean_pool(hidden_states, feature_lengths):
- attenion_mask = tf.sequence_mask(
- feature_lengths, maxlen=MAX_FRAMES, dtype=tf.dtypes.int64
- )
- padding_mask = tf.cast(
- tf.reverse(tf.cumsum(tf.reverse(attenion_mask, [-1]), -1), [-1]),
- dtype=tf.dtypes.bool,
- )
- hidden_states = tf.where(
- tf.broadcast_to(
- tf.expand_dims(~padding_mask, -1), (BATCH_SIZE, MAX_FRAMES, HIDDEN_DIM)
- ),
- 0.0,
- hidden_states,
- )
- pooled_state = tf.math.reduce_sum(hidden_states, axis=1) / tf.reshape(
- tf.math.reduce_sum(tf.cast(padding_mask, dtype=tf.dtypes.float32), axis=1),
- [-1, 1],
- )
- return pooled_state
-
-
-class TFWav2Vec2ForAudioClassification(layers.Layer):
- """Combines the encoder and decoder into an end-to-end model for training."""
-
- def __init__(self, model_checkpoint, num_classes):
- super().__init__()
- # Instantiate the Wav2Vec 2.0 model without the Classification-Head
- self.wav2vec2 = TFWav2Vec2Model.from_pretrained(
- model_checkpoint, apply_spec_augment=False, from_pt=True
- )
- self.pooling = layers.GlobalAveragePooling1D()
- # Drop-out layer before the final Classification-Head
- self.intermediate_layer_dropout = layers.Dropout(0.5)
- # Classification-Head
- self.final_layer = layers.Dense(num_classes, activation="softmax")
-
- def call(self, inputs):
- # We take only the first output in the returned dictionary corresponding to the
- # output of the last layer of Wav2vec 2.0
- hidden_states = self.wav2vec2(inputs["input_values"])[0]
-
- # If attention mask does exist then mean-pool only un-masked output frames
- if tf.is_tensor(inputs["attention_mask"]):
- # Get the length of each audio input by summing up the attention_mask
- # (attention_mask = (BATCH_SIZE x MAX_SEQ_LENGTH) ∈ {1,0})
- audio_lengths = tf.cumsum(inputs["attention_mask"], -1)[:, -1]
- # Get the number of Wav2Vec 2.0 output frames for each corresponding audio input
- # length
- feature_lengths = self.wav2vec2.wav2vec2._get_feat_extract_output_lengths(
- audio_lengths
- )
- pooled_state = mean_pool(hidden_states, feature_lengths)
- # If attention mask does not exist then mean-pool only all output frames
- else:
- pooled_state = self.pooling(hidden_states)
-
- intermediate_state = self.intermediate_layer_dropout(pooled_state)
- final_state = self.final_layer(intermediate_state)
-
- return final_state
-
-```
-
----
-## Building and Compiling the model
-
-We now build and compile our model. We use the `SparseCategoricalCrossentropy`
-to train our model since it is a classification task. Following much of literature
-we evaluate our model on the `accuracy` metric.
-
-
-```python
-
-def build_model():
- # Model's input
- inputs = {
- "input_values": tf.keras.Input(shape=(MAX_SEQ_LENGTH,), dtype="float32"),
- "attention_mask": tf.keras.Input(shape=(MAX_SEQ_LENGTH,), dtype="int32"),
- }
- # Instantiate the Wav2Vec 2.0 model with Classification-Head using the desired
- # pre-trained checkpoint
- wav2vec2_model = TFWav2Vec2ForAudioClassification(MODEL_CHECKPOINT, NUM_CLASSES)(
- inputs
- )
- # Model
- model = tf.keras.Model(inputs, wav2vec2_model)
- # Loss
- loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
- # Optimizer
- optimizer = keras.optimizers.Adam(learning_rate=1e-5)
- # Compile and return
- model.compile(loss=loss, optimizer=optimizer, metrics=["accuracy"])
- return model
-
-
-model = build_model()
-```
-
----
-## Training the model
-
-Before we start training our model, we divide the inputs into its
-dependent and independent variables.
-
-
-```python
-# Remove targets from training dictionaries
-train_x = {x: y for x, y in train.items() if x != "label"}
-test_x = {x: y for x, y in test.items() if x != "label"}
-```
-
-And now we can finally start training our model.
-
-
-```python
-model.fit(
- train_x,
- train["label"],
- validation_data=(test_x, test["label"]),
- batch_size=BATCH_SIZE,
- epochs=MAX_EPOCHS,
-)
-```
-
-
-Great! Now that we have trained our model, we predict the classes
-for audio samples in the test set using the `model.predict()` method! We see
-the model predictions are not that great as it has been trained on a very small
-number of samples for just 1 epoch. For best results, we recommend training on
-the complete dataset for at least 5 epochs!
-
-
-```python
-preds = model.predict(test_x)
-```
-
-
-Now we try to infer the model we trained on a randomly sampled audio file.
-We hear the audio file and then also see how well our model was able to predict!
-
-
-```python
-import IPython.display as ipd
-
-rand_int = random.randint(0, len(test_x))
-
-ipd.Audio(data=np.asarray(test_x["input_values"][rand_int]), autoplay=True, rate=16000)
-
-print("Original Label is ", id2label[str(test["label"][rand_int])])
-print("Predicted Label is ", id2label[str(np.argmax((preds[rand_int])))])
-```
-
-
-```
-Original Label is up
-Predicted Label is on
-
-```
-
-Now you can push this model to Hugging Face Model Hub and also share it with all your friends,
-family, favorite pets: they can all load it with the identifier
-`"your-username/the-name-you-picked"`, for instance:
-
-```python
-model.push_to_hub("wav2vec2-ks", organization="keras-io")
-tokenizer.push_to_hub("wav2vec2-ks", organization="keras-io")
-```
-And after you push your model this is how you can load it in the future!
-
-```python
-from transformers import TFWav2Vec2Model
-
-model = TFWav2Vec2Model.from_pretrained("your-username/my-awesome-model", from_pt=True)
-```
-
diff --git a/templates/examples/keras_rs/basic_ranking.md b/templates/examples/keras_rs/basic_ranking.md
deleted file mode 100644
index 2f11e2d353..0000000000
--- a/templates/examples/keras_rs/basic_ranking.md
+++ /dev/null
@@ -1,614 +0,0 @@
-# Recommending movies: ranking
-
-**Author:** [Fabien Hertschuh](https://github.com/hertschuh/), [Abheesht Sharma](https://github.com/abheesht17/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Rank movies using a two tower model.
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/basic_ranking.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/basic_ranking.py)
-
-
-
----
-## Introduction
-
-Recommender systems are often composed of two stages:
-
-1. The retrieval stage is responsible for selecting an initial set of hundreds
- of candidates from all possible candidates. The main objective of this model
- is to efficiently weed out all candidates that the user is not interested in.
- Because the retrieval model may be dealing with millions of candidates, it
- has to be computationally efficient.
-2. The ranking stage takes the outputs of the retrieval model and fine-tunes
- them to select the best possible handful of recommendations. Its task is to
- narrow down the set of items the user may be interested in to a shortlist of
- likely candidates.
-
-In this tutorial, we're going to focus on the second stage, ranking. If you are
-interested in the retrieval stage, have a look at our
-[retrieval](/keras_rs/examples/basic_retrieval/)
-tutorial.
-
-In this tutorial, we're going to:
-
-1. Get our data and split it into a training and test set.
-2. Implement a ranking model.
-3. Fit and evaluate it.
-4. Test running predictions with the model.
-
-Let's begin by choosing JAX as the backend we want to run on, and import all
-the necessary libraries.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import keras
-import tensorflow as tf # Needed for the dataset
-import tensorflow_datasets as tfds
-```
-
----
-## Preparing the dataset
-
-We're going to use the same data as the
-[retrieval](/keras_rs/examples/basic_retrieval/)
-tutorial. The ratings are the objectives we are trying to predict.
-
-
-```python
-# Ratings data.
-ratings = tfds.load("movielens/100k-ratings", split="train")
-# Features of all the available movies.
-movies = tfds.load("movielens/100k-movies", split="train")
-```
-
-
-In the Movielens dataset, user IDs are integers (represented as strings)
-starting at 1 and with no gap. Normally, you would need to create a lookup table
-to map user IDs to integers from 0 to N-1. But as a simplication, we'll use the
-user id directly as an index in our model, in particular to lookup the user
-embedding from the user embedding table. So we need do know the number of users.
-
-
-```python
-users_count = (
- ratings.map(lambda x: tf.strings.to_number(x["user_id"], out_type=tf.int32))
- .reduce(tf.constant(0, tf.int32), tf.maximum)
- .numpy()
-)
-```
-
-In the Movielens dataset, movie IDs are integers (represented as strings)
-starting at 1 and with no gap. Normally, you would need to create a lookup table
-to map movie IDs to integers from 0 to N-1. But as a simplication, we'll use the
-movie id directly as an index in our model, in particular to lookup the movie
-embedding from the movie embedding table. So we need do know the number of
-movies.
-
-
-```python
-movies_count = movies.cardinality().numpy()
-```
-
-The inputs to the model are the user IDs and movie IDs and the labels are the
-ratings.
-
-
-```python
-
-def preprocess_rating(x):
- return (
- # Inputs are user IDs and movie IDs
- {
- "user_id": tf.strings.to_number(x["user_id"], out_type=tf.int32),
- "movie_id": tf.strings.to_number(x["movie_id"], out_type=tf.int32),
- },
- # Labels are ratings between 0 and 1.
- (x["user_rating"] - 1.0) / 4.0,
- )
-
-```
-
-We'll split the data by putting 80% of the ratings in the train set, and 20% in
-the test set.
-
-
-```python
-shuffled_ratings = ratings.map(preprocess_rating).shuffle(
- 100_000, seed=42, reshuffle_each_iteration=False
-)
-train_ratings = shuffled_ratings.take(80_000).batch(1000).cache()
-test_ratings = shuffled_ratings.skip(80_000).take(20_000).batch(1000).cache()
-```
-
----
-## Implementing the Model
-
-### Architecture
-
-Ranking models do not face the same efficiency constraints as retrieval models
-do, and so we have a little bit more freedom in our choice of architectures.
-
-A model composed of multiple stacked dense layers is a relatively common
-architecture for ranking tasks. We can implement it as follows:
-
-
-```python
-
-class RankingModel(keras.Model):
- """Create the ranking model with the provided parameters.
-
- Args:
- num_users: Number of entries in the user embedding table.
- num_candidates: Number of entries in the candidate embedding table.
- embedding_dimension: Output dimension for user and movie embedding tables.
- """
-
- def __init__(
- self,
- num_users,
- num_candidates,
- embedding_dimension=32,
- **kwargs,
- ):
- super().__init__(**kwargs)
- # Embedding table for users.
- self.user_embedding = keras.layers.Embedding(num_users, embedding_dimension)
- # Embedding table for candidates.
- self.candidate_embedding = keras.layers.Embedding(
- num_candidates, embedding_dimension
- )
- # Predictions.
- self.ratings = keras.Sequential(
- [
- # Learn multiple dense layers.
- keras.layers.Dense(256, activation="relu"),
- keras.layers.Dense(64, activation="relu"),
- # Make rating predictions in the final layer.
- keras.layers.Dense(1),
- ]
- )
-
- def call(self, inputs):
- user_id, movie_id = inputs["user_id"], inputs["movie_id"]
- user_embeddings = self.user_embedding(user_id)
- candidate_embeddings = self.candidate_embedding(movie_id)
- return self.ratings(
- keras.ops.concatenate([user_embeddings, candidate_embeddings], axis=1)
- )
-
-```
-
-Let's first instantiate the model. Note that we add `+ 1` to the number of users
-and movies to account for the fact that id zero is not used for either (IDs
-start at 1), but still takes a row in the embedding tables.
-
-
-```python
-model = RankingModel(users_count + 1, movies_count + 1)
-```
-
-### Loss and metrics
-
-The next component is the loss used to train our model. Keras has several losses
-to make this easy. In this instance, we'll make use of the `MeanSquaredError`
-loss in order to predict the ratings. We'll also look at the
-`RootMeanSquaredError` metric.
-
-
-```python
-model.compile(
- loss=keras.losses.MeanSquaredError(),
- metrics=[keras.metrics.RootMeanSquaredError()],
- optimizer=keras.optimizers.Adagrad(learning_rate=0.1),
-)
-```
-
----
-## Fitting and evaluating
-
-After defining the model, we can use the standard Keras `model.fit()` to train
-the model.
-
-
-```python
-model.fit(train_ratings, epochs=5)
-```
-
-
-As the model trains, the loss is falling and the RMSE metric is improving.
-
-Finally, we can evaluate our model on the test set. The lower the RMSE metric,
-the more accurate our model is at predicting ratings.
-
-
-```python
-model.evaluate(test_ratings, return_dict=True)
-```
-
-
- 1/20 ━[37m━━━━━━━━━━━━━━━━━━━ 36s 2s/step - loss: 0.0732 - root_mean_squared_error: 0.2705
-
-
----
-## Testing the ranking model
-
-So far, we have only handled movies by id. Now is the time to create a mapping
-keyed by movie IDs to be able to surface the titles.
-
-
-```python
-movie_id_to_movie_title = {
- int(x["movie_id"]): x["movie_title"] for x in movies.as_numpy_iterator()
-}
-movie_id_to_movie_title[0] = "" # Because id 0 is not in the dataset.
-```
-
-Now we can test the ranking model by computing predictions for a set of movies
-and then rank these movies based on the predictions:
-
-
-```python
-user_id = 42
-movie_ids = [204, 141, 131]
-predictions = model.predict(
- {
- "user_id": keras.ops.array([user_id] * len(movie_ids)),
- "movie_id": keras.ops.array(movie_ids),
- }
-)
-predictions = keras.ops.convert_to_numpy(keras.ops.squeeze(predictions, axis=1))
-
-for movie_id, prediction in zip(movie_ids, predictions):
- print(f"{movie_id_to_movie_title[movie_id]}: {5.0 * prediction:,.2f}")
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 271ms/step
-
-
-```
-
-```
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 273ms/step
-
-
-
-```
-b'Back to the Future (1985)': 3.86
-b'20,000 Leagues Under the Sea (1954)': 3.93
-b"Breakfast at Tiffany's (1961)": 3.72
-
-```
-
diff --git a/templates/examples/keras_rs/basic_retrieval.md b/templates/examples/keras_rs/basic_retrieval.md
deleted file mode 100644
index f8e96c8393..0000000000
--- a/templates/examples/keras_rs/basic_retrieval.md
+++ /dev/null
@@ -1,2170 +0,0 @@
-# Recommending movies: retrieval
-
-**Author:** [Fabien Hertschuh](https://github.com/hertschuh/), [Abheesht Sharma](https://github.com/abheesht17/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Retrieve movies using a two tower model.
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/basic_retrieval.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/basic_retrieval.py)
-
-
-
----
-## Introduction
-
-Recommender systems are often composed of two stages:
-
-1. The retrieval stage is responsible for selecting an initial set of hundreds
- of candidates from all possible candidates. The main objective of this model
- is to efficiently weed out all candidates that the user is not interested in.
- Because the retrieval model may be dealing with millions of candidates, it
- has to be computationally efficient.
-2. The ranking stage takes the outputs of the retrieval model and fine-tunes
- them to select the best possible handful of recommendations. Its task is to
- narrow down the set of items the user may be interested in to a shortlist of
- likely candidates.
-
-In this tutorial, we're going to focus on the first stage, retrieval. If you are
-interested in the ranking stage, have a look at our
-[ranking](/keras_rs/examples/basic_ranking/) tutorial.
-
-Retrieval models are often composed of two sub-models:
-
-1. A query tower computing the query representation (normally a
- fixed-dimensionality embedding vector) using query features.
-2. A candidate tower computing the candidate representation (an equally-sized
- vector) using the candidate features. The outputs of the two models are then
- multiplied together to give a query-candidate affinity score, with higher
- scores expressing a better match between the candidate and the query.
-
-In this tutorial, we're going to build and train such a two-tower model using
-the Movielens dataset.
-
-We're going to:
-
-1. Get our data and split it into a training and test set.
-2. Implement a retrieval model.
-3. Fit and evaluate it.
-4. Test running predictions with the model.
-
-### The dataset
-
-The Movielens dataset is a classic dataset from the
-[GroupLens](https://grouplens.org/datasets/movielens/) research group at the
-University of Minnesota. It contains a set of ratings given to movies by a set
-of users, and is a standard for recommender systems research.
-
-The data can be treated in two ways:
-
-1. It can be interpreted as expressesing which movies the users watched (and
- rated), and which they did not. This is a form of implicit feedback, where
- users' watches tell us which things they prefer to see and which they'd
- rather not see.
-2. It can also be seen as expressesing how much the users liked the movies they
- did watch. This is a form of explicit feedback: given that a user watched a
- movie, we can tell how much they liked by looking at the rating they have
- given.
-
-In this tutorial, we are focusing on a retrieval system: a model that predicts a
-set of movies from the catalogue that the user is likely to watch. For this, the
-model will try to predict the rating users would give to all the movies in the
-catalogue. We will therefore use the explicit rating data.
-
-Let's begin by choosing JAX as the backend we want to run on, and import all
-the necessary libraries.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import keras
-import tensorflow as tf # Needed for the dataset
-import tensorflow_datasets as tfds
-
-import keras_rs
-```
-
----
-## Preparing the dataset
-
-Let's first have a look at the data.
-
-We use the MovieLens dataset from
-[Tensorflow Datasets](https://www.tensorflow.org/datasets). Loading
-`movielens/100k_ratings` yields a `tf.data.Dataset` object containing the
-ratings alongside user and movie data. Loading `movielens/100k_movies` yields a
-`tf.data.Dataset` object containing only the movies data.
-
-Note that since the MovieLens dataset does not have predefined splits, all data
-are under `train` split.
-
-
-```python
-# Ratings data with user and movie data.
-ratings = tfds.load("movielens/100k-ratings", split="train")
-# Features of all the available movies.
-movies = tfds.load("movielens/100k-movies", split="train")
-```
-
-The ratings dataset returns a dictionary of movie id, user id, the assigned
-rating, timestamp, movie information, and user information:
-
-
-```python
-for data in ratings.take(1).as_numpy_iterator():
- print(str(data).replace(", '", ",\n '"))
-```
-
-
-In the Movielens dataset, user IDs are integers (represented as strings)
-starting at 1 and with no gap. Normally, you would need to create a lookup table
-to map user IDs to integers from 0 to N-1. But as a simplication, we'll use the
-user id directly as an index in our model, in particular to lookup the user
-embedding from the user embedding table. So we need do know the number of users.
-
-
-```python
-users_count = (
- ratings.map(lambda x: tf.strings.to_number(x["user_id"], out_type=tf.int32))
- .reduce(tf.constant(0, tf.int32), tf.maximum)
- .numpy()
-)
-```
-
-The movies dataset contains the movie id, movie title, and the genres it belongs
-to. Note that the genres are encoded with integer labels.
-
-
-```python
-for data in movies.take(1).as_numpy_iterator():
- print(str(data).replace(", '", ",\n '"))
-```
-
-
-In the Movielens dataset, movie IDs are integers (represented as strings)
-starting at 1 and with no gap. Normally, you would need to create a lookup table
-to map movie IDs to integers from 0 to N-1. But as a simplication, we'll use the
-movie id directly as an index in our model, in particular to lookup the movie
-embedding from the movie embedding table. So we need do know the number of
-movies.
-
-
-```python
-movies_count = movies.cardinality().numpy()
-```
-
-In this example, we're going to focus on the ratings data. Other tutorials
-explore how to use the movie information data as well as the user information to
-improve the model quality.
-
-We keep only the `user_id`, `movie_id` and `rating` fields in the dataset. Our
-input is the `user_id`. The labels are the `movie_id` alongside the `rating` for
-the given movie and user.
-
-The `rating` is a number between 1 and 5, we adapt it to be between 0 and 1.
-
-
-```python
-
-def preprocess_rating(x):
- return (
- # Input is the user IDs
- tf.strings.to_number(x["user_id"], out_type=tf.int32),
- # Labels are movie IDs + ratings between 0 and 1.
- {
- "movie_id": tf.strings.to_number(x["movie_id"], out_type=tf.int32),
- "rating": (x["user_rating"] - 1.0) / 4.0,
- },
- )
-
-```
-
-To fit and evaluate the model, we need to split it into a training and
-evaluation set. In a real recommender system, this would most likely be done by
-time: the data up to time *T* would be used to predict interactions after *T*.
-
-In this simple example, however, let's use a random split, putting 80% of the
-ratings in the train set, and 20% in the test set.
-
-
-```python
-shuffled_ratings = ratings.map(preprocess_rating).shuffle(
- 100_000, seed=42, reshuffle_each_iteration=False
-)
-train_ratings = shuffled_ratings.take(80_000).batch(1000).cache()
-test_ratings = shuffled_ratings.skip(80_000).take(20_000).batch(1000).cache()
-```
-
----
-## Implementing the Model
-
-Choosing the architecture of our model is a key part of modelling.
-
-We are building a two-tower retrieval model, therefore we need to combine a
-query tower for users and a candidate tower for movies.
-
-The first step is to decide on the dimensionality of the query and candidate
-representations. This is the `embedding_dimension` argument in our model
-constructor. We'll test with a value of `32`. Higher values will correspond to
-models that may be more accurate, but will also be slower to fit and more prone
-to overfitting.
-
-### Query and Candidate Towers
-
-The second step is to define the model itself. In this simple example, the query
-tower and candidate tower are simply embeddings with nothing else. We'll use
-Keras' `Embedding` layer.
-
-We can easily extend the towers to make them arbitrarily complex using standard
-Keras components, as long as we return an `embedding_dimension`-wide output at
-the end.
-
-### Retrieval
-
-The retrieval itself will be performed by `BruteForceRetrieval` layer from Keras
-Recommenders. This layer computes the affinity scores for the given users and
-all the candidate movies, then returns the top K in order.
-
-Note that during training, we don't actually need to perform any retrieval since
-the only affinity scores we need are the ones for the users and movies in the
-batch. As an optimization, we skip the retrieval entirely in the `call` method.
-
-### Loss
-
-The next component is the loss used to train our model. In this case, we use a
-mean square error loss to measure the difference between the predicted movie
-ratings and the actual ratins from users.
-
-Note that we override `compute_loss` from the `keras.Model` class. This allows
-us to compute the query-candidate affinity score, which is obtained by
-multiplying the outputs of the two towers together. That affinity score can then
-be passed to the loss function.
-
-
-```python
-
-class RetrievalModel(keras.Model):
- """Create the retrieval model with the provided parameters.
-
- Args:
- num_users: Number of entries in the user embedding table.
- num_candidates: Number of entries in the candidate embedding table.
- embedding_dimension: Output dimension for user and movie embedding tables.
- """
-
- def __init__(
- self,
- num_users,
- num_candidates,
- embedding_dimension=32,
- **kwargs,
- ):
- super().__init__(**kwargs)
- # Our query tower, simply an embedding table.
- self.user_embedding = keras.layers.Embedding(num_users, embedding_dimension)
- # Our candidate tower, simply an embedding table.
- self.candidate_embedding = keras.layers.Embedding(
- num_candidates, embedding_dimension
- )
- # The layer that performs the retrieval.
- self.retrieval = keras_rs.layers.BruteForceRetrieval(k=10, return_scores=False)
- self.loss_fn = keras.losses.MeanSquaredError()
-
- def build(self, input_shape):
- self.user_embedding.build(input_shape)
- self.candidate_embedding.build(input_shape)
- # In this case, the candidates are directly the movie embeddings.
- # We take a shortcut and directly reuse the variable.
- self.retrieval.candidate_embeddings = self.candidate_embedding.embeddings
- self.retrieval.build(input_shape)
- super().build(input_shape)
-
- def call(self, inputs, training=False):
- user_embeddings = self.user_embedding(inputs)
- result = {
- "user_embeddings": user_embeddings,
- }
- if not training:
- # Skip the retrieval of top movies during training as the
- # predictions are not used.
- result["predictions"] = self.retrieval(user_embeddings)
- return result
-
- def compute_loss(self, x, y, y_pred, sample_weight, training=True):
- candidate_id, rating = y["movie_id"], y["rating"]
- user_embeddings = y_pred["user_embeddings"]
- candidate_embeddings = self.candidate_embedding(candidate_id)
-
- labels = keras.ops.expand_dims(rating, -1)
- # Compute the affinity score by multiplying the two embeddings.
- scores = keras.ops.sum(
- keras.ops.multiply(user_embeddings, candidate_embeddings),
- axis=1,
- keepdims=True,
- )
- return self.loss_fn(labels, scores, sample_weight)
-
-```
-
----
-## Fitting and evaluating
-
-After defining the model, we can use the standard Keras `model.fit()` to train
-and evaluate the model.
-
-Let's first instantiate the model. Note that we add `+ 1` to the number of users
-and movies to account for the fact that id zero is not used for either (IDs
-start at 1), but still takes a row in the embedding tables.
-
-
-```python
-model = RetrievalModel(users_count + 1, movies_count + 1)
-model.compile(optimizer=keras.optimizers.Adagrad(learning_rate=0.1))
-```
-
-Then train the model. Evaluation takes a bit of time, so we only evaluate the
-model every 5 epochs.
-
-
-```python
-history = model.fit(
- train_ratings, validation_data=test_ratings, validation_freq=5, epochs=50
-)
-```
-
-
- 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - loss: 0.4667 - val_loss: 0.4739
-
-
----
-## Making predictions
-
-Now that we have a model, we would like to be able to make predictions.
-
-So far, we have only handled movies by id. Now is the time to create a mapping
-keyed by movie IDs to be able to surface the titles.
-
-
-```python
-movie_id_to_movie_title = {
- int(x["movie_id"]): x["movie_title"] for x in movies.as_numpy_iterator()
-}
-movie_id_to_movie_title[0] = "" # Because id 0 is not in the dataset.
-```
-
-We then simply use the Keras `model.predict()` method. Under the hood, it calls
-the `BruteForceRetrieval` layer to perform the actual retrieval.
-
-Note that this model can retrieve movies already watched by the user. We could
-easily add logic to remove them if that is desirable.
-
-
-```python
-user_id = 42
-predictions = model.predict(keras.ops.convert_to_tensor([user_id]))
-predictions = keras.ops.convert_to_numpy(predictions["predictions"])
-
-print(f"Recommended movies for user {user_id}:")
-for movie_id in predictions[0]:
- print(movie_id_to_movie_title[movie_id])
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step
-
-
-```
-
-```
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step
-
-
-
-```
-Recommended movies for user 42:
-b'Raiders of the Lost Ark (1981)'
-b'Godfather, The (1972)'
-b'Star Trek: The Wrath of Khan (1982)'
-b'Indiana Jones and the Last Crusade (1989)'
-b'Birdcage, The (1996)'
-b'Silence of the Lambs, The (1991)'
-b'Blade Runner (1982)'
-b'Aliens (1986)'
-b'Contact (1997)'
-b'Star Wars (1977)'
-
-```
-
----
-## Item-to-item recommendation
-
-In this model, we created a user-movie model. However, for some applications
-(for example, product detail pages) it's common to perform item-to-item (for
-example, movie-to-movie or product-to-product) recommendations.
-
-Training models like this would follow the same pattern as shown in this
-tutorial, but with different training data. Here, we had a user and a movie
-tower, and used (user, movie) pairs to train them. In an item-to-item model, we
-would have two item towers (for the query and candidate item), and train the
-model using (query item, candidate item) pairs. These could be constructed from
-clicks on product detail pages.
-
diff --git a/templates/examples/keras_rs/data_parallel_retrieval.md b/templates/examples/keras_rs/data_parallel_retrieval.md
deleted file mode 100644
index 36ecb2d692..0000000000
--- a/templates/examples/keras_rs/data_parallel_retrieval.md
+++ /dev/null
@@ -1,4222 +0,0 @@
-# Retrieval with data parallel training
-
-**Author:** [Abheesht Sharma](https://github.com/abheesht17/), [Fabien Hertschuh](https://github.com/hertschuh/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Retrieve movies using a two tower model (data parallel training).
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/data_parallel_retrieval.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/data_parallel_retrieval.py)
-
-
-
----
-## Introduction
-
-In this tutorial, we are going to train the exact same retrieval model as we
-did in our
-[basic retrieval](/keras_rs/examples/basic_retrieval/)
-tutorial, but in a distributed way.
-
-Distributed training is used to train models on multiple devices or machines
-simultaneously, thereby reducing training time. Here, we focus on synchronous
-data parallel training. Each accelerator (GPU/TPU) holds a complete replica
-of the model, and sees a different mini-batch of the input data. Local gradients
-are computed on each device, aggregated and used to compute a global gradient
-update.
-
-Before we begin, let's note down a few things:
-
-1. The number of accelerators should be greater than 1.
-2. The `keras.distribution` API works only with JAX. So, make sure you select
- JAX as your backend!
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax"
-
-import random
-
-import jax
-import keras
-import tensorflow as tf # Needed only for the dataset
-import tensorflow_datasets as tfds
-
-import keras_rs
-```
-
----
-## Data Parallel
-
-For the synchronous data parallelism strategy in distributed training,
-we will use the `DataParallel` class present in the `keras.distribution`
-API.
-
-
-```python
-devices = jax.devices() # Assume it has >1 local devices.
-data_parallel = keras.distribution.DataParallel(devices=devices)
-```
-
-Alternatively, you can choose to create the `DataParallel` object
-using a 1D `DeviceMesh` object, like so:
-
-```
-mesh_1d = keras.distribution.DeviceMesh(
- shape=(len(devices),), axis_names=["data"], devices=devices
-)
-data_parallel = keras.distribution.DataParallel(device_mesh=mesh_1d)
-```
-
-
-```python
-# Set the global distribution strategy.
-keras.distribution.set_distribution(data_parallel)
-```
-
----
-## Preparing the dataset
-
-Now that we are done defining the global distribution
-strategy, the rest of the guide looks exactly the same
-as the previous basic retrieval guide.
-
-Let's load and prepare the dataset. Here too, we use the
-MovieLens dataset.
-
-
-```python
-# Ratings data with user and movie data.
-ratings = tfds.load("movielens/100k-ratings", split="train")
-# Features of all the available movies.
-movies = tfds.load("movielens/100k-movies", split="train")
-
-# User, movie counts for defining vocabularies.
-users_count = (
- ratings.map(lambda x: tf.strings.to_number(x["user_id"], out_type=tf.int32))
- .reduce(tf.constant(0, tf.int32), tf.maximum)
- .numpy()
-)
-movies_count = movies.cardinality().numpy()
-
-
-# Preprocess dataset, and split it into train-test datasets.
-def preprocess_rating(x):
- return (
- # Input is the user IDs
- tf.strings.to_number(x["user_id"], out_type=tf.int32),
- # Labels are movie IDs + ratings between 0 and 1.
- {
- "movie_id": tf.strings.to_number(x["movie_id"], out_type=tf.int32),
- "rating": (x["user_rating"] - 1.0) / 4.0,
- },
- )
-
-
-shuffled_ratings = ratings.map(preprocess_rating).shuffle(
- 100_000, seed=42, reshuffle_each_iteration=False
-)
-train_ratings = shuffled_ratings.take(80_000).batch(1000).cache()
-test_ratings = shuffled_ratings.skip(80_000).take(20_000).batch(1000).cache()
-```
-
-
----
-## Implementing the Model
-
-We build a two-tower retrieval model. Therefore, we need to combine a
-query tower for users and a candidate tower for movies. Note that we don't
-have to change anything here from the previous basic retrieval tutorial.
-
-
-```python
-
-class RetrievalModel(keras.Model):
- """Create the retrieval model with the provided parameters.
-
- Args:
- num_users: Number of entries in the user embedding table.
- num_candidates: Number of entries in the candidate embedding table.
- embedding_dimension: Output dimension for user and movie embedding tables.
- """
-
- def __init__(
- self,
- num_users,
- num_candidates,
- embedding_dimension=32,
- **kwargs,
- ):
- super().__init__(**kwargs)
- # Our query tower, simply an embedding table.
- self.user_embedding = keras.layers.Embedding(num_users, embedding_dimension)
- # Our candidate tower, simply an embedding table.
- self.candidate_embedding = keras.layers.Embedding(
- num_candidates, embedding_dimension
- )
- # The layer that performs the retrieval.
- self.retrieval = keras_rs.layers.BruteForceRetrieval(k=10, return_scores=False)
- self.loss_fn = keras.losses.MeanSquaredError()
-
- def build(self, input_shape):
- self.user_embedding.build(input_shape)
- self.candidate_embedding.build(input_shape)
- # In this case, the candidates are directly the movie embeddings.
- # We take a shortcut and directly reuse the variable.
- self.retrieval.candidate_embeddings = self.candidate_embedding.embeddings
- self.retrieval.build(input_shape)
- super().build(input_shape)
-
- def call(self, inputs, training=False):
- user_embeddings = self.user_embedding(inputs)
- result = {
- "user_embeddings": user_embeddings,
- }
- if not training:
- # Skip the retrieval of top movies during training as the
- # predictions are not used.
- result["predictions"] = self.retrieval(user_embeddings)
- return result
-
- def compute_loss(self, x, y, y_pred, sample_weight, training=True):
- candidate_id, rating = y["movie_id"], y["rating"]
- user_embeddings = y_pred["user_embeddings"]
- candidate_embeddings = self.candidate_embedding(candidate_id)
-
- labels = keras.ops.expand_dims(rating, -1)
- # Compute the affinity score by multiplying the two embeddings.
- scores = keras.ops.sum(
- keras.ops.multiply(user_embeddings, candidate_embeddings),
- axis=1,
- keepdims=True,
- )
- return self.loss_fn(labels, scores, sample_weight)
-
-```
-
----
-## Fitting and evaluating
-
-After defining the model, we can use the standard Keras `model.fit()` to train
-and evaluate the model.
-
-
-```python
-model = RetrievalModel(users_count + 1, movies_count + 1)
-model.compile(optimizer=keras.optimizers.Adagrad(learning_rate=0.2))
-```
-
-Let's train the model. Evaluation takes a bit of time, so we only evaluate the
-model every 5 epochs.
-
-
-```python
-history = model.fit(
- train_ratings, validation_data=test_ratings, validation_freq=5, epochs=50
-)
-```
-
-
- 80/80 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - loss: 0.1620 - val_loss: 0.1660
-
-
----
-## Making predictions
-
-Now that we have a model, let's run inference and make predictions.
-
-
-```python
-movie_id_to_movie_title = {
- int(x["movie_id"]): x["movie_title"] for x in movies.as_numpy_iterator()
-}
-movie_id_to_movie_title[0] = "" # Because id 0 is not in the dataset.
-```
-
-We then simply use the Keras `model.predict()` method. Under the hood, it calls
-the `BruteForceRetrieval` layer to perform the actual retrieval.
-
-
-```python
-user_ids = random.sample(range(1, 1001), len(devices))
-predictions = model.predict(keras.ops.convert_to_tensor(user_ids))
-predictions = keras.ops.convert_to_numpy(predictions["predictions"])
-
-for i, user_id in enumerate(user_ids):
- print(f"\n==Recommended movies for user {user_id}==")
- for movie_id in predictions[i]:
- print(movie_id_to_movie_title[movie_id])
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 204ms/step
-
-
-```
-
-```
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 205ms/step
-
-
-
-
-```
-==Recommended movies for user 449==
-b'Star Wars (1977)'
-b'Fargo (1996)'
-b'Silence of the Lambs, The (1991)'
-b'Shawshank Redemption, The (1994)'
-b'Pulp Fiction (1994)'
-b'Raiders of the Lost Ark (1981)'
-b"Schindler's List (1993)"
-b'Blade Runner (1982)'
-b"One Flew Over the Cuckoo's Nest (1975)"
-b'Casablanca (1942)'
-```
-
-
-
-```
-==Recommended movies for user 681==
-b'Star Wars (1977)'
-b'Fargo (1996)'
-b'Godfather, The (1972)'
-b'Silence of the Lambs, The (1991)'
-b'Raiders of the Lost Ark (1981)'
-b'Return of the Jedi (1983)'
-b'Pulp Fiction (1994)'
-b"Schindler's List (1993)"
-b'Empire Strikes Back, The (1980)'
-b'Shawshank Redemption, The (1994)'
-```
-
-
-
-```
-==Recommended movies for user 151==
-b'Princess Bride, The (1987)'
-b'Pulp Fiction (1994)'
-b'English Patient, The (1996)'
-b'Alien (1979)'
-b'Raiders of the Lost Ark (1981)'
-b'Willy Wonka and the Chocolate Factory (1971)'
-b'Amadeus (1984)'
-b'Liar Liar (1997)'
-b'Psycho (1960)'
-b"It's a Wonderful Life (1946)"
-```
-
-
-
-```
-==Recommended movies for user 442==
-b'Star Wars (1977)'
-b'Fargo (1996)'
-b'Godfather, The (1972)'
-b'Silence of the Lambs, The (1991)'
-b'Raiders of the Lost Ark (1981)'
-b'Return of the Jedi (1983)'
-b'Pulp Fiction (1994)'
-b'Empire Strikes Back, The (1980)'
-b"Schindler's List (1993)"
-b'Shawshank Redemption, The (1994)'
-```
-
-
-
-```
-==Recommended movies for user 134==
-b'Star Wars (1977)'
-b'Fargo (1996)'
-b'Godfather, The (1972)'
-b'Silence of the Lambs, The (1991)'
-b'Raiders of the Lost Ark (1981)'
-b'Pulp Fiction (1994)'
-b'Return of the Jedi (1983)'
-b'Empire Strikes Back, The (1980)'
-b'Twelve Monkeys (1995)'
-b'Contact (1997)'
-```
-
-
-
-```
-==Recommended movies for user 853==
-b'Star Wars (1977)'
-b'Fargo (1996)'
-b'Godfather, The (1972)'
-b'Raiders of the Lost Ark (1981)'
-b'Silence of the Lambs, The (1991)'
-b'Return of the Jedi (1983)'
-b'Pulp Fiction (1994)'
-b"Schindler's List (1993)"
-b'Empire Strikes Back, The (1980)'
-b'Shawshank Redemption, The (1994)'
-```
-
-
-
-```
-==Recommended movies for user 707==
-b'Star Wars (1977)'
-b'Raiders of the Lost Ark (1981)'
-b'Toy Story (1995)'
-b"Schindler's List (1993)"
-b'Empire Strikes Back, The (1980)'
-b'Fargo (1996)'
-b'Godfather, The (1972)'
-b'Return of the Jedi (1983)'
-b'Terminator, The (1984)'
-b'Princess Bride, The (1987)'
-```
-
-
-
-```
-==Recommended movies for user 511==
-b'Star Wars (1977)'
-b'Fargo (1996)'
-b'Godfather, The (1972)'
-b'Raiders of the Lost Ark (1981)'
-b'Silence of the Lambs, The (1991)'
-b'Return of the Jedi (1983)'
-b"Schindler's List (1993)"
-b'Empire Strikes Back, The (1980)'
-b'Pulp Fiction (1994)'
-b'Shawshank Redemption, The (1994)'
-
-```
-
-And we're done! For data parallel training, all we had to do was add ~3-5 LoC.
-The rest is exactly the same.
-
diff --git a/templates/examples/keras_rs/dcn.md b/templates/examples/keras_rs/dcn.md
deleted file mode 100644
index 6f53f2160e..0000000000
--- a/templates/examples/keras_rs/dcn.md
+++ /dev/null
@@ -1,678 +0,0 @@
-# Ranking with Deep and Cross Networks
-
-**Author:** [Abheesht Sharma](https://github.com/abheesht17/), [Fabien Hertschuh](https://github.com/hertschuh/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Rank movies using Deep and Cross Networks (DCN).
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/dcn.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/dcn.py)
-
-
-
----
-## Introduction
-
-This tutorial demonstrates how to use Deep & Cross Networks (DCN) to effectively
-learn feature crosses. Before diving into the example, let's briefly discuss
-feature crosses.
-
-Imagine that we are building a recommender system for blenders. Individual
-features might include a customer's past purchase history (e.g.,
-`purchased_bananas`, `purchased_cooking_books`) or geographic location. However,
-a customer who has purchased both bananas and cooking books is more likely to be
-interested in a blender than someone who purchased only one or the other. The
-combination of `purchased_bananas` and `purchased_cooking_books` is a feature
-cross. Feature crosses capture interaction information between individual
-features, providing richer context than the individual features alone.
-
-
-
-Learning effective feature crosses presents several challenges. In web-scale
-applications, data is often categorical, resulting in high-dimensional and
-sparse feature spaces. Identifying impactful feature crosses in such
-environments typically relies on manual feature engineering or computationally
-expensive exhaustive searches. While traditional feed-forward multilayer
-perceptrons (MLPs) are universal function approximators, they often struggle to
-efficiently learn even second- or third-order feature interactions.
-
-The Deep & Cross Network (DCN) architecture is designed for more effective
-learning of explicit and bounded-degree feature crosses. It comprises three main
-components: an input layer (typically an embedding layer), a cross network for
-modeling explicit feature interactions, and a deep network for capturing
-implicit interactions.
-
-The cross network is the core of the DCN. It explicitly performs feature
-crossing at each layer, with the highest polynomial degree of feature
-interaction increasing with depth. The following figure shows the `(i+1)`-th
-cross layer.
-
-
-
-The deep network is a standard feedforward multilayer perceptron
-(MLP). These two networks are then combined to form the DCN. Two common
-combination strategies exist: a stacked structure, where the deep network is
-placed on top of the cross network, and a parallel structure, where they
-operate in parallel.
-
-
-
-
-
-
- Parallel layers
-
-
-
-
-
- Stacked layers
-
-
-
-
-
-Now that we know a little bit about DCN, let's start writing some code. We will
-first train a DCN on a toy dataset, and demonstrate that the model has indeed
-learnt important feature crosses.
-
-Let's set the backend to JAX, and get our imports sorted.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import keras
-import matplotlib.pyplot as plt
-import numpy as np
-import tensorflow as tf
-import tensorflow_datasets as tfds
-from mpl_toolkits.axes_grid1 import make_axes_locatable
-
-import keras_rs
-```
-
-Let's also define variables which will be reused throughout the example.
-
-
-```python
-TOY_CONFIG = {
- "learning_rate": 0.01,
- "num_epochs": 100,
- "batch_size": 1024,
-}
-
-MOVIELENS_CONFIG = {
- # features
- "int_features": [
- "movie_id",
- "user_id",
- "user_gender",
- "bucketized_user_age",
- ],
- "str_features": [
- "user_zip_code",
- "user_occupation_text",
- ],
- # model
- "embedding_dim": 32,
- "deep_net_num_units": [192, 192, 192],
- "projection_dim": 20,
- "dcn_num_units": [192, 192],
- # training
- "learning_rate": 0.01,
- "num_epochs": 10,
- "batch_size": 1024,
-}
-
-LOOKUP_LAYERS = {
- "int": keras.layers.IntegerLookup,
- "str": keras.layers.StringLookup,
-}
-```
-
-Here, we define a helper function for visualising weights of the cross layer in
-order to better understand its functioning. Also, we define a function for
-compiling, training and evaluating a given model.
-
-
-```python
-
-def visualize_layer(matrix, features):
- plt.figure(figsize=(9, 9))
-
- im = plt.matshow(np.abs(matrix), cmap=plt.cm.Blues)
-
- ax = plt.gca()
- divider = make_axes_locatable(plt.gca())
- cax = divider.append_axes("right", size="5%", pad=0.05)
- plt.colorbar(im, cax=cax)
- cax.tick_params(labelsize=10)
- ax.set_xticklabels([""] + features, rotation=45, fontsize=10)
- ax.set_yticklabels([""] + features, fontsize=10)
-
-
-def train_and_evaluate(
- learning_rate,
- epochs,
- train_data,
- test_data,
- model,
-):
- optimizer = keras.optimizers.AdamW(learning_rate=learning_rate)
- loss = keras.losses.MeanSquaredError()
- rmse = keras.metrics.RootMeanSquaredError()
-
- model.compile(
- optimizer=optimizer,
- loss=loss,
- metrics=[rmse],
- )
-
- model.fit(
- train_data,
- epochs=epochs,
- verbose=0,
- )
-
- results = model.evaluate(test_data, return_dict=True, verbose=0)
- rmse_value = results["root_mean_squared_error"]
-
- return rmse_value, model.count_params()
-
-
-def print_stats(rmse_list, num_params, model_name):
- # Report metrics.
- num_trials = len(rmse_list)
- avg_rmse = np.mean(rmse_list)
- std_rmse = np.std(rmse_list)
-
- if num_trials == 1:
- print(f"{model_name}: RMSE = {avg_rmse}; #params = {num_params}")
- else:
- print(
- f"{model_name}: RMSE = {avg_rmse} ± {std_rmse}; " "#params = {num_params}"
- )
-
-```
-
----
-## Toy Example
-
-To illustrate the benefits of DCNs, let's consider a simple example. Suppose we
-have a dataset for modeling the likelihood of a customer clicking on a blender
-advertisement. The features and label are defined as follows:
-
-| **Features / Label** | **Description** | **Range**|
-|:--------------------:|:------------------------------:|:--------:|
-| `x1` = country | Customer's resident country | [0, 199] |
-| `x2` = bananas | # bananas purchased | [0, 23] |
-| `x3` = cookbooks | # cooking books purchased | [0, 5] |
-| `y` | Blender ad click likelihood | - |
-
-Then, we let the data follow the following underlying distribution:
-`y = f(x1, x2, x3) = 0.1x1 + 0.4x2 + 0.7x3 + 0.1x1x2 +`
-`3.1x2x3 + 0.1x3^2`.
-
-This distribution shows that the click likelihood (`y`) depends linearly on
-individual features (`xi`) and on multiplicative interactions between them. In
-this scenario, the likelihood of purchasing a blender (`y`) is influenced not
-only by purchasing bananas (`x2`) or cookbooks (`x3`) individually, but also
-significantly by the interaction of purchasing both bananas and cookbooks
-(`x2x3`).
-
-### Preparing the dataset
-
-Let's create synthetic data based on the above equation, and form the train-test
-splits.
-
-
-```python
-
-def get_mixer_data(data_size=100_000):
- country = np.random.randint(200, size=[data_size, 1]) / 200.0
- bananas = np.random.randint(24, size=[data_size, 1]) / 24.0
- cookbooks = np.random.randint(6, size=[data_size, 1]) / 6.0
-
- x = np.concatenate([country, bananas, cookbooks], axis=1)
-
- # Create 1st-order terms.
- y = 0.1 * country + 0.4 * bananas + 0.7 * cookbooks
-
- # Create 2nd-order cross terms.
- y += (
- 0.1 * country * bananas
- + 3.1 * bananas * cookbooks
- + (0.1 * cookbooks * cookbooks)
- )
-
- return x, y
-
-
-x, y = get_mixer_data(data_size=100_000)
-num_train = 90_000
-train_x = x[:num_train]
-train_y = y[:num_train]
-test_x = x[num_train:]
-test_y = y[num_train:]
-```
-
-### Building the model
-
-To demonstrate the advantages of a cross network in recommender systems, we'll
-compare its performance with a deep network. Since our example data only
-contains second-order feature interactions, a single-layered cross network will
-suffice. For datasets with higher-order interactions, multiple cross layers can
-be stacked to form a multi-layered cross network. We will build two models:
-
-1. A cross network with a single cross layer.
-2. A deep network with wider and deeper feedforward layers.
-
-
-```python
-cross_network = keras.Sequential(
- [
- keras_rs.layers.FeatureCross(),
- keras.layers.Dense(1),
- ]
-)
-
-deep_network = keras.Sequential(
- [
- keras.layers.Dense(512, activation="relu"),
- keras.layers.Dense(256, activation="relu"),
- keras.layers.Dense(128, activation="relu"),
- ]
-)
-```
-
-### Model training
-
-Before we train the model, we need to batch our datasets.
-
-
-```python
-train_ds = tf.data.Dataset.from_tensor_slices((train_x, train_y)).batch(
- TOY_CONFIG["batch_size"]
-)
-test_ds = tf.data.Dataset.from_tensor_slices((test_x, test_y)).batch(
- TOY_CONFIG["batch_size"]
-)
-```
-
-Let's train both models. Remember we have set `verbose=0` for brevity's
-sake, so do not be alarmed if you do not see any output for a while.
-
-After training, we evaluate the models on the unseen dataset. We will report
-the Root Mean Squared Error (RMSE) here.
-
-We observe that the cross network achieved significantly lower RMSE compared to
-a ReLU-based DNN, while also using fewer parameters. This points to the
-efficiency of the cross network in learning feature interactions.
-
-
-```python
-cross_network_rmse, cross_network_num_params = train_and_evaluate(
- learning_rate=TOY_CONFIG["learning_rate"],
- epochs=TOY_CONFIG["num_epochs"],
- train_data=train_ds,
- test_data=test_ds,
- model=cross_network,
-)
-print_stats(
- rmse_list=[cross_network_rmse],
- num_params=cross_network_num_params,
- model_name="Cross Network",
-)
-
-deep_network_rmse, deep_network_num_params = train_and_evaluate(
- learning_rate=TOY_CONFIG["learning_rate"],
- epochs=TOY_CONFIG["num_epochs"],
- train_data=train_ds,
- test_data=test_ds,
- model=deep_network,
-)
-print_stats(
- rmse_list=[deep_network_rmse],
- num_params=deep_network_num_params,
- model_name="Deep Network",
-)
-```
-
-
-### Visualizing feature interactions
-
-Since we already know which feature crosses are important in our data, it would
-be interesting to verify whether our model has indeed learned these key feature
-interactions. This can be done by visualizing the learned weight matrix in the
-cross network, where the weight `Wij` represents the learned importance of
-the interaction between features `xi` and `xj`.
-
-
-```python
-visualize_layer(
- matrix=cross_network.weights[0].numpy(),
- features=["country", "purchased_bananas", "purchased_cookbooks"],
-)
-```
-
-
-```
-:11: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
- ax.set_xticklabels([""] + features, rotation=45, fontsize=10)
-:12: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
- ax.set_yticklabels([""] + features, fontsize=10)
-
-
-
-```
-
-
-
-
-
-
----
-## Real-world example
-
-Let's use the MovieLens 100K dataset. This dataset is used to train models to
-predict users' movie ratings, based on user-related features and movie-related
-features.
-
-### Preparing the dataset
-
-The dataset processing steps here are similar to what's given in the
-[basic ranking](/keras_rs/examples/basic_ranking/)
-tutorial. Let's load the dataset, and keep only the useful columns.
-
-
-```python
-ratings_ds = tfds.load("movielens/100k-ratings", split="train")
-ratings_ds = ratings_ds.map(
- lambda x: (
- {
- "movie_id": int(x["movie_id"]),
- "user_id": int(x["user_id"]),
- "user_gender": int(x["user_gender"]),
- "user_zip_code": x["user_zip_code"],
- "user_occupation_text": x["user_occupation_text"],
- "bucketized_user_age": int(x["bucketized_user_age"]),
- },
- x["user_rating"], # label
- )
-)
-```
-
-
-DCN outperforms a similarly sized DNN with ReLU layers, demonstrating
-superior performance. Furthermore, the low-rank DCN effectively reduces the
-number of parameters without compromising accuracy.
-
-### Visualizing feature interactions
-
-Like we did for the toy example, we will plot the weight matrix of the cross
-layer to see which feature crosses are important. In the previous example,
-the importance of interactions between the `i`-th and `j-th` features is
-captured by the `(i, j)`-{th} element of the weight matrix.
-
-In this case, the feature embeddings are of size 32 rather than 1. Therefore,
-the importance of feature interactions is represented by the `(i, j)`-th
-block of the weight matrix, which has dimensions `32 x 32`. To quantify the
-significance of these interactions, we use the Frobenius norm of each block. A
-larger value implies higher importance.
-
-
-```python
-features = list(vocabularies.keys())
-mat = cross_network.weights[len(features)].numpy()
-embedding_dim = MOVIELENS_CONFIG["embedding_dim"]
-
-block_norm = np.zeros([len(features), len(features)])
-
-# Compute the norms of the blocks.
-for i in range(len(features)):
- for j in range(len(features)):
- block = mat[
- i * embedding_dim : (i + 1) * embedding_dim,
- j * embedding_dim : (j + 1) * embedding_dim,
- ]
- block_norm[i, j] = np.linalg.norm(block, ord="fro")
-
-visualize_layer(
- matrix=block_norm,
- features=features,
-)
-```
-
-
-```
-:11: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
- ax.set_xticklabels([""] + features, rotation=45, fontsize=10)
-:12: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
- ax.set_yticklabels([""] + features, fontsize=10)
-
-
-
-```
-
-
-
-
-
-
-And we are all done!
-
diff --git a/templates/examples/keras_rs/deep_recommender.md b/templates/examples/keras_rs/deep_recommender.md
deleted file mode 100644
index b643e8d25b..0000000000
--- a/templates/examples/keras_rs/deep_recommender.md
+++ /dev/null
@@ -1,5441 +0,0 @@
-# Deep Recommenders
-
-**Author:** [Fabien Hertschuh](https://github.com/hertschuh/), [Abheesht Sharma](https://github.com/abheesht17/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Building a deep retrieval model with multiple stacked layers.
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/deep_recommender.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/deep_recommender.py)
-
-
-
----
-## Introduction
-
-One of the great advantages of using Keras to build recommender models is the
-freedom to build rich, flexible feature representations.
-
-The first step in doing so is preparing the features, as raw features will
-usually not be immediately usable in a model.
-
-For example:
-- User and item IDs may be strings (titles, usernames) or large, non-contiguous
- integers (database IDs).
-- Item descriptions could be raw text.
-- Interaction timestamps could be raw Unix timestamps.
-
-These need to be appropriately transformed in order to be useful in building
-models:
-- User and item IDs have to be translated into embedding vectors,
- high-dimensional numerical representations that are adjusted during training
- to help the model predict its objective better.
-- Raw text needs to be tokenized (split into smaller parts such as individual
- words) and translated into embeddings.
-- Numerical features need to be normalized so that their values lie in a small
- interval around 0.
-
-Fortunately, the Keras
-[`FeatureSpace`](/api/utils/feature_space/) utility makes this
-preprocessing easy.
-
-In this tutorial, we are going to incorporate multiple features in our models.
-These features will come from preprocessing the MovieLens dataset.
-
-In the
-[basic retrieval](/keras_rs/examples/basic_retrieval/)
-tutorial, the models consist of only an embedding layer. In this tutorial, we
-add more dense layers to our models to increase their expressive power.
-
-In general, deeper models are capable of learning more complex patterns than
-shallower models. For example, our user model incorporates user IDs and user
-features such as age, gender and occupation. A shallow model (say, a single
-embedding layer) may only be able to learn the simplest relationships between
-those features and movies: a given user generally prefers horror movies to
-comedies. To capture more complex relationships, such as user preferences
-evolving with their age, we may need a deeper model with multiple stacked dense
-layers.
-
-Of course, complex models also have their disadvantages. The first is
-computational cost, as larger models require both more memory and more
-computation to train and serve. The second is the requirement for more data. In
-general, more training data is needed to take advantage of deeper models. With
-more parameters, deep models might overfit or even simply memorize the training
-examples instead of learning a function that can generalize. Finally, training
-deeper models may be harder, and more care needs to be taken in choosing
-settings like regularization and learning rate.
-
-Finding a good architecture for a real-world recommender system is a complex
-art, requiring good intuition and careful hyperparameter tuning. For example,
-factors such as the depth and width of the model, activation function, learning
-rate, and optimizer can radically change the performance of the model. Modelling
-choices are further complicated by the fact that good offline evaluation metrics
-may not correspond to good online performance, and that the choice of what to
-optimize for is often more critical than the choice of model itself.
-
-Nevertheless, effort put into building and fine-tuning larger models often pays
-off. In this tutorial, we will illustrate how to build a deep retrieval model.
-We'll do this by building progressively more complex models to see how this
-affects model performance.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import keras
-import matplotlib.pyplot as plt
-import tensorflow as tf # Needed for the dataset
-import tensorflow_datasets as tfds
-
-import keras_rs
-```
-
----
-## The MovieLens dataset
-
-Let's first have a look at what features we can use from the MovieLens dataset.
-
-
-```python
-# Ratings data with user and movie data.
-ratings = tfds.load("movielens/100k-ratings", split="train")
-# Features of all the available movies.
-movies = tfds.load("movielens/100k-movies", split="train")
-```
-
-The ratings dataset returns a dictionary of movie id, user id, the assigned
-rating, timestamp, movie information, and user information:
-
-
-```python
-for data in ratings.take(1).as_numpy_iterator():
- print(str(data).replace(", '", ",\n '"))
-```
-
-
-In the Movielens dataset, user IDs are integers (represented as strings)
-starting at 1 and with no gap. Normally, you would need to create a lookup table
-to map user IDs to integers from 0 to N-1. But as a simplication, we'll use the
-user id directly as an index in our model, in particular to lookup the user
-embedding from the user embedding table. So we need do know the number of users.
-
-
-```python
-USERS_COUNT = (
- ratings.map(lambda x: tf.strings.to_number(x["user_id"], out_type=tf.int32))
- .reduce(tf.constant(0, tf.int32), tf.maximum)
- .numpy()
-)
-```
-
-The movies dataset contains the movie id, movie title, and the genres it belongs
-to. Note that the genres are encoded with integer labels.
-
-
-```python
-for data in movies.take(1).as_numpy_iterator():
- print(str(data).replace(", '", ",\n '"))
-```
-
-
-In the Movielens dataset, movie IDs are integers (represented as strings)
-starting at 1 and with no gap. Normally, you would need to create a lookup table
-to map movie IDs to integers from 0 to N-1. But as a simplication, we'll use the
-movie id directly as an index in our model, in particular to lookup the movie
-embedding from the movie embedding table. So we need do know the number of
-movies.
-
-
-```python
-MOVIES_COUNT = movies.cardinality().numpy()
-```
-
----
-## Preprocessing the dataset
-
-### Normalizing continuous features
-
-Continuous features may need normalization so that they fall within an
-acceptable range for the model. We will give two examples of such normalization.
-
-#### Discretization
-
-A common transformation is to turn a continuous feature into a number of
-categorical features. This makes good sense if we have reasons to suspect that a
-feature's effect is non-continuous.
-
-We need to decide on a number the buckets we will use for discretization. Then,
-we will use the Keras `FeatureSpace` utility to automatically find the minimum
-and maximum value, and divide that range by the number of buckets to perform the
-discretization.
-
-In this example, we will discretize the user age.
-
-
-```python
-AGE_BINS_COUNT = 10
-user_age_feature = keras.utils.FeatureSpace.float_discretized(
- num_bins=AGE_BINS_COUNT, output_mode="int"
-)
-```
-
-#### Rescaling
-
-Often, we want continous features to be between 0 and 1, or between -1 and 1.
-To achieve this, we can rescale features that have a different range.
-
-In this example, we will standardize the rating, which is a integer between 1
-and 5, to be a float between 0 and 1. We need to rescale it and offset it.
-
-
-```python
-user_rating_feature = keras.utils.FeatureSpace.float_rescaled(
- scale=1.0 / 4.0, offset=-1.0 / 4.0
-)
-```
-
-### Turning categorical features into embeddings
-
-A categorical feature is a feature that does not express a continuous quantity,
-but rather takes on one of a set of fixed values.
-
-Most deep learning models express these feature by turning them into
-high-dimensional vectors. During model training, the value of that vector is
-adjusted to help the model predict its objective better.
-
-For example, suppose that our goal is to predict which user is going to watch
-which movie. To do that, we represent each user and each movie by an embedding
-vector. Initially, these embeddings will take on random values. During training,
-we adjust them so that embeddings of users and the movies they watch end up
-closer together.
-
-Taking raw categorical features and turning them into embeddings is normally a
-two-step process:
-1. First, we need to translate the raw values into a range of contiguous
- integers, normally by building a mapping (called a "vocabulary") that maps
- raw values to integers.
-2. Second, we need to take these integers and turn them into embeddings.
-
-#### Defining categorical features
-
-We will use the Keras `FeatureSpace` utility for the first step. Its `adapt`
-method automatically discovers the vocabulary for categorical features.
-
-
-```python
-user_gender_feature = keras.utils.FeatureSpace.integer_categorical(
- num_oov_indices=0, output_mode="int"
-)
-user_occupation_feature = keras.utils.FeatureSpace.integer_categorical(
- num_oov_indices=0, output_mode="int"
-)
-```
-
-#### Using feature crosses
-
-With crosses we can do feature interactions between multiple categorical
-features. This can be powerful to express that the combination of features
-represents a specific taste for movies.
-
-Note that the combination of multiple features can result into on a super large
-feature space, that is why the crossing_dim parameter is important to limit the
-output dimension of the cross feature.
-
-In this example, we will cross age and gender with the Keras `FeatureSpace`
-utility.
-
-
-```python
-USER_GENDER_CROSS_COUNT = 20
-user_gender_age_cross = keras.utils.FeatureSpace.cross(
- feature_names=("user_gender", "raw_user_age"),
- crossing_dim=USER_GENDER_CROSS_COUNT,
- output_mode="int",
-)
-```
-
-### Processing text features
-
-We may also want to add text features to our model. Usually, things like product
-descriptions are free form text, and we can hope that our model can learn to use
-the information they contain to make better recommendations, especially in a
-cold-start or long tail scenario.
-
-While the MovieLens dataset does not give us rich textual features, we can still
-use movie titles. This may help us capture the fact that movies with very
-similar titles are likely to belong to the same series.
-
-The first transformation we need to apply to text is tokenization (splitting
-into constituent words or word-pieces), followed by vocabulary learning,
-followed by an embedding.
-
-
-The
-[`keras.layers.TextVectorization`](/api/layers/preprocessing_layers/text/text_vectorization/)
-layer can do the first two steps for us.
-
-
-```python
-title_vectorizer = keras.layers.TextVectorization(
- max_tokens=10_000, output_sequence_length=16, dtype="int32"
-)
-title_vectorizer.adapt(movies.map(lambda x: x["movie_title"]))
-```
-
-Let's try it out:
-
-
-```python
-for data in movies.take(1).as_numpy_iterator():
- print(title_vectorizer(data["movie_title"]))
-```
-
-
-Each title is translated into a sequence of tokens, one for each piece we've
-tokenized.
-
-We can check the learned vocabulary to verify that the layer is using the
-correct tokenization:
-
-
-```python
-print(title_vectorizer.get_vocabulary()[40:50])
-```
-
-
-This looks correct, the layer is tokenizing titles into individual words. Later,
-we will see how to embed this tokenized text. For now, we turn this vectorizer
-into a Keras `FeatureSpace` feature.
-
-
-```python
-title_feature = keras.utils.FeatureSpace.feature(
- preprocessor=title_vectorizer, dtype="string", output_mode="float"
-)
-TITLE_TOKEN_COUNT = title_vectorizer.vocabulary_size()
-```
-
-### Putting the FeatureSpace features together
-
-We're now ready to assemble the features with preprocessors in a `FeatureSpace`
-object. We're then using `adapt` to go through the dataset and learn what needs
-to be learned, such as the vocabulary size for categorical features or the
-minimum and maximum values for bucketized features.
-
-
-```python
-feature_space = keras.utils.FeatureSpace(
- features={
- # Numerical features to discretize.
- "raw_user_age": user_age_feature,
- # Categorical features encoded as integers.
- "user_gender": user_gender_feature,
- "user_occupation_label": user_occupation_feature,
- # Labels are ratings between 0 and 1.
- "user_rating": user_rating_feature,
- "movie_title": title_feature,
- },
- crosses=[user_gender_age_cross],
- output_mode="dict",
-)
-
-feature_space.adapt(ratings)
-GENDERS_COUNT = feature_space.preprocessors["user_gender"].vocabulary_size()
-OCCUPATIONS_COUNT = feature_space.preprocessors[
- "user_occupation_label"
-].vocabulary_size()
-```
-
----
-## Pre-building the candidate set
-
-Our model is going to based on a `Retrieval` layer, which can provides a set of
-best candidates among to full set of candidates. To do this, the retrieval layer
-needs to know all the candidates and their features. In this section, we
-assemble the full set of movies with the associated features.
-
-### Extract raw candidate features
-
-First, we gather all the raw features from the dataset in lists. That is the
-titles of the movies and the genres. Note that one or more genres are
-associated with each movie, and the number of genres varies per movie.
-
-
-```python
-movie_titles = [""] * (MOVIES_COUNT + 1)
-movie_genres = [[]] * (MOVIES_COUNT + 1)
-for x in movies.as_numpy_iterator():
- movie_id = int(x["movie_id"])
- movie_titles[movie_id] = x["movie_title"]
- movie_genres[movie_id] = x["movie_genres"].tolist()
-```
-
-### Preprocess candidate features
-
-Genres are already in the form of category numbers starting at zero. However, we
-do need to figure out two things:
-- The maximum number of genres a single movie can have; this will determine the
- dimension for this feature.
-- The maximum value for the genre, which will give us the total number of genres
- and determine the size of our embedding table for genres.
-
-
-```python
-MAX_GENRES_PER_MOVIE = 0
-max_genre_id = 0
-for one_movie_genres in movie_genres:
- MAX_GENRES_PER_MOVIE = max(MAX_GENRES_PER_MOVIE, len(one_movie_genres))
- if one_movie_genres:
- max_genre_id = max(max_genre_id, max(one_movie_genres))
-
-GENRES_COUNT = max_genre_id + 1
-```
-
-Now we need to pad genres with an Out Of Vocabulary value to be able to
-represent genres as a fixed size vector. We'll pad with zeros for simplicity, so
-we're adding one to the genres to not conflict with genre zero, which is a valid
-genre.
-
-
-```python
-movie_genres = [
- [g + 1 for g in genres] + [0] * (MAX_GENRES_PER_MOVIE - len(genres))
- for genres in movie_genres
-]
-```
-
-Then, we vectorize all the movie titles.
-
-
-```python
-movie_titles_vectors = title_vectorizer(movie_titles)
-```
-
-### Convert candidate set to native tensors
-
-We're now ready to combine these in a dataset. The last step is to make sure
-everything is a native tensor that can be consumed by the retrieval layer.
-As a remminder, movie id zero does not exist.
-
-
-```python
-MOVIES_DATASET = {
- "movie_id": keras.ops.arange(0, MOVIES_COUNT + 1, dtype="int32"),
- "movie_title_vector": movie_titles_vectors,
- "movie_genres": keras.ops.convert_to_tensor(movie_genres, dtype="int32"),
-}
-```
-
----
-## Preparing the data
-
-We can now define our preprocessing function. Most features will be handled
-by the `FeatureSpace`. User IDs and Movie IDs need to be extracted. Movie genres
-need to be padded. Then everything is packaged as a tuple with a dict of input
-features and a float for the rating, which is used as a label.
-
-
-```python
-
-def preprocess_rating(x):
- features = feature_space(
- {
- "raw_user_age": x["raw_user_age"],
- "user_gender": x["user_gender"],
- "user_occupation_label": x["user_occupation_label"],
- "user_rating": x["user_rating"],
- "movie_title": x["movie_title"],
- }
- )
- features = {k: tf.squeeze(v, axis=0) for k, v in features.items()}
- movie_genres = x["movie_genres"]
-
- return (
- {
- # User inputs are user ID and user features
- "user_id": int(x["user_id"]),
- "raw_user_age": features["raw_user_age"],
- "user_gender": features["user_gender"],
- "user_occupation_label": features["user_occupation_label"],
- "user_gender_X_raw_user_age": tf.squeeze(
- features["user_gender_X_raw_user_age"], axis=-1
- ),
- # Movie inputs are movie ID, vectorized title and genres
- "movie_id": int(x["movie_id"]),
- "movie_title_vector": features["movie_title"],
- "movie_genres": tf.pad(
- movie_genres + 1,
- [[0, MAX_GENRES_PER_MOVIE - tf.shape(movie_genres)[0]]],
- ),
- },
- # Label is user rating between 0 and 1
- features["user_rating"],
- )
-
-```
-
-We shuffle and then split the data into a training set and a testing set.
-
-
-```python
-shuffled_ratings = ratings.map(preprocess_rating).shuffle(
- 100_000, seed=42, reshuffle_each_iteration=False
-)
-
-train_ratings = shuffled_ratings.take(80_000).batch(1000).cache()
-test_ratings = shuffled_ratings.skip(80_000).take(20_000).batch(1000).cache()
-```
-
----
-## Model definition
-
-### Query model
-
-The query model is first tasked with converting user features to embeddings. The
-embeddings are then concatenated into a single vector.
-
-Defining deeper models will require us to stack more layers on top of this first
-set of embeddings. A progressively narrower stack of layers, separated by an
-activation function, is a common pattern:
-
-```
- +----------------------+
- | 64 x 32 |
- +----------------------+
- | relu
- +--------------------------+
- | 128 x 64 |
- +--------------------------+
- | relu
- +------------------------------+
- | ... x 128 |
- +------------------------------+
-```
-
-Since the expressive power of deep linear models is no greater than that of
-shallow linear models, we use ReLU activations for all but the last hidden
-layer. The final hidden layer does not use any activation function: using an
-activation function would limit the output space of the final embeddings and
-might negatively impact the performance of the model. For instance, if ReLUs are
-used in the projection layer, all components in the output embedding would be
-non-negative.
-
-We're going to try this here. To make experimentation with different depths
-easy, let's define a model whose depth (and width) is defined by a constructor
-parameters. The `layer_sizes` parameter gives us the depth and width of the
-model. We can vary it to experiment with shallower or deeper models.
-
-
-```python
-
-class QueryModel(keras.Model):
- """Model for encoding user queries."""
-
- def __init__(self, layer_sizes, embedding_dimension=32):
- """Construct a model for encoding user queries.
-
- Args:
- layer_sizes: A list of integers where the i-th entry represents the
- number of units the i-th layer contains.
- embedding_dimension: Output dimension for all embedding tables.
- """
- super().__init__()
-
- # We first generate embeddings.
- self.user_embedding = keras.layers.Embedding(
- # +1 for user ID zero, which does not exist
- USERS_COUNT + 1,
- embedding_dimension,
- )
- self.gender_embedding = keras.layers.Embedding(
- GENDERS_COUNT, embedding_dimension
- )
- self.age_embedding = keras.layers.Embedding(AGE_BINS_COUNT, embedding_dimension)
- self.gender_x_age_embedding = keras.layers.Embedding(
- USER_GENDER_CROSS_COUNT, embedding_dimension
- )
- self.occupation_embedding = keras.layers.Embedding(
- OCCUPATIONS_COUNT, embedding_dimension
- )
-
- # Then construct the layers.
- self.dense_layers = keras.Sequential()
-
- # Use the ReLU activation for all but the last layer.
- for layer_size in layer_sizes[:-1]:
- self.dense_layers.add(keras.layers.Dense(layer_size, activation="relu"))
-
- # No activation for the last layer.
- self.dense_layers.add(keras.layers.Dense(layer_sizes[-1]))
-
- def call(self, inputs):
- # Take the inputs, pass each through its embedding layer, concatenate.
- feature_embedding = keras.ops.concatenate(
- [
- self.user_embedding(inputs["user_id"]),
- self.gender_embedding(inputs["user_gender"]),
- self.age_embedding(inputs["raw_user_age"]),
- self.gender_x_age_embedding(inputs["user_gender_X_raw_user_age"]),
- self.occupation_embedding(inputs["user_occupation_label"]),
- ],
- axis=1,
- )
- return self.dense_layers(feature_embedding)
-
-```
-
----
-## Candidate model
-
-We can adopt the same approach for the candidate model. Again, we start with
-converting movie features to embeddings, concatenate them and then expand it
-with hidden layers:
-
-
-```python
-
-class CandidateModel(keras.Model):
- """Model for encoding candidates (movies)."""
-
- def __init__(self, layer_sizes, embedding_dimension=32):
- """Construct a model for encoding candidates (movies).
-
- Args:
- layer_sizes: A list of integers where the i-th entry represents the
- number of units the i-th layer contains.
- embedding_dimension: Output dimension for all embedding tables.
- """
- super().__init__()
-
- # We first generate embeddings.
- self.movie_embedding = keras.layers.Embedding(
- # +1 for movie ID zero, which does not exist
- MOVIES_COUNT + 1,
- embedding_dimension,
- )
- # Take all the title tokens for the title of the movie, embed each
- # token, and then take the mean of all token embeddings.
- self.movie_title_embedding = keras.Sequential(
- [
- keras.layers.Embedding(
- # +1 for OOV token, which is used for padding
- TITLE_TOKEN_COUNT + 1,
- embedding_dimension,
- mask_zero=True,
- ),
- keras.layers.GlobalAveragePooling1D(),
- ]
- )
- # Take all the genres for the movie, embed each genre, and then take the
- # mean of all genre embeddings.
- self.movie_genres_embedding = keras.Sequential(
- [
- keras.layers.Embedding(
- # +1 for OOV genre, which is used for padding
- GENRES_COUNT + 1,
- embedding_dimension,
- mask_zero=True,
- ),
- keras.layers.GlobalAveragePooling1D(),
- ]
- )
-
- # Then construct the layers.
- self.dense_layers = keras.Sequential()
-
- # Use the ReLU activation for all but the last layer.
- for layer_size in layer_sizes[:-1]:
- self.dense_layers.add(keras.layers.Dense(layer_size, activation="relu"))
-
- # No activation for the last layer.
- self.dense_layers.add(keras.layers.Dense(layer_sizes[-1]))
-
- def call(self, inputs):
- movie_id = inputs["movie_id"]
- movie_title_vector = inputs["movie_title_vector"]
- movie_genres = inputs["movie_genres"]
- feature_embedding = keras.ops.concatenate(
- [
- self.movie_embedding(movie_id),
- self.movie_title_embedding(movie_title_vector),
- self.movie_genres_embedding(movie_genres),
- ],
- axis=1,
- )
- return self.dense_layers(feature_embedding)
-
-```
-
----
-## Combined model
-
-With both QueryModel and CandidateModel defined, we can put together a combined
-model and implement our loss and metrics logic. To make things simple, we'll
-enforce that the model structure is the same across the query and candidate
-models.
-
-
-```python
-
-class RetrievalModel(keras.Model):
- """Combined model."""
-
- def __init__(
- self,
- layer_sizes=(32,),
- embedding_dimension=32,
- retrieval_k=100,
- ):
- """Construct a combined model.
-
- Args:
- layer_sizes: A list of integers where the i-th entry represents the
- number of units the i-th layer contains.
- embedding_dimension: Output dimension for all embedding tables.
- retrieval_k: How many candidate movies to retrieve.
- """
- super().__init__()
- self.query_model = QueryModel(layer_sizes, embedding_dimension)
- self.candidate_model = CandidateModel(layer_sizes, embedding_dimension)
- self.retrieval = keras_rs.layers.BruteForceRetrieval(
- k=retrieval_k, return_scores=False
- )
- self.update_candidates() # Provide an initial set of candidates
- self.loss_fn = keras.losses.MeanSquaredError()
- self.top_k_metric = keras.metrics.SparseTopKCategoricalAccuracy(
- k=100, from_sorted_ids=True
- )
-
- def update_candidates(self):
- self.retrieval.update_candidates(
- self.candidate_model.predict(MOVIES_DATASET, verbose=0)
- )
-
- def call(self, inputs, training=False):
- query_embeddings = self.query_model(
- {
- "user_id": inputs["user_id"],
- "raw_user_age": inputs["raw_user_age"],
- "user_gender": inputs["user_gender"],
- "user_occupation_label": inputs["user_occupation_label"],
- "user_gender_X_raw_user_age": inputs["user_gender_X_raw_user_age"],
- }
- )
- candidate_embeddings = self.candidate_model(
- {
- "movie_id": inputs["movie_id"],
- "movie_title_vector": inputs["movie_title_vector"],
- "movie_genres": inputs["movie_genres"],
- }
- )
-
- result = {
- "query_embeddings": query_embeddings,
- "candidate_embeddings": candidate_embeddings,
- }
- if not training:
- # No need to spend time extracting top predicted movies during
- # training, they are not used.
- result["predictions"] = self.retrieval(query_embeddings)
- return result
-
- def evaluate(
- self,
- x=None,
- y=None,
- batch_size=None,
- verbose="auto",
- sample_weight=None,
- steps=None,
- callbacks=None,
- return_dict=False,
- **kwargs,
- ):
- """Overridden to update the candidate set.
-
- Before evaluating the model, we need to update our retrieval layer by
- re-computing the values predicted by the candidate model for all the
- candidates.
- """
- self.update_candidates()
- return super().evaluate(
- x,
- y,
- batch_size=batch_size,
- verbose=verbose,
- sample_weight=sample_weight,
- steps=steps,
- callbacks=callbacks,
- return_dict=return_dict,
- **kwargs,
- )
-
- def compute_loss(self, x, y, y_pred, sample_weight, training=True):
- query_embeddings = y_pred["query_embeddings"]
- candidate_embeddings = y_pred["candidate_embeddings"]
-
- labels = keras.ops.expand_dims(y, -1)
- # Compute the affinity score by multiplying the two embeddings.
- scores = keras.ops.sum(
- keras.ops.multiply(query_embeddings, candidate_embeddings),
- axis=1,
- keepdims=True,
- )
- return self.loss_fn(labels, scores, sample_weight)
-
- def compute_metrics(self, x, y, y_pred, sample_weight=None):
- if "predictions" in y_pred:
- # We are evaluating or predicting. Update `top_k_metric`.
- movie_ids = x["movie_id"]
- predictions = y_pred["predictions"]
- # For `top_k_metric`, which is a `SparseTopKCategoricalAccuracy`, we
- # only take top rated movies, and we put a weight of 0 for the rest.
- rating_weight = keras.ops.cast(keras.ops.greater(y, 0.9), "float32")
- sample_weight = (
- rating_weight
- if sample_weight is None
- else keras.ops.multiply(rating_weight, sample_weight)
- )
- self.top_k_metric.update_state(
- movie_ids, predictions, sample_weight=sample_weight
- )
- return self.get_metrics_result()
- else:
- # We are training. `top_k_metric` is not updated and is zero, so
- # don't report it.
- result = self.get_metrics_result()
- result.pop(self.top_k_metric.name)
- return result
-
-```
-
----
-## Training the model
-
-### Shallow model
-
-We're ready to try out our first, shallow, model!
-
-
-```python
-NUM_EPOCHS = 30
-
-one_layer_model = RetrievalModel((32,))
-one_layer_model.compile(optimizer=keras.optimizers.Adagrad(0.05))
-
-one_layer_history = one_layer_model.fit(
- train_ratings,
- validation_data=test_ratings,
- validation_freq=5,
- epochs=NUM_EPOCHS,
-)
-```
-
-
- 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.0554 - val_loss: 0.0574 - val_sparse_top_k_categorical_accuracy: 0.3216
-
-
-This gives us a top-100 accuracy of around 0.30. We can use this as a reference
-point for evaluating deeper models.
-
-### Deeper model
-
-What about a deeper model with two layers?
-
-
-```python
-two_layer_model = RetrievalModel((64, 32))
-two_layer_model.compile(optimizer=keras.optimizers.Adagrad(0.05))
-two_layer_history = two_layer_model.fit(
- train_ratings,
- validation_data=test_ratings,
- validation_freq=5,
- epochs=NUM_EPOCHS,
-)
-```
-
-
- 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.0548 - val_loss: 0.0570 - val_sparse_top_k_categorical_accuracy: 0.2964
-
-
-While the deeper model seems to learn a bit better than the shallow model at
-first, the difference becomes minimal towards the end of the trainign. We can
-plot the validation accuracy curves to illustrate this:
-
-
-```python
-METRIC = "val_sparse_top_k_categorical_accuracy"
-num_validation_runs = len(one_layer_history.history[METRIC])
-epochs = [(x + 1) * 5 for x in range(num_validation_runs)]
-
-plt.plot(epochs, one_layer_history.history[METRIC], label="1 layer")
-plt.plot(epochs, two_layer_history.history[METRIC], label="2 layers")
-plt.title("Accuracy vs epoch")
-plt.xlabel("epoch")
-plt.ylabel("Top-100 accuracy")
-plt.legend()
-plt.show()
-```
-
-
-
-
-
-
-
-Deeper models are not necessarily better. The following model extends the depth
-to three layers:
-
-
-```python
-three_layer_model = RetrievalModel((128, 64, 32))
-three_layer_model.compile(optimizer=keras.optimizers.Adagrad(0.05))
-three_layer_history = three_layer_model.fit(
- train_ratings,
- validation_data=test_ratings,
- validation_freq=5,
- epochs=NUM_EPOCHS,
-)
-```
-
-
- 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.0550 - val_loss: 0.0569 - val_sparse_top_k_categorical_accuracy: 0.3072
-
-
-We don't really see an improvement over the shallow model:
-
-
-```python
-plt.plot(epochs, one_layer_history.history[METRIC], label="1 layer")
-plt.plot(epochs, two_layer_history.history[METRIC], label="2 layers")
-plt.plot(epochs, three_layer_history.history[METRIC], label="3 layers")
-plt.title("Accuracy vs epoch")
-plt.xlabel("epoch")
-plt.ylabel("Top-100 accuracy")
-plt.legend()
-plt.show()
-```
-
-
-
-
-
-
-
-This is a good illustration of the fact that deeper and larger models, while
-capable of superior performance, often require very careful tuning. For example,
-throughout this tutorial we used a single, fixed learning rate. Alternative
-choices may give very different results and are worth exploring.
-
-With appropriate tuning and sufficient data, the effort put into building larger
-and deeper models is in many cases well worth it: larger models can lead to
-substantial improvements in prediction accuracy.
-
----
-## Next Steps
-
-In this tutorial we expanded our retrieval model with dense layers and
-activation functions. To see how to create a model that can perform not only
-retrieval tasks but also rating tasks, take a look at the multitask tutorial.
-
diff --git a/templates/examples/keras_rs/dlrm.md b/templates/examples/keras_rs/dlrm.md
deleted file mode 100644
index 50ad0f7ef0..0000000000
--- a/templates/examples/keras_rs/dlrm.md
+++ /dev/null
@@ -1,522 +0,0 @@
-# Ranking with Deep Learning Recommendation Model
-
-**Author:** [Harshith Kulkarni](https://github.com/kharshith-k)
-**Date created:** 2025/06/02
-**Last modified:** 2025/09/04
-**Description:** Rank movies with DLRM using KerasRS.
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/dlrm.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/dlrm.py)
-
-
-
----
-## Introduction
-
-This tutorial demonstrates how to use the Deep Learning Recommendation Model (DLRM) to
-effectively learn the relationships between items and user preferences using a
-dot-product interaction mechanism. For more details, please refer to the
-[DLRM](https://arxiv.org/abs/1906.00091) paper.
-
-DLRM is designed to excel at capturing explicit, bounded-degree feature interactions and
-is particularly effective at processing both categorical and continuous (sparse/dense)
-input features. The architecture consists of three main components: dedicated input
-layers to handle diverse features (typically embedding layers for categorical features),
-a dot-product interaction layer to explicitly model feature interactions, and a
-Multi-Layer Perceptron (MLP) to capture implicit feature relationships.
-
-The dot-product interaction layer lies at the heart of DLRM, efficiently computing
-pairwise interactions between different feature embeddings. This contrasts with models
-like Deep & Cross Network (DCN), which can treat elements within a feature vector as
-independent units, potentially leading to a higher-dimensional space and increased
-computational cost. The MLP is a standard feedforward network. The DLRM is formed by
-combining the interaction layer and MLP.
-
-The following image illustrates the DLRM architecture:
-
-
-
-
-Now that we have a foundational understanding of DLRM's architecture and key
-characteristics, let's dive into the code. We will train a DLRM on a real-world dataset
-to demonstrate its capability to learn meaningful feature interactions. Let's begin by
-setting the backend to JAX and organizing our imports.
-
-
-```python
-!pip install -q keras-rs
-```
-
-
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "tensorflow" # `"tensorflow"`/`"torch"`
-
-import keras
-import matplotlib.pyplot as plt
-import numpy as np
-import tensorflow as tf
-import tensorflow_datasets as tfds
-from mpl_toolkits.axes_grid1 import make_axes_locatable
-
-import keras_rs
-```
-
-Let's also define variables which will be reused throughout the example.
-
-
-```python
-MOVIELENS_CONFIG = {
- # features
- "continuous_features": [
- "raw_user_age",
- "hour_of_day_sin",
- "hour_of_day_cos",
- "hour_of_week_sin",
- "hour_of_week_cos",
- ],
- "categorical_int_features": [
- "user_gender",
- ],
- "categorical_str_features": [
- "user_zip_code",
- "user_occupation_text",
- "movie_id",
- "user_id",
- ],
- # model
- "embedding_dim": 8,
- "mlp_dim": 8,
- "deep_net_num_units": [192, 192, 192],
- # training
- "learning_rate": 1e-4,
- "num_epochs": 30,
- "batch_size": 8192,
-}
-```
-
-Here, we define a helper function for visualising weights of the cross layer in
-order to better understand its functioning. Also, we define a function for
-compiling, training and evaluating a given model.
-
-
-```python
-
-def plot_training_metrics(history):
- """Graphs all metrics tracked in the history object."""
- plt.figure(figsize=(12, 6))
-
- for metric_name, metric_values in history.history.items():
- plt.plot(metric_values, label=metric_name.replace("_", " ").title())
-
- plt.title("Metrics over Epochs")
- plt.xlabel("Epoch")
- plt.ylabel("Metric Value")
- plt.legend()
- plt.grid(True)
-
-
-def visualize_layer(matrix, features, cmap=plt.cm.Blues):
-
- im = plt.matshow(
- matrix, cmap=cmap, extent=[-0.5, len(features) - 0.5, len(features) - 0.5, -0.5]
- )
-
- ax = plt.gca()
- divider = make_axes_locatable(plt.gca())
- cax = divider.append_axes("right", size="5%", pad=0.05)
- plt.colorbar(im, cax=cax)
- cax.tick_params(labelsize=10)
-
- # Set tick locations explicitly before setting labels
- ax.set_xticks(np.arange(len(features)))
- ax.set_yticks(np.arange(len(features)))
-
- ax.set_xticklabels(features, rotation=45, fontsize=5)
- ax.set_yticklabels(features, fontsize=5)
-
- plt.show()
-
-
-def train_and_evaluate(
- learning_rate,
- epochs,
- train_data,
- test_data,
- model,
- plot_metrics=False,
-):
- optimizer = keras.optimizers.AdamW(learning_rate=learning_rate, clipnorm=1.0)
- loss = keras.losses.MeanSquaredError()
- rmse = keras.metrics.RootMeanSquaredError()
-
- model.compile(
- optimizer=optimizer,
- loss=loss,
- metrics=[rmse],
- )
-
- history = model.fit(
- train_data,
- epochs=epochs,
- verbose=1,
- )
- if plot_metrics:
- plot_training_metrics(history)
-
- results = model.evaluate(test_data, return_dict=True, verbose=1)
- rmse_value = results["root_mean_squared_error"]
-
- return rmse_value, model.count_params()
-
-
-def print_stats(rmse_list, num_params, model_name):
- # Report metrics.
- num_trials = len(rmse_list)
- avg_rmse = np.mean(rmse_list)
- std_rmse = np.std(rmse_list)
-
- if num_trials == 1:
- print(f"{model_name}: RMSE = {avg_rmse}; #params = {num_params}")
- else:
- print(f"{model_name}: RMSE = {avg_rmse} ± {std_rmse}; #params = {num_params}")
-
-```
-
----
-## Real-world example
-
-Let's use the MovieLens 100K dataset. This dataset is used to train models to
-predict users' movie ratings, based on user-related features and movie-related
-features.
-
-### Preparing the dataset
-
-The dataset processing steps here are similar to what's given in the
-[basic ranking](/keras_rs/examples/basic_ranking/)
-tutorial. Let's load the dataset, and keep only the useful columns.
-
-
-```python
-ratings_ds = tfds.load("movielens/100k-ratings", split="train")
-
-
-def preprocess_features(x):
- """Extracts and cyclically encodes timestamp features."""
- features = {
- "movie_id": x["movie_id"],
- "user_id": x["user_id"],
- "user_gender": tf.cast(x["user_gender"], dtype=tf.int32),
- "user_zip_code": x["user_zip_code"],
- "user_occupation_text": x["user_occupation_text"],
- "raw_user_age": tf.cast(x["raw_user_age"], dtype=tf.float32),
- }
- label = tf.cast(x["user_rating"], dtype=tf.float32)
-
- # The timestamp is in seconds since the epoch.
- timestamp = tf.cast(x["timestamp"], dtype=tf.float32)
-
- # Constants for time periods
- SECONDS_IN_HOUR = 3600.0
- HOURS_IN_DAY = 24.0
- HOURS_IN_WEEK = 168.0
-
- # Calculate hour of day and encode it
- hour_of_day = (timestamp / SECONDS_IN_HOUR) % HOURS_IN_DAY
- features["hour_of_day_sin"] = tf.sin(2 * np.pi * hour_of_day / HOURS_IN_DAY)
- features["hour_of_day_cos"] = tf.cos(2 * np.pi * hour_of_day / HOURS_IN_DAY)
-
- # Calculate hour of week and encode it
- hour_of_week = (timestamp / SECONDS_IN_HOUR) % HOURS_IN_WEEK
- features["hour_of_week_sin"] = tf.sin(2 * np.pi * hour_of_week / HOURS_IN_WEEK)
- features["hour_of_week_cos"] = tf.cos(2 * np.pi * hour_of_week / HOURS_IN_WEEK)
-
- return features, label
-
-
-# Apply the new preprocessing function
-ratings_ds = ratings_ds.map(preprocess_features)
-```
-
-For every categorical feature, let's get the list of unique values, i.e., vocabulary, so
-that we can use that for the embedding layer.
-
-
-```python
-vocabularies = {}
-for feature_name in (
- MOVIELENS_CONFIG["categorical_int_features"]
- + MOVIELENS_CONFIG["categorical_str_features"]
-):
- vocabulary = ratings_ds.batch(10_000).map(lambda x, y: x[feature_name])
- vocabularies[feature_name] = np.unique(np.concatenate(list(vocabulary)))
-```
-
-One thing we need to do is to use `keras.layers.StringLookup` and
-`keras.layers.IntegerLookup` to convert all the categorical features into indices, which
-can
-then be fed into embedding layers.
-
-
-```python
-lookup_layers = {}
-lookup_layers.update(
- {
- feature: keras.layers.IntegerLookup(vocabulary=vocabularies[feature])
- for feature in MOVIELENS_CONFIG["categorical_int_features"]
- }
-)
-lookup_layers.update(
- {
- feature: keras.layers.StringLookup(vocabulary=vocabularies[feature])
- for feature in MOVIELENS_CONFIG["categorical_str_features"]
- }
-)
-```
-
-Let's normalize all the continuous features, so that we can use that for the MLP layers.
-
-
-```python
-normalization_layers = {}
-for feature_name in MOVIELENS_CONFIG["continuous_features"]:
- normalization_layers[feature_name] = keras.layers.Normalization(axis=-1)
-
-training_data_for_adaptation = ratings_ds.take(80_000).map(lambda x, y: x)
-
-for feature_name in MOVIELENS_CONFIG["continuous_features"]:
- feature_ds = training_data_for_adaptation.map(
- lambda x: tf.expand_dims(x[feature_name], axis=-1)
- )
- normalization_layers[feature_name].adapt(feature_ds)
-
-ratings_ds = ratings_ds.map(
- lambda x, y: (
- {
- **{
- feature_name: lookup_layers[feature_name](x[feature_name])
- for feature_name in vocabularies
- },
- # Apply the adapted normalization layers to the continuous features.
- **{
- feature_name: tf.squeeze(
- normalization_layers[feature_name](
- tf.expand_dims(x[feature_name], axis=-1)
- ),
- axis=-1,
- )
- for feature_name in MOVIELENS_CONFIG["continuous_features"]
- },
- },
- y,
- )
-)
-```
-
-Let's split our data into train and test sets. We also use `cache()` and
-`prefetch()` for better performance.
-
-
-```python
-ratings_ds = ratings_ds.shuffle(100_000)
-
-train_ds = (
- ratings_ds.take(80_000)
- .batch(MOVIELENS_CONFIG["batch_size"])
- .cache()
- .prefetch(tf.data.AUTOTUNE)
-)
-test_ds = (
- ratings_ds.skip(80_000)
- .batch(MOVIELENS_CONFIG["batch_size"])
- .take(20_000)
- .cache()
- .prefetch(tf.data.AUTOTUNE)
-)
-```
-
-### Building the model
-
-The model will have embedding layers, followed by DotInteraction and feedforward
-layers.
-
-
-```python
-
-class DLRM(keras.Model):
- def __init__(
- self,
- dense_num_units_lst,
- embedding_dim=MOVIELENS_CONFIG["embedding_dim"],
- mlp_dim=MOVIELENS_CONFIG["mlp_dim"],
- **kwargs,
- ):
- super().__init__(**kwargs)
-
- self.embedding_layers = {}
- for feature_name in (
- MOVIELENS_CONFIG["categorical_int_features"]
- + MOVIELENS_CONFIG["categorical_str_features"]
- ):
- vocab_size = len(vocabularies[feature_name]) + 1 # +1 for OOV token
- self.embedding_layers[feature_name] = keras.layers.Embedding(
- input_dim=vocab_size,
- output_dim=embedding_dim,
- )
-
- self.bottom_mlp = keras.Sequential(
- [
- keras.layers.Dense(mlp_dim, activation="relu"),
- keras.layers.Dense(embedding_dim), # Output must match embedding_dim
- ]
- )
-
- self.dot_layer = keras_rs.layers.DotInteraction()
-
- self.top_mlp = []
- for num_units in dense_num_units_lst:
- self.top_mlp.append(keras.layers.Dense(num_units, activation="relu"))
-
- self.output_layer = keras.layers.Dense(1)
-
- self.dense_num_units_lst = dense_num_units_lst
- self.embedding_dim = embedding_dim
-
- def call(self, inputs):
- embeddings = []
- for feature_name in (
- MOVIELENS_CONFIG["categorical_int_features"]
- + MOVIELENS_CONFIG["categorical_str_features"]
- ):
- embedding = self.embedding_layers[feature_name](inputs[feature_name])
- embeddings.append(embedding)
-
- # Process all continuous features together.
- continuous_inputs = []
- for feature_name in MOVIELENS_CONFIG["continuous_features"]:
- # Reshape each feature to (batch_size, 1)
- feature = keras.ops.reshape(
- keras.ops.cast(inputs[feature_name], dtype="float32"), (-1, 1)
- )
- continuous_inputs.append(feature)
-
- # Concatenate into a single tensor: (batch_size, num_continuous_features)
- concatenated_continuous = keras.ops.concatenate(continuous_inputs, axis=1)
-
- # Pass through the Bottom MLP to get one combined vector.
- processed_continuous = self.bottom_mlp(concatenated_continuous)
-
- # Combine with categorical embeddings. Note: we add a list containing the
- # single tensor.
- combined_features = embeddings + [processed_continuous]
-
- # Pass the list of features to the DotInteraction layer.
- x = self.dot_layer(combined_features)
-
- for layer in self.top_mlp:
- x = layer(x)
-
- x = self.output_layer(x)
-
- return x
-
-
-dot_network = DLRM(
- dense_num_units_lst=MOVIELENS_CONFIG["deep_net_num_units"],
- embedding_dim=MOVIELENS_CONFIG["embedding_dim"],
- mlp_dim=MOVIELENS_CONFIG["mlp_dim"],
-)
-
-rmse, dot_network_num_params = train_and_evaluate(
- learning_rate=MOVIELENS_CONFIG["learning_rate"],
- epochs=MOVIELENS_CONFIG["num_epochs"],
- train_data=train_ds,
- test_data=test_ds,
- model=dot_network,
- plot_metrics=True,
-)
-print_stats(
- rmse_list=[rmse],
- num_params=dot_network_num_params,
- model_name="Dot Network",
-)
-```
-
-
-
-
-### Visualizing feature interactions
-
-The DotInteraction layer itself doesn't have a conventional "weight" matrix like a Dense
-layer. Instead, its function is to compute the dot product between the embedding vectors
-of your features.
-
-To visualize the strength of these interactions, we can calculate a matrix representing
-the pairwise interaction strength between all feature embeddings. A common way to do this
-is to take the dot product of the embedding matrices for each pair of features and then
-aggregate the result into a single value (like the mean of the absolute values) that
-represents the overall interaction strength.
-
-
-```python
-
-def get_dot_interaction_matrix(model, categorical_features, continuous_features):
- # The new feature list for the plot labels
- all_feature_names = categorical_features + ["all_continuous_features"]
- num_features = len(all_feature_names)
-
- # Store all feature outputs in the correct order.
- all_feature_outputs = []
-
- # Get outputs for categorical features from embedding layers (unchanged).
- for feature_name in categorical_features:
- embedding = model.embedding_layers[feature_name](keras.ops.array([0]))
- all_feature_outputs.append(embedding)
-
- # Get a single output for ALL continuous features from the shared MLP.
- num_continuous_features = len(continuous_features)
- # Create a dummy input of zeros for the MLP
- dummy_continuous_input = keras.ops.zeros((1, num_continuous_features))
- processed_continuous = model.bottom_mlp(dummy_continuous_input)
- all_feature_outputs.append(processed_continuous)
-
- interaction_matrix = np.zeros((num_features, num_features))
-
- # Iterate through each pair to calculate interaction strength.
- for i in range(num_features):
- for j in range(num_features):
- interaction = keras.ops.dot(
- all_feature_outputs[i], keras.ops.transpose(all_feature_outputs[j])
- )
- interaction_strength = keras.ops.convert_to_numpy(np.abs(interaction))[0][0]
- interaction_matrix[i, j] = interaction_strength
-
- return interaction_matrix, all_feature_names
-
-
-# Get the list of categorical feature names.
-categorical_feature_names = (
- MOVIELENS_CONFIG["categorical_int_features"]
- + MOVIELENS_CONFIG["categorical_str_features"]
-)
-
-# Calculate the interaction matrix with the corrected function.
-interaction_matrix, feature_names = get_dot_interaction_matrix(
- model=dot_network,
- categorical_features=categorical_feature_names,
- continuous_features=MOVIELENS_CONFIG["continuous_features"],
-)
-
-# Visualize the matrix as a heatmap.
-print("\nVisualizing the feature interaction strengths:")
-visualize_layer(interaction_matrix, feature_names)
-```
-
-
-
-
-
diff --git a/templates/examples/keras_rs/listwise_ranking.md b/templates/examples/keras_rs/listwise_ranking.md
deleted file mode 100644
index 7143859333..0000000000
--- a/templates/examples/keras_rs/listwise_ranking.md
+++ /dev/null
@@ -1,669 +0,0 @@
-# List-wise ranking
-
-**Author:** [Abheesht Sharma](https://github.com/abheesht17/), [Fabien Hertschuh](https://github.com/hertschuh/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Rank movies using pairwise losses instead of pointwise losses.
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/listwise_ranking.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/listwise_ranking.py)
-
-
-
----
-## Introduction
-
-In our
-[basic ranking tutorial](/keras_rs/examples/basic_ranking/), we explored a model
-that learned to predict ratings for specific user-movie combinations. This model
-took (user, movie) pairs as input and was trained using mean-squared error to
-precisely predict the rating a user might give to a movie.
-
-However, solely optimizing a model's accuracy in predicting individual movie
-scores isn't always the most effective strategy for developing ranking systems.
-For ranking models, pinpoint accuracy in predicting scores is less critical than
-the model's capability to generate an ordered list of items that aligns with a
-user's preferences. In essence, the relative order of items matters more than
-the exact predicted values.
-
-Instead of focusing on the model's predictions for individual query-item pairs
-(a pointwise approach), we can optimize the model based on its ability to
-correctly order items. One common method for this is pairwise ranking. In this
-approach, the model learns by comparing pairs of items (e.g., item A and item B)
-and determining which one should be ranked higher for a given user or query. The
-goal is to minimize the number of incorrectly ordered pairs.
-
-Let's begin by importing all the necessary libraries.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import collections
-
-import keras
-import numpy as np
-import tensorflow as tf # Needed only for the dataset
-import tensorflow_datasets as tfds
-from keras import ops
-
-import keras_rs
-```
-
-Let's define some hyperparameters here.
-
-
-```python
-# Data args
-TRAIN_NUM_LIST_PER_USER = 50
-TEST_NUM_LIST_PER_USER = 1
-NUM_EXAMPLES_PER_LIST = 5
-
-# Model args
-EMBEDDING_DIM = 32
-
-# Train args
-BATCH_SIZE = 1024
-EPOCHS = 5
-LEARNING_RATE = 0.1
-```
-
----
-## Preparing the dataset
-
-We use the MovieLens dataset. The data loading and processing steps are similar
-to previous tutorials, so, we will only discuss the differences here.
-
-
-```python
-# Ratings data.
-ratings = tfds.load("movielens/100k-ratings", split="train")
-# Features of all the available movies.
-movies = tfds.load("movielens/100k-movies", split="train")
-
-users_count = (
- ratings.map(lambda x: tf.strings.to_number(x["user_id"], out_type=tf.int32))
- .reduce(tf.constant(0, tf.int32), tf.maximum)
- .numpy()
-)
-movies_count = movies.cardinality().numpy()
-
-
-def preprocess_rating(x):
- return {
- "user_id": tf.strings.to_number(x["user_id"], out_type=tf.int32),
- "movie_id": tf.strings.to_number(x["movie_id"], out_type=tf.int32),
- # Normalise ratings between 0 and 1.
- "user_rating": (x["user_rating"] - 1.0) / 4.0,
- }
-
-
-shuffled_ratings = ratings.map(preprocess_rating).shuffle(
- 100_000, seed=42, reshuffle_each_iteration=False
-)
-train_ratings = shuffled_ratings.take(70_000)
-val_ratings = shuffled_ratings.skip(70_000).take(15_000)
-test_ratings = shuffled_ratings.skip(85_000).take(15_000)
-```
-
-So far, we've replicated what we have in the basic ranking tutorial.
-
-However, this existing dataset is not directly applicable to list-wise
-optimization. List-wise optimization requires, for each user, a list of movies
-they have rated, allowing the model to learn from the relative orderings within
-that list. The MovieLens 100K dataset, in its original form, provides individual
-rating instances (one user, one movie, one rating per example), rather than
-these aggregated user-specific lists.
-
-To enable listwise optimization, we need to restructure the dataset. This
-involves transforming it so that each data point or example represents a single
-user ID accompanied by a list of movies that user has rated. Within these lists,
-some movies will naturally be ranked higher by the user (as evidenced by their
-ratings) than others. The primary objective for our model will then be to learn
-to predict item orderings that correspond to these observed user preferences.
-
-Let's start by getting the entire list of movies and corresponding ratings for
-every user. We remove `user_ids` corresponding to users who have rated less than
-`NUM_EXAMPLES_PER_LIST` number of movies.
-
-
-```python
-
-def get_movie_sequence_per_user(ratings, min_examples_per_list):
- """Gets movieID sequences and ratings for every user."""
- sequences = collections.defaultdict(list)
-
- for sample in ratings:
- user_id = sample["user_id"]
- movie_id = sample["movie_id"]
- user_rating = sample["user_rating"]
-
- sequences[int(user_id.numpy())].append(
- {
- "movie_id": int(movie_id.numpy()),
- "user_rating": float(user_rating.numpy()),
- }
- )
-
- # Remove lists with < `min_examples_per_list` number of elements.
- sequences = {
- user_id: sequence
- for user_id, sequence in sequences.items()
- if len(sequence) >= min_examples_per_list
- }
-
- return sequences
-
-```
-
-We now sample 50 lists for each user for the training data. For each list, we
-randomly sample 5 movies from the movies the user rated.
-
-
-```python
-
-def sample_sublist_from_list(
- lst,
- num_examples_per_list,
-):
- """Random selects `num_examples_per_list` number of elements from list."""
-
- indices = np.random.choice(
- range(len(lst)),
- size=num_examples_per_list,
- replace=False,
- )
-
- samples = [lst[i] for i in indices]
- return samples
-
-
-def get_examples(
- sequences,
- num_list_per_user,
- num_examples_per_list,
-):
- inputs = {
- "user_id": [],
- "movie_id": [],
- }
- labels = []
- for user_id, user_list in sequences.items():
- sampled_list = sample_sublist_from_list(
- user_list,
- num_examples_per_list,
- )
-
- inputs["user_id"].append(user_id)
- inputs["movie_id"].append(
- tf.convert_to_tensor([f["movie_id"] for f in sampled_list])
- )
- labels.append(tf.convert_to_tensor([f["user_rating"] for f in sampled_list]))
-
- return (
- {"user_id": inputs["user_id"], "movie_id": inputs["movie_id"]},
- labels,
- )
-
-
-train_sequences = get_movie_sequence_per_user(
- ratings=train_ratings, min_examples_per_list=NUM_EXAMPLES_PER_LIST
-)
-train_examples = get_examples(
- train_sequences,
- num_list_per_user=TRAIN_NUM_LIST_PER_USER,
- num_examples_per_list=NUM_EXAMPLES_PER_LIST,
-)
-train_ds = tf.data.Dataset.from_tensor_slices(train_examples)
-
-val_sequences = get_movie_sequence_per_user(
- ratings=val_ratings, min_examples_per_list=5
-)
-val_examples = get_examples(
- val_sequences,
- num_list_per_user=TEST_NUM_LIST_PER_USER,
- num_examples_per_list=NUM_EXAMPLES_PER_LIST,
-)
-val_ds = tf.data.Dataset.from_tensor_slices(val_examples)
-
-test_sequences = get_movie_sequence_per_user(
- ratings=test_ratings, min_examples_per_list=5
-)
-test_examples = get_examples(
- test_sequences,
- num_list_per_user=TEST_NUM_LIST_PER_USER,
- num_examples_per_list=NUM_EXAMPLES_PER_LIST,
-)
-test_ds = tf.data.Dataset.from_tensor_slices(test_examples)
-```
-
-Batch up the dataset, and cache it.
-
-
-```python
-train_ds = train_ds.batch(BATCH_SIZE).cache()
-val_ds = val_ds.batch(BATCH_SIZE).cache()
-test_ds = test_ds.batch(BATCH_SIZE).cache()
-```
-
----
-## Building the model
-
-We build a typical two-tower ranking model, similar to the
-[basic ranking tutorial](/keras_rs/examples/basic_ranking/).
-We have separate embedding layers for user ID and movie IDs. After obtaining
-these embeddings, we concatenate them and pass them through a network of dense
-layers.
-
-The only point of difference is that for movie IDs, we take a list of IDs
-rather than just one movie ID. So, when we concatenate user ID embedding and
-movie IDs' embeddings, we "repeat" the user ID 'NUM_EXAMPLES_PER_LIST' times so
-as to get the same shape as the movie IDs' embeddings.
-
-
-```python
-
-class RankingModel(keras.Model):
- """Create the ranking model with the provided parameters.
-
- Args:
- num_users: Number of entries in the user embedding table.
- num_candidates: Number of entries in the candidate embedding table.
- embedding_dimension: Output dimension for user and movie embedding tables.
- """
-
- def __init__(
- self,
- num_users,
- num_candidates,
- embedding_dimension=32,
- **kwargs,
- ):
- super().__init__(**kwargs)
- # Embedding table for users.
- self.user_embedding = keras.layers.Embedding(num_users, embedding_dimension)
- # Embedding table for candidates.
- self.candidate_embedding = keras.layers.Embedding(
- num_candidates, embedding_dimension
- )
- # Predictions.
- self.ratings = keras.Sequential(
- [
- # Learn multiple dense layers.
- keras.layers.Dense(256, activation="relu"),
- keras.layers.Dense(64, activation="relu"),
- # Make rating predictions in the final layer.
- keras.layers.Dense(1),
- ]
- )
-
- def build(self, input_shape):
- self.user_embedding.build(input_shape["user_id"])
- self.candidate_embedding.build(input_shape["movie_id"])
-
- output_shape = self.candidate_embedding.compute_output_shape(
- input_shape["movie_id"]
- )
-
- self.ratings.build(list(output_shape[:-1]) + [2 * output_shape[-1]])
-
- def call(self, inputs):
- user_id, movie_id = inputs["user_id"], inputs["movie_id"]
- user_embeddings = self.user_embedding(user_id)
- candidate_embeddings = self.candidate_embedding(movie_id)
-
- list_length = ops.shape(movie_id)[-1]
- user_embeddings_repeated = ops.repeat(
- ops.expand_dims(user_embeddings, axis=1),
- repeats=list_length,
- axis=1,
- )
- concatenated_embeddings = ops.concatenate(
- [user_embeddings_repeated, candidate_embeddings], axis=-1
- )
-
- scores = self.ratings(concatenated_embeddings)
- scores = ops.squeeze(scores, axis=-1)
-
- return scores
-
- def compute_output_shape(self, input_shape):
- return (input_shape[0], input_shape[1])
-
-```
-
-Let's instantiate, compile and train our model. We will train two models:
-one with vanilla mean-squared error, and the other with pairwise hinge loss.
-For the latter, we will use `keras_rs.losses.PairwiseHingeLoss`.
-
-Pairwise losses compare pairs of items within each list, penalizing cases where
-an item with a higher true label has a lower predicted score than an item with a
-lower true label. This is why they are more suited for ranking tasks than
-pointwise losses.
-
-To quantify these results, we compute nDCG. nDCG is a measure of ranking quality
-that evaluates how well a system orders items based on relevance, giving more
-importance to highly relevant items appearing at the top of the list and
-normalizing the score against an ideal ranking.
-To compute it, we just need to pass `keras_rs.metrics.NDCG()` as a metric to
-`model.compile`.
-
-
-```python
-model_mse = RankingModel(
- num_users=users_count + 1,
- num_candidates=movies_count + 1,
- embedding_dimension=EMBEDDING_DIM,
-)
-model_mse.compile(
- loss=keras.losses.MeanSquaredError(),
- metrics=[keras_rs.metrics.NDCG(k=NUM_EXAMPLES_PER_LIST, name="ndcg")],
- optimizer=keras.optimizers.Adagrad(learning_rate=LEARNING_RATE),
-)
-model_mse.fit(train_ds, validation_data=val_ds, epochs=EPOCHS)
-```
-
-
----
-## Evaluation
-
-Comparing the validation nDCG values, it is clear that the model trained with
-the pairwise hinge loss outperforms the other one. Let's make this observation
-more concrete by comparing results on the test set.
-
-
-```python
-ndcg_mse = model_mse.evaluate(test_ds, return_dict=True)["ndcg"]
-ndcg_hinge = model_hinge.evaluate(test_ds, return_dict=True)["ndcg"]
-print(ndcg_mse, ndcg_hinge)
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 1s/step - loss: 0.0805 - ndcg: 0.8886
-
-
----
-## Prediction
-
-Now, let's rank some lists!
-
-Let's create a mapping from movie ID to title so that we can surface the titles
-for the ranked list.
-
-
-```python
-movie_id_to_movie_title = {
- int(x["movie_id"]): x["movie_title"] for x in movies.as_numpy_iterator()
-}
-movie_id_to_movie_title[0] = "" # Because id 0 is not in the dataset.
-
-user_id = 42
-movie_ids = [409, 237, 131, 941, 543]
-predictions = model_hinge.predict(
- {
- "user_id": keras.ops.array([user_id]),
- "movie_id": keras.ops.array([movie_ids]),
- }
-)
-predictions = keras.ops.convert_to_numpy(keras.ops.squeeze(predictions, axis=0))
-sorted_indices = np.argsort(predictions)
-sorted_movies = [movie_ids[i] for i in sorted_indices]
-
-for i, movie_id in enumerate(sorted_movies):
- print(f"{i + 1}. ", movie_id_to_movie_title[movie_id])
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 261ms/step
-
-
-And we're all done!
-
diff --git a/templates/examples/keras_rs/multi_task.md b/templates/examples/keras_rs/multi_task.md
deleted file mode 100644
index 6b764b1f22..0000000000
--- a/templates/examples/keras_rs/multi_task.md
+++ /dev/null
@@ -1,1464 +0,0 @@
-# Multi-task recommenders: retrieval + ranking
-
-**Author:** [Abheesht Sharma](https://github.com/abheesht17/), [Fabien Hertschuh](https://github.com/hertschuh/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Using one model for both retrieval and ranking.
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/multi_task.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/multi_task.py)
-
-
-
----
-## Introduction
-
-In the
-[basic retrieval](/keras_rs/examples/basic_retrieval/)
-and
-[basic ranking](/keras_rs/examples/basic_ranking/)
-tutorials, we created separate models for retrieval and ranking tasks,
-respectively. However, in many cases, building a single, joint model for
-multiple tasks can lead to better performance than creating distinct models for
-each task. This is especially true when dealing with data that is unevenly
-distributed — such as abundant data (e.g., clicks) versus sparse data
-(e.g., purchases, returns, or manual reviews). In such scenarios, a joint model
-can leverage representations learned from the abundant data to improve
-predictions on the sparse data, a technique known as transfer learning.
-For instance, [research](https://openreview.net/forum?id=SJxPVcSonN) shows that
-a model trained to predict user ratings from sparse survey data can be
-significantly enhanced by incorporating an auxiliary task using abundant click
-log data.
-
-In this example, we develop a multi-objective recommender system using the
-MovieLens dataset. We incorporate both implicit feedback (e.g., movie watches)
-and explicit feedback (e.g., ratings) to create a more robust and effective
-recommendation model. For the former, we predict "movie watches", i.e., whether
-a user has watched a movie, and for the latter, we predict the rating given by a
-user to a movie.
-
-Let's start by importing the necessary packages.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import keras
-import tensorflow as tf # Needed for the dataset
-import tensorflow_datasets as tfds
-
-import keras_rs
-```
-
----
-## Prepare the dataset
-
-We use the MovieLens dataset. The data loading and processing steps are similar
-to previous tutorials, so we will not discuss them in details here.
-
-
-```python
-# Ratings data with user and movie data.
-ratings = tfds.load("movielens/100k-ratings", split="train")
-# Features of all the available movies.
-movies = tfds.load("movielens/100k-movies", split="train")
-```
-
-Get user and movie counts so that we can define embedding layers.
-
-
-```python
-users_count = (
- ratings.map(lambda x: tf.strings.to_number(x["user_id"], out_type=tf.int32))
- .reduce(tf.constant(0, tf.int32), tf.maximum)
- .numpy()
-)
-
-movies_count = movies.cardinality().numpy()
-```
-
-Our inputs are `"user_id"` and `"movie_id"`. Our label for the ranking task is
-`"user_rating"`. `"user_rating"` is an integer between 0 to 4. We constrain it
-to `[0, 1]`.
-
-
-```python
-
-def preprocess_rating(x):
- return (
- {
- "user_id": tf.strings.to_number(x["user_id"], out_type=tf.int32),
- "movie_id": tf.strings.to_number(x["movie_id"], out_type=tf.int32),
- },
- (x["user_rating"] - 1.0) / 4.0,
- )
-
-
-shuffled_ratings = ratings.map(preprocess_rating).shuffle(
- 100_000, seed=42, reshuffle_each_iteration=False
-)
-
-```
-
-Split the dataset into train-test sets.
-
-
-```python
-train_ratings = shuffled_ratings.take(80_000).batch(1000).cache()
-test_ratings = shuffled_ratings.skip(80_000).take(20_000).batch(1000).cache()
-```
-
----
-## Building the model
-
-We build the model in a similar way to the basic retrieval and basic ranking
-guides.
-
-For the retrieval task (i.e., predicting whether a user watched a movie),
-we compute the similarity of the corresponding user and movie embeddings, and
-use cross entropy loss, where the positive pairs are labelled one, and all other
-samples in the batch are considered "negatives". We report top-k accuracy for
-this task.
-
-For the ranking task (i.e., given a user-movie pair, predict rating), we
-concatenate user and movie embeddings and pass it to a dense module. We use
-MSE loss here, and report the Root Mean Squared Error (RMSE).
-
-The final loss is a weighted combination of the two losses mentioned above,
-where the weights are `"retrieval_loss_wt"` and `"ranking_loss_wt"`. These
-weights decide which task the model will focus on.
-
-
-```python
-
-class MultiTaskModel(keras.Model):
- def __init__(
- self,
- num_users,
- num_candidates,
- embedding_dimension=32,
- layer_sizes=(256, 128),
- retrieval_loss_wt=1.0,
- ranking_loss_wt=1.0,
- **kwargs,
- ):
- super().__init__(**kwargs)
- # Our query tower, simply an embedding table.
- self.user_embedding = keras.layers.Embedding(num_users, embedding_dimension)
-
- # Our candidate tower, simply an embedding table.
- self.candidate_embedding = keras.layers.Embedding(
- num_candidates, embedding_dimension
- )
-
- # Rating model.
- self.rating_model = tf.keras.Sequential(
- [
- keras.layers.Dense(layer_size, activation="relu")
- for layer_size in layer_sizes
- ]
- + [keras.layers.Dense(1)]
- )
-
- # The layer that performs the retrieval.
- self.retrieval = keras_rs.layers.BruteForceRetrieval(k=10, return_scores=False)
-
- self.retrieval_loss_fn = keras.losses.CategoricalCrossentropy(
- from_logits=True,
- reduction="sum",
- )
- self.ranking_loss_fn = keras.losses.MeanSquaredError()
-
- # Top-k accuracy for retrieval
- self.top_k_metric = keras.metrics.SparseTopKCategoricalAccuracy(
- k=100, from_sorted_ids=True
- )
- # RMSE for ranking
- self.rmse_metric = keras.metrics.RootMeanSquaredError()
-
- # Attributes.
- self.num_users = num_users
- self.num_candidates = num_candidates
- self.embedding_dimension = embedding_dimension
- self.layer_sizes = layer_sizes
- self.retrieval_loss_wt = retrieval_loss_wt
- self.ranking_loss_wt = ranking_loss_wt
-
- def build(self, input_shape):
- self.user_embedding.build(input_shape)
- self.candidate_embedding.build(input_shape)
- # In this case, the candidates are directly the movie embeddings.
- # We take a shortcut and directly reuse the variable.
- self.retrieval.candidate_embeddings = self.candidate_embedding.embeddings
- self.retrieval.build(input_shape)
-
- self.rating_model.build((None, 2 * self.embedding_dimension))
-
- super().build(input_shape)
-
- def call(self, inputs, training=False):
- # Unpack inputs. Note that we have the if condition throughout this
- # `call()` method so that we can do a `.predict()` for the retrieval
- # task.
- user_id = inputs["user_id"]
- if "movie_id" in inputs:
- movie_id = inputs["movie_id"]
-
- result = {}
-
- # Get user, movie embeddings.
- user_embeddings = self.user_embedding(user_id)
- result["user_embeddings"] = user_embeddings
-
- if "movie_id" in inputs:
- candidate_embeddings = self.candidate_embedding(movie_id)
- result["candidate_embeddings"] = candidate_embeddings
-
- # Pass both embeddings through the rating block of the model.
- rating = self.rating_model(
- keras.ops.concatenate([user_embeddings, candidate_embeddings], axis=1)
- )
- result["rating"] = rating
-
- if not training:
- # Skip the retrieval of top movies during training as the
- # predictions are not used.
- result["predictions"] = self.retrieval(user_embeddings)
-
- return result
-
- def compute_loss(self, x, y, y_pred, sample_weight, training=True):
- user_embeddings = y_pred["user_embeddings"]
- candidate_embeddings = y_pred["candidate_embeddings"]
-
- # 1. Retrieval
-
- # Compute the affinity score by multiplying the two embeddings.
- scores = keras.ops.matmul(
- user_embeddings,
- keras.ops.transpose(candidate_embeddings),
- )
-
- # Retrieval labels: One-hot vectors
- num_users = keras.ops.shape(user_embeddings)[0]
- num_candidates = keras.ops.shape(candidate_embeddings)[0]
- retrieval_labels = keras.ops.eye(num_users, num_candidates)
- # Retrieval loss
- retrieval_loss = self.retrieval_loss_fn(retrieval_labels, scores, sample_weight)
-
- # 2. Ranking
- ratings = y
- pred_rating = y_pred["rating"]
-
- # Ranking labels are just ratings.
- ranking_labels = keras.ops.expand_dims(ratings, -1)
- # Ranking loss
- ranking_loss = self.ranking_loss_fn(ranking_labels, pred_rating, sample_weight)
-
- # Total loss is a weighted combination of the two losses.
- total_loss = (
- self.retrieval_loss_wt * retrieval_loss
- + self.ranking_loss_wt * ranking_loss
- )
-
- return total_loss
-
- def compute_metrics(self, x, y, y_pred, sample_weight=None):
- # RMSE can be computed irrespective of whether we are
- # training/evaluating.
- self.rmse_metric.update_state(
- y,
- y_pred["rating"],
- sample_weight=sample_weight,
- )
-
- if "predictions" in y_pred:
- # We are evaluating or predicting. Update `top_k_metric`.
- movie_ids = x["movie_id"]
- predictions = y_pred["predictions"]
- # For `top_k_metric`, which is a `SparseTopKCategoricalAccuracy`, we
- # only take top rated movies, and we put a weight of 0 for the rest.
- rating_weight = keras.ops.cast(keras.ops.greater(y, 0.9), "float32")
- sample_weight = (
- rating_weight
- if sample_weight is None
- else keras.ops.multiply(rating_weight, sample_weight)
- )
- self.top_k_metric.update_state(
- movie_ids, predictions, sample_weight=sample_weight
- )
-
- return self.get_metrics_result()
- else:
- # We are training. `top_k_metric` is not updated and is zero, so
- # don't report it.
- result = self.get_metrics_result()
- result.pop(self.top_k_metric.name)
- return result
-
-```
-
----
-## Training and evaluating
-
-We will train three different models here. This can be done easily by passing
-the correct loss weights:
-
-1. Rating-specialised model
-2. Retrieval-specialised model
-3. Multi-task model
-
-
-```python
-# Rating-specialised model
-model = MultiTaskModel(
- num_users=users_count + 1,
- num_candidates=movies_count + 1,
- ranking_loss_wt=1.0,
- retrieval_loss_wt=0.0,
-)
-model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
-model.fit(train_ratings, epochs=5)
-
-model.evaluate(test_ratings)
-
-# Retrieval-specialised model
-model = MultiTaskModel(
- num_users=users_count + 1,
- num_candidates=movies_count + 1,
- ranking_loss_wt=0.0,
- retrieval_loss_wt=1.0,
-)
-model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
-model.fit(train_ratings, epochs=5)
-
-model.evaluate(test_ratings)
-
-# Multi-task model
-model = MultiTaskModel(
- num_users=users_count + 1,
- num_candidates=movies_count + 1,
- ranking_loss_wt=1.0,
- retrieval_loss_wt=1.0,
-)
-model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
-model.fit(train_ratings, epochs=5)
-
-model.evaluate(test_ratings)
-```
-
-
-Let's plot a table of the metrics and pen down our observations:
-
-| Model | Top-K Accuracy (↑) | RMSE (↓) |
-|-----------------------|--------------------|----------|
-| rating-specialised | 0.005 | 0.26 |
-| retrieval-specialised | 0.020 | 0.78 |
-| multi-task | 0.022 | 0.25 |
-
-As expected, the rating-specialised model has good RMSE, but poor top-k
-accuracy. For the retrieval-specialised model, it's the opposite.
-
-For the multi-task model, we notice that the model does well (or even slightly
-better than the two specialised models) on both tasks. In general, we can expect
-multi-task learning to bring about better results, especially when one task has
-a data-abundant source, and the other task is trained on sparse data.
-
-Now, let's make a prediction! We will first do a retrieval, and then for the
-retrieved list of movies, we will predict the rating using the same model.
-
-
-```python
-movie_id_to_movie_title = {
- int(x["movie_id"]): x["movie_title"] for x in movies.as_numpy_iterator()
-}
-movie_id_to_movie_title[0] = "" # Because id 0 is not in the dataset.
-
-user_id = 5
-retrieved_movie_ids = model.predict(
- {
- "user_id": keras.ops.array([user_id]),
- }
-)
-retrieved_movie_ids = keras.ops.convert_to_numpy(retrieved_movie_ids["predictions"][0])
-retrieved_movies = [movie_id_to_movie_title[x] for x in retrieved_movie_ids]
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 109ms/step
-
-
-```
-
-```
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 110ms/step
-
-
-For these retrieved movies, we can now get the corresponding ratings.
-
-
-```python
-pred_ratings = model.predict(
- {
- "user_id": keras.ops.array([user_id] * len(retrieved_movie_ids)),
- "movie_id": keras.ops.array(retrieved_movie_ids),
- }
-)["rating"]
-pred_ratings = keras.ops.convert_to_numpy(keras.ops.squeeze(pred_ratings, axis=1))
-
-for movie_id, prediction in zip(retrieved_movie_ids, pred_ratings):
- print(f"{movie_id_to_movie_title[movie_id]}: {5.0 * prediction:,.2f}")
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 273ms/step
-
-
-```
-
-```
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 274ms/step
-
-
-
-```
-b'Blob, The (1958)': 2.01
-b'Mighty Morphin Power Rangers: The Movie (1995)': 2.03
-b'Flintstones, The (1994)': 2.18
-b'Beverly Hillbillies, The (1993)': 1.89
-b'Lawnmower Man, The (1992)': 2.57
-b'Hot Shots! Part Deux (1993)': 2.28
-b'Street Fighter (1994)': 1.84
-b'Cabin Boy (1994)': 1.94
-b'Little Rascals, The (1994)': 2.12
-b'Jaws 3-D (1983)': 2.27
-
-```
-
diff --git a/templates/examples/keras_rs/sas_rec.md b/templates/examples/keras_rs/sas_rec.md
deleted file mode 100644
index 40305906d3..0000000000
--- a/templates/examples/keras_rs/sas_rec.md
+++ /dev/null
@@ -1,2972 +0,0 @@
-# Sequential retrieval using SASRec
-
-**Author:** [Abheesht Sharma](https://github.com/abheesht17/), [Fabien Hertschuh](https://github.com/hertschuh/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Recommend movies using a Transformer-based retrieval model (SASRec).
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/sas_rec.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/sas_rec.py)
-
-
-
----
-## Introduction
-
-Sequential recommendation is a popular model that looks at a sequence of items
-that users have interacted with previously and then predicts the next item.
-Here, the order of the items within each sequence matters. Previously, in the
-[Recommending movies: retrieval using a sequential model](/keras_rs/examples/sequential_retrieval/)
-example, we built a GRU-based sequential retrieval model. In this example, we
-will build a popular Transformer decoder-based model named
-[Self-Attentive Sequential Recommendation (SASRec)](https://arxiv.org/abs/1808.09781)
-for the same sequential recommendation task.
-
-Let's begin by importing all the necessary libraries.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import collections
-import os
-
-import keras
-import keras_hub
-import numpy as np
-import pandas as pd
-import tensorflow as tf # Needed only for the dataset
-from keras import ops
-
-import keras_rs
-```
-
-Let's also define all important variables/hyperparameters below.
-
-
-```python
-DATA_DIR = "./raw/data/"
-
-# MovieLens-specific variables
-MOVIELENS_1M_URL = "https://files.grouplens.org/datasets/movielens/ml-1m.zip"
-MOVIELENS_ZIP_HASH = "a6898adb50b9ca05aa231689da44c217cb524e7ebd39d264c56e2832f2c54e20"
-
-RATINGS_FILE_NAME = "ratings.dat"
-MOVIES_FILE_NAME = "movies.dat"
-
-# Data processing args
-MAX_CONTEXT_LENGTH = 200
-MIN_SEQUENCE_LENGTH = 3
-PAD_ITEM_ID = 0
-
-RATINGS_DATA_COLUMNS = ["UserID", "MovieID", "Rating", "Timestamp"]
-MOVIES_DATA_COLUMNS = ["MovieID", "Title", "Genres"]
-MIN_RATING = 2
-
-# Training/model args picked from SASRec paper
-BATCH_SIZE = 128
-NUM_EPOCHS = 10
-LEARNING_RATE = 0.001
-
-NUM_LAYERS = 2
-NUM_HEADS = 1
-HIDDEN_DIM = 50
-DROPOUT = 0.2
-```
-
----
-## Dataset
-
-Next, we need to prepare our dataset. Like we did in the
-[sequential retrieval](/keras_rs/examples/sequential_retrieval/)
-example, we are going to use the MovieLens dataset.
-
-The dataset preparation step is fairly involved. The original ratings dataset
-contains `(user, movie ID, rating, timestamp)` tuples (among other columns,
-which are not important for this example). Since we are dealing with sequential
-retrieval, we need to create movie sequences for every user, where the sequences
-are ordered by timestamp.
-
-Let's start by downloading and reading the dataset.
-
-
-```python
-# Download the MovieLens dataset.
-if not os.path.exists(DATA_DIR):
- os.makedirs(DATA_DIR)
-
-path_to_zip = keras.utils.get_file(
- fname="ml-1m.zip",
- origin=MOVIELENS_1M_URL,
- file_hash=MOVIELENS_ZIP_HASH,
- hash_algorithm="sha256",
- extract=True,
- cache_dir=DATA_DIR,
-)
-movielens_extracted_dir = os.path.join(
- os.path.dirname(path_to_zip),
- "ml-1m_extracted",
- "ml-1m",
-)
-
-
-# Read the dataset.
-def read_data(data_directory, min_rating=None):
- """Read movielens ratings.dat and movies.dat file
- into dataframe.
- """
-
- ratings_df = pd.read_csv(
- os.path.join(data_directory, RATINGS_FILE_NAME),
- sep="::",
- names=RATINGS_DATA_COLUMNS,
- encoding="unicode_escape",
- )
- ratings_df["Timestamp"] = ratings_df["Timestamp"].apply(int)
-
- # Remove movies with `rating < min_rating`.
- if min_rating is not None:
- ratings_df = ratings_df[ratings_df["Rating"] >= min_rating]
-
- movies_df = pd.read_csv(
- os.path.join(data_directory, MOVIES_FILE_NAME),
- sep="::",
- names=MOVIES_DATA_COLUMNS,
- encoding="unicode_escape",
- )
- return ratings_df, movies_df
-
-
-ratings_df, movies_df = read_data(
- data_directory=movielens_extracted_dir, min_rating=MIN_RATING
-)
-
-# Need to know #movies so as to define embedding layers.
-movies_count = movies_df["MovieID"].max()
-```
-
-
-```
-Downloading data from https://files.grouplens.org/datasets/movielens/ml-1m.zip
-
-```
-
-```
-:26: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
- ratings_df = pd.read_csv(
-
-:38: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
- movies_df = pd.read_csv(
-
-```
-
-Now that we have read the dataset, let's create sequences of movies
-for every user. Here is the function for doing just that.
-
-
-```python
-
-def get_movie_sequence_per_user(ratings_df):
- """Get movieID sequences for every user."""
- sequences = collections.defaultdict(list)
-
- for user_id, movie_id, rating, timestamp in ratings_df.values:
- sequences[user_id].append(
- {
- "movie_id": movie_id,
- "timestamp": timestamp,
- "rating": rating,
- }
- )
-
- # Sort movie sequences by timestamp for every user.
- for user_id, context in sequences.items():
- context.sort(key=lambda x: x["timestamp"])
- sequences[user_id] = context
-
- return sequences
-
-
-sequences = get_movie_sequence_per_user(ratings_df)
-```
-
-So far, we have essentially replicated what we did in the sequential retrieval
-example. We have a sequence of movies for every user.
-
-SASRec is trained contrastively, which means the model learns to distinguish
-between sequences of movies a user has actually interacted with (positive
-examples) and sequences they have not interacted with (negative examples).
-
-The following function, `format_data`, prepares the data in this specific
-format. For each user's movie sequence, it generates a corresponding
-"negative sequence". This negative sequence consists of randomly
-selected movies that the user has *not* interacted with, but are of the same
-length as the original sequence.
-
-
-```python
-
-def format_data(sequences):
- examples = {
- "sequence": [],
- "negative_sequence": [],
- }
-
- for user_id in sequences:
- sequence = [int(d["movie_id"]) for d in sequences[user_id]]
-
- # Get negative sequence.
- def random_negative_item_id(low, high, positive_lst):
- sampled = np.random.randint(low=low, high=high)
- while sampled in positive_lst:
- sampled = np.random.randint(low=low, high=high)
- return sampled
-
- negative_sequence = [
- random_negative_item_id(1, movies_count + 1, sequence)
- for _ in range(len(sequence))
- ]
-
- examples["sequence"].append(np.array(sequence))
- examples["negative_sequence"].append(np.array(negative_sequence))
-
- examples["sequence"] = tf.ragged.constant(examples["sequence"])
- examples["negative_sequence"] = tf.ragged.constant(examples["negative_sequence"])
-
- return examples
-
-
-examples = format_data(sequences)
-ds = tf.data.Dataset.from_tensor_slices(examples).batch(BATCH_SIZE)
-```
-
-Now that we have the original movie interaction sequences for each user (from
-`format_data`, stored in `examples["sequence"]`) and their corresponding
-random negative sequences (in `examples["negative_sequence"]`), the next step is
-to prepare this data for input to the model. The primary goals of this
-preprocessing are:
-
-1. Creating Input Features and Target Labels: For sequential
- recommendation, the model learns to predict the next item in a sequence
- given the preceding items. This is achieved by:
- - taking the original `example["sequence"]` and creating the model's
- input features (`item_ids`) from all items *except the last one*
- (`example["sequence"][..., :-1]`);
- - creating the target "positive sequence" (what the model tries to predict
- as the actual next items) by taking the original `example["sequence"]`
- and shifting it, using all items *except the first one*
- (`example["sequence"][..., 1:]`);
- - shifting `example["negative_sequence"]` (from `format_data`) is
- to create the target "negative sequence" for the contrastive loss
- (`example["negative_sequence"][..., 1:]`).
-
-2. Handling Variable Length Sequences: Neural networks typically require
- fixed-size inputs. Therefore, both the input feature sequences and the
- target sequences are padded (with a special `PAD_ITEM_ID`) or truncated
- to a predefined `MAX_CONTEXT_LENGTH`. A `padding_mask` is also generated
- from the input features to ensure the model ignores these padded tokens
- during attention calculations, i.e, these tokens will be masked.
-
-3. Differentiating Training and Validation/Testing:
- - During training:
- - Input features (`item_ids`) and context for negative sequences
- are prepared as described above (all but the last item of the
- original sequences).
- - Target positive and negative sequences are the shifted versions of
- the original sequences.
- - `sample_weight` is created based on the input features to ensure
- that loss is calculated only on actual items, not on padding tokens
- in the targets.
- - During validation/testing:
- - Input features are prepared similarly.
- - The model's performance is typically evaluated on its ability to
- predict the actual last item of the original sequence. Thus,
- `sample_weight` is configured to focus the loss calculation
- only on this final prediction in the target sequences.
-
-Note: SASRec does the same thing we've done above, except that they take the
-`item_ids[:-2]` for the validation set and `item_ids[:-1]` for the test set.
-We skip that here for brevity.
-
-
-```python
-
-def _preprocess(example, train=False):
- sequence = example["sequence"]
- negative_sequence = example["negative_sequence"]
-
- if train:
- sequence = example["sequence"][..., :-1]
- negative_sequence = example["negative_sequence"][..., :-1]
-
- batch_size = tf.shape(sequence)[0]
-
- if not train:
- # Loss computed only on last token.
- sample_weight = tf.zeros_like(sequence, dtype="float32")[..., :-1]
- sample_weight = tf.concat(
- [sample_weight, tf.ones((batch_size, 1), dtype="float32")], axis=1
- )
-
- # Truncate/pad sequence. +1 to account for truncation later.
- sequence = sequence.to_tensor(
- shape=[batch_size, MAX_CONTEXT_LENGTH + 1], default_value=PAD_ITEM_ID
- )
- negative_sequence = negative_sequence.to_tensor(
- shape=[batch_size, MAX_CONTEXT_LENGTH + 1], default_value=PAD_ITEM_ID
- )
- if train:
- sample_weight = tf.cast(sequence != PAD_ITEM_ID, dtype="float32")
- else:
- sample_weight = sample_weight.to_tensor(
- shape=[batch_size, MAX_CONTEXT_LENGTH + 1], default_value=0
- )
-
- example = (
- {
- # last token does not have a next token
- "item_ids": sequence[..., :-1],
- # padding mask for controlling attention mask
- "padding_mask": (sequence != PAD_ITEM_ID)[..., :-1],
- },
- {
- "positive_sequence": sequence[
- ..., 1:
- ], # 0th token's label will be 1st token, and so on
- "negative_sequence": negative_sequence[..., 1:],
- },
- sample_weight[..., 1:], # loss will not be computed on pad tokens
- )
- return example
-
-
-def preprocess_train(examples):
- return _preprocess(examples, train=True)
-
-
-def preprocess_val(examples):
- return _preprocess(examples, train=False)
-
-
-train_ds = ds.map(preprocess_train)
-val_ds = ds.map(preprocess_val)
-```
-
-We can see a batch for each.
-
-
-```python
-for batch in train_ds.take(1):
- print(batch)
-
-for batch in val_ds.take(1):
- print(batch)
-
-```
-
-
----
-## Making predictions
-
-Now that we have a model, we would like to be able to make predictions.
-
-So far, we have only handled movies by id. Now is the time to create a mapping
-keyed by movie IDs to be able to surface the titles.
-
-
-```python
-movie_id_to_movie_title = dict(zip(movies_df["MovieID"], movies_df["Title"]))
-movie_id_to_movie_title[0] = "" # Because id 0 is not in the dataset.
-```
-
-We then simply use the Keras `model.predict()` method. Under the hood, it calls
-the `BruteForceRetrieval` layer to perform the actual retrieval.
-
-Note that this model can retrieve movies already watched by the user. We could
-easily add logic to remove them if that is desirable.
-
-
-```python
-for ele in val_ds.unbatch().take(1):
- test_sample = ele[0]
- test_sample["item_ids"] = tf.expand_dims(test_sample["item_ids"], axis=0)
- test_sample["padding_mask"] = tf.expand_dims(test_sample["padding_mask"], axis=0)
-
-movie_sequence = np.array(test_sample["item_ids"])[0]
-for movie_id in movie_sequence:
- if movie_id == 0:
- continue
- print(movie_id_to_movie_title[movie_id], end="; ")
-print()
-
-predictions = model.predict(test_sample)["predictions"]
-predictions = keras.ops.convert_to_numpy(predictions)
-
-for movie_id in predictions[0]:
- print(movie_id_to_movie_title[movie_id])
-```
-
-
-```
-Girl, Interrupted (1999); Back to the Future (1985); Titanic (1997); Cinderella (1950); Meet Joe Black (1998); Last Days of Disco, The (1998); Erin Brockovich (2000); Christmas Story, A (1983); To Kill a Mockingbird (1962); One Flew Over the Cuckoo's Nest (1975); Wallace & Gromit: The Best of Aardman Animation (1996); Star Wars: Episode IV - A New Hope (1977); Wizard of Oz, The (1939); Fargo (1996); Run Lola Run (Lola rennt) (1998); Rain Man (1988); Saving Private Ryan (1998); Awakenings (1990); Gigi (1958); Sound of Music, The (1965); Driving Miss Daisy (1989); Bambi (1942); Apollo 13 (1995); Mary Poppins (1964); E.T. the Extra-Terrestrial (1982); My Fair Lady (1964); Ben-Hur (1959); Big (1988); Sixth Sense, The (1999); Dead Poets Society (1989); James and the Giant Peach (1996); Ferris Bueller's Day Off (1986); Secret Garden, The (1993); Toy Story 2 (1999); Airplane! (1980); Pleasantville (1998); Dumbo (1941); Princess Bride, The (1987); Snow White and the Seven Dwarfs (1937); Miracle on 34th Street (1947); Ponette (1996); Schindler's List (1993); Beauty and the Beast (1991); Tarzan (1999); Close Shave, A (1995); Aladdin (1992); Toy Story (1995); Bug's Life, A (1998); Antz (1998); Hunchback of Notre Dame, The (1996); Hercules (1997); Mulan (1998); Pocahontas (1995);
-
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 790ms/step
-
-
-```
-
-```
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 791ms/step
-
-
-
-```
-Groundhog Day (1993)
-Aladdin (1992)
-Toy Story (1995)
-Forrest Gump (1994)
-Bug's Life, A (1998)
-Lion King, The (1994)
-Shakespeare in Love (1998)
-American Beauty (1999)
-Sixth Sense, The (1999)
-Ghostbusters (1984)
-
-```
-
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/scann.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/scann.py)
-
-
-
----
-## Introduction
-
-Retrieval models are designed to quickly identify a small set of highly relevant
-candidates from vast pools of data, often comprising millions or even hundreds
-of millions of items. To effectively respond to the user's context and behavior
-in real time, these models must perform this task in just milliseconds.
-
-Approximate nearest neighbor (ANN) search is the key technology that enables
-this level of efficiency. In this tutorial, we'll demonstrate how to leverage
-ScANN—a cutting-edge nearest neighbor retrieval library—to effortlessly scale
-retrieval for millions of items.
-
-[ScANN](https://research.google/blog/announcing-scann-efficient-vector-similarity-search/),
-developed by Google Research, is a high-performance library designed for
-dense vector similarity search at scale. It efficiently indexes a database of
-candidate embeddings, enabling rapid search during inference. By leveraging
-advanced vector compression techniques and finely tuned algorithms, ScaNN
-strikes an optimal balance between speed and accuracy. As a result, it can
-significantly outperform brute-force search methods, delivering fast retrieval
-with minimal loss in accuracy.
-
-We will start with the same code as the
-[basic retrieval example](/keras_rs/examples/basic_retrieval/).
-Data processing, model building, and training remain exactly the same. Feel free
-to skip this part if you have gone over the basic retrieval example before.
-
-Note: ScANN does not have its own separate layer in KerasRS because the ScANN
-library is TensorFlow-only. Here, in this example, we directly use the ScANN
-library and demonstrate its usage with KerasRS.
-
----
-## Imports
-
-Let's install the `scann` library and import all necessary packages. We will
-also set the backend to JAX.
-
-
-```python
-# ruff: noqa: E402
-```
-
-
-```python
-!pip install -q scann
-```
-
-
-```
-[?25l [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/11.8 MB [31m? eta [36m-:--:--
-```
-
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-
-
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-
-
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-
-
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/11.8 MB [31m2.8 MB/s eta [36m0:00:04
-
-
-[2K [91m━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/11.8 MB [31m2.8 MB/s eta [36m0:00:04
-[2K [91m━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/11.8 MB [31m2.8 MB/s eta [36m0:00:04
-[2K [91m━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/11.8 MB [31m2.8 MB/s eta [36m0:00:04
-[2K [91m━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/11.8 MB [31m2.8 MB/s eta [36m0:00:04
-[2K [91m━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/11.8 MB [31m2.8 MB/s eta [36m0:00:04
-[2K [91m━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/11.8 MB [31m2.8 MB/s eta [36m0:00:04
-
-
-[2K [91m━━━━━━━━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━ 4.2/11.8 MB [31m4.2 MB/s eta [36m0:00:02
-[2K [91m━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━ 5.6/11.8 MB [31m5.3 MB/s eta [36m0:00:02
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━━━━━ 9.4/11.8 MB [31m8.9 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━ 10.5/11.8 MB [31m9.3 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━ 10.5/11.8 MB [31m9.3 MB/s eta [36m0:00:01
-
-
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━ 10.5/11.8 MB [31m9.3 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━ 10.5/11.8 MB [31m9.3 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━ 10.5/11.8 MB [31m9.3 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━ 10.5/11.8 MB [31m9.3 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━ 10.5/11.8 MB [31m9.3 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸ 11.8/11.8 MB [31m17.3 MB/s eta [36m0:00:01
-[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.8/11.8 MB [31m16.4 MB/s eta [36m0:00:00
- [?25h
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import time
-import uuid
-
-import keras
-import tensorflow as tf # Needed for the dataset
-import tensorflow_datasets as tfds
-from scann import scann_ops
-
-import keras_rs
-```
-
----
-## Preparing the dataset
-
-
-```python
-# Ratings data with user and movie data.
-ratings = tfds.load("movielens/100k-ratings", split="train")
-# Features of all the available movies.
-movies = tfds.load("movielens/100k-movies", split="train")
-
-# Get user and movie counts so that we can define embedding layers for both.
-users_count = (
- ratings.map(lambda x: tf.strings.to_number(x["user_id"], out_type=tf.int32))
- .reduce(tf.constant(0, tf.int32), tf.maximum)
- .numpy()
-)
-
-movies_count = movies.cardinality().numpy()
-
-
-# Preprocess the dataset, by selecting only the relevant columns.
-def preprocess_rating(x):
- return (
- # Input is the user IDs
- tf.strings.to_number(x["user_id"], out_type=tf.int32),
- # Labels are movie IDs + ratings between 0 and 1.
- {
- "movie_id": tf.strings.to_number(x["movie_id"], out_type=tf.int32),
- "rating": (x["user_rating"] - 1.0) / 4.0,
- },
- )
-
-
-shuffled_ratings = ratings.map(preprocess_rating).shuffle(
- 100_000, seed=42, reshuffle_each_iteration=False
-)
-# Train-test split.
-train_ratings = shuffled_ratings.take(80_000).batch(1000).cache()
-test_ratings = shuffled_ratings.skip(80_000).take(20_000).batch(1000).cache()
-```
-
----
-## Implementing the Model
-
-
-```python
-
-class RetrievalModel(keras.Model):
- def __init__(
- self,
- num_users,
- num_candidates,
- embedding_dimension=32,
- **kwargs,
- ):
- super().__init__(**kwargs)
- # Our query tower, simply an embedding table.
- self.user_embedding = keras.layers.Embedding(num_users, embedding_dimension)
- # Our candidate tower, simply an embedding table.
- self.candidate_embedding = keras.layers.Embedding(
- num_candidates, embedding_dimension
- )
-
- self.loss_fn = keras.losses.MeanSquaredError()
-
- def build(self, input_shape):
- self.user_embedding.build(input_shape)
- self.candidate_embedding.build(input_shape)
-
- super().build(input_shape)
-
- def call(self, inputs, training=False):
- user_embeddings = self.user_embedding(inputs)
- result = {
- "user_embeddings": user_embeddings,
- }
- return result
-
- def compute_loss(self, x, y, y_pred, sample_weight, training=True):
- candidate_id, rating = y["movie_id"], y["rating"]
- user_embeddings = y_pred["user_embeddings"]
- candidate_embeddings = self.candidate_embedding(candidate_id)
-
- labels = keras.ops.expand_dims(rating, -1)
- # Compute the affinity score by multiplying the two embeddings.
- scores = keras.ops.sum(
- keras.ops.multiply(user_embeddings, candidate_embeddings),
- axis=1,
- keepdims=True,
- )
- return self.loss_fn(labels, scores, sample_weight)
-
-```
-
----
-## Training the model
-
-
-```python
-model = RetrievalModel(users_count + 1000, movies_count + 1000)
-model.compile(optimizer=keras.optimizers.Adagrad(learning_rate=0.1))
-
-history = model.fit(
- train_ratings, validation_data=test_ratings, validation_freq=5, epochs=50
-)
-```
-
-
- 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - loss: 0.4679 - val_loss: 0.4753
-
-
----
-## Making predictions
-
-Before we try out ScANN, let's go with the brute force method, i.e., for a given
-user, scores are computed for all movies, sorted and then the top-k
-movies are picked. This is, of course, not very scalable when we have a huge
-number of movies.
-
-
-```python
-candidate_embeddings = keras.ops.array(model.candidate_embedding.embeddings.numpy())
-# Artificially duplicate candidate embeddings to simulate a large number of
-# movies.
-candidate_embeddings = keras.ops.concatenate(
- [candidate_embeddings]
- + [
- candidate_embeddings
- * keras.random.uniform(keras.ops.shape(candidate_embeddings))
- for _ in range(100)
- ],
- axis=0,
-)
-
-user_embedding = model.user_embedding(keras.ops.array([10, 5, 42, 345]))
-
-# Define the brute force retrieval layer.
-brute_force_layer = keras_rs.layers.BruteForceRetrieval(
- candidate_embeddings=candidate_embeddings,
- k=10,
- return_scores=False,
-)
-```
-
-Now, let's do a forward pass on the layer. Note that in previous tutorials, we
-have the above layer as an attribute of the model class, and we then call
-`.predict()`. This will obviously be faster (since it's compiled XLA code), but
-since we cannot do the same for ScANN, we just do a normal forward pass here
-without compilation to ensure a fair comparison.
-
-
-```python
-t0 = time.time()
-pred_movie_ids = brute_force_layer(user_embedding)
-print("Time taken by brute force layer (sec):", time.time() - t0)
-```
-
-
-```
-Time taken by brute force layer (sec): 0.22817683219909668
-
-```
-
-Now, let's retrieve movies using ScANN. We will use the ScANN library from
-Google Research to build the layer and then call it. To fully understand all the
-arguments, please refer to the
-[ScANN README file](https://github.com/google-research/google-research/tree/master/scann#readme).
-
-
-```python
-
-def build_scann(
- candidates,
- k=10,
- distance_measure="dot_product",
- dimensions_per_block=2,
- num_reordering_candidates=500,
- num_leaves=100,
- num_leaves_to_search=30,
- training_iterations=12,
-):
- builder = scann_ops.builder(
- db=candidates,
- num_neighbors=k,
- distance_measure=distance_measure,
- )
-
- builder = builder.tree(
- num_leaves=num_leaves,
- num_leaves_to_search=num_leaves_to_search,
- training_iterations=training_iterations,
- )
- builder = builder.score_ah(dimensions_per_block=dimensions_per_block)
-
- if num_reordering_candidates is not None:
- builder = builder.reorder(num_reordering_candidates)
-
- # Set a unique name to prevent unintentional sharing between
- # ScaNN instances.
- searcher = builder.build(shared_name=str(uuid.uuid4()))
- return searcher
-
-
-def run_scann(searcher):
- pred_movie_ids = searcher.search_batched_parallel(
- user_embedding,
- final_num_neighbors=10,
- ).indices
- return pred_movie_ids
-
-
-searcher = build_scann(candidates=candidate_embeddings)
-
-t0 = time.time()
-pred_movie_ids = run_scann(searcher)
-print("Time taken by ScANN (sec):", time.time() - t0)
-```
-
-
-```
-Time taken by ScANN (sec): 0.0032587051391601562
-
-```
-
-You can clearly see the performance improvement in terms of latency. ScANN
-(0.003 seconds) takes one-fiftieth the time it takes for the brute force layer
-(0.15 seconds) to run!
-
diff --git a/templates/examples/keras_rs/sequential_retrieval.md b/templates/examples/keras_rs/sequential_retrieval.md
deleted file mode 100644
index 5341b55d85..0000000000
--- a/templates/examples/keras_rs/sequential_retrieval.md
+++ /dev/null
@@ -1,2334 +0,0 @@
-# Sequential retrieval [GRU4Rec]
-
-**Author:** [Abheesht Sharma](https://github.com/abheesht17/), [Fabien Hertschuh](https://github.com/hertschuh/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Recommend movies using a GRU-based sequential retrieval model.
-
-
-
ⓘ This example uses Keras 2
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/sequential_retrieval.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/sequential_retrieval.py)
-
-
-
----
-## Introduction
-
-In this example, we are going to build a sequential retrieval model. Sequential
-recommendation is a popular model that looks at a sequence of items that users
-have interacted with previously and then predicts the next item. Here, the order
-of the items within each sequence matters. So, we are going to use a recurrent
-neural network to model the sequential relationship. For more details,
-please refer to the [GRU4Rec](https://arxiv.org/abs/1511.06939) paper.
-
-Let's begin by choosing JAX as the backend we want to run on, and import all
-the necessary libraries.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import collections
-import os
-import random
-
-import keras
-import pandas as pd
-import tensorflow as tf # Needed only for the dataset
-
-import keras_rs
-```
-
-Let's also define all important variables/hyperparameters below.
-
-
-```python
-DATA_DIR = "./raw/data/"
-
-# MovieLens-specific variables
-MOVIELENS_1M_URL = "https://files.grouplens.org/datasets/movielens/ml-1m.zip"
-MOVIELENS_ZIP_HASH = "a6898adb50b9ca05aa231689da44c217cb524e7ebd39d264c56e2832f2c54e20"
-
-RATINGS_FILE_NAME = "ratings.dat"
-MOVIES_FILE_NAME = "movies.dat"
-
-# Data processing args
-MAX_CONTEXT_LENGTH = 10
-MIN_SEQUENCE_LENGTH = 3
-TRAIN_DATA_FRACTION = 0.9
-
-RATINGS_DATA_COLUMNS = ["UserID", "MovieID", "Rating", "Timestamp"]
-MOVIES_DATA_COLUMNS = ["MovieID", "Title", "Genres"]
-MIN_RATING = 2
-
-# Training/model args
-BATCH_SIZE = 4096
-TEST_BATCH_SIZE = 2048
-EMBEDDING_DIM = 32
-NUM_EPOCHS = 5
-LEARNING_RATE = 0.05
-```
-
----
-## Dataset
-
-Next, we need to prepare our dataset. Like we did in the
-[basic retrieval](/keras_rs/examples/basic_retrieval/)
-example, we are going to use the MovieLens dataset.
-
-The dataset preparation step is fairly involved. The original ratings dataset
-contains `(user, movie ID, rating, timestamp)` tuples (among other columns,
-which are not important for this example). Since we are dealing with sequential
-retrieval, we need to create movie sequences for every user, where the sequences
-are ordered by timestamp.
-
-Let's start by downloading and reading the dataset.
-
-
-```python
-# Download the MovieLens dataset.
-if not os.path.exists(DATA_DIR):
- os.makedirs(DATA_DIR)
-
-path_to_zip = keras.utils.get_file(
- fname="ml-1m.zip",
- origin=MOVIELENS_1M_URL,
- file_hash=MOVIELENS_ZIP_HASH,
- hash_algorithm="sha256",
- extract=True,
- cache_dir=DATA_DIR,
-)
-movielens_extracted_dir = os.path.join(
- os.path.dirname(path_to_zip),
- "ml-1m_extracted",
- "ml-1m",
-)
-
-
-# Read the dataset.
-def read_data(data_directory, min_rating=None):
- """Read movielens ratings.dat and movies.dat file
- into dataframe.
- """
-
- ratings_df = pd.read_csv(
- os.path.join(data_directory, RATINGS_FILE_NAME),
- sep="::",
- names=RATINGS_DATA_COLUMNS,
- encoding="unicode_escape",
- )
- ratings_df["Timestamp"] = ratings_df["Timestamp"].apply(int)
-
- # Remove movies with `rating < min_rating`.
- if min_rating is not None:
- ratings_df = ratings_df[ratings_df["Rating"] >= min_rating]
-
- movies_df = pd.read_csv(
- os.path.join(data_directory, MOVIES_FILE_NAME),
- sep="::",
- names=MOVIES_DATA_COLUMNS,
- encoding="unicode_escape",
- )
- return ratings_df, movies_df
-
-
-ratings_df, movies_df = read_data(
- data_directory=movielens_extracted_dir, min_rating=MIN_RATING
-)
-
-# Need to know #movies so as to define embedding layers.
-movies_count = movies_df["MovieID"].max()
-```
-
-
-```
-Downloading data from https://files.grouplens.org/datasets/movielens/ml-1m.zip
-
-```
-
-```
-:26: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
- ratings_df = pd.read_csv(
-
-:38: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
- movies_df = pd.read_csv(
-
-```
-
-Now that we have read the dataset, let's create sequences of movies
-for every user. Here is the function for doing just that.
-
-
-```python
-
-def get_movie_sequence_per_user(ratings_df):
- """Get movieID sequences for every user."""
- sequences = collections.defaultdict(list)
-
- for user_id, movie_id, rating, timestamp in ratings_df.values:
- sequences[user_id].append(
- {
- "movie_id": movie_id,
- "timestamp": timestamp,
- "rating": rating,
- }
- )
-
- # Sort movie sequences by timestamp for every user.
- for user_id, context in sequences.items():
- context.sort(key=lambda x: x["timestamp"])
- sequences[user_id] = context
-
- return sequences
-
-```
-
-We need to do some filtering and processing before we proceed
-with training the model:
-
-1. Form sequences of all lengths up to
- `min(user_sequence_length, MAX_CONTEXT_LENGTH)`. So, every user
- will have multiple sequences corresponding to it.
-2. Get labels, i.e., Given a sequence of length `n`, the first
- `n-1` tokens will be fed to the model as input, and the label
- will be the last token.
-3. Remove all user sequences with less than `MIN_SEQUENCE_LENGTH`
- movies.
-4. Pad all sequences to `MAX_CONTEXT_LENGTH`.
-
-
-```python
-
-def generate_examples_from_user_sequences(sequences):
- """Generates sequences for all users, with padding, truncation, etc."""
-
- def generate_examples_from_user_sequence(sequence):
- """Generates examples for a single user sequence."""
-
- examples = []
- for label_idx in range(1, len(sequence)):
- start_idx = max(0, label_idx - MAX_CONTEXT_LENGTH)
- context = sequence[start_idx:label_idx]
-
- # Padding
- while len(context) < MAX_CONTEXT_LENGTH:
- context.append(
- {
- "movie_id": 0,
- "timestamp": 0,
- "rating": 0.0,
- }
- )
-
- label_movie_id = int(sequence[label_idx]["movie_id"])
- context_movie_id = [int(movie["movie_id"]) for movie in context]
-
- examples.append(
- {
- "context_movie_id": context_movie_id,
- "label_movie_id": label_movie_id,
- },
- )
- return examples
-
- all_examples = []
- for sequence in sequences.values():
- if len(sequence) < MIN_SEQUENCE_LENGTH:
- continue
-
- user_examples = generate_examples_from_user_sequence(sequence)
-
- all_examples.extend(user_examples)
-
- return all_examples
-
-```
-
-Let's split the dataset into train and test sets. Also, we need to
-change the format of the dataset dictionary so as to enable conversion
-to a `tf.data.Dataset` object.
-
-
-```python
-sequences = get_movie_sequence_per_user(ratings_df)
-examples = generate_examples_from_user_sequences(sequences)
-
-# Train-test split.
-random.shuffle(examples)
-split_index = int(TRAIN_DATA_FRACTION * len(examples))
-train_examples = examples[:split_index]
-test_examples = examples[split_index:]
-
-
-def list_of_dicts_to_dict_of_lists(list_of_dicts):
- """Convert list of dictionaries to dictionary of lists for
- `tf.data` conversion.
- """
- dict_of_lists = collections.defaultdict(list)
- for dictionary in list_of_dicts:
- for key, value in dictionary.items():
- dict_of_lists[key].append(value)
- return dict_of_lists
-
-
-train_examples = list_of_dicts_to_dict_of_lists(train_examples)
-test_examples = list_of_dicts_to_dict_of_lists(test_examples)
-
-train_ds = tf.data.Dataset.from_tensor_slices(train_examples).map(
- lambda x: (x["context_movie_id"], x["label_movie_id"])
-)
-test_ds = tf.data.Dataset.from_tensor_slices(test_examples).map(
- lambda x: (x["context_movie_id"], x["label_movie_id"])
-)
-```
-
-We need to batch our datasets. We also user `cache()` and `prefetch()`
-for better performance.
-
-
-```python
-train_ds = train_ds.batch(BATCH_SIZE).cache().prefetch(tf.data.AUTOTUNE)
-test_ds = test_ds.batch(TEST_BATCH_SIZE).cache().prefetch(tf.data.AUTOTUNE)
-```
-
-Let's print out one batch.
-
-
-```python
-for sample in train_ds.take(1):
- print(sample)
-```
-
-
-```
-(, )
-
-```
-
----
-## Model and Training
-
-In the basic retrieval example, we used one query tower for the
-user, and the candidate tower for the candidate movie. We are
-going to use a two-tower architecture here as well. However,
-we use the query tower with a Gated Recurrent Unit (GRU) layer
-to encode the sequence of historical movies, and keep the same
-candidate tower for the candidate movie.
-
-Note: Take a look at how the labels are defined. The label tensor
-(of shape `(batch_size, batch_size)`) contains one-hot vectors. The idea
-is: for every sample, consider movie IDs corresponding to other samples in
-the batch as negatives.
-
-
-```python
-
-class SequentialRetrievalModel(keras.Model):
- """Create the sequential retrieval model.
-
- Args:
- movies_count: Total number of unique movies in the dataset.
- embedding_dimension: Output dimension for movie embedding tables.
- """
-
- def __init__(
- self,
- movies_count,
- embedding_dimension=128,
- **kwargs,
- ):
- super().__init__(**kwargs)
- # Our query tower, simply an embedding table followed by
- # a GRU unit. This encodes sequence of historical movies.
- self.query_model = keras.Sequential(
- [
- keras.layers.Embedding(movies_count + 1, embedding_dimension),
- keras.layers.GRU(embedding_dimension),
- ]
- )
-
- # Our candidate tower, simply an embedding table.
- self.candidate_model = keras.layers.Embedding(
- movies_count + 1, embedding_dimension
- )
-
- # The layer that performs the retrieval.
- self.retrieval = keras_rs.layers.BruteForceRetrieval(k=10, return_scores=False)
- self.loss_fn = keras.losses.CategoricalCrossentropy(
- from_logits=True,
- )
-
- def build(self, input_shape):
- self.query_model.build(input_shape)
- self.candidate_model.build(input_shape)
-
- # In this case, the candidates are directly the movie embeddings.
- # We take a shortcut and directly reuse the variable.
- self.retrieval.candidate_embeddings = self.candidate_model.embeddings
- self.retrieval.build(input_shape)
- super().build(input_shape)
-
- def call(self, inputs, training=False):
- query_embeddings = self.query_model(inputs)
- result = {
- "query_embeddings": query_embeddings,
- }
-
- if not training:
- # Skip the retrieval of top movies during training as the
- # predictions are not used.
- result["predictions"] = self.retrieval(query_embeddings)
- return result
-
- def compute_loss(self, x, y, y_pred, sample_weight, training=True):
- candidate_id = y
- query_embeddings = y_pred["query_embeddings"]
- candidate_embeddings = self.candidate_model(candidate_id)
-
- num_queries = keras.ops.shape(query_embeddings)[0]
- num_candidates = keras.ops.shape(candidate_embeddings)[0]
-
- # One-hot vectors for labels.
- labels = keras.ops.eye(num_queries, num_candidates)
-
- # Compute the affinity score by multiplying the two embeddings.
- scores = keras.ops.matmul(
- query_embeddings, keras.ops.transpose(candidate_embeddings)
- )
-
- return self.loss_fn(labels, scores, sample_weight)
-
-```
-
-Let's instantiate, compile and train our model.
-
-
-```python
-model = SequentialRetrievalModel(
- movies_count=movies_count + 1, embedding_dimension=EMBEDDING_DIM
-)
-
-# Compile.
-model.compile(optimizer=keras.optimizers.AdamW(learning_rate=LEARNING_RATE))
-
-# Train.
-model.fit(
- train_ds,
- validation_data=test_ds,
- epochs=NUM_EPOCHS,
-)
-```
-
-
----
-## Making predictions
-
-Now that we have a model, we would like to be able to make predictions.
-
-So far, we have only handled movies by id. Now is the time to create a mapping
-keyed by movie IDs to be able to surface the titles.
-
-
-```python
-movie_id_to_movie_title = dict(zip(movies_df["MovieID"], movies_df["Title"]))
-movie_id_to_movie_title[0] = "" # Because id 0 is not in the dataset.
-```
-
-We then simply use the Keras `model.predict()` method. Under the hood, it calls
-the `BruteForceRetrieval` layer to perform the actual retrieval.
-
-Note that this model can retrieve movies already watched by the user. We could
-easily add logic to remove them if that is desirable.
-
-
-```python
-print("\n==> Movies the user has watched:")
-movie_sequence = test_ds.unbatch().take(1)
-for element in movie_sequence:
- for movie_id in element[0][:-1]:
- print(movie_id_to_movie_title[movie_id.numpy()], end=", ")
- print(movie_id_to_movie_title[element[0][-1].numpy()])
-
-predictions = model.predict(movie_sequence.batch(1))
-predictions = keras.ops.convert_to_numpy(predictions["predictions"])
-
-print("\n==> Recommended movies for the above sequence:")
-for movie_id in predictions[0]:
- print(movie_id_to_movie_title[movie_id])
-```
-
-
-
-```
-==> Movies the user has watched:
-10 Things I Hate About You (1999), American Beauty (1999), Bachelor, The (1999), Austin Powers: The Spy Who Shagged Me (1999), Arachnophobia (1990), Big Daddy (1999), Bone Collector, The (1999), Bug's Life, A (1998), Bowfinger (1999), Dead Calm (1989)
-
-```
-
-
-
-```
- 1/Unknown 0s 300ms/step
-
-
-```
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 302ms/step
-
-
-
-
-```
-==> Recommended movies for the above sequence:
-Creepshow (1982)
-Bringing Out the Dead (1999)
-Civil Action, A (1998)
-Doors, The (1991)
-Cruel Intentions (1999)
-Brokedown Palace (1999)
-Dead Calm (1989)
-Condorman (1981)
-Clan of the Cave Bear, The (1986)
-Clerks (1994)
-
-/usr/local/lib/python3.11/dist-packages/keras/src/trainers/epoch_iterator.py:151: UserWarning: Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches. You may need to use the `.repeat()` function when building your dataset.
- self._interrupted_warning()
-
-```
-
diff --git a/templates/keras_rs/examples/basic_ranking.md b/templates/keras_rs/examples/basic_ranking.md
deleted file mode 100644
index 87c557733b..0000000000
--- a/templates/keras_rs/examples/basic_ranking.md
+++ /dev/null
@@ -1,613 +0,0 @@
-# Recommending movies: ranking
-
-**Author:** [Fabien Hertschuh](https://github.com/hertschuh/), [Abheesht Sharma](https://github.com/abheesht17/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Rank movies using a two tower model.
-
-
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/basic_ranking.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/basic_ranking.py)
-
-
-
----
-## Introduction
-
-Recommender systems are often composed of two stages:
-
-1. The retrieval stage is responsible for selecting an initial set of hundreds
- of candidates from all possible candidates. The main objective of this model
- is to efficiently weed out all candidates that the user is not interested in.
- Because the retrieval model may be dealing with millions of candidates, it
- has to be computationally efficient.
-2. The ranking stage takes the outputs of the retrieval model and fine-tunes
- them to select the best possible handful of recommendations. Its task is to
- narrow down the set of items the user may be interested in to a shortlist of
- likely candidates.
-
-In this tutorial, we're going to focus on the second stage, ranking. If you are
-interested in the retrieval stage, have a look at our
-[retrieval](/keras_rs/examples/basic_retrieval/)
-tutorial.
-
-In this tutorial, we're going to:
-
-1. Get our data and split it into a training and test set.
-2. Implement a ranking model.
-3. Fit and evaluate it.
-4. Test running predictions with the model.
-
-Let's begin by choosing JAX as the backend we want to run on, and import all
-the necessary libraries.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import keras
-import tensorflow as tf # Needed for the dataset
-import tensorflow_datasets as tfds
-```
-
----
-## Preparing the dataset
-
-We're going to use the same data as the
-[retrieval](/keras_rs/examples/basic_retrieval/)
-tutorial. The ratings are the objectives we are trying to predict.
-
-
-```python
-# Ratings data.
-ratings = tfds.load("movielens/100k-ratings", split="train")
-# Features of all the available movies.
-movies = tfds.load("movielens/100k-movies", split="train")
-```
-
-
-In the Movielens dataset, user IDs are integers (represented as strings)
-starting at 1 and with no gap. Normally, you would need to create a lookup table
-to map user IDs to integers from 0 to N-1. But as a simplication, we'll use the
-user id directly as an index in our model, in particular to lookup the user
-embedding from the user embedding table. So we need do know the number of users.
-
-
-```python
-users_count = (
- ratings.map(lambda x: tf.strings.to_number(x["user_id"], out_type=tf.int32))
- .reduce(tf.constant(0, tf.int32), tf.maximum)
- .numpy()
-)
-```
-
-In the Movielens dataset, movie IDs are integers (represented as strings)
-starting at 1 and with no gap. Normally, you would need to create a lookup table
-to map movie IDs to integers from 0 to N-1. But as a simplication, we'll use the
-movie id directly as an index in our model, in particular to lookup the movie
-embedding from the movie embedding table. So we need do know the number of
-movies.
-
-
-```python
-movies_count = movies.cardinality().numpy()
-```
-
-The inputs to the model are the user IDs and movie IDs and the labels are the
-ratings.
-
-
-```python
-
-def preprocess_rating(x):
- return (
- # Inputs are user IDs and movie IDs
- {
- "user_id": tf.strings.to_number(x["user_id"], out_type=tf.int32),
- "movie_id": tf.strings.to_number(x["movie_id"], out_type=tf.int32),
- },
- # Labels are ratings between 0 and 1.
- (x["user_rating"] - 1.0) / 4.0,
- )
-
-```
-
-We'll split the data by putting 80% of the ratings in the train set, and 20% in
-the test set.
-
-
-```python
-shuffled_ratings = ratings.map(preprocess_rating).shuffle(
- 100_000, seed=42, reshuffle_each_iteration=False
-)
-train_ratings = shuffled_ratings.take(80_000).batch(1000).cache()
-test_ratings = shuffled_ratings.skip(80_000).take(20_000).batch(1000).cache()
-```
-
----
-## Implementing the Model
-
-### Architecture
-
-Ranking models do not face the same efficiency constraints as retrieval models
-do, and so we have a little bit more freedom in our choice of architectures.
-
-A model composed of multiple stacked dense layers is a relatively common
-architecture for ranking tasks. We can implement it as follows:
-
-
-```python
-
-class RankingModel(keras.Model):
- """Create the ranking model with the provided parameters.
-
- Args:
- num_users: Number of entries in the user embedding table.
- num_candidates: Number of entries in the candidate embedding table.
- embedding_dimension: Output dimension for user and movie embedding tables.
- """
-
- def __init__(
- self,
- num_users,
- num_candidates,
- embedding_dimension=32,
- **kwargs,
- ):
- super().__init__(**kwargs)
- # Embedding table for users.
- self.user_embedding = keras.layers.Embedding(num_users, embedding_dimension)
- # Embedding table for candidates.
- self.candidate_embedding = keras.layers.Embedding(
- num_candidates, embedding_dimension
- )
- # Predictions.
- self.ratings = keras.Sequential(
- [
- # Learn multiple dense layers.
- keras.layers.Dense(256, activation="relu"),
- keras.layers.Dense(64, activation="relu"),
- # Make rating predictions in the final layer.
- keras.layers.Dense(1),
- ]
- )
-
- def call(self, inputs):
- user_id, movie_id = inputs["user_id"], inputs["movie_id"]
- user_embeddings = self.user_embedding(user_id)
- candidate_embeddings = self.candidate_embedding(movie_id)
- return self.ratings(
- keras.ops.concatenate([user_embeddings, candidate_embeddings], axis=1)
- )
-
-```
-
-Let's first instantiate the model. Note that we add `+ 1` to the number of users
-and movies to account for the fact that id zero is not used for either (IDs
-start at 1), but still takes a row in the embedding tables.
-
-
-```python
-model = RankingModel(users_count + 1, movies_count + 1)
-```
-
-### Loss and metrics
-
-The next component is the loss used to train our model. Keras has several losses
-to make this easy. In this instance, we'll make use of the `MeanSquaredError`
-loss in order to predict the ratings. We'll also look at the
-`RootMeanSquaredError` metric.
-
-
-```python
-model.compile(
- loss=keras.losses.MeanSquaredError(),
- metrics=[keras.metrics.RootMeanSquaredError()],
- optimizer=keras.optimizers.Adagrad(learning_rate=0.1),
-)
-```
-
----
-## Fitting and evaluating
-
-After defining the model, we can use the standard Keras `model.fit()` to train
-the model.
-
-
-```python
-model.fit(train_ratings, epochs=5)
-```
-
-
-As the model trains, the loss is falling and the RMSE metric is improving.
-
-Finally, we can evaluate our model on the test set. The lower the RMSE metric,
-the more accurate our model is at predicting ratings.
-
-
-```python
-model.evaluate(test_ratings, return_dict=True)
-```
-
-
- 1/20 ━[37m━━━━━━━━━━━━━━━━━━━ 36s 2s/step - loss: 0.0732 - root_mean_squared_error: 0.2705
-
-
----
-## Testing the ranking model
-
-So far, we have only handled movies by id. Now is the time to create a mapping
-keyed by movie IDs to be able to surface the titles.
-
-
-```python
-movie_id_to_movie_title = {
- int(x["movie_id"]): x["movie_title"] for x in movies.as_numpy_iterator()
-}
-movie_id_to_movie_title[0] = "" # Because id 0 is not in the dataset.
-```
-
-Now we can test the ranking model by computing predictions for a set of movies
-and then rank these movies based on the predictions:
-
-
-```python
-user_id = 42
-movie_ids = [204, 141, 131]
-predictions = model.predict(
- {
- "user_id": keras.ops.array([user_id] * len(movie_ids)),
- "movie_id": keras.ops.array(movie_ids),
- }
-)
-predictions = keras.ops.convert_to_numpy(keras.ops.squeeze(predictions, axis=1))
-
-for movie_id, prediction in zip(movie_ids, predictions):
- print(f"{movie_id_to_movie_title[movie_id]}: {5.0 * prediction:,.2f}")
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 271ms/step
-
-
-```
-
-```
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 273ms/step
-
-
-
-```
-b'Back to the Future (1985)': 3.86
-b'20,000 Leagues Under the Sea (1954)': 3.93
-b"Breakfast at Tiffany's (1961)": 3.72
-
-```
-
\ No newline at end of file
diff --git a/templates/keras_rs/examples/basic_retrieval.md b/templates/keras_rs/examples/basic_retrieval.md
deleted file mode 100644
index 06eb46818c..0000000000
--- a/templates/keras_rs/examples/basic_retrieval.md
+++ /dev/null
@@ -1,2168 +0,0 @@
-# Recommending movies: retrieval
-
-**Author:** [Fabien Hertschuh](https://github.com/hertschuh/), [Abheesht Sharma](https://github.com/abheesht17/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Retrieve movies using a two tower model.
-
-
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/basic_retrieval.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/basic_retrieval.py)
-
-
-
----
-## Introduction
-
-Recommender systems are often composed of two stages:
-
-1. The retrieval stage is responsible for selecting an initial set of hundreds
- of candidates from all possible candidates. The main objective of this model
- is to efficiently weed out all candidates that the user is not interested in.
- Because the retrieval model may be dealing with millions of candidates, it
- has to be computationally efficient.
-2. The ranking stage takes the outputs of the retrieval model and fine-tunes
- them to select the best possible handful of recommendations. Its task is to
- narrow down the set of items the user may be interested in to a shortlist of
- likely candidates.
-
-In this tutorial, we're going to focus on the first stage, retrieval. If you are
-interested in the ranking stage, have a look at our
-[ranking](/keras_rs/examples/basic_ranking/) tutorial.
-
-Retrieval models are often composed of two sub-models:
-
-1. A query tower computing the query representation (normally a
- fixed-dimensionality embedding vector) using query features.
-2. A candidate tower computing the candidate representation (an equally-sized
- vector) using the candidate features. The outputs of the two models are then
- multiplied together to give a query-candidate affinity score, with higher
- scores expressing a better match between the candidate and the query.
-
-In this tutorial, we're going to build and train such a two-tower model using
-the Movielens dataset.
-
-We're going to:
-
-1. Get our data and split it into a training and test set.
-2. Implement a retrieval model.
-3. Fit and evaluate it.
-4. Test running predictions with the model.
-
-### The dataset
-
-The Movielens dataset is a classic dataset from the
-[GroupLens](https://grouplens.org/datasets/movielens/) research group at the
-University of Minnesota. It contains a set of ratings given to movies by a set
-of users, and is a standard for recommender systems research.
-
-The data can be treated in two ways:
-
-1. It can be interpreted as expressesing which movies the users watched (and
- rated), and which they did not. This is a form of implicit feedback, where
- users' watches tell us which things they prefer to see and which they'd
- rather not see.
-2. It can also be seen as expressesing how much the users liked the movies they
- did watch. This is a form of explicit feedback: given that a user watched a
- movie, we can tell how much they liked by looking at the rating they have
- given.
-
-In this tutorial, we are focusing on a retrieval system: a model that predicts a
-set of movies from the catalogue that the user is likely to watch. For this, the
-model will try to predict the rating users would give to all the movies in the
-catalogue. We will therefore use the explicit rating data.
-
-Let's begin by choosing JAX as the backend we want to run on, and import all
-the necessary libraries.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import keras
-import tensorflow as tf # Needed for the dataset
-import tensorflow_datasets as tfds
-
-import keras_rs
-```
-
----
-## Preparing the dataset
-
-Let's first have a look at the data.
-
-We use the MovieLens dataset from
-[Tensorflow Datasets](https://www.tensorflow.org/datasets). Loading
-`movielens/100k_ratings` yields a `tf.data.Dataset` object containing the
-ratings alongside user and movie data. Loading `movielens/100k_movies` yields a
-`tf.data.Dataset` object containing only the movies data.
-
-Note that since the MovieLens dataset does not have predefined splits, all data
-are under `train` split.
-
-
-```python
-# Ratings data with user and movie data.
-ratings = tfds.load("movielens/100k-ratings", split="train")
-# Features of all the available movies.
-movies = tfds.load("movielens/100k-movies", split="train")
-```
-
-The ratings dataset returns a dictionary of movie id, user id, the assigned
-rating, timestamp, movie information, and user information:
-
-
-```python
-for data in ratings.take(1).as_numpy_iterator():
- print(str(data).replace(", '", ",\n '"))
-```
-
-
-In the Movielens dataset, user IDs are integers (represented as strings)
-starting at 1 and with no gap. Normally, you would need to create a lookup table
-to map user IDs to integers from 0 to N-1. But as a simplication, we'll use the
-user id directly as an index in our model, in particular to lookup the user
-embedding from the user embedding table. So we need do know the number of users.
-
-
-```python
-users_count = (
- ratings.map(lambda x: tf.strings.to_number(x["user_id"], out_type=tf.int32))
- .reduce(tf.constant(0, tf.int32), tf.maximum)
- .numpy()
-)
-```
-
-The movies dataset contains the movie id, movie title, and the genres it belongs
-to. Note that the genres are encoded with integer labels.
-
-
-```python
-for data in movies.take(1).as_numpy_iterator():
- print(str(data).replace(", '", ",\n '"))
-```
-
-
-In the Movielens dataset, movie IDs are integers (represented as strings)
-starting at 1 and with no gap. Normally, you would need to create a lookup table
-to map movie IDs to integers from 0 to N-1. But as a simplication, we'll use the
-movie id directly as an index in our model, in particular to lookup the movie
-embedding from the movie embedding table. So we need do know the number of
-movies.
-
-
-```python
-movies_count = movies.cardinality().numpy()
-```
-
-In this example, we're going to focus on the ratings data. Other tutorials
-explore how to use the movie information data as well as the user information to
-improve the model quality.
-
-We keep only the `user_id`, `movie_id` and `rating` fields in the dataset. Our
-input is the `user_id`. The labels are the `movie_id` alongside the `rating` for
-the given movie and user.
-
-The `rating` is a number between 1 and 5, we adapt it to be between 0 and 1.
-
-
-```python
-
-def preprocess_rating(x):
- return (
- # Input is the user IDs
- tf.strings.to_number(x["user_id"], out_type=tf.int32),
- # Labels are movie IDs + ratings between 0 and 1.
- {
- "movie_id": tf.strings.to_number(x["movie_id"], out_type=tf.int32),
- "rating": (x["user_rating"] - 1.0) / 4.0,
- },
- )
-
-```
-
-To fit and evaluate the model, we need to split it into a training and
-evaluation set. In a real recommender system, this would most likely be done by
-time: the data up to time *T* would be used to predict interactions after *T*.
-
-In this simple example, however, let's use a random split, putting 80% of the
-ratings in the train set, and 20% in the test set.
-
-
-```python
-shuffled_ratings = ratings.map(preprocess_rating).shuffle(
- 100_000, seed=42, reshuffle_each_iteration=False
-)
-train_ratings = shuffled_ratings.take(80_000).batch(1000).cache()
-test_ratings = shuffled_ratings.skip(80_000).take(20_000).batch(1000).cache()
-```
-
----
-## Implementing the Model
-
-Choosing the architecture of our model is a key part of modelling.
-
-We are building a two-tower retrieval model, therefore we need to combine a
-query tower for users and a candidate tower for movies.
-
-The first step is to decide on the dimensionality of the query and candidate
-representations. This is the `embedding_dimension` argument in our model
-constructor. We'll test with a value of `32`. Higher values will correspond to
-models that may be more accurate, but will also be slower to fit and more prone
-to overfitting.
-
-### Query and Candidate Towers
-
-The second step is to define the model itself. In this simple example, the query
-tower and candidate tower are simply embeddings with nothing else. We'll use
-Keras' `Embedding` layer.
-
-We can easily extend the towers to make them arbitrarily complex using standard
-Keras components, as long as we return an `embedding_dimension`-wide output at
-the end.
-
-### Retrieval
-
-The retrieval itself will be performed by `BruteForceRetrieval` layer from Keras
-Recommenders. This layer computes the affinity scores for the given users and
-all the candidate movies, then returns the top K in order.
-
-Note that during training, we don't actually need to perform any retrieval since
-the only affinity scores we need are the ones for the users and movies in the
-batch. As an optimization, we skip the retrieval entirely in the `call` method.
-
-### Loss
-
-The next component is the loss used to train our model. In this case, we use a
-mean square error loss to measure the difference between the predicted movie
-ratings and the actual ratins from users.
-
-Note that we override `compute_loss` from the `keras.Model` class. This allows
-us to compute the query-candidate affinity score, which is obtained by
-multiplying the outputs of the two towers together. That affinity score can then
-be passed to the loss function.
-
-
-```python
-
-class RetrievalModel(keras.Model):
- """Create the retrieval model with the provided parameters.
-
- Args:
- num_users: Number of entries in the user embedding table.
- num_candidates: Number of entries in the candidate embedding table.
- embedding_dimension: Output dimension for user and movie embedding tables.
- """
-
- def __init__(
- self,
- num_users,
- num_candidates,
- embedding_dimension=32,
- **kwargs,
- ):
- super().__init__(**kwargs)
- # Our query tower, simply an embedding table.
- self.user_embedding = keras.layers.Embedding(num_users, embedding_dimension)
- # Our candidate tower, simply an embedding table.
- self.candidate_embedding = keras.layers.Embedding(
- num_candidates, embedding_dimension
- )
- # The layer that performs the retrieval.
- self.retrieval = keras_rs.layers.BruteForceRetrieval(k=10, return_scores=False)
- self.loss_fn = keras.losses.MeanSquaredError()
-
- def build(self, input_shape):
- self.user_embedding.build(input_shape)
- self.candidate_embedding.build(input_shape)
- # In this case, the candidates are directly the movie embeddings.
- # We take a shortcut and directly reuse the variable.
- self.retrieval.candidate_embeddings = self.candidate_embedding.embeddings
- self.retrieval.build(input_shape)
- super().build(input_shape)
-
- def call(self, inputs, training=False):
- user_embeddings = self.user_embedding(inputs)
- result = {
- "user_embeddings": user_embeddings,
- }
- if not training:
- # Skip the retrieval of top movies during training as the
- # predictions are not used.
- result["predictions"] = self.retrieval(user_embeddings)
- return result
-
- def compute_loss(self, x, y, y_pred, sample_weight, training=True):
- candidate_id, rating = y["movie_id"], y["rating"]
- user_embeddings = y_pred["user_embeddings"]
- candidate_embeddings = self.candidate_embedding(candidate_id)
-
- labels = keras.ops.expand_dims(rating, -1)
- # Compute the affinity score by multiplying the two embeddings.
- scores = keras.ops.sum(
- keras.ops.multiply(user_embeddings, candidate_embeddings),
- axis=1,
- keepdims=True,
- )
- return self.loss_fn(labels, scores, sample_weight)
-
-```
-
----
-## Fitting and evaluating
-
-After defining the model, we can use the standard Keras `model.fit()` to train
-and evaluate the model.
-
-Let's first instantiate the model. Note that we add `+ 1` to the number of users
-and movies to account for the fact that id zero is not used for either (IDs
-start at 1), but still takes a row in the embedding tables.
-
-
-```python
-model = RetrievalModel(users_count + 1, movies_count + 1)
-model.compile(optimizer=keras.optimizers.Adagrad(learning_rate=0.1))
-```
-
-Then train the model. Evaluation takes a bit of time, so we only evaluate the
-model every 5 epochs.
-
-
-```python
-history = model.fit(
- train_ratings, validation_data=test_ratings, validation_freq=5, epochs=50
-)
-```
-
-
- 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - loss: 0.4667 - val_loss: 0.4739
-
-
----
-## Making predictions
-
-Now that we have a model, we would like to be able to make predictions.
-
-So far, we have only handled movies by id. Now is the time to create a mapping
-keyed by movie IDs to be able to surface the titles.
-
-
-```python
-movie_id_to_movie_title = {
- int(x["movie_id"]): x["movie_title"] for x in movies.as_numpy_iterator()
-}
-movie_id_to_movie_title[0] = "" # Because id 0 is not in the dataset.
-```
-
-We then simply use the Keras `model.predict()` method. Under the hood, it calls
-the `BruteForceRetrieval` layer to perform the actual retrieval.
-
-Note that this model can retrieve movies already watched by the user. We could
-easily add logic to remove them if that is desirable.
-
-
-```python
-user_id = 42
-predictions = model.predict(keras.ops.convert_to_tensor([user_id]))
-predictions = keras.ops.convert_to_numpy(predictions["predictions"])
-
-print(f"Recommended movies for user {user_id}:")
-for movie_id in predictions[0]:
- print(movie_id_to_movie_title[movie_id])
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step
-
-
-```
-
-```
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step
-
-
-
-```
-Recommended movies for user 42:
-b'Raiders of the Lost Ark (1981)'
-b'Godfather, The (1972)'
-b'Star Trek: The Wrath of Khan (1982)'
-b'Indiana Jones and the Last Crusade (1989)'
-b'Birdcage, The (1996)'
-b'Silence of the Lambs, The (1991)'
-b'Blade Runner (1982)'
-b'Aliens (1986)'
-b'Contact (1997)'
-b'Star Wars (1977)'
-
-```
-
----
-## Item-to-item recommendation
-
-In this model, we created a user-movie model. However, for some applications
-(for example, product detail pages) it's common to perform item-to-item (for
-example, movie-to-movie or product-to-product) recommendations.
-
-Training models like this would follow the same pattern as shown in this
-tutorial, but with different training data. Here, we had a user and a movie
-tower, and used (user, movie) pairs to train them. In an item-to-item model, we
-would have two item towers (for the query and candidate item), and train the
-model using (query item, candidate item) pairs. These could be constructed from
-clicks on product detail pages.
diff --git a/templates/keras_rs/examples/data_parallel_retrieval.md b/templates/keras_rs/examples/data_parallel_retrieval.md
deleted file mode 100644
index f7fe3a7df7..0000000000
--- a/templates/keras_rs/examples/data_parallel_retrieval.md
+++ /dev/null
@@ -1,4220 +0,0 @@
-# Retrieval with data parallel training
-
-**Author:** [Abheesht Sharma](https://github.com/abheesht17/), [Fabien Hertschuh](https://github.com/hertschuh/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Retrieve movies using a two tower model (data parallel training).
-
-
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/data_parallel_retrieval.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/data_parallel_retrieval.py)
-
-
-
----
-## Introduction
-
-In this tutorial, we are going to train the exact same retrieval model as we
-did in our
-[basic retrieval](/keras_rs/examples/basic_retrieval/)
-tutorial, but in a distributed way.
-
-Distributed training is used to train models on multiple devices or machines
-simultaneously, thereby reducing training time. Here, we focus on synchronous
-data parallel training. Each accelerator (GPU/TPU) holds a complete replica
-of the model, and sees a different mini-batch of the input data. Local gradients
-are computed on each device, aggregated and used to compute a global gradient
-update.
-
-Before we begin, let's note down a few things:
-
-1. The number of accelerators should be greater than 1.
-2. The `keras.distribution` API works only with JAX. So, make sure you select
- JAX as your backend!
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax"
-
-import random
-
-import jax
-import keras
-import tensorflow as tf # Needed only for the dataset
-import tensorflow_datasets as tfds
-
-import keras_rs
-```
-
----
-## Data Parallel
-
-For the synchronous data parallelism strategy in distributed training,
-we will use the `DataParallel` class present in the `keras.distribution`
-API.
-
-
-```python
-devices = jax.devices() # Assume it has >1 local devices.
-data_parallel = keras.distribution.DataParallel(devices=devices)
-```
-
-Alternatively, you can choose to create the `DataParallel` object
-using a 1D `DeviceMesh` object, like so:
-
-```
-mesh_1d = keras.distribution.DeviceMesh(
- shape=(len(devices),), axis_names=["data"], devices=devices
-)
-data_parallel = keras.distribution.DataParallel(device_mesh=mesh_1d)
-```
-
-
-```python
-# Set the global distribution strategy.
-keras.distribution.set_distribution(data_parallel)
-```
-
----
-## Preparing the dataset
-
-Now that we are done defining the global distribution
-strategy, the rest of the guide looks exactly the same
-as the previous basic retrieval guide.
-
-Let's load and prepare the dataset. Here too, we use the
-MovieLens dataset.
-
-
-```python
-# Ratings data with user and movie data.
-ratings = tfds.load("movielens/100k-ratings", split="train")
-# Features of all the available movies.
-movies = tfds.load("movielens/100k-movies", split="train")
-
-# User, movie counts for defining vocabularies.
-users_count = (
- ratings.map(lambda x: tf.strings.to_number(x["user_id"], out_type=tf.int32))
- .reduce(tf.constant(0, tf.int32), tf.maximum)
- .numpy()
-)
-movies_count = movies.cardinality().numpy()
-
-
-# Preprocess dataset, and split it into train-test datasets.
-def preprocess_rating(x):
- return (
- # Input is the user IDs
- tf.strings.to_number(x["user_id"], out_type=tf.int32),
- # Labels are movie IDs + ratings between 0 and 1.
- {
- "movie_id": tf.strings.to_number(x["movie_id"], out_type=tf.int32),
- "rating": (x["user_rating"] - 1.0) / 4.0,
- },
- )
-
-
-shuffled_ratings = ratings.map(preprocess_rating).shuffle(
- 100_000, seed=42, reshuffle_each_iteration=False
-)
-train_ratings = shuffled_ratings.take(80_000).batch(1000).cache()
-test_ratings = shuffled_ratings.skip(80_000).take(20_000).batch(1000).cache()
-```
-
-
----
-## Implementing the Model
-
-We build a two-tower retrieval model. Therefore, we need to combine a
-query tower for users and a candidate tower for movies. Note that we don't
-have to change anything here from the previous basic retrieval tutorial.
-
-
-```python
-
-class RetrievalModel(keras.Model):
- """Create the retrieval model with the provided parameters.
-
- Args:
- num_users: Number of entries in the user embedding table.
- num_candidates: Number of entries in the candidate embedding table.
- embedding_dimension: Output dimension for user and movie embedding tables.
- """
-
- def __init__(
- self,
- num_users,
- num_candidates,
- embedding_dimension=32,
- **kwargs,
- ):
- super().__init__(**kwargs)
- # Our query tower, simply an embedding table.
- self.user_embedding = keras.layers.Embedding(num_users, embedding_dimension)
- # Our candidate tower, simply an embedding table.
- self.candidate_embedding = keras.layers.Embedding(
- num_candidates, embedding_dimension
- )
- # The layer that performs the retrieval.
- self.retrieval = keras_rs.layers.BruteForceRetrieval(k=10, return_scores=False)
- self.loss_fn = keras.losses.MeanSquaredError()
-
- def build(self, input_shape):
- self.user_embedding.build(input_shape)
- self.candidate_embedding.build(input_shape)
- # In this case, the candidates are directly the movie embeddings.
- # We take a shortcut and directly reuse the variable.
- self.retrieval.candidate_embeddings = self.candidate_embedding.embeddings
- self.retrieval.build(input_shape)
- super().build(input_shape)
-
- def call(self, inputs, training=False):
- user_embeddings = self.user_embedding(inputs)
- result = {
- "user_embeddings": user_embeddings,
- }
- if not training:
- # Skip the retrieval of top movies during training as the
- # predictions are not used.
- result["predictions"] = self.retrieval(user_embeddings)
- return result
-
- def compute_loss(self, x, y, y_pred, sample_weight, training=True):
- candidate_id, rating = y["movie_id"], y["rating"]
- user_embeddings = y_pred["user_embeddings"]
- candidate_embeddings = self.candidate_embedding(candidate_id)
-
- labels = keras.ops.expand_dims(rating, -1)
- # Compute the affinity score by multiplying the two embeddings.
- scores = keras.ops.sum(
- keras.ops.multiply(user_embeddings, candidate_embeddings),
- axis=1,
- keepdims=True,
- )
- return self.loss_fn(labels, scores, sample_weight)
-
-```
-
----
-## Fitting and evaluating
-
-After defining the model, we can use the standard Keras `model.fit()` to train
-and evaluate the model.
-
-
-```python
-model = RetrievalModel(users_count + 1, movies_count + 1)
-model.compile(optimizer=keras.optimizers.Adagrad(learning_rate=0.2))
-```
-
-Let's train the model. Evaluation takes a bit of time, so we only evaluate the
-model every 5 epochs.
-
-
-```python
-history = model.fit(
- train_ratings, validation_data=test_ratings, validation_freq=5, epochs=50
-)
-```
-
-
- 80/80 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - loss: 0.1620 - val_loss: 0.1660
-
-
----
-## Making predictions
-
-Now that we have a model, let's run inference and make predictions.
-
-
-```python
-movie_id_to_movie_title = {
- int(x["movie_id"]): x["movie_title"] for x in movies.as_numpy_iterator()
-}
-movie_id_to_movie_title[0] = "" # Because id 0 is not in the dataset.
-```
-
-We then simply use the Keras `model.predict()` method. Under the hood, it calls
-the `BruteForceRetrieval` layer to perform the actual retrieval.
-
-
-```python
-user_ids = random.sample(range(1, 1001), len(devices))
-predictions = model.predict(keras.ops.convert_to_tensor(user_ids))
-predictions = keras.ops.convert_to_numpy(predictions["predictions"])
-
-for i, user_id in enumerate(user_ids):
- print(f"\n==Recommended movies for user {user_id}==")
- for movie_id in predictions[i]:
- print(movie_id_to_movie_title[movie_id])
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 204ms/step
-
-
-```
-
-```
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 205ms/step
-
-
-
-
-```
-==Recommended movies for user 449==
-b'Star Wars (1977)'
-b'Fargo (1996)'
-b'Silence of the Lambs, The (1991)'
-b'Shawshank Redemption, The (1994)'
-b'Pulp Fiction (1994)'
-b'Raiders of the Lost Ark (1981)'
-b"Schindler's List (1993)"
-b'Blade Runner (1982)'
-b"One Flew Over the Cuckoo's Nest (1975)"
-b'Casablanca (1942)'
-```
-
-
-
-```
-==Recommended movies for user 681==
-b'Star Wars (1977)'
-b'Fargo (1996)'
-b'Godfather, The (1972)'
-b'Silence of the Lambs, The (1991)'
-b'Raiders of the Lost Ark (1981)'
-b'Return of the Jedi (1983)'
-b'Pulp Fiction (1994)'
-b"Schindler's List (1993)"
-b'Empire Strikes Back, The (1980)'
-b'Shawshank Redemption, The (1994)'
-```
-
-
-
-```
-==Recommended movies for user 151==
-b'Princess Bride, The (1987)'
-b'Pulp Fiction (1994)'
-b'English Patient, The (1996)'
-b'Alien (1979)'
-b'Raiders of the Lost Ark (1981)'
-b'Willy Wonka and the Chocolate Factory (1971)'
-b'Amadeus (1984)'
-b'Liar Liar (1997)'
-b'Psycho (1960)'
-b"It's a Wonderful Life (1946)"
-```
-
-
-
-```
-==Recommended movies for user 442==
-b'Star Wars (1977)'
-b'Fargo (1996)'
-b'Godfather, The (1972)'
-b'Silence of the Lambs, The (1991)'
-b'Raiders of the Lost Ark (1981)'
-b'Return of the Jedi (1983)'
-b'Pulp Fiction (1994)'
-b'Empire Strikes Back, The (1980)'
-b"Schindler's List (1993)"
-b'Shawshank Redemption, The (1994)'
-```
-
-
-
-```
-==Recommended movies for user 134==
-b'Star Wars (1977)'
-b'Fargo (1996)'
-b'Godfather, The (1972)'
-b'Silence of the Lambs, The (1991)'
-b'Raiders of the Lost Ark (1981)'
-b'Pulp Fiction (1994)'
-b'Return of the Jedi (1983)'
-b'Empire Strikes Back, The (1980)'
-b'Twelve Monkeys (1995)'
-b'Contact (1997)'
-```
-
-
-
-```
-==Recommended movies for user 853==
-b'Star Wars (1977)'
-b'Fargo (1996)'
-b'Godfather, The (1972)'
-b'Raiders of the Lost Ark (1981)'
-b'Silence of the Lambs, The (1991)'
-b'Return of the Jedi (1983)'
-b'Pulp Fiction (1994)'
-b"Schindler's List (1993)"
-b'Empire Strikes Back, The (1980)'
-b'Shawshank Redemption, The (1994)'
-```
-
-
-
-```
-==Recommended movies for user 707==
-b'Star Wars (1977)'
-b'Raiders of the Lost Ark (1981)'
-b'Toy Story (1995)'
-b"Schindler's List (1993)"
-b'Empire Strikes Back, The (1980)'
-b'Fargo (1996)'
-b'Godfather, The (1972)'
-b'Return of the Jedi (1983)'
-b'Terminator, The (1984)'
-b'Princess Bride, The (1987)'
-```
-
-
-
-```
-==Recommended movies for user 511==
-b'Star Wars (1977)'
-b'Fargo (1996)'
-b'Godfather, The (1972)'
-b'Raiders of the Lost Ark (1981)'
-b'Silence of the Lambs, The (1991)'
-b'Return of the Jedi (1983)'
-b"Schindler's List (1993)"
-b'Empire Strikes Back, The (1980)'
-b'Pulp Fiction (1994)'
-b'Shawshank Redemption, The (1994)'
-
-```
-
-And we're done! For data parallel training, all we had to do was add ~3-5 LoC.
-The rest is exactly the same.
diff --git a/templates/keras_rs/examples/dcn.md b/templates/keras_rs/examples/dcn.md
deleted file mode 100644
index 3999ef35db..0000000000
--- a/templates/keras_rs/examples/dcn.md
+++ /dev/null
@@ -1,676 +0,0 @@
-# Ranking with Deep and Cross Networks
-
-**Author:** [Abheesht Sharma](https://github.com/abheesht17/), [Fabien Hertschuh](https://github.com/hertschuh/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Rank movies using Deep and Cross Networks (DCN).
-
-
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/dcn.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/dcn.py)
-
-
-
----
-## Introduction
-
-This tutorial demonstrates how to use Deep & Cross Networks (DCN) to effectively
-learn feature crosses. Before diving into the example, let's briefly discuss
-feature crosses.
-
-Imagine that we are building a recommender system for blenders. Individual
-features might include a customer's past purchase history (e.g.,
-`purchased_bananas`, `purchased_cooking_books`) or geographic location. However,
-a customer who has purchased both bananas and cooking books is more likely to be
-interested in a blender than someone who purchased only one or the other. The
-combination of `purchased_bananas` and `purchased_cooking_books` is a feature
-cross. Feature crosses capture interaction information between individual
-features, providing richer context than the individual features alone.
-
-
-
-Learning effective feature crosses presents several challenges. In web-scale
-applications, data is often categorical, resulting in high-dimensional and
-sparse feature spaces. Identifying impactful feature crosses in such
-environments typically relies on manual feature engineering or computationally
-expensive exhaustive searches. While traditional feed-forward multilayer
-perceptrons (MLPs) are universal function approximators, they often struggle to
-efficiently learn even second- or third-order feature interactions.
-
-The Deep & Cross Network (DCN) architecture is designed for more effective
-learning of explicit and bounded-degree feature crosses. It comprises three main
-components: an input layer (typically an embedding layer), a cross network for
-modeling explicit feature interactions, and a deep network for capturing
-implicit interactions.
-
-The cross network is the core of the DCN. It explicitly performs feature
-crossing at each layer, with the highest polynomial degree of feature
-interaction increasing with depth. The following figure shows the `(i+1)`-th
-cross layer.
-
-
-
-The deep network is a standard feedforward multilayer perceptron
-(MLP). These two networks are then combined to form the DCN. Two common
-combination strategies exist: a stacked structure, where the deep network is
-placed on top of the cross network, and a parallel structure, where they
-operate in parallel.
-
-
-
-
-
-
- Parallel layers
-
-
-
-
-
- Stacked layers
-
-
-
-
-
-Now that we know a little bit about DCN, let's start writing some code. We will
-first train a DCN on a toy dataset, and demonstrate that the model has indeed
-learnt important feature crosses.
-
-Let's set the backend to JAX, and get our imports sorted.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import keras
-import matplotlib.pyplot as plt
-import numpy as np
-import tensorflow as tf
-import tensorflow_datasets as tfds
-from mpl_toolkits.axes_grid1 import make_axes_locatable
-
-import keras_rs
-```
-
-Let's also define variables which will be reused throughout the example.
-
-
-```python
-TOY_CONFIG = {
- "learning_rate": 0.01,
- "num_epochs": 100,
- "batch_size": 1024,
-}
-
-MOVIELENS_CONFIG = {
- # features
- "int_features": [
- "movie_id",
- "user_id",
- "user_gender",
- "bucketized_user_age",
- ],
- "str_features": [
- "user_zip_code",
- "user_occupation_text",
- ],
- # model
- "embedding_dim": 32,
- "deep_net_num_units": [192, 192, 192],
- "projection_dim": 20,
- "dcn_num_units": [192, 192],
- # training
- "learning_rate": 0.01,
- "num_epochs": 10,
- "batch_size": 1024,
-}
-
-LOOKUP_LAYERS = {
- "int": keras.layers.IntegerLookup,
- "str": keras.layers.StringLookup,
-}
-```
-
-Here, we define a helper function for visualising weights of the cross layer in
-order to better understand its functioning. Also, we define a function for
-compiling, training and evaluating a given model.
-
-
-```python
-
-def visualize_layer(matrix, features):
- plt.figure(figsize=(9, 9))
-
- im = plt.matshow(np.abs(matrix), cmap=plt.cm.Blues)
-
- ax = plt.gca()
- divider = make_axes_locatable(plt.gca())
- cax = divider.append_axes("right", size="5%", pad=0.05)
- plt.colorbar(im, cax=cax)
- cax.tick_params(labelsize=10)
- ax.set_xticklabels([""] + features, rotation=45, fontsize=10)
- ax.set_yticklabels([""] + features, fontsize=10)
-
-
-def train_and_evaluate(
- learning_rate,
- epochs,
- train_data,
- test_data,
- model,
-):
- optimizer = keras.optimizers.AdamW(learning_rate=learning_rate)
- loss = keras.losses.MeanSquaredError()
- rmse = keras.metrics.RootMeanSquaredError()
-
- model.compile(
- optimizer=optimizer,
- loss=loss,
- metrics=[rmse],
- )
-
- model.fit(
- train_data,
- epochs=epochs,
- verbose=0,
- )
-
- results = model.evaluate(test_data, return_dict=True, verbose=0)
- rmse_value = results["root_mean_squared_error"]
-
- return rmse_value, model.count_params()
-
-
-def print_stats(rmse_list, num_params, model_name):
- # Report metrics.
- num_trials = len(rmse_list)
- avg_rmse = np.mean(rmse_list)
- std_rmse = np.std(rmse_list)
-
- if num_trials == 1:
- print(f"{model_name}: RMSE = {avg_rmse}; #params = {num_params}")
- else:
- print(
- f"{model_name}: RMSE = {avg_rmse} ± {std_rmse}; " "#params = {num_params}"
- )
-
-```
-
----
-## Toy Example
-
-To illustrate the benefits of DCNs, let's consider a simple example. Suppose we
-have a dataset for modeling the likelihood of a customer clicking on a blender
-advertisement. The features and label are defined as follows:
-
-| **Features / Label** | **Description** | **Range**|
-|:--------------------:|:------------------------------:|:--------:|
-| `x1` = country | Customer's resident country | [0, 199] |
-| `x2` = bananas | # bananas purchased | [0, 23] |
-| `x3` = cookbooks | # cooking books purchased | [0, 5] |
-| `y` | Blender ad click likelihood | - |
-
-Then, we let the data follow the following underlying distribution:
-`y = f(x1, x2, x3) = 0.1x1 + 0.4x2 + 0.7x3 + 0.1x1x2 +`
-`3.1x2x3 + 0.1x3^2`.
-
-This distribution shows that the click likelihood (`y`) depends linearly on
-individual features (`xi`) and on multiplicative interactions between them. In
-this scenario, the likelihood of purchasing a blender (`y`) is influenced not
-only by purchasing bananas (`x2`) or cookbooks (`x3`) individually, but also
-significantly by the interaction of purchasing both bananas and cookbooks
-(`x2x3`).
-
-### Preparing the dataset
-
-Let's create synthetic data based on the above equation, and form the train-test
-splits.
-
-
-```python
-
-def get_mixer_data(data_size=100_000):
- country = np.random.randint(200, size=[data_size, 1]) / 200.0
- bananas = np.random.randint(24, size=[data_size, 1]) / 24.0
- cookbooks = np.random.randint(6, size=[data_size, 1]) / 6.0
-
- x = np.concatenate([country, bananas, cookbooks], axis=1)
-
- # Create 1st-order terms.
- y = 0.1 * country + 0.4 * bananas + 0.7 * cookbooks
-
- # Create 2nd-order cross terms.
- y += (
- 0.1 * country * bananas
- + 3.1 * bananas * cookbooks
- + (0.1 * cookbooks * cookbooks)
- )
-
- return x, y
-
-
-x, y = get_mixer_data(data_size=100_000)
-num_train = 90_000
-train_x = x[:num_train]
-train_y = y[:num_train]
-test_x = x[num_train:]
-test_y = y[num_train:]
-```
-
-### Building the model
-
-To demonstrate the advantages of a cross network in recommender systems, we'll
-compare its performance with a deep network. Since our example data only
-contains second-order feature interactions, a single-layered cross network will
-suffice. For datasets with higher-order interactions, multiple cross layers can
-be stacked to form a multi-layered cross network. We will build two models:
-
-1. A cross network with a single cross layer.
-2. A deep network with wider and deeper feedforward layers.
-
-
-```python
-cross_network = keras.Sequential(
- [
- keras_rs.layers.FeatureCross(),
- keras.layers.Dense(1),
- ]
-)
-
-deep_network = keras.Sequential(
- [
- keras.layers.Dense(512, activation="relu"),
- keras.layers.Dense(256, activation="relu"),
- keras.layers.Dense(128, activation="relu"),
- ]
-)
-```
-
-### Model training
-
-Before we train the model, we need to batch our datasets.
-
-
-```python
-train_ds = tf.data.Dataset.from_tensor_slices((train_x, train_y)).batch(
- TOY_CONFIG["batch_size"]
-)
-test_ds = tf.data.Dataset.from_tensor_slices((test_x, test_y)).batch(
- TOY_CONFIG["batch_size"]
-)
-```
-
-Let's train both models. Remember we have set `verbose=0` for brevity's
-sake, so do not be alarmed if you do not see any output for a while.
-
-After training, we evaluate the models on the unseen dataset. We will report
-the Root Mean Squared Error (RMSE) here.
-
-We observe that the cross network achieved significantly lower RMSE compared to
-a ReLU-based DNN, while also using fewer parameters. This points to the
-efficiency of the cross network in learning feature interactions.
-
-
-```python
-cross_network_rmse, cross_network_num_params = train_and_evaluate(
- learning_rate=TOY_CONFIG["learning_rate"],
- epochs=TOY_CONFIG["num_epochs"],
- train_data=train_ds,
- test_data=test_ds,
- model=cross_network,
-)
-print_stats(
- rmse_list=[cross_network_rmse],
- num_params=cross_network_num_params,
- model_name="Cross Network",
-)
-
-deep_network_rmse, deep_network_num_params = train_and_evaluate(
- learning_rate=TOY_CONFIG["learning_rate"],
- epochs=TOY_CONFIG["num_epochs"],
- train_data=train_ds,
- test_data=test_ds,
- model=deep_network,
-)
-print_stats(
- rmse_list=[deep_network_rmse],
- num_params=deep_network_num_params,
- model_name="Deep Network",
-)
-```
-
-
-### Visualizing feature interactions
-
-Since we already know which feature crosses are important in our data, it would
-be interesting to verify whether our model has indeed learned these key feature
-interactions. This can be done by visualizing the learned weight matrix in the
-cross network, where the weight `Wij` represents the learned importance of
-the interaction between features `xi` and `xj`.
-
-
-```python
-visualize_layer(
- matrix=cross_network.weights[0].numpy(),
- features=["country", "purchased_bananas", "purchased_cookbooks"],
-)
-```
-
-
-```
-:11: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
- ax.set_xticklabels([""] + features, rotation=45, fontsize=10)
-:12: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
- ax.set_yticklabels([""] + features, fontsize=10)
-
-
-
-```
-
-
-
-
-
-
----
-## Real-world example
-
-Let's use the MovieLens 100K dataset. This dataset is used to train models to
-predict users' movie ratings, based on user-related features and movie-related
-features.
-
-### Preparing the dataset
-
-The dataset processing steps here are similar to what's given in the
-[basic ranking](/keras_rs/examples/basic_ranking/)
-tutorial. Let's load the dataset, and keep only the useful columns.
-
-
-```python
-ratings_ds = tfds.load("movielens/100k-ratings", split="train")
-ratings_ds = ratings_ds.map(
- lambda x: (
- {
- "movie_id": int(x["movie_id"]),
- "user_id": int(x["user_id"]),
- "user_gender": int(x["user_gender"]),
- "user_zip_code": x["user_zip_code"],
- "user_occupation_text": x["user_occupation_text"],
- "bucketized_user_age": int(x["bucketized_user_age"]),
- },
- x["user_rating"], # label
- )
-)
-```
-
-
-DCN outperforms a similarly sized DNN with ReLU layers, demonstrating
-superior performance. Furthermore, the low-rank DCN effectively reduces the
-number of parameters without compromising accuracy.
-
-### Visualizing feature interactions
-
-Like we did for the toy example, we will plot the weight matrix of the cross
-layer to see which feature crosses are important. In the previous example,
-the importance of interactions between the `i`-th and `j-th` features is
-captured by the `(i, j)`-{th} element of the weight matrix.
-
-In this case, the feature embeddings are of size 32 rather than 1. Therefore,
-the importance of feature interactions is represented by the `(i, j)`-th
-block of the weight matrix, which has dimensions `32 x 32`. To quantify the
-significance of these interactions, we use the Frobenius norm of each block. A
-larger value implies higher importance.
-
-
-```python
-features = list(vocabularies.keys())
-mat = cross_network.weights[len(features)].numpy()
-embedding_dim = MOVIELENS_CONFIG["embedding_dim"]
-
-block_norm = np.zeros([len(features), len(features)])
-
-# Compute the norms of the blocks.
-for i in range(len(features)):
- for j in range(len(features)):
- block = mat[
- i * embedding_dim : (i + 1) * embedding_dim,
- j * embedding_dim : (j + 1) * embedding_dim,
- ]
- block_norm[i, j] = np.linalg.norm(block, ord="fro")
-
-visualize_layer(
- matrix=block_norm,
- features=features,
-)
-```
-
-
-```
-:11: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
- ax.set_xticklabels([""] + features, rotation=45, fontsize=10)
-:12: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
- ax.set_yticklabels([""] + features, fontsize=10)
-
-
-
-```
-
-
-
-
-
-
-And we are all done!
diff --git a/templates/keras_rs/examples/deep_recommender.md b/templates/keras_rs/examples/deep_recommender.md
deleted file mode 100644
index ad154675c7..0000000000
--- a/templates/keras_rs/examples/deep_recommender.md
+++ /dev/null
@@ -1,5439 +0,0 @@
-# Deep Recommenders
-
-**Author:** [Fabien Hertschuh](https://github.com/hertschuh/), [Abheesht Sharma](https://github.com/abheesht17/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Building a deep retrieval model with multiple stacked layers.
-
-
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/deep_recommender.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/deep_recommender.py)
-
-
-
----
-## Introduction
-
-One of the great advantages of using Keras to build recommender models is the
-freedom to build rich, flexible feature representations.
-
-The first step in doing so is preparing the features, as raw features will
-usually not be immediately usable in a model.
-
-For example:
-- User and item IDs may be strings (titles, usernames) or large, non-contiguous
- integers (database IDs).
-- Item descriptions could be raw text.
-- Interaction timestamps could be raw Unix timestamps.
-
-These need to be appropriately transformed in order to be useful in building
-models:
-- User and item IDs have to be translated into embedding vectors,
- high-dimensional numerical representations that are adjusted during training
- to help the model predict its objective better.
-- Raw text needs to be tokenized (split into smaller parts such as individual
- words) and translated into embeddings.
-- Numerical features need to be normalized so that their values lie in a small
- interval around 0.
-
-Fortunately, the Keras
-[`FeatureSpace`](/api/utils/feature_space/) utility makes this
-preprocessing easy.
-
-In this tutorial, we are going to incorporate multiple features in our models.
-These features will come from preprocessing the MovieLens dataset.
-
-In the
-[basic retrieval](/keras_rs/examples/basic_retrieval/)
-tutorial, the models consist of only an embedding layer. In this tutorial, we
-add more dense layers to our models to increase their expressive power.
-
-In general, deeper models are capable of learning more complex patterns than
-shallower models. For example, our user model incorporates user IDs and user
-features such as age, gender and occupation. A shallow model (say, a single
-embedding layer) may only be able to learn the simplest relationships between
-those features and movies: a given user generally prefers horror movies to
-comedies. To capture more complex relationships, such as user preferences
-evolving with their age, we may need a deeper model with multiple stacked dense
-layers.
-
-Of course, complex models also have their disadvantages. The first is
-computational cost, as larger models require both more memory and more
-computation to train and serve. The second is the requirement for more data. In
-general, more training data is needed to take advantage of deeper models. With
-more parameters, deep models might overfit or even simply memorize the training
-examples instead of learning a function that can generalize. Finally, training
-deeper models may be harder, and more care needs to be taken in choosing
-settings like regularization and learning rate.
-
-Finding a good architecture for a real-world recommender system is a complex
-art, requiring good intuition and careful hyperparameter tuning. For example,
-factors such as the depth and width of the model, activation function, learning
-rate, and optimizer can radically change the performance of the model. Modelling
-choices are further complicated by the fact that good offline evaluation metrics
-may not correspond to good online performance, and that the choice of what to
-optimize for is often more critical than the choice of model itself.
-
-Nevertheless, effort put into building and fine-tuning larger models often pays
-off. In this tutorial, we will illustrate how to build a deep retrieval model.
-We'll do this by building progressively more complex models to see how this
-affects model performance.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import keras
-import matplotlib.pyplot as plt
-import tensorflow as tf # Needed for the dataset
-import tensorflow_datasets as tfds
-
-import keras_rs
-```
-
----
-## The MovieLens dataset
-
-Let's first have a look at what features we can use from the MovieLens dataset.
-
-
-```python
-# Ratings data with user and movie data.
-ratings = tfds.load("movielens/100k-ratings", split="train")
-# Features of all the available movies.
-movies = tfds.load("movielens/100k-movies", split="train")
-```
-
-The ratings dataset returns a dictionary of movie id, user id, the assigned
-rating, timestamp, movie information, and user information:
-
-
-```python
-for data in ratings.take(1).as_numpy_iterator():
- print(str(data).replace(", '", ",\n '"))
-```
-
-
-In the Movielens dataset, user IDs are integers (represented as strings)
-starting at 1 and with no gap. Normally, you would need to create a lookup table
-to map user IDs to integers from 0 to N-1. But as a simplication, we'll use the
-user id directly as an index in our model, in particular to lookup the user
-embedding from the user embedding table. So we need do know the number of users.
-
-
-```python
-USERS_COUNT = (
- ratings.map(lambda x: tf.strings.to_number(x["user_id"], out_type=tf.int32))
- .reduce(tf.constant(0, tf.int32), tf.maximum)
- .numpy()
-)
-```
-
-The movies dataset contains the movie id, movie title, and the genres it belongs
-to. Note that the genres are encoded with integer labels.
-
-
-```python
-for data in movies.take(1).as_numpy_iterator():
- print(str(data).replace(", '", ",\n '"))
-```
-
-
-In the Movielens dataset, movie IDs are integers (represented as strings)
-starting at 1 and with no gap. Normally, you would need to create a lookup table
-to map movie IDs to integers from 0 to N-1. But as a simplication, we'll use the
-movie id directly as an index in our model, in particular to lookup the movie
-embedding from the movie embedding table. So we need do know the number of
-movies.
-
-
-```python
-MOVIES_COUNT = movies.cardinality().numpy()
-```
-
----
-## Preprocessing the dataset
-
-### Normalizing continuous features
-
-Continuous features may need normalization so that they fall within an
-acceptable range for the model. We will give two examples of such normalization.
-
-#### Discretization
-
-A common transformation is to turn a continuous feature into a number of
-categorical features. This makes good sense if we have reasons to suspect that a
-feature's effect is non-continuous.
-
-We need to decide on a number the buckets we will use for discretization. Then,
-we will use the Keras `FeatureSpace` utility to automatically find the minimum
-and maximum value, and divide that range by the number of buckets to perform the
-discretization.
-
-In this example, we will discretize the user age.
-
-
-```python
-AGE_BINS_COUNT = 10
-user_age_feature = keras.utils.FeatureSpace.float_discretized(
- num_bins=AGE_BINS_COUNT, output_mode="int"
-)
-```
-
-#### Rescaling
-
-Often, we want continous features to be between 0 and 1, or between -1 and 1.
-To achieve this, we can rescale features that have a different range.
-
-In this example, we will standardize the rating, which is a integer between 1
-and 5, to be a float between 0 and 1. We need to rescale it and offset it.
-
-
-```python
-user_rating_feature = keras.utils.FeatureSpace.float_rescaled(
- scale=1.0 / 4.0, offset=-1.0 / 4.0
-)
-```
-
-### Turning categorical features into embeddings
-
-A categorical feature is a feature that does not express a continuous quantity,
-but rather takes on one of a set of fixed values.
-
-Most deep learning models express these feature by turning them into
-high-dimensional vectors. During model training, the value of that vector is
-adjusted to help the model predict its objective better.
-
-For example, suppose that our goal is to predict which user is going to watch
-which movie. To do that, we represent each user and each movie by an embedding
-vector. Initially, these embeddings will take on random values. During training,
-we adjust them so that embeddings of users and the movies they watch end up
-closer together.
-
-Taking raw categorical features and turning them into embeddings is normally a
-two-step process:
-1. First, we need to translate the raw values into a range of contiguous
- integers, normally by building a mapping (called a "vocabulary") that maps
- raw values to integers.
-2. Second, we need to take these integers and turn them into embeddings.
-
-#### Defining categorical features
-
-We will use the Keras `FeatureSpace` utility for the first step. Its `adapt`
-method automatically discovers the vocabulary for categorical features.
-
-
-```python
-user_gender_feature = keras.utils.FeatureSpace.integer_categorical(
- num_oov_indices=0, output_mode="int"
-)
-user_occupation_feature = keras.utils.FeatureSpace.integer_categorical(
- num_oov_indices=0, output_mode="int"
-)
-```
-
-#### Using feature crosses
-
-With crosses we can do feature interactions between multiple categorical
-features. This can be powerful to express that the combination of features
-represents a specific taste for movies.
-
-Note that the combination of multiple features can result into on a super large
-feature space, that is why the crossing_dim parameter is important to limit the
-output dimension of the cross feature.
-
-In this example, we will cross age and gender with the Keras `FeatureSpace`
-utility.
-
-
-```python
-USER_GENDER_CROSS_COUNT = 20
-user_gender_age_cross = keras.utils.FeatureSpace.cross(
- feature_names=("user_gender", "raw_user_age"),
- crossing_dim=USER_GENDER_CROSS_COUNT,
- output_mode="int",
-)
-```
-
-### Processing text features
-
-We may also want to add text features to our model. Usually, things like product
-descriptions are free form text, and we can hope that our model can learn to use
-the information they contain to make better recommendations, especially in a
-cold-start or long tail scenario.
-
-While the MovieLens dataset does not give us rich textual features, we can still
-use movie titles. This may help us capture the fact that movies with very
-similar titles are likely to belong to the same series.
-
-The first transformation we need to apply to text is tokenization (splitting
-into constituent words or word-pieces), followed by vocabulary learning,
-followed by an embedding.
-
-
-The
-[`keras.layers.TextVectorization`](/api/layers/preprocessing_layers/text/text_vectorization/)
-layer can do the first two steps for us.
-
-
-```python
-title_vectorizer = keras.layers.TextVectorization(
- max_tokens=10_000, output_sequence_length=16, dtype="int32"
-)
-title_vectorizer.adapt(movies.map(lambda x: x["movie_title"]))
-```
-
-Let's try it out:
-
-
-```python
-for data in movies.take(1).as_numpy_iterator():
- print(title_vectorizer(data["movie_title"]))
-```
-
-
-Each title is translated into a sequence of tokens, one for each piece we've
-tokenized.
-
-We can check the learned vocabulary to verify that the layer is using the
-correct tokenization:
-
-
-```python
-print(title_vectorizer.get_vocabulary()[40:50])
-```
-
-
-This looks correct, the layer is tokenizing titles into individual words. Later,
-we will see how to embed this tokenized text. For now, we turn this vectorizer
-into a Keras `FeatureSpace` feature.
-
-
-```python
-title_feature = keras.utils.FeatureSpace.feature(
- preprocessor=title_vectorizer, dtype="string", output_mode="float"
-)
-TITLE_TOKEN_COUNT = title_vectorizer.vocabulary_size()
-```
-
-### Putting the FeatureSpace features together
-
-We're now ready to assemble the features with preprocessors in a `FeatureSpace`
-object. We're then using `adapt` to go through the dataset and learn what needs
-to be learned, such as the vocabulary size for categorical features or the
-minimum and maximum values for bucketized features.
-
-
-```python
-feature_space = keras.utils.FeatureSpace(
- features={
- # Numerical features to discretize.
- "raw_user_age": user_age_feature,
- # Categorical features encoded as integers.
- "user_gender": user_gender_feature,
- "user_occupation_label": user_occupation_feature,
- # Labels are ratings between 0 and 1.
- "user_rating": user_rating_feature,
- "movie_title": title_feature,
- },
- crosses=[user_gender_age_cross],
- output_mode="dict",
-)
-
-feature_space.adapt(ratings)
-GENDERS_COUNT = feature_space.preprocessors["user_gender"].vocabulary_size()
-OCCUPATIONS_COUNT = feature_space.preprocessors[
- "user_occupation_label"
-].vocabulary_size()
-```
-
----
-## Pre-building the candidate set
-
-Our model is going to based on a `Retrieval` layer, which can provides a set of
-best candidates among to full set of candidates. To do this, the retrieval layer
-needs to know all the candidates and their features. In this section, we
-assemble the full set of movies with the associated features.
-
-### Extract raw candidate features
-
-First, we gather all the raw features from the dataset in lists. That is the
-titles of the movies and the genres. Note that one or more genres are
-associated with each movie, and the number of genres varies per movie.
-
-
-```python
-movie_titles = [""] * (MOVIES_COUNT + 1)
-movie_genres = [[]] * (MOVIES_COUNT + 1)
-for x in movies.as_numpy_iterator():
- movie_id = int(x["movie_id"])
- movie_titles[movie_id] = x["movie_title"]
- movie_genres[movie_id] = x["movie_genres"].tolist()
-```
-
-### Preprocess candidate features
-
-Genres are already in the form of category numbers starting at zero. However, we
-do need to figure out two things:
-- The maximum number of genres a single movie can have; this will determine the
- dimension for this feature.
-- The maximum value for the genre, which will give us the total number of genres
- and determine the size of our embedding table for genres.
-
-
-```python
-MAX_GENRES_PER_MOVIE = 0
-max_genre_id = 0
-for one_movie_genres in movie_genres:
- MAX_GENRES_PER_MOVIE = max(MAX_GENRES_PER_MOVIE, len(one_movie_genres))
- if one_movie_genres:
- max_genre_id = max(max_genre_id, max(one_movie_genres))
-
-GENRES_COUNT = max_genre_id + 1
-```
-
-Now we need to pad genres with an Out Of Vocabulary value to be able to
-represent genres as a fixed size vector. We'll pad with zeros for simplicity, so
-we're adding one to the genres to not conflict with genre zero, which is a valid
-genre.
-
-
-```python
-movie_genres = [
- [g + 1 for g in genres] + [0] * (MAX_GENRES_PER_MOVIE - len(genres))
- for genres in movie_genres
-]
-```
-
-Then, we vectorize all the movie titles.
-
-
-```python
-movie_titles_vectors = title_vectorizer(movie_titles)
-```
-
-### Convert candidate set to native tensors
-
-We're now ready to combine these in a dataset. The last step is to make sure
-everything is a native tensor that can be consumed by the retrieval layer.
-As a remminder, movie id zero does not exist.
-
-
-```python
-MOVIES_DATASET = {
- "movie_id": keras.ops.arange(0, MOVIES_COUNT + 1, dtype="int32"),
- "movie_title_vector": movie_titles_vectors,
- "movie_genres": keras.ops.convert_to_tensor(movie_genres, dtype="int32"),
-}
-```
-
----
-## Preparing the data
-
-We can now define our preprocessing function. Most features will be handled
-by the `FeatureSpace`. User IDs and Movie IDs need to be extracted. Movie genres
-need to be padded. Then everything is packaged as a tuple with a dict of input
-features and a float for the rating, which is used as a label.
-
-
-```python
-
-def preprocess_rating(x):
- features = feature_space(
- {
- "raw_user_age": x["raw_user_age"],
- "user_gender": x["user_gender"],
- "user_occupation_label": x["user_occupation_label"],
- "user_rating": x["user_rating"],
- "movie_title": x["movie_title"],
- }
- )
- features = {k: tf.squeeze(v, axis=0) for k, v in features.items()}
- movie_genres = x["movie_genres"]
-
- return (
- {
- # User inputs are user ID and user features
- "user_id": int(x["user_id"]),
- "raw_user_age": features["raw_user_age"],
- "user_gender": features["user_gender"],
- "user_occupation_label": features["user_occupation_label"],
- "user_gender_X_raw_user_age": tf.squeeze(
- features["user_gender_X_raw_user_age"], axis=-1
- ),
- # Movie inputs are movie ID, vectorized title and genres
- "movie_id": int(x["movie_id"]),
- "movie_title_vector": features["movie_title"],
- "movie_genres": tf.pad(
- movie_genres + 1,
- [[0, MAX_GENRES_PER_MOVIE - tf.shape(movie_genres)[0]]],
- ),
- },
- # Label is user rating between 0 and 1
- features["user_rating"],
- )
-
-```
-
-We shuffle and then split the data into a training set and a testing set.
-
-
-```python
-shuffled_ratings = ratings.map(preprocess_rating).shuffle(
- 100_000, seed=42, reshuffle_each_iteration=False
-)
-
-train_ratings = shuffled_ratings.take(80_000).batch(1000).cache()
-test_ratings = shuffled_ratings.skip(80_000).take(20_000).batch(1000).cache()
-```
-
----
-## Model definition
-
-### Query model
-
-The query model is first tasked with converting user features to embeddings. The
-embeddings are then concatenated into a single vector.
-
-Defining deeper models will require us to stack more layers on top of this first
-set of embeddings. A progressively narrower stack of layers, separated by an
-activation function, is a common pattern:
-
-```
- +----------------------+
- | 64 x 32 |
- +----------------------+
- | relu
- +--------------------------+
- | 128 x 64 |
- +--------------------------+
- | relu
- +------------------------------+
- | ... x 128 |
- +------------------------------+
-```
-
-Since the expressive power of deep linear models is no greater than that of
-shallow linear models, we use ReLU activations for all but the last hidden
-layer. The final hidden layer does not use any activation function: using an
-activation function would limit the output space of the final embeddings and
-might negatively impact the performance of the model. For instance, if ReLUs are
-used in the projection layer, all components in the output embedding would be
-non-negative.
-
-We're going to try this here. To make experimentation with different depths
-easy, let's define a model whose depth (and width) is defined by a constructor
-parameters. The `layer_sizes` parameter gives us the depth and width of the
-model. We can vary it to experiment with shallower or deeper models.
-
-
-```python
-
-class QueryModel(keras.Model):
- """Model for encoding user queries."""
-
- def __init__(self, layer_sizes, embedding_dimension=32):
- """Construct a model for encoding user queries.
-
- Args:
- layer_sizes: A list of integers where the i-th entry represents the
- number of units the i-th layer contains.
- embedding_dimension: Output dimension for all embedding tables.
- """
- super().__init__()
-
- # We first generate embeddings.
- self.user_embedding = keras.layers.Embedding(
- # +1 for user ID zero, which does not exist
- USERS_COUNT + 1,
- embedding_dimension,
- )
- self.gender_embedding = keras.layers.Embedding(
- GENDERS_COUNT, embedding_dimension
- )
- self.age_embedding = keras.layers.Embedding(AGE_BINS_COUNT, embedding_dimension)
- self.gender_x_age_embedding = keras.layers.Embedding(
- USER_GENDER_CROSS_COUNT, embedding_dimension
- )
- self.occupation_embedding = keras.layers.Embedding(
- OCCUPATIONS_COUNT, embedding_dimension
- )
-
- # Then construct the layers.
- self.dense_layers = keras.Sequential()
-
- # Use the ReLU activation for all but the last layer.
- for layer_size in layer_sizes[:-1]:
- self.dense_layers.add(keras.layers.Dense(layer_size, activation="relu"))
-
- # No activation for the last layer.
- self.dense_layers.add(keras.layers.Dense(layer_sizes[-1]))
-
- def call(self, inputs):
- # Take the inputs, pass each through its embedding layer, concatenate.
- feature_embedding = keras.ops.concatenate(
- [
- self.user_embedding(inputs["user_id"]),
- self.gender_embedding(inputs["user_gender"]),
- self.age_embedding(inputs["raw_user_age"]),
- self.gender_x_age_embedding(inputs["user_gender_X_raw_user_age"]),
- self.occupation_embedding(inputs["user_occupation_label"]),
- ],
- axis=1,
- )
- return self.dense_layers(feature_embedding)
-
-```
-
----
-## Candidate model
-
-We can adopt the same approach for the candidate model. Again, we start with
-converting movie features to embeddings, concatenate them and then expand it
-with hidden layers:
-
-
-```python
-
-class CandidateModel(keras.Model):
- """Model for encoding candidates (movies)."""
-
- def __init__(self, layer_sizes, embedding_dimension=32):
- """Construct a model for encoding candidates (movies).
-
- Args:
- layer_sizes: A list of integers where the i-th entry represents the
- number of units the i-th layer contains.
- embedding_dimension: Output dimension for all embedding tables.
- """
- super().__init__()
-
- # We first generate embeddings.
- self.movie_embedding = keras.layers.Embedding(
- # +1 for movie ID zero, which does not exist
- MOVIES_COUNT + 1,
- embedding_dimension,
- )
- # Take all the title tokens for the title of the movie, embed each
- # token, and then take the mean of all token embeddings.
- self.movie_title_embedding = keras.Sequential(
- [
- keras.layers.Embedding(
- # +1 for OOV token, which is used for padding
- TITLE_TOKEN_COUNT + 1,
- embedding_dimension,
- mask_zero=True,
- ),
- keras.layers.GlobalAveragePooling1D(),
- ]
- )
- # Take all the genres for the movie, embed each genre, and then take the
- # mean of all genre embeddings.
- self.movie_genres_embedding = keras.Sequential(
- [
- keras.layers.Embedding(
- # +1 for OOV genre, which is used for padding
- GENRES_COUNT + 1,
- embedding_dimension,
- mask_zero=True,
- ),
- keras.layers.GlobalAveragePooling1D(),
- ]
- )
-
- # Then construct the layers.
- self.dense_layers = keras.Sequential()
-
- # Use the ReLU activation for all but the last layer.
- for layer_size in layer_sizes[:-1]:
- self.dense_layers.add(keras.layers.Dense(layer_size, activation="relu"))
-
- # No activation for the last layer.
- self.dense_layers.add(keras.layers.Dense(layer_sizes[-1]))
-
- def call(self, inputs):
- movie_id = inputs["movie_id"]
- movie_title_vector = inputs["movie_title_vector"]
- movie_genres = inputs["movie_genres"]
- feature_embedding = keras.ops.concatenate(
- [
- self.movie_embedding(movie_id),
- self.movie_title_embedding(movie_title_vector),
- self.movie_genres_embedding(movie_genres),
- ],
- axis=1,
- )
- return self.dense_layers(feature_embedding)
-
-```
-
----
-## Combined model
-
-With both QueryModel and CandidateModel defined, we can put together a combined
-model and implement our loss and metrics logic. To make things simple, we'll
-enforce that the model structure is the same across the query and candidate
-models.
-
-
-```python
-
-class RetrievalModel(keras.Model):
- """Combined model."""
-
- def __init__(
- self,
- layer_sizes=(32,),
- embedding_dimension=32,
- retrieval_k=100,
- ):
- """Construct a combined model.
-
- Args:
- layer_sizes: A list of integers where the i-th entry represents the
- number of units the i-th layer contains.
- embedding_dimension: Output dimension for all embedding tables.
- retrieval_k: How many candidate movies to retrieve.
- """
- super().__init__()
- self.query_model = QueryModel(layer_sizes, embedding_dimension)
- self.candidate_model = CandidateModel(layer_sizes, embedding_dimension)
- self.retrieval = keras_rs.layers.BruteForceRetrieval(
- k=retrieval_k, return_scores=False
- )
- self.update_candidates() # Provide an initial set of candidates
- self.loss_fn = keras.losses.MeanSquaredError()
- self.top_k_metric = keras.metrics.SparseTopKCategoricalAccuracy(
- k=100, from_sorted_ids=True
- )
-
- def update_candidates(self):
- self.retrieval.update_candidates(
- self.candidate_model.predict(MOVIES_DATASET, verbose=0)
- )
-
- def call(self, inputs, training=False):
- query_embeddings = self.query_model(
- {
- "user_id": inputs["user_id"],
- "raw_user_age": inputs["raw_user_age"],
- "user_gender": inputs["user_gender"],
- "user_occupation_label": inputs["user_occupation_label"],
- "user_gender_X_raw_user_age": inputs["user_gender_X_raw_user_age"],
- }
- )
- candidate_embeddings = self.candidate_model(
- {
- "movie_id": inputs["movie_id"],
- "movie_title_vector": inputs["movie_title_vector"],
- "movie_genres": inputs["movie_genres"],
- }
- )
-
- result = {
- "query_embeddings": query_embeddings,
- "candidate_embeddings": candidate_embeddings,
- }
- if not training:
- # No need to spend time extracting top predicted movies during
- # training, they are not used.
- result["predictions"] = self.retrieval(query_embeddings)
- return result
-
- def evaluate(
- self,
- x=None,
- y=None,
- batch_size=None,
- verbose="auto",
- sample_weight=None,
- steps=None,
- callbacks=None,
- return_dict=False,
- **kwargs,
- ):
- """Overridden to update the candidate set.
-
- Before evaluating the model, we need to update our retrieval layer by
- re-computing the values predicted by the candidate model for all the
- candidates.
- """
- self.update_candidates()
- return super().evaluate(
- x,
- y,
- batch_size=batch_size,
- verbose=verbose,
- sample_weight=sample_weight,
- steps=steps,
- callbacks=callbacks,
- return_dict=return_dict,
- **kwargs,
- )
-
- def compute_loss(self, x, y, y_pred, sample_weight, training=True):
- query_embeddings = y_pred["query_embeddings"]
- candidate_embeddings = y_pred["candidate_embeddings"]
-
- labels = keras.ops.expand_dims(y, -1)
- # Compute the affinity score by multiplying the two embeddings.
- scores = keras.ops.sum(
- keras.ops.multiply(query_embeddings, candidate_embeddings),
- axis=1,
- keepdims=True,
- )
- return self.loss_fn(labels, scores, sample_weight)
-
- def compute_metrics(self, x, y, y_pred, sample_weight=None):
- if "predictions" in y_pred:
- # We are evaluating or predicting. Update `top_k_metric`.
- movie_ids = x["movie_id"]
- predictions = y_pred["predictions"]
- # For `top_k_metric`, which is a `SparseTopKCategoricalAccuracy`, we
- # only take top rated movies, and we put a weight of 0 for the rest.
- rating_weight = keras.ops.cast(keras.ops.greater(y, 0.9), "float32")
- sample_weight = (
- rating_weight
- if sample_weight is None
- else keras.ops.multiply(rating_weight, sample_weight)
- )
- self.top_k_metric.update_state(
- movie_ids, predictions, sample_weight=sample_weight
- )
- return self.get_metrics_result()
- else:
- # We are training. `top_k_metric` is not updated and is zero, so
- # don't report it.
- result = self.get_metrics_result()
- result.pop(self.top_k_metric.name)
- return result
-
-```
-
----
-## Training the model
-
-### Shallow model
-
-We're ready to try out our first, shallow, model!
-
-
-```python
-NUM_EPOCHS = 30
-
-one_layer_model = RetrievalModel((32,))
-one_layer_model.compile(optimizer=keras.optimizers.Adagrad(0.05))
-
-one_layer_history = one_layer_model.fit(
- train_ratings,
- validation_data=test_ratings,
- validation_freq=5,
- epochs=NUM_EPOCHS,
-)
-```
-
-
- 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.0554 - val_loss: 0.0574 - val_sparse_top_k_categorical_accuracy: 0.3216
-
-
-This gives us a top-100 accuracy of around 0.30. We can use this as a reference
-point for evaluating deeper models.
-
-### Deeper model
-
-What about a deeper model with two layers?
-
-
-```python
-two_layer_model = RetrievalModel((64, 32))
-two_layer_model.compile(optimizer=keras.optimizers.Adagrad(0.05))
-two_layer_history = two_layer_model.fit(
- train_ratings,
- validation_data=test_ratings,
- validation_freq=5,
- epochs=NUM_EPOCHS,
-)
-```
-
-
- 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.0548 - val_loss: 0.0570 - val_sparse_top_k_categorical_accuracy: 0.2964
-
-
-While the deeper model seems to learn a bit better than the shallow model at
-first, the difference becomes minimal towards the end of the trainign. We can
-plot the validation accuracy curves to illustrate this:
-
-
-```python
-METRIC = "val_sparse_top_k_categorical_accuracy"
-num_validation_runs = len(one_layer_history.history[METRIC])
-epochs = [(x + 1) * 5 for x in range(num_validation_runs)]
-
-plt.plot(epochs, one_layer_history.history[METRIC], label="1 layer")
-plt.plot(epochs, two_layer_history.history[METRIC], label="2 layers")
-plt.title("Accuracy vs epoch")
-plt.xlabel("epoch")
-plt.ylabel("Top-100 accuracy")
-plt.legend()
-plt.show()
-```
-
-
-
-
-
-
-
-Deeper models are not necessarily better. The following model extends the depth
-to three layers:
-
-
-```python
-three_layer_model = RetrievalModel((128, 64, 32))
-three_layer_model.compile(optimizer=keras.optimizers.Adagrad(0.05))
-three_layer_history = three_layer_model.fit(
- train_ratings,
- validation_data=test_ratings,
- validation_freq=5,
- epochs=NUM_EPOCHS,
-)
-```
-
-
- 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.0550 - val_loss: 0.0569 - val_sparse_top_k_categorical_accuracy: 0.3072
-
-
-We don't really see an improvement over the shallow model:
-
-
-```python
-plt.plot(epochs, one_layer_history.history[METRIC], label="1 layer")
-plt.plot(epochs, two_layer_history.history[METRIC], label="2 layers")
-plt.plot(epochs, three_layer_history.history[METRIC], label="3 layers")
-plt.title("Accuracy vs epoch")
-plt.xlabel("epoch")
-plt.ylabel("Top-100 accuracy")
-plt.legend()
-plt.show()
-```
-
-
-
-
-
-
-
-This is a good illustration of the fact that deeper and larger models, while
-capable of superior performance, often require very careful tuning. For example,
-throughout this tutorial we used a single, fixed learning rate. Alternative
-choices may give very different results and are worth exploring.
-
-With appropriate tuning and sufficient data, the effort put into building larger
-and deeper models is in many cases well worth it: larger models can lead to
-substantial improvements in prediction accuracy.
-
----
-## Next Steps
-
-In this tutorial we expanded our retrieval model with dense layers and
-activation functions. To see how to create a model that can perform not only
-retrieval tasks but also rating tasks, take a look at the multitask tutorial.
diff --git a/templates/keras_rs/examples/dlrm.md b/templates/keras_rs/examples/dlrm.md
deleted file mode 100644
index 46131c0bad..0000000000
--- a/templates/keras_rs/examples/dlrm.md
+++ /dev/null
@@ -1,520 +0,0 @@
-# Ranking with Deep Learning Recommendation Model
-
-**Author:** [Harshith Kulkarni](https://github.com/kharshith-k)
-**Date created:** 2025/06/02
-**Last modified:** 2025/09/04
-**Description:** Rank movies with DLRM using KerasRS.
-
-
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/dlrm.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/dlrm.py)
-
-
-
----
-## Introduction
-
-This tutorial demonstrates how to use the Deep Learning Recommendation Model (DLRM) to
-effectively learn the relationships between items and user preferences using a
-dot-product interaction mechanism. For more details, please refer to the
-[DLRM](https://arxiv.org/abs/1906.00091) paper.
-
-DLRM is designed to excel at capturing explicit, bounded-degree feature interactions and
-is particularly effective at processing both categorical and continuous (sparse/dense)
-input features. The architecture consists of three main components: dedicated input
-layers to handle diverse features (typically embedding layers for categorical features),
-a dot-product interaction layer to explicitly model feature interactions, and a
-Multi-Layer Perceptron (MLP) to capture implicit feature relationships.
-
-The dot-product interaction layer lies at the heart of DLRM, efficiently computing
-pairwise interactions between different feature embeddings. This contrasts with models
-like Deep & Cross Network (DCN), which can treat elements within a feature vector as
-independent units, potentially leading to a higher-dimensional space and increased
-computational cost. The MLP is a standard feedforward network. The DLRM is formed by
-combining the interaction layer and MLP.
-
-The following image illustrates the DLRM architecture:
-
-
-
-
-Now that we have a foundational understanding of DLRM's architecture and key
-characteristics, let's dive into the code. We will train a DLRM on a real-world dataset
-to demonstrate its capability to learn meaningful feature interactions. Let's begin by
-setting the backend to JAX and organizing our imports.
-
-
-```python
-!pip install -q keras-rs
-```
-
-
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "tensorflow" # `"tensorflow"`/`"torch"`
-
-import keras
-import matplotlib.pyplot as plt
-import numpy as np
-import tensorflow as tf
-import tensorflow_datasets as tfds
-from mpl_toolkits.axes_grid1 import make_axes_locatable
-
-import keras_rs
-```
-
-Let's also define variables which will be reused throughout the example.
-
-
-```python
-MOVIELENS_CONFIG = {
- # features
- "continuous_features": [
- "raw_user_age",
- "hour_of_day_sin",
- "hour_of_day_cos",
- "hour_of_week_sin",
- "hour_of_week_cos",
- ],
- "categorical_int_features": [
- "user_gender",
- ],
- "categorical_str_features": [
- "user_zip_code",
- "user_occupation_text",
- "movie_id",
- "user_id",
- ],
- # model
- "embedding_dim": 8,
- "mlp_dim": 8,
- "deep_net_num_units": [192, 192, 192],
- # training
- "learning_rate": 1e-4,
- "num_epochs": 30,
- "batch_size": 8192,
-}
-```
-
-Here, we define a helper function for visualising weights of the cross layer in
-order to better understand its functioning. Also, we define a function for
-compiling, training and evaluating a given model.
-
-
-```python
-
-def plot_training_metrics(history):
- """Graphs all metrics tracked in the history object."""
- plt.figure(figsize=(12, 6))
-
- for metric_name, metric_values in history.history.items():
- plt.plot(metric_values, label=metric_name.replace("_", " ").title())
-
- plt.title("Metrics over Epochs")
- plt.xlabel("Epoch")
- plt.ylabel("Metric Value")
- plt.legend()
- plt.grid(True)
-
-
-def visualize_layer(matrix, features, cmap=plt.cm.Blues):
-
- im = plt.matshow(
- matrix, cmap=cmap, extent=[-0.5, len(features) - 0.5, len(features) - 0.5, -0.5]
- )
-
- ax = plt.gca()
- divider = make_axes_locatable(plt.gca())
- cax = divider.append_axes("right", size="5%", pad=0.05)
- plt.colorbar(im, cax=cax)
- cax.tick_params(labelsize=10)
-
- # Set tick locations explicitly before setting labels
- ax.set_xticks(np.arange(len(features)))
- ax.set_yticks(np.arange(len(features)))
-
- ax.set_xticklabels(features, rotation=45, fontsize=5)
- ax.set_yticklabels(features, fontsize=5)
-
- plt.show()
-
-
-def train_and_evaluate(
- learning_rate,
- epochs,
- train_data,
- test_data,
- model,
- plot_metrics=False,
-):
- optimizer = keras.optimizers.AdamW(learning_rate=learning_rate, clipnorm=1.0)
- loss = keras.losses.MeanSquaredError()
- rmse = keras.metrics.RootMeanSquaredError()
-
- model.compile(
- optimizer=optimizer,
- loss=loss,
- metrics=[rmse],
- )
-
- history = model.fit(
- train_data,
- epochs=epochs,
- verbose=1,
- )
- if plot_metrics:
- plot_training_metrics(history)
-
- results = model.evaluate(test_data, return_dict=True, verbose=1)
- rmse_value = results["root_mean_squared_error"]
-
- return rmse_value, model.count_params()
-
-
-def print_stats(rmse_list, num_params, model_name):
- # Report metrics.
- num_trials = len(rmse_list)
- avg_rmse = np.mean(rmse_list)
- std_rmse = np.std(rmse_list)
-
- if num_trials == 1:
- print(f"{model_name}: RMSE = {avg_rmse}; #params = {num_params}")
- else:
- print(f"{model_name}: RMSE = {avg_rmse} ± {std_rmse}; #params = {num_params}")
-
-```
-
----
-## Real-world example
-
-Let's use the MovieLens 100K dataset. This dataset is used to train models to
-predict users' movie ratings, based on user-related features and movie-related
-features.
-
-### Preparing the dataset
-
-The dataset processing steps here are similar to what's given in the
-[basic ranking](/keras_rs/examples/basic_ranking/)
-tutorial. Let's load the dataset, and keep only the useful columns.
-
-
-```python
-ratings_ds = tfds.load("movielens/100k-ratings", split="train")
-
-
-def preprocess_features(x):
- """Extracts and cyclically encodes timestamp features."""
- features = {
- "movie_id": x["movie_id"],
- "user_id": x["user_id"],
- "user_gender": tf.cast(x["user_gender"], dtype=tf.int32),
- "user_zip_code": x["user_zip_code"],
- "user_occupation_text": x["user_occupation_text"],
- "raw_user_age": tf.cast(x["raw_user_age"], dtype=tf.float32),
- }
- label = tf.cast(x["user_rating"], dtype=tf.float32)
-
- # The timestamp is in seconds since the epoch.
- timestamp = tf.cast(x["timestamp"], dtype=tf.float32)
-
- # Constants for time periods
- SECONDS_IN_HOUR = 3600.0
- HOURS_IN_DAY = 24.0
- HOURS_IN_WEEK = 168.0
-
- # Calculate hour of day and encode it
- hour_of_day = (timestamp / SECONDS_IN_HOUR) % HOURS_IN_DAY
- features["hour_of_day_sin"] = tf.sin(2 * np.pi * hour_of_day / HOURS_IN_DAY)
- features["hour_of_day_cos"] = tf.cos(2 * np.pi * hour_of_day / HOURS_IN_DAY)
-
- # Calculate hour of week and encode it
- hour_of_week = (timestamp / SECONDS_IN_HOUR) % HOURS_IN_WEEK
- features["hour_of_week_sin"] = tf.sin(2 * np.pi * hour_of_week / HOURS_IN_WEEK)
- features["hour_of_week_cos"] = tf.cos(2 * np.pi * hour_of_week / HOURS_IN_WEEK)
-
- return features, label
-
-
-# Apply the new preprocessing function
-ratings_ds = ratings_ds.map(preprocess_features)
-```
-
-For every categorical feature, let's get the list of unique values, i.e., vocabulary, so
-that we can use that for the embedding layer.
-
-
-```python
-vocabularies = {}
-for feature_name in (
- MOVIELENS_CONFIG["categorical_int_features"]
- + MOVIELENS_CONFIG["categorical_str_features"]
-):
- vocabulary = ratings_ds.batch(10_000).map(lambda x, y: x[feature_name])
- vocabularies[feature_name] = np.unique(np.concatenate(list(vocabulary)))
-```
-
-One thing we need to do is to use `keras.layers.StringLookup` and
-`keras.layers.IntegerLookup` to convert all the categorical features into indices, which
-can
-then be fed into embedding layers.
-
-
-```python
-lookup_layers = {}
-lookup_layers.update(
- {
- feature: keras.layers.IntegerLookup(vocabulary=vocabularies[feature])
- for feature in MOVIELENS_CONFIG["categorical_int_features"]
- }
-)
-lookup_layers.update(
- {
- feature: keras.layers.StringLookup(vocabulary=vocabularies[feature])
- for feature in MOVIELENS_CONFIG["categorical_str_features"]
- }
-)
-```
-
-Let's normalize all the continuous features, so that we can use that for the MLP layers.
-
-
-```python
-normalization_layers = {}
-for feature_name in MOVIELENS_CONFIG["continuous_features"]:
- normalization_layers[feature_name] = keras.layers.Normalization(axis=-1)
-
-training_data_for_adaptation = ratings_ds.take(80_000).map(lambda x, y: x)
-
-for feature_name in MOVIELENS_CONFIG["continuous_features"]:
- feature_ds = training_data_for_adaptation.map(
- lambda x: tf.expand_dims(x[feature_name], axis=-1)
- )
- normalization_layers[feature_name].adapt(feature_ds)
-
-ratings_ds = ratings_ds.map(
- lambda x, y: (
- {
- **{
- feature_name: lookup_layers[feature_name](x[feature_name])
- for feature_name in vocabularies
- },
- # Apply the adapted normalization layers to the continuous features.
- **{
- feature_name: tf.squeeze(
- normalization_layers[feature_name](
- tf.expand_dims(x[feature_name], axis=-1)
- ),
- axis=-1,
- )
- for feature_name in MOVIELENS_CONFIG["continuous_features"]
- },
- },
- y,
- )
-)
-```
-
-Let's split our data into train and test sets. We also use `cache()` and
-`prefetch()` for better performance.
-
-
-```python
-ratings_ds = ratings_ds.shuffle(100_000)
-
-train_ds = (
- ratings_ds.take(80_000)
- .batch(MOVIELENS_CONFIG["batch_size"])
- .cache()
- .prefetch(tf.data.AUTOTUNE)
-)
-test_ds = (
- ratings_ds.skip(80_000)
- .batch(MOVIELENS_CONFIG["batch_size"])
- .take(20_000)
- .cache()
- .prefetch(tf.data.AUTOTUNE)
-)
-```
-
-### Building the model
-
-The model will have embedding layers, followed by DotInteraction and feedforward
-layers.
-
-
-```python
-
-class DLRM(keras.Model):
- def __init__(
- self,
- dense_num_units_lst,
- embedding_dim=MOVIELENS_CONFIG["embedding_dim"],
- mlp_dim=MOVIELENS_CONFIG["mlp_dim"],
- **kwargs,
- ):
- super().__init__(**kwargs)
-
- self.embedding_layers = {}
- for feature_name in (
- MOVIELENS_CONFIG["categorical_int_features"]
- + MOVIELENS_CONFIG["categorical_str_features"]
- ):
- vocab_size = len(vocabularies[feature_name]) + 1 # +1 for OOV token
- self.embedding_layers[feature_name] = keras.layers.Embedding(
- input_dim=vocab_size,
- output_dim=embedding_dim,
- )
-
- self.bottom_mlp = keras.Sequential(
- [
- keras.layers.Dense(mlp_dim, activation="relu"),
- keras.layers.Dense(embedding_dim), # Output must match embedding_dim
- ]
- )
-
- self.dot_layer = keras_rs.layers.DotInteraction()
-
- self.top_mlp = []
- for num_units in dense_num_units_lst:
- self.top_mlp.append(keras.layers.Dense(num_units, activation="relu"))
-
- self.output_layer = keras.layers.Dense(1)
-
- self.dense_num_units_lst = dense_num_units_lst
- self.embedding_dim = embedding_dim
-
- def call(self, inputs):
- embeddings = []
- for feature_name in (
- MOVIELENS_CONFIG["categorical_int_features"]
- + MOVIELENS_CONFIG["categorical_str_features"]
- ):
- embedding = self.embedding_layers[feature_name](inputs[feature_name])
- embeddings.append(embedding)
-
- # Process all continuous features together.
- continuous_inputs = []
- for feature_name in MOVIELENS_CONFIG["continuous_features"]:
- # Reshape each feature to (batch_size, 1)
- feature = keras.ops.reshape(
- keras.ops.cast(inputs[feature_name], dtype="float32"), (-1, 1)
- )
- continuous_inputs.append(feature)
-
- # Concatenate into a single tensor: (batch_size, num_continuous_features)
- concatenated_continuous = keras.ops.concatenate(continuous_inputs, axis=1)
-
- # Pass through the Bottom MLP to get one combined vector.
- processed_continuous = self.bottom_mlp(concatenated_continuous)
-
- # Combine with categorical embeddings. Note: we add a list containing the
- # single tensor.
- combined_features = embeddings + [processed_continuous]
-
- # Pass the list of features to the DotInteraction layer.
- x = self.dot_layer(combined_features)
-
- for layer in self.top_mlp:
- x = layer(x)
-
- x = self.output_layer(x)
-
- return x
-
-
-dot_network = DLRM(
- dense_num_units_lst=MOVIELENS_CONFIG["deep_net_num_units"],
- embedding_dim=MOVIELENS_CONFIG["embedding_dim"],
- mlp_dim=MOVIELENS_CONFIG["mlp_dim"],
-)
-
-rmse, dot_network_num_params = train_and_evaluate(
- learning_rate=MOVIELENS_CONFIG["learning_rate"],
- epochs=MOVIELENS_CONFIG["num_epochs"],
- train_data=train_ds,
- test_data=test_ds,
- model=dot_network,
- plot_metrics=True,
-)
-print_stats(
- rmse_list=[rmse],
- num_params=dot_network_num_params,
- model_name="Dot Network",
-)
-```
-
-
-
-
-### Visualizing feature interactions
-
-The DotInteraction layer itself doesn't have a conventional "weight" matrix like a Dense
-layer. Instead, its function is to compute the dot product between the embedding vectors
-of your features.
-
-To visualize the strength of these interactions, we can calculate a matrix representing
-the pairwise interaction strength between all feature embeddings. A common way to do this
-is to take the dot product of the embedding matrices for each pair of features and then
-aggregate the result into a single value (like the mean of the absolute values) that
-represents the overall interaction strength.
-
-
-```python
-
-def get_dot_interaction_matrix(model, categorical_features, continuous_features):
- # The new feature list for the plot labels
- all_feature_names = categorical_features + ["all_continuous_features"]
- num_features = len(all_feature_names)
-
- # Store all feature outputs in the correct order.
- all_feature_outputs = []
-
- # Get outputs for categorical features from embedding layers (unchanged).
- for feature_name in categorical_features:
- embedding = model.embedding_layers[feature_name](keras.ops.array([0]))
- all_feature_outputs.append(embedding)
-
- # Get a single output for ALL continuous features from the shared MLP.
- num_continuous_features = len(continuous_features)
- # Create a dummy input of zeros for the MLP
- dummy_continuous_input = keras.ops.zeros((1, num_continuous_features))
- processed_continuous = model.bottom_mlp(dummy_continuous_input)
- all_feature_outputs.append(processed_continuous)
-
- interaction_matrix = np.zeros((num_features, num_features))
-
- # Iterate through each pair to calculate interaction strength.
- for i in range(num_features):
- for j in range(num_features):
- interaction = keras.ops.dot(
- all_feature_outputs[i], keras.ops.transpose(all_feature_outputs[j])
- )
- interaction_strength = keras.ops.convert_to_numpy(np.abs(interaction))[0][0]
- interaction_matrix[i, j] = interaction_strength
-
- return interaction_matrix, all_feature_names
-
-
-# Get the list of categorical feature names.
-categorical_feature_names = (
- MOVIELENS_CONFIG["categorical_int_features"]
- + MOVIELENS_CONFIG["categorical_str_features"]
-)
-
-# Calculate the interaction matrix with the corrected function.
-interaction_matrix, feature_names = get_dot_interaction_matrix(
- model=dot_network,
- categorical_features=categorical_feature_names,
- continuous_features=MOVIELENS_CONFIG["continuous_features"],
-)
-
-# Visualize the matrix as a heatmap.
-print("\nVisualizing the feature interaction strengths:")
-visualize_layer(interaction_matrix, feature_names)
-```
-
-
-
-
diff --git a/templates/keras_rs/examples/listwise_ranking.md b/templates/keras_rs/examples/listwise_ranking.md
deleted file mode 100644
index 89f302b3fd..0000000000
--- a/templates/keras_rs/examples/listwise_ranking.md
+++ /dev/null
@@ -1,667 +0,0 @@
-# List-wise ranking
-
-**Author:** [Abheesht Sharma](https://github.com/abheesht17/), [Fabien Hertschuh](https://github.com/hertschuh/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Rank movies using pairwise losses instead of pointwise losses.
-
-
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/listwise_ranking.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/listwise_ranking.py)
-
-
-
----
-## Introduction
-
-In our
-[basic ranking tutorial](/keras_rs/examples/basic_ranking/), we explored a model
-that learned to predict ratings for specific user-movie combinations. This model
-took (user, movie) pairs as input and was trained using mean-squared error to
-precisely predict the rating a user might give to a movie.
-
-However, solely optimizing a model's accuracy in predicting individual movie
-scores isn't always the most effective strategy for developing ranking systems.
-For ranking models, pinpoint accuracy in predicting scores is less critical than
-the model's capability to generate an ordered list of items that aligns with a
-user's preferences. In essence, the relative order of items matters more than
-the exact predicted values.
-
-Instead of focusing on the model's predictions for individual query-item pairs
-(a pointwise approach), we can optimize the model based on its ability to
-correctly order items. One common method for this is pairwise ranking. In this
-approach, the model learns by comparing pairs of items (e.g., item A and item B)
-and determining which one should be ranked higher for a given user or query. The
-goal is to minimize the number of incorrectly ordered pairs.
-
-Let's begin by importing all the necessary libraries.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import collections
-
-import keras
-import numpy as np
-import tensorflow as tf # Needed only for the dataset
-import tensorflow_datasets as tfds
-from keras import ops
-
-import keras_rs
-```
-
-Let's define some hyperparameters here.
-
-
-```python
-# Data args
-TRAIN_NUM_LIST_PER_USER = 50
-TEST_NUM_LIST_PER_USER = 1
-NUM_EXAMPLES_PER_LIST = 5
-
-# Model args
-EMBEDDING_DIM = 32
-
-# Train args
-BATCH_SIZE = 1024
-EPOCHS = 5
-LEARNING_RATE = 0.1
-```
-
----
-## Preparing the dataset
-
-We use the MovieLens dataset. The data loading and processing steps are similar
-to previous tutorials, so, we will only discuss the differences here.
-
-
-```python
-# Ratings data.
-ratings = tfds.load("movielens/100k-ratings", split="train")
-# Features of all the available movies.
-movies = tfds.load("movielens/100k-movies", split="train")
-
-users_count = (
- ratings.map(lambda x: tf.strings.to_number(x["user_id"], out_type=tf.int32))
- .reduce(tf.constant(0, tf.int32), tf.maximum)
- .numpy()
-)
-movies_count = movies.cardinality().numpy()
-
-
-def preprocess_rating(x):
- return {
- "user_id": tf.strings.to_number(x["user_id"], out_type=tf.int32),
- "movie_id": tf.strings.to_number(x["movie_id"], out_type=tf.int32),
- # Normalise ratings between 0 and 1.
- "user_rating": (x["user_rating"] - 1.0) / 4.0,
- }
-
-
-shuffled_ratings = ratings.map(preprocess_rating).shuffle(
- 100_000, seed=42, reshuffle_each_iteration=False
-)
-train_ratings = shuffled_ratings.take(70_000)
-val_ratings = shuffled_ratings.skip(70_000).take(15_000)
-test_ratings = shuffled_ratings.skip(85_000).take(15_000)
-```
-
-So far, we've replicated what we have in the basic ranking tutorial.
-
-However, this existing dataset is not directly applicable to list-wise
-optimization. List-wise optimization requires, for each user, a list of movies
-they have rated, allowing the model to learn from the relative orderings within
-that list. The MovieLens 100K dataset, in its original form, provides individual
-rating instances (one user, one movie, one rating per example), rather than
-these aggregated user-specific lists.
-
-To enable listwise optimization, we need to restructure the dataset. This
-involves transforming it so that each data point or example represents a single
-user ID accompanied by a list of movies that user has rated. Within these lists,
-some movies will naturally be ranked higher by the user (as evidenced by their
-ratings) than others. The primary objective for our model will then be to learn
-to predict item orderings that correspond to these observed user preferences.
-
-Let's start by getting the entire list of movies and corresponding ratings for
-every user. We remove `user_ids` corresponding to users who have rated less than
-`NUM_EXAMPLES_PER_LIST` number of movies.
-
-
-```python
-
-def get_movie_sequence_per_user(ratings, min_examples_per_list):
- """Gets movieID sequences and ratings for every user."""
- sequences = collections.defaultdict(list)
-
- for sample in ratings:
- user_id = sample["user_id"]
- movie_id = sample["movie_id"]
- user_rating = sample["user_rating"]
-
- sequences[int(user_id.numpy())].append(
- {
- "movie_id": int(movie_id.numpy()),
- "user_rating": float(user_rating.numpy()),
- }
- )
-
- # Remove lists with < `min_examples_per_list` number of elements.
- sequences = {
- user_id: sequence
- for user_id, sequence in sequences.items()
- if len(sequence) >= min_examples_per_list
- }
-
- return sequences
-
-```
-
-We now sample 50 lists for each user for the training data. For each list, we
-randomly sample 5 movies from the movies the user rated.
-
-
-```python
-
-def sample_sublist_from_list(
- lst,
- num_examples_per_list,
-):
- """Random selects `num_examples_per_list` number of elements from list."""
-
- indices = np.random.choice(
- range(len(lst)),
- size=num_examples_per_list,
- replace=False,
- )
-
- samples = [lst[i] for i in indices]
- return samples
-
-
-def get_examples(
- sequences,
- num_list_per_user,
- num_examples_per_list,
-):
- inputs = {
- "user_id": [],
- "movie_id": [],
- }
- labels = []
- for user_id, user_list in sequences.items():
- sampled_list = sample_sublist_from_list(
- user_list,
- num_examples_per_list,
- )
-
- inputs["user_id"].append(user_id)
- inputs["movie_id"].append(
- tf.convert_to_tensor([f["movie_id"] for f in sampled_list])
- )
- labels.append(tf.convert_to_tensor([f["user_rating"] for f in sampled_list]))
-
- return (
- {"user_id": inputs["user_id"], "movie_id": inputs["movie_id"]},
- labels,
- )
-
-
-train_sequences = get_movie_sequence_per_user(
- ratings=train_ratings, min_examples_per_list=NUM_EXAMPLES_PER_LIST
-)
-train_examples = get_examples(
- train_sequences,
- num_list_per_user=TRAIN_NUM_LIST_PER_USER,
- num_examples_per_list=NUM_EXAMPLES_PER_LIST,
-)
-train_ds = tf.data.Dataset.from_tensor_slices(train_examples)
-
-val_sequences = get_movie_sequence_per_user(
- ratings=val_ratings, min_examples_per_list=5
-)
-val_examples = get_examples(
- val_sequences,
- num_list_per_user=TEST_NUM_LIST_PER_USER,
- num_examples_per_list=NUM_EXAMPLES_PER_LIST,
-)
-val_ds = tf.data.Dataset.from_tensor_slices(val_examples)
-
-test_sequences = get_movie_sequence_per_user(
- ratings=test_ratings, min_examples_per_list=5
-)
-test_examples = get_examples(
- test_sequences,
- num_list_per_user=TEST_NUM_LIST_PER_USER,
- num_examples_per_list=NUM_EXAMPLES_PER_LIST,
-)
-test_ds = tf.data.Dataset.from_tensor_slices(test_examples)
-```
-
-Batch up the dataset, and cache it.
-
-
-```python
-train_ds = train_ds.batch(BATCH_SIZE).cache()
-val_ds = val_ds.batch(BATCH_SIZE).cache()
-test_ds = test_ds.batch(BATCH_SIZE).cache()
-```
-
----
-## Building the model
-
-We build a typical two-tower ranking model, similar to the
-[basic ranking tutorial](/keras_rs/examples/basic_ranking/).
-We have separate embedding layers for user ID and movie IDs. After obtaining
-these embeddings, we concatenate them and pass them through a network of dense
-layers.
-
-The only point of difference is that for movie IDs, we take a list of IDs
-rather than just one movie ID. So, when we concatenate user ID embedding and
-movie IDs' embeddings, we "repeat" the user ID 'NUM_EXAMPLES_PER_LIST' times so
-as to get the same shape as the movie IDs' embeddings.
-
-
-```python
-
-class RankingModel(keras.Model):
- """Create the ranking model with the provided parameters.
-
- Args:
- num_users: Number of entries in the user embedding table.
- num_candidates: Number of entries in the candidate embedding table.
- embedding_dimension: Output dimension for user and movie embedding tables.
- """
-
- def __init__(
- self,
- num_users,
- num_candidates,
- embedding_dimension=32,
- **kwargs,
- ):
- super().__init__(**kwargs)
- # Embedding table for users.
- self.user_embedding = keras.layers.Embedding(num_users, embedding_dimension)
- # Embedding table for candidates.
- self.candidate_embedding = keras.layers.Embedding(
- num_candidates, embedding_dimension
- )
- # Predictions.
- self.ratings = keras.Sequential(
- [
- # Learn multiple dense layers.
- keras.layers.Dense(256, activation="relu"),
- keras.layers.Dense(64, activation="relu"),
- # Make rating predictions in the final layer.
- keras.layers.Dense(1),
- ]
- )
-
- def build(self, input_shape):
- self.user_embedding.build(input_shape["user_id"])
- self.candidate_embedding.build(input_shape["movie_id"])
-
- output_shape = self.candidate_embedding.compute_output_shape(
- input_shape["movie_id"]
- )
-
- self.ratings.build(list(output_shape[:-1]) + [2 * output_shape[-1]])
-
- def call(self, inputs):
- user_id, movie_id = inputs["user_id"], inputs["movie_id"]
- user_embeddings = self.user_embedding(user_id)
- candidate_embeddings = self.candidate_embedding(movie_id)
-
- list_length = ops.shape(movie_id)[-1]
- user_embeddings_repeated = ops.repeat(
- ops.expand_dims(user_embeddings, axis=1),
- repeats=list_length,
- axis=1,
- )
- concatenated_embeddings = ops.concatenate(
- [user_embeddings_repeated, candidate_embeddings], axis=-1
- )
-
- scores = self.ratings(concatenated_embeddings)
- scores = ops.squeeze(scores, axis=-1)
-
- return scores
-
- def compute_output_shape(self, input_shape):
- return (input_shape[0], input_shape[1])
-
-```
-
-Let's instantiate, compile and train our model. We will train two models:
-one with vanilla mean-squared error, and the other with pairwise hinge loss.
-For the latter, we will use `keras_rs.losses.PairwiseHingeLoss`.
-
-Pairwise losses compare pairs of items within each list, penalizing cases where
-an item with a higher true label has a lower predicted score than an item with a
-lower true label. This is why they are more suited for ranking tasks than
-pointwise losses.
-
-To quantify these results, we compute nDCG. nDCG is a measure of ranking quality
-that evaluates how well a system orders items based on relevance, giving more
-importance to highly relevant items appearing at the top of the list and
-normalizing the score against an ideal ranking.
-To compute it, we just need to pass `keras_rs.metrics.NDCG()` as a metric to
-`model.compile`.
-
-
-```python
-model_mse = RankingModel(
- num_users=users_count + 1,
- num_candidates=movies_count + 1,
- embedding_dimension=EMBEDDING_DIM,
-)
-model_mse.compile(
- loss=keras.losses.MeanSquaredError(),
- metrics=[keras_rs.metrics.NDCG(k=NUM_EXAMPLES_PER_LIST, name="ndcg")],
- optimizer=keras.optimizers.Adagrad(learning_rate=LEARNING_RATE),
-)
-model_mse.fit(train_ds, validation_data=val_ds, epochs=EPOCHS)
-```
-
-
----
-## Evaluation
-
-Comparing the validation nDCG values, it is clear that the model trained with
-the pairwise hinge loss outperforms the other one. Let's make this observation
-more concrete by comparing results on the test set.
-
-
-```python
-ndcg_mse = model_mse.evaluate(test_ds, return_dict=True)["ndcg"]
-ndcg_hinge = model_hinge.evaluate(test_ds, return_dict=True)["ndcg"]
-print(ndcg_mse, ndcg_hinge)
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 1s/step - loss: 0.0805 - ndcg: 0.8886
-
-
----
-## Prediction
-
-Now, let's rank some lists!
-
-Let's create a mapping from movie ID to title so that we can surface the titles
-for the ranked list.
-
-
-```python
-movie_id_to_movie_title = {
- int(x["movie_id"]): x["movie_title"] for x in movies.as_numpy_iterator()
-}
-movie_id_to_movie_title[0] = "" # Because id 0 is not in the dataset.
-
-user_id = 42
-movie_ids = [409, 237, 131, 941, 543]
-predictions = model_hinge.predict(
- {
- "user_id": keras.ops.array([user_id]),
- "movie_id": keras.ops.array([movie_ids]),
- }
-)
-predictions = keras.ops.convert_to_numpy(keras.ops.squeeze(predictions, axis=0))
-sorted_indices = np.argsort(predictions)
-sorted_movies = [movie_ids[i] for i in sorted_indices]
-
-for i, movie_id in enumerate(sorted_movies):
- print(f"{i + 1}. ", movie_id_to_movie_title[movie_id])
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 261ms/step
-
-
-And we're all done!
diff --git a/templates/keras_rs/examples/multi_task.md b/templates/keras_rs/examples/multi_task.md
deleted file mode 100644
index 68d00a60e6..0000000000
--- a/templates/keras_rs/examples/multi_task.md
+++ /dev/null
@@ -1,1463 +0,0 @@
-# Multi-task recommenders: retrieval + ranking
-
-**Author:** [Abheesht Sharma](https://github.com/abheesht17/), [Fabien Hertschuh](https://github.com/hertschuh/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Using one model for both retrieval and ranking.
-
-
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/multi_task.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/multi_task.py)
-
-
-
----
-## Introduction
-
-In the
-[basic retrieval](/keras_rs/examples/basic_retrieval/)
-and
-[basic ranking](/keras_rs/examples/basic_ranking/)
-tutorials, we created separate models for retrieval and ranking tasks,
-respectively. However, in many cases, building a single, joint model for
-multiple tasks can lead to better performance than creating distinct models for
-each task. This is especially true when dealing with data that is unevenly
-distributed — such as abundant data (e.g., clicks) versus sparse data
-(e.g., purchases, returns, or manual reviews). In such scenarios, a joint model
-can leverage representations learned from the abundant data to improve
-predictions on the sparse data, a technique known as transfer learning.
-For instance, [research](https://openreview.net/forum?id=SJxPVcSonN) shows that
-a model trained to predict user ratings from sparse survey data can be
-significantly enhanced by incorporating an auxiliary task using abundant click
-log data.
-
-In this example, we develop a multi-objective recommender system using the
-MovieLens dataset. We incorporate both implicit feedback (e.g., movie watches)
-and explicit feedback (e.g., ratings) to create a more robust and effective
-recommendation model. For the former, we predict "movie watches", i.e., whether
-a user has watched a movie, and for the latter, we predict the rating given by a
-user to a movie.
-
-Let's start by importing the necessary packages.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import keras
-import tensorflow as tf # Needed for the dataset
-import tensorflow_datasets as tfds
-
-import keras_rs
-```
-
----
-## Prepare the dataset
-
-We use the MovieLens dataset. The data loading and processing steps are similar
-to previous tutorials, so we will not discuss them in details here.
-
-
-```python
-# Ratings data with user and movie data.
-ratings = tfds.load("movielens/100k-ratings", split="train")
-# Features of all the available movies.
-movies = tfds.load("movielens/100k-movies", split="train")
-```
-
-Get user and movie counts so that we can define embedding layers.
-
-
-```python
-users_count = (
- ratings.map(lambda x: tf.strings.to_number(x["user_id"], out_type=tf.int32))
- .reduce(tf.constant(0, tf.int32), tf.maximum)
- .numpy()
-)
-
-movies_count = movies.cardinality().numpy()
-```
-
-Our inputs are `"user_id"` and `"movie_id"`. Our label for the ranking task is
-`"user_rating"`. `"user_rating"` is an integer between 0 to 4. We constrain it
-to `[0, 1]`.
-
-
-```python
-
-def preprocess_rating(x):
- return (
- {
- "user_id": tf.strings.to_number(x["user_id"], out_type=tf.int32),
- "movie_id": tf.strings.to_number(x["movie_id"], out_type=tf.int32),
- },
- (x["user_rating"] - 1.0) / 4.0,
- )
-
-
-shuffled_ratings = ratings.map(preprocess_rating).shuffle(
- 100_000, seed=42, reshuffle_each_iteration=False
-)
-
-```
-
-Split the dataset into train-test sets.
-
-
-```python
-train_ratings = shuffled_ratings.take(80_000).batch(1000).cache()
-test_ratings = shuffled_ratings.skip(80_000).take(20_000).batch(1000).cache()
-```
-
----
-## Building the model
-
-We build the model in a similar way to the basic retrieval and basic ranking
-guides.
-
-For the retrieval task (i.e., predicting whether a user watched a movie),
-we compute the similarity of the corresponding user and movie embeddings, and
-use cross entropy loss, where the positive pairs are labelled one, and all other
-samples in the batch are considered "negatives". We report top-k accuracy for
-this task.
-
-For the ranking task (i.e., given a user-movie pair, predict rating), we
-concatenate user and movie embeddings and pass it to a dense module. We use
-MSE loss here, and report the Root Mean Squared Error (RMSE).
-
-The final loss is a weighted combination of the two losses mentioned above,
-where the weights are `"retrieval_loss_wt"` and `"ranking_loss_wt"`. These
-weights decide which task the model will focus on.
-
-
-```python
-
-class MultiTaskModel(keras.Model):
- def __init__(
- self,
- num_users,
- num_candidates,
- embedding_dimension=32,
- layer_sizes=(256, 128),
- retrieval_loss_wt=1.0,
- ranking_loss_wt=1.0,
- **kwargs,
- ):
- super().__init__(**kwargs)
- # Our query tower, simply an embedding table.
- self.user_embedding = keras.layers.Embedding(num_users, embedding_dimension)
-
- # Our candidate tower, simply an embedding table.
- self.candidate_embedding = keras.layers.Embedding(
- num_candidates, embedding_dimension
- )
-
- # Rating model.
- self.rating_model = tf.keras.Sequential(
- [
- keras.layers.Dense(layer_size, activation="relu")
- for layer_size in layer_sizes
- ]
- + [keras.layers.Dense(1)]
- )
-
- # The layer that performs the retrieval.
- self.retrieval = keras_rs.layers.BruteForceRetrieval(k=10, return_scores=False)
-
- self.retrieval_loss_fn = keras.losses.CategoricalCrossentropy(
- from_logits=True,
- reduction="sum",
- )
- self.ranking_loss_fn = keras.losses.MeanSquaredError()
-
- # Top-k accuracy for retrieval
- self.top_k_metric = keras.metrics.SparseTopKCategoricalAccuracy(
- k=100, from_sorted_ids=True
- )
- # RMSE for ranking
- self.rmse_metric = keras.metrics.RootMeanSquaredError()
-
- # Attributes.
- self.num_users = num_users
- self.num_candidates = num_candidates
- self.embedding_dimension = embedding_dimension
- self.layer_sizes = layer_sizes
- self.retrieval_loss_wt = retrieval_loss_wt
- self.ranking_loss_wt = ranking_loss_wt
-
- def build(self, input_shape):
- self.user_embedding.build(input_shape)
- self.candidate_embedding.build(input_shape)
- # In this case, the candidates are directly the movie embeddings.
- # We take a shortcut and directly reuse the variable.
- self.retrieval.candidate_embeddings = self.candidate_embedding.embeddings
- self.retrieval.build(input_shape)
-
- self.rating_model.build((None, 2 * self.embedding_dimension))
-
- super().build(input_shape)
-
- def call(self, inputs, training=False):
- # Unpack inputs. Note that we have the if condition throughout this
- # `call()` method so that we can do a `.predict()` for the retrieval
- # task.
- user_id = inputs["user_id"]
- if "movie_id" in inputs:
- movie_id = inputs["movie_id"]
-
- result = {}
-
- # Get user, movie embeddings.
- user_embeddings = self.user_embedding(user_id)
- result["user_embeddings"] = user_embeddings
-
- if "movie_id" in inputs:
- candidate_embeddings = self.candidate_embedding(movie_id)
- result["candidate_embeddings"] = candidate_embeddings
-
- # Pass both embeddings through the rating block of the model.
- rating = self.rating_model(
- keras.ops.concatenate([user_embeddings, candidate_embeddings], axis=1)
- )
- result["rating"] = rating
-
- if not training:
- # Skip the retrieval of top movies during training as the
- # predictions are not used.
- result["predictions"] = self.retrieval(user_embeddings)
-
- return result
-
- def compute_loss(self, x, y, y_pred, sample_weight, training=True):
- user_embeddings = y_pred["user_embeddings"]
- candidate_embeddings = y_pred["candidate_embeddings"]
-
- # 1. Retrieval
-
- # Compute the affinity score by multiplying the two embeddings.
- scores = keras.ops.matmul(
- user_embeddings,
- keras.ops.transpose(candidate_embeddings),
- )
-
- # Retrieval labels: One-hot vectors
- num_users = keras.ops.shape(user_embeddings)[0]
- num_candidates = keras.ops.shape(candidate_embeddings)[0]
- retrieval_labels = keras.ops.eye(num_users, num_candidates)
- # Retrieval loss
- retrieval_loss = self.retrieval_loss_fn(retrieval_labels, scores, sample_weight)
-
- # 2. Ranking
- ratings = y
- pred_rating = y_pred["rating"]
-
- # Ranking labels are just ratings.
- ranking_labels = keras.ops.expand_dims(ratings, -1)
- # Ranking loss
- ranking_loss = self.ranking_loss_fn(ranking_labels, pred_rating, sample_weight)
-
- # Total loss is a weighted combination of the two losses.
- total_loss = (
- self.retrieval_loss_wt * retrieval_loss
- + self.ranking_loss_wt * ranking_loss
- )
-
- return total_loss
-
- def compute_metrics(self, x, y, y_pred, sample_weight=None):
- # RMSE can be computed irrespective of whether we are
- # training/evaluating.
- self.rmse_metric.update_state(
- y,
- y_pred["rating"],
- sample_weight=sample_weight,
- )
-
- if "predictions" in y_pred:
- # We are evaluating or predicting. Update `top_k_metric`.
- movie_ids = x["movie_id"]
- predictions = y_pred["predictions"]
- # For `top_k_metric`, which is a `SparseTopKCategoricalAccuracy`, we
- # only take top rated movies, and we put a weight of 0 for the rest.
- rating_weight = keras.ops.cast(keras.ops.greater(y, 0.9), "float32")
- sample_weight = (
- rating_weight
- if sample_weight is None
- else keras.ops.multiply(rating_weight, sample_weight)
- )
- self.top_k_metric.update_state(
- movie_ids, predictions, sample_weight=sample_weight
- )
-
- return self.get_metrics_result()
- else:
- # We are training. `top_k_metric` is not updated and is zero, so
- # don't report it.
- result = self.get_metrics_result()
- result.pop(self.top_k_metric.name)
- return result
-
-```
-
----
-## Training and evaluating
-
-We will train three different models here. This can be done easily by passing
-the correct loss weights:
-
-1. Rating-specialised model
-2. Retrieval-specialised model
-3. Multi-task model
-
-
-```python
-# Rating-specialised model
-model = MultiTaskModel(
- num_users=users_count + 1,
- num_candidates=movies_count + 1,
- ranking_loss_wt=1.0,
- retrieval_loss_wt=0.0,
-)
-model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
-model.fit(train_ratings, epochs=5)
-
-model.evaluate(test_ratings)
-
-# Retrieval-specialised model
-model = MultiTaskModel(
- num_users=users_count + 1,
- num_candidates=movies_count + 1,
- ranking_loss_wt=0.0,
- retrieval_loss_wt=1.0,
-)
-model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
-model.fit(train_ratings, epochs=5)
-
-model.evaluate(test_ratings)
-
-# Multi-task model
-model = MultiTaskModel(
- num_users=users_count + 1,
- num_candidates=movies_count + 1,
- ranking_loss_wt=1.0,
- retrieval_loss_wt=1.0,
-)
-model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
-model.fit(train_ratings, epochs=5)
-
-model.evaluate(test_ratings)
-```
-
-
-Let's plot a table of the metrics and pen down our observations:
-
-| Model | Top-K Accuracy (↑) | RMSE (↓) |
-|-----------------------|--------------------|----------|
-| rating-specialised | 0.005 | 0.26 |
-| retrieval-specialised | 0.020 | 0.78 |
-| multi-task | 0.022 | 0.25 |
-
-As expected, the rating-specialised model has good RMSE, but poor top-k
-accuracy. For the retrieval-specialised model, it's the opposite.
-
-For the multi-task model, we notice that the model does well (or even slightly
-better than the two specialised models) on both tasks. In general, we can expect
-multi-task learning to bring about better results, especially when one task has
-a data-abundant source, and the other task is trained on sparse data.
-
-Now, let's make a prediction! We will first do a retrieval, and then for the
-retrieved list of movies, we will predict the rating using the same model.
-
-
-```python
-movie_id_to_movie_title = {
- int(x["movie_id"]): x["movie_title"] for x in movies.as_numpy_iterator()
-}
-movie_id_to_movie_title[0] = "" # Because id 0 is not in the dataset.
-
-user_id = 5
-retrieved_movie_ids = model.predict(
- {
- "user_id": keras.ops.array([user_id]),
- }
-)
-retrieved_movie_ids = keras.ops.convert_to_numpy(retrieved_movie_ids["predictions"][0])
-retrieved_movies = [movie_id_to_movie_title[x] for x in retrieved_movie_ids]
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 109ms/step
-
-
-```
-
-```
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 110ms/step
-
-
-For these retrieved movies, we can now get the corresponding ratings.
-
-
-```python
-pred_ratings = model.predict(
- {
- "user_id": keras.ops.array([user_id] * len(retrieved_movie_ids)),
- "movie_id": keras.ops.array(retrieved_movie_ids),
- }
-)["rating"]
-pred_ratings = keras.ops.convert_to_numpy(keras.ops.squeeze(pred_ratings, axis=1))
-
-for movie_id, prediction in zip(retrieved_movie_ids, pred_ratings):
- print(f"{movie_id_to_movie_title[movie_id]}: {5.0 * prediction:,.2f}")
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 273ms/step
-
-
-```
-
-```
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 274ms/step
-
-
-
-```
-b'Blob, The (1958)': 2.01
-b'Mighty Morphin Power Rangers: The Movie (1995)': 2.03
-b'Flintstones, The (1994)': 2.18
-b'Beverly Hillbillies, The (1993)': 1.89
-b'Lawnmower Man, The (1992)': 2.57
-b'Hot Shots! Part Deux (1993)': 2.28
-b'Street Fighter (1994)': 1.84
-b'Cabin Boy (1994)': 1.94
-b'Little Rascals, The (1994)': 2.12
-b'Jaws 3-D (1983)': 2.27
-
-```
-
\ No newline at end of file
diff --git a/templates/keras_rs/examples/sas_rec.md b/templates/keras_rs/examples/sas_rec.md
deleted file mode 100644
index 54399cd2e5..0000000000
--- a/templates/keras_rs/examples/sas_rec.md
+++ /dev/null
@@ -1,2970 +0,0 @@
-# Sequential retrieval using SASRec
-
-**Author:** [Abheesht Sharma](https://github.com/abheesht17/), [Fabien Hertschuh](https://github.com/hertschuh/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Recommend movies using a Transformer-based retrieval model (SASRec).
-
-
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/sas_rec.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/sas_rec.py)
-
-
-
----
-## Introduction
-
-Sequential recommendation is a popular model that looks at a sequence of items
-that users have interacted with previously and then predicts the next item.
-Here, the order of the items within each sequence matters. Previously, in the
-[Recommending movies: retrieval using a sequential model](/keras_rs/examples/sequential_retrieval/)
-example, we built a GRU-based sequential retrieval model. In this example, we
-will build a popular Transformer decoder-based model named
-[Self-Attentive Sequential Recommendation (SASRec)](https://arxiv.org/abs/1808.09781)
-for the same sequential recommendation task.
-
-Let's begin by importing all the necessary libraries.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import collections
-import os
-
-import keras
-import keras_hub
-import numpy as np
-import pandas as pd
-import tensorflow as tf # Needed only for the dataset
-from keras import ops
-
-import keras_rs
-```
-
-Let's also define all important variables/hyperparameters below.
-
-
-```python
-DATA_DIR = "./raw/data/"
-
-# MovieLens-specific variables
-MOVIELENS_1M_URL = "https://files.grouplens.org/datasets/movielens/ml-1m.zip"
-MOVIELENS_ZIP_HASH = "a6898adb50b9ca05aa231689da44c217cb524e7ebd39d264c56e2832f2c54e20"
-
-RATINGS_FILE_NAME = "ratings.dat"
-MOVIES_FILE_NAME = "movies.dat"
-
-# Data processing args
-MAX_CONTEXT_LENGTH = 200
-MIN_SEQUENCE_LENGTH = 3
-PAD_ITEM_ID = 0
-
-RATINGS_DATA_COLUMNS = ["UserID", "MovieID", "Rating", "Timestamp"]
-MOVIES_DATA_COLUMNS = ["MovieID", "Title", "Genres"]
-MIN_RATING = 2
-
-# Training/model args picked from SASRec paper
-BATCH_SIZE = 128
-NUM_EPOCHS = 10
-LEARNING_RATE = 0.001
-
-NUM_LAYERS = 2
-NUM_HEADS = 1
-HIDDEN_DIM = 50
-DROPOUT = 0.2
-```
-
----
-## Dataset
-
-Next, we need to prepare our dataset. Like we did in the
-[sequential retrieval](/keras_rs/examples/sequential_retrieval/)
-example, we are going to use the MovieLens dataset.
-
-The dataset preparation step is fairly involved. The original ratings dataset
-contains `(user, movie ID, rating, timestamp)` tuples (among other columns,
-which are not important for this example). Since we are dealing with sequential
-retrieval, we need to create movie sequences for every user, where the sequences
-are ordered by timestamp.
-
-Let's start by downloading and reading the dataset.
-
-
-```python
-# Download the MovieLens dataset.
-if not os.path.exists(DATA_DIR):
- os.makedirs(DATA_DIR)
-
-path_to_zip = keras.utils.get_file(
- fname="ml-1m.zip",
- origin=MOVIELENS_1M_URL,
- file_hash=MOVIELENS_ZIP_HASH,
- hash_algorithm="sha256",
- extract=True,
- cache_dir=DATA_DIR,
-)
-movielens_extracted_dir = os.path.join(
- os.path.dirname(path_to_zip),
- "ml-1m_extracted",
- "ml-1m",
-)
-
-
-# Read the dataset.
-def read_data(data_directory, min_rating=None):
- """Read movielens ratings.dat and movies.dat file
- into dataframe.
- """
-
- ratings_df = pd.read_csv(
- os.path.join(data_directory, RATINGS_FILE_NAME),
- sep="::",
- names=RATINGS_DATA_COLUMNS,
- encoding="unicode_escape",
- )
- ratings_df["Timestamp"] = ratings_df["Timestamp"].apply(int)
-
- # Remove movies with `rating < min_rating`.
- if min_rating is not None:
- ratings_df = ratings_df[ratings_df["Rating"] >= min_rating]
-
- movies_df = pd.read_csv(
- os.path.join(data_directory, MOVIES_FILE_NAME),
- sep="::",
- names=MOVIES_DATA_COLUMNS,
- encoding="unicode_escape",
- )
- return ratings_df, movies_df
-
-
-ratings_df, movies_df = read_data(
- data_directory=movielens_extracted_dir, min_rating=MIN_RATING
-)
-
-# Need to know #movies so as to define embedding layers.
-movies_count = movies_df["MovieID"].max()
-```
-
-
-```
-Downloading data from https://files.grouplens.org/datasets/movielens/ml-1m.zip
-
-```
-
-```
-:26: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
- ratings_df = pd.read_csv(
-
-:38: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
- movies_df = pd.read_csv(
-
-```
-
-Now that we have read the dataset, let's create sequences of movies
-for every user. Here is the function for doing just that.
-
-
-```python
-
-def get_movie_sequence_per_user(ratings_df):
- """Get movieID sequences for every user."""
- sequences = collections.defaultdict(list)
-
- for user_id, movie_id, rating, timestamp in ratings_df.values:
- sequences[user_id].append(
- {
- "movie_id": movie_id,
- "timestamp": timestamp,
- "rating": rating,
- }
- )
-
- # Sort movie sequences by timestamp for every user.
- for user_id, context in sequences.items():
- context.sort(key=lambda x: x["timestamp"])
- sequences[user_id] = context
-
- return sequences
-
-
-sequences = get_movie_sequence_per_user(ratings_df)
-```
-
-So far, we have essentially replicated what we did in the sequential retrieval
-example. We have a sequence of movies for every user.
-
-SASRec is trained contrastively, which means the model learns to distinguish
-between sequences of movies a user has actually interacted with (positive
-examples) and sequences they have not interacted with (negative examples).
-
-The following function, `format_data`, prepares the data in this specific
-format. For each user's movie sequence, it generates a corresponding
-"negative sequence". This negative sequence consists of randomly
-selected movies that the user has *not* interacted with, but are of the same
-length as the original sequence.
-
-
-```python
-
-def format_data(sequences):
- examples = {
- "sequence": [],
- "negative_sequence": [],
- }
-
- for user_id in sequences:
- sequence = [int(d["movie_id"]) for d in sequences[user_id]]
-
- # Get negative sequence.
- def random_negative_item_id(low, high, positive_lst):
- sampled = np.random.randint(low=low, high=high)
- while sampled in positive_lst:
- sampled = np.random.randint(low=low, high=high)
- return sampled
-
- negative_sequence = [
- random_negative_item_id(1, movies_count + 1, sequence)
- for _ in range(len(sequence))
- ]
-
- examples["sequence"].append(np.array(sequence))
- examples["negative_sequence"].append(np.array(negative_sequence))
-
- examples["sequence"] = tf.ragged.constant(examples["sequence"])
- examples["negative_sequence"] = tf.ragged.constant(examples["negative_sequence"])
-
- return examples
-
-
-examples = format_data(sequences)
-ds = tf.data.Dataset.from_tensor_slices(examples).batch(BATCH_SIZE)
-```
-
-Now that we have the original movie interaction sequences for each user (from
-`format_data`, stored in `examples["sequence"]`) and their corresponding
-random negative sequences (in `examples["negative_sequence"]`), the next step is
-to prepare this data for input to the model. The primary goals of this
-preprocessing are:
-
-1. Creating Input Features and Target Labels: For sequential
- recommendation, the model learns to predict the next item in a sequence
- given the preceding items. This is achieved by:
- - taking the original `example["sequence"]` and creating the model's
- input features (`item_ids`) from all items *except the last one*
- (`example["sequence"][..., :-1]`);
- - creating the target "positive sequence" (what the model tries to predict
- as the actual next items) by taking the original `example["sequence"]`
- and shifting it, using all items *except the first one*
- (`example["sequence"][..., 1:]`);
- - shifting `example["negative_sequence"]` (from `format_data`) is
- to create the target "negative sequence" for the contrastive loss
- (`example["negative_sequence"][..., 1:]`).
-
-2. Handling Variable Length Sequences: Neural networks typically require
- fixed-size inputs. Therefore, both the input feature sequences and the
- target sequences are padded (with a special `PAD_ITEM_ID`) or truncated
- to a predefined `MAX_CONTEXT_LENGTH`. A `padding_mask` is also generated
- from the input features to ensure the model ignores these padded tokens
- during attention calculations, i.e, these tokens will be masked.
-
-3. Differentiating Training and Validation/Testing:
- - During training:
- - Input features (`item_ids`) and context for negative sequences
- are prepared as described above (all but the last item of the
- original sequences).
- - Target positive and negative sequences are the shifted versions of
- the original sequences.
- - `sample_weight` is created based on the input features to ensure
- that loss is calculated only on actual items, not on padding tokens
- in the targets.
- - During validation/testing:
- - Input features are prepared similarly.
- - The model's performance is typically evaluated on its ability to
- predict the actual last item of the original sequence. Thus,
- `sample_weight` is configured to focus the loss calculation
- only on this final prediction in the target sequences.
-
-Note: SASRec does the same thing we've done above, except that they take the
-`item_ids[:-2]` for the validation set and `item_ids[:-1]` for the test set.
-We skip that here for brevity.
-
-
-```python
-
-def _preprocess(example, train=False):
- sequence = example["sequence"]
- negative_sequence = example["negative_sequence"]
-
- if train:
- sequence = example["sequence"][..., :-1]
- negative_sequence = example["negative_sequence"][..., :-1]
-
- batch_size = tf.shape(sequence)[0]
-
- if not train:
- # Loss computed only on last token.
- sample_weight = tf.zeros_like(sequence, dtype="float32")[..., :-1]
- sample_weight = tf.concat(
- [sample_weight, tf.ones((batch_size, 1), dtype="float32")], axis=1
- )
-
- # Truncate/pad sequence. +1 to account for truncation later.
- sequence = sequence.to_tensor(
- shape=[batch_size, MAX_CONTEXT_LENGTH + 1], default_value=PAD_ITEM_ID
- )
- negative_sequence = negative_sequence.to_tensor(
- shape=[batch_size, MAX_CONTEXT_LENGTH + 1], default_value=PAD_ITEM_ID
- )
- if train:
- sample_weight = tf.cast(sequence != PAD_ITEM_ID, dtype="float32")
- else:
- sample_weight = sample_weight.to_tensor(
- shape=[batch_size, MAX_CONTEXT_LENGTH + 1], default_value=0
- )
-
- example = (
- {
- # last token does not have a next token
- "item_ids": sequence[..., :-1],
- # padding mask for controlling attention mask
- "padding_mask": (sequence != PAD_ITEM_ID)[..., :-1],
- },
- {
- "positive_sequence": sequence[
- ..., 1:
- ], # 0th token's label will be 1st token, and so on
- "negative_sequence": negative_sequence[..., 1:],
- },
- sample_weight[..., 1:], # loss will not be computed on pad tokens
- )
- return example
-
-
-def preprocess_train(examples):
- return _preprocess(examples, train=True)
-
-
-def preprocess_val(examples):
- return _preprocess(examples, train=False)
-
-
-train_ds = ds.map(preprocess_train)
-val_ds = ds.map(preprocess_val)
-```
-
-We can see a batch for each.
-
-
-```python
-for batch in train_ds.take(1):
- print(batch)
-
-for batch in val_ds.take(1):
- print(batch)
-
-```
-
-
----
-## Making predictions
-
-Now that we have a model, we would like to be able to make predictions.
-
-So far, we have only handled movies by id. Now is the time to create a mapping
-keyed by movie IDs to be able to surface the titles.
-
-
-```python
-movie_id_to_movie_title = dict(zip(movies_df["MovieID"], movies_df["Title"]))
-movie_id_to_movie_title[0] = "" # Because id 0 is not in the dataset.
-```
-
-We then simply use the Keras `model.predict()` method. Under the hood, it calls
-the `BruteForceRetrieval` layer to perform the actual retrieval.
-
-Note that this model can retrieve movies already watched by the user. We could
-easily add logic to remove them if that is desirable.
-
-
-```python
-for ele in val_ds.unbatch().take(1):
- test_sample = ele[0]
- test_sample["item_ids"] = tf.expand_dims(test_sample["item_ids"], axis=0)
- test_sample["padding_mask"] = tf.expand_dims(test_sample["padding_mask"], axis=0)
-
-movie_sequence = np.array(test_sample["item_ids"])[0]
-for movie_id in movie_sequence:
- if movie_id == 0:
- continue
- print(movie_id_to_movie_title[movie_id], end="; ")
-print()
-
-predictions = model.predict(test_sample)["predictions"]
-predictions = keras.ops.convert_to_numpy(predictions)
-
-for movie_id in predictions[0]:
- print(movie_id_to_movie_title[movie_id])
-```
-
-
-```
-Girl, Interrupted (1999); Back to the Future (1985); Titanic (1997); Cinderella (1950); Meet Joe Black (1998); Last Days of Disco, The (1998); Erin Brockovich (2000); Christmas Story, A (1983); To Kill a Mockingbird (1962); One Flew Over the Cuckoo's Nest (1975); Wallace & Gromit: The Best of Aardman Animation (1996); Star Wars: Episode IV - A New Hope (1977); Wizard of Oz, The (1939); Fargo (1996); Run Lola Run (Lola rennt) (1998); Rain Man (1988); Saving Private Ryan (1998); Awakenings (1990); Gigi (1958); Sound of Music, The (1965); Driving Miss Daisy (1989); Bambi (1942); Apollo 13 (1995); Mary Poppins (1964); E.T. the Extra-Terrestrial (1982); My Fair Lady (1964); Ben-Hur (1959); Big (1988); Sixth Sense, The (1999); Dead Poets Society (1989); James and the Giant Peach (1996); Ferris Bueller's Day Off (1986); Secret Garden, The (1993); Toy Story 2 (1999); Airplane! (1980); Pleasantville (1998); Dumbo (1941); Princess Bride, The (1987); Snow White and the Seven Dwarfs (1937); Miracle on 34th Street (1947); Ponette (1996); Schindler's List (1993); Beauty and the Beast (1991); Tarzan (1999); Close Shave, A (1995); Aladdin (1992); Toy Story (1995); Bug's Life, A (1998); Antz (1998); Hunchback of Notre Dame, The (1996); Hercules (1997); Mulan (1998); Pocahontas (1995);
-
-```
-
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 790ms/step
-
-
-```
-
-```
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 791ms/step
-
-
-
-```
-Groundhog Day (1993)
-Aladdin (1992)
-Toy Story (1995)
-Forrest Gump (1994)
-Bug's Life, A (1998)
-Lion King, The (1994)
-Shakespeare in Love (1998)
-American Beauty (1999)
-Sixth Sense, The (1999)
-Ghostbusters (1984)
-
-```
-
-And that's all!
diff --git a/templates/keras_rs/examples/scann.md b/templates/keras_rs/examples/scann.md
deleted file mode 100644
index 17c5c70bb3..0000000000
--- a/templates/keras_rs/examples/scann.md
+++ /dev/null
@@ -1,2159 +0,0 @@
-# Faster retrieval with Scalable Nearest Neighbours (ScANN)
-
-**Author:** [Abheesht Sharma](https://github.com/abheesht17/), [Fabien Hertschuh](https://github.com/hertschuh/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Using ScANN for faster retrieval.
-
-
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/scann.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/scann.py)
-
-
-
----
-## Introduction
-
-Retrieval models are designed to quickly identify a small set of highly relevant
-candidates from vast pools of data, often comprising millions or even hundreds
-of millions of items. To effectively respond to the user's context and behavior
-in real time, these models must perform this task in just milliseconds.
-
-Approximate nearest neighbor (ANN) search is the key technology that enables
-this level of efficiency. In this tutorial, we'll demonstrate how to leverage
-ScANN—a cutting-edge nearest neighbor retrieval library—to effortlessly scale
-retrieval for millions of items.
-
-[ScANN](https://research.google/blog/announcing-scann-efficient-vector-similarity-search/),
-developed by Google Research, is a high-performance library designed for
-dense vector similarity search at scale. It efficiently indexes a database of
-candidate embeddings, enabling rapid search during inference. By leveraging
-advanced vector compression techniques and finely tuned algorithms, ScaNN
-strikes an optimal balance between speed and accuracy. As a result, it can
-significantly outperform brute-force search methods, delivering fast retrieval
-with minimal loss in accuracy.
-
-We will start with the same code as the
-[basic retrieval example](/keras_rs/examples/basic_retrieval/).
-Data processing, model building, and training remain exactly the same. Feel free
-to skip this part if you have gone over the basic retrieval example before.
-
-Note: ScANN does not have its own separate layer in KerasRS because the ScANN
-library is TensorFlow-only. Here, in this example, we directly use the ScANN
-library and demonstrate its usage with KerasRS.
-
----
-## Imports
-
-Let's install the `scann` library and import all necessary packages. We will
-also set the backend to JAX.
-
-
-```python
-# ruff: noqa: E402
-```
-
-
-```python
-!pip install -q scann
-```
-
-
-```
-[?25l [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/11.8 MB [31m? eta [36m-:--:--
-```
-
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-
-
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-
-
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-
-
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/11.8 MB [31m126.7 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/11.8 MB [31m2.8 MB/s eta [36m0:00:04
-
-
-[2K [91m━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/11.8 MB [31m2.8 MB/s eta [36m0:00:04
-[2K [91m━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/11.8 MB [31m2.8 MB/s eta [36m0:00:04
-[2K [91m━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/11.8 MB [31m2.8 MB/s eta [36m0:00:04
-[2K [91m━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/11.8 MB [31m2.8 MB/s eta [36m0:00:04
-[2K [91m━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/11.8 MB [31m2.8 MB/s eta [36m0:00:04
-[2K [91m━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/11.8 MB [31m2.8 MB/s eta [36m0:00:04
-
-
-[2K [91m━━━━━━━━━━━━━━[90m╺[90m━━━━━━━━━━━━━━━━━━━━━━━━━ 4.2/11.8 MB [31m4.2 MB/s eta [36m0:00:02
-[2K [91m━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━━━━━━━━━━━━━━━━━━ 5.6/11.8 MB [31m5.3 MB/s eta [36m0:00:02
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━━━━━ 9.4/11.8 MB [31m8.9 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━ 10.5/11.8 MB [31m9.3 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━ 10.5/11.8 MB [31m9.3 MB/s eta [36m0:00:01
-
-
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━ 10.5/11.8 MB [31m9.3 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━ 10.5/11.8 MB [31m9.3 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━ 10.5/11.8 MB [31m9.3 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━ 10.5/11.8 MB [31m9.3 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸[90m━━━━ 10.5/11.8 MB [31m9.3 MB/s eta [36m0:00:01
-[2K [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[91m╸ 11.8/11.8 MB [31m17.3 MB/s eta [36m0:00:01
-[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.8/11.8 MB [31m16.4 MB/s eta [36m0:00:00
- [?25h
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import time
-import uuid
-
-import keras
-import tensorflow as tf # Needed for the dataset
-import tensorflow_datasets as tfds
-from scann import scann_ops
-
-import keras_rs
-```
-
----
-## Preparing the dataset
-
-
-```python
-# Ratings data with user and movie data.
-ratings = tfds.load("movielens/100k-ratings", split="train")
-# Features of all the available movies.
-movies = tfds.load("movielens/100k-movies", split="train")
-
-# Get user and movie counts so that we can define embedding layers for both.
-users_count = (
- ratings.map(lambda x: tf.strings.to_number(x["user_id"], out_type=tf.int32))
- .reduce(tf.constant(0, tf.int32), tf.maximum)
- .numpy()
-)
-
-movies_count = movies.cardinality().numpy()
-
-
-# Preprocess the dataset, by selecting only the relevant columns.
-def preprocess_rating(x):
- return (
- # Input is the user IDs
- tf.strings.to_number(x["user_id"], out_type=tf.int32),
- # Labels are movie IDs + ratings between 0 and 1.
- {
- "movie_id": tf.strings.to_number(x["movie_id"], out_type=tf.int32),
- "rating": (x["user_rating"] - 1.0) / 4.0,
- },
- )
-
-
-shuffled_ratings = ratings.map(preprocess_rating).shuffle(
- 100_000, seed=42, reshuffle_each_iteration=False
-)
-# Train-test split.
-train_ratings = shuffled_ratings.take(80_000).batch(1000).cache()
-test_ratings = shuffled_ratings.skip(80_000).take(20_000).batch(1000).cache()
-```
-
----
-## Implementing the Model
-
-
-```python
-
-class RetrievalModel(keras.Model):
- def __init__(
- self,
- num_users,
- num_candidates,
- embedding_dimension=32,
- **kwargs,
- ):
- super().__init__(**kwargs)
- # Our query tower, simply an embedding table.
- self.user_embedding = keras.layers.Embedding(num_users, embedding_dimension)
- # Our candidate tower, simply an embedding table.
- self.candidate_embedding = keras.layers.Embedding(
- num_candidates, embedding_dimension
- )
-
- self.loss_fn = keras.losses.MeanSquaredError()
-
- def build(self, input_shape):
- self.user_embedding.build(input_shape)
- self.candidate_embedding.build(input_shape)
-
- super().build(input_shape)
-
- def call(self, inputs, training=False):
- user_embeddings = self.user_embedding(inputs)
- result = {
- "user_embeddings": user_embeddings,
- }
- return result
-
- def compute_loss(self, x, y, y_pred, sample_weight, training=True):
- candidate_id, rating = y["movie_id"], y["rating"]
- user_embeddings = y_pred["user_embeddings"]
- candidate_embeddings = self.candidate_embedding(candidate_id)
-
- labels = keras.ops.expand_dims(rating, -1)
- # Compute the affinity score by multiplying the two embeddings.
- scores = keras.ops.sum(
- keras.ops.multiply(user_embeddings, candidate_embeddings),
- axis=1,
- keepdims=True,
- )
- return self.loss_fn(labels, scores, sample_weight)
-
-```
-
----
-## Training the model
-
-
-```python
-model = RetrievalModel(users_count + 1000, movies_count + 1000)
-model.compile(optimizer=keras.optimizers.Adagrad(learning_rate=0.1))
-
-history = model.fit(
- train_ratings, validation_data=test_ratings, validation_freq=5, epochs=50
-)
-```
-
-
- 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - loss: 0.4679 - val_loss: 0.4753
-
-
----
-## Making predictions
-
-Before we try out ScANN, let's go with the brute force method, i.e., for a given
-user, scores are computed for all movies, sorted and then the top-k
-movies are picked. This is, of course, not very scalable when we have a huge
-number of movies.
-
-
-```python
-candidate_embeddings = keras.ops.array(model.candidate_embedding.embeddings.numpy())
-# Artificially duplicate candidate embeddings to simulate a large number of
-# movies.
-candidate_embeddings = keras.ops.concatenate(
- [candidate_embeddings]
- + [
- candidate_embeddings
- * keras.random.uniform(keras.ops.shape(candidate_embeddings))
- for _ in range(100)
- ],
- axis=0,
-)
-
-user_embedding = model.user_embedding(keras.ops.array([10, 5, 42, 345]))
-
-# Define the brute force retrieval layer.
-brute_force_layer = keras_rs.layers.BruteForceRetrieval(
- candidate_embeddings=candidate_embeddings,
- k=10,
- return_scores=False,
-)
-```
-
-Now, let's do a forward pass on the layer. Note that in previous tutorials, we
-have the above layer as an attribute of the model class, and we then call
-`.predict()`. This will obviously be faster (since it's compiled XLA code), but
-since we cannot do the same for ScANN, we just do a normal forward pass here
-without compilation to ensure a fair comparison.
-
-
-```python
-t0 = time.time()
-pred_movie_ids = brute_force_layer(user_embedding)
-print("Time taken by brute force layer (sec):", time.time() - t0)
-```
-
-
-```
-Time taken by brute force layer (sec): 0.22817683219909668
-
-```
-
-Now, let's retrieve movies using ScANN. We will use the ScANN library from
-Google Research to build the layer and then call it. To fully understand all the
-arguments, please refer to the
-[ScANN README file](https://github.com/google-research/google-research/tree/master/scann#readme).
-
-
-```python
-
-def build_scann(
- candidates,
- k=10,
- distance_measure="dot_product",
- dimensions_per_block=2,
- num_reordering_candidates=500,
- num_leaves=100,
- num_leaves_to_search=30,
- training_iterations=12,
-):
- builder = scann_ops.builder(
- db=candidates,
- num_neighbors=k,
- distance_measure=distance_measure,
- )
-
- builder = builder.tree(
- num_leaves=num_leaves,
- num_leaves_to_search=num_leaves_to_search,
- training_iterations=training_iterations,
- )
- builder = builder.score_ah(dimensions_per_block=dimensions_per_block)
-
- if num_reordering_candidates is not None:
- builder = builder.reorder(num_reordering_candidates)
-
- # Set a unique name to prevent unintentional sharing between
- # ScaNN instances.
- searcher = builder.build(shared_name=str(uuid.uuid4()))
- return searcher
-
-
-def run_scann(searcher):
- pred_movie_ids = searcher.search_batched_parallel(
- user_embedding,
- final_num_neighbors=10,
- ).indices
- return pred_movie_ids
-
-
-searcher = build_scann(candidates=candidate_embeddings)
-
-t0 = time.time()
-pred_movie_ids = run_scann(searcher)
-print("Time taken by ScANN (sec):", time.time() - t0)
-```
-
-
-```
-Time taken by ScANN (sec): 0.0032587051391601562
-
-```
-
-You can clearly see the performance improvement in terms of latency. ScANN
-(0.003 seconds) takes one-fiftieth the time it takes for the brute force layer
-(0.15 seconds) to run!
diff --git a/templates/keras_rs/examples/sequential_retrieval.md b/templates/keras_rs/examples/sequential_retrieval.md
deleted file mode 100644
index 54fd7fbe3a..0000000000
--- a/templates/keras_rs/examples/sequential_retrieval.md
+++ /dev/null
@@ -1,2333 +0,0 @@
-# Sequential retrieval [GRU4Rec]
-
-**Author:** [Abheesht Sharma](https://github.com/abheesht17/), [Fabien Hertschuh](https://github.com/hertschuh/)
-**Date created:** 2025/04/28
-**Last modified:** 2025/04/28
-**Description:** Recommend movies using a GRU-based sequential retrieval model.
-
-
- [**View in Colab**](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_rs/ipynb/sequential_retrieval.ipynb) • [**GitHub source**](https://github.com/keras-team/keras-io/blob/master/examples/keras_rs/sequential_retrieval.py)
-
-
-
----
-## Introduction
-
-In this example, we are going to build a sequential retrieval model. Sequential
-recommendation is a popular model that looks at a sequence of items that users
-have interacted with previously and then predicts the next item. Here, the order
-of the items within each sequence matters. So, we are going to use a recurrent
-neural network to model the sequential relationship. For more details,
-please refer to the [GRU4Rec](https://arxiv.org/abs/1511.06939) paper.
-
-Let's begin by choosing JAX as the backend we want to run on, and import all
-the necessary libraries.
-
-
-```python
-import os
-
-os.environ["KERAS_BACKEND"] = "jax" # `"tensorflow"`/`"torch"`
-
-import collections
-import os
-import random
-
-import keras
-import pandas as pd
-import tensorflow as tf # Needed only for the dataset
-
-import keras_rs
-```
-
-Let's also define all important variables/hyperparameters below.
-
-
-```python
-DATA_DIR = "./raw/data/"
-
-# MovieLens-specific variables
-MOVIELENS_1M_URL = "https://files.grouplens.org/datasets/movielens/ml-1m.zip"
-MOVIELENS_ZIP_HASH = "a6898adb50b9ca05aa231689da44c217cb524e7ebd39d264c56e2832f2c54e20"
-
-RATINGS_FILE_NAME = "ratings.dat"
-MOVIES_FILE_NAME = "movies.dat"
-
-# Data processing args
-MAX_CONTEXT_LENGTH = 10
-MIN_SEQUENCE_LENGTH = 3
-TRAIN_DATA_FRACTION = 0.9
-
-RATINGS_DATA_COLUMNS = ["UserID", "MovieID", "Rating", "Timestamp"]
-MOVIES_DATA_COLUMNS = ["MovieID", "Title", "Genres"]
-MIN_RATING = 2
-
-# Training/model args
-BATCH_SIZE = 4096
-TEST_BATCH_SIZE = 2048
-EMBEDDING_DIM = 32
-NUM_EPOCHS = 5
-LEARNING_RATE = 0.05
-```
-
----
-## Dataset
-
-Next, we need to prepare our dataset. Like we did in the
-[basic retrieval](/keras_rs/examples/basic_retrieval/)
-example, we are going to use the MovieLens dataset.
-
-The dataset preparation step is fairly involved. The original ratings dataset
-contains `(user, movie ID, rating, timestamp)` tuples (among other columns,
-which are not important for this example). Since we are dealing with sequential
-retrieval, we need to create movie sequences for every user, where the sequences
-are ordered by timestamp.
-
-Let's start by downloading and reading the dataset.
-
-
-```python
-# Download the MovieLens dataset.
-if not os.path.exists(DATA_DIR):
- os.makedirs(DATA_DIR)
-
-path_to_zip = keras.utils.get_file(
- fname="ml-1m.zip",
- origin=MOVIELENS_1M_URL,
- file_hash=MOVIELENS_ZIP_HASH,
- hash_algorithm="sha256",
- extract=True,
- cache_dir=DATA_DIR,
-)
-movielens_extracted_dir = os.path.join(
- os.path.dirname(path_to_zip),
- "ml-1m_extracted",
- "ml-1m",
-)
-
-
-# Read the dataset.
-def read_data(data_directory, min_rating=None):
- """Read movielens ratings.dat and movies.dat file
- into dataframe.
- """
-
- ratings_df = pd.read_csv(
- os.path.join(data_directory, RATINGS_FILE_NAME),
- sep="::",
- names=RATINGS_DATA_COLUMNS,
- encoding="unicode_escape",
- )
- ratings_df["Timestamp"] = ratings_df["Timestamp"].apply(int)
-
- # Remove movies with `rating < min_rating`.
- if min_rating is not None:
- ratings_df = ratings_df[ratings_df["Rating"] >= min_rating]
-
- movies_df = pd.read_csv(
- os.path.join(data_directory, MOVIES_FILE_NAME),
- sep="::",
- names=MOVIES_DATA_COLUMNS,
- encoding="unicode_escape",
- )
- return ratings_df, movies_df
-
-
-ratings_df, movies_df = read_data(
- data_directory=movielens_extracted_dir, min_rating=MIN_RATING
-)
-
-# Need to know #movies so as to define embedding layers.
-movies_count = movies_df["MovieID"].max()
-```
-
-
-```
-Downloading data from https://files.grouplens.org/datasets/movielens/ml-1m.zip
-
-```
-
-```
-:26: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
- ratings_df = pd.read_csv(
-
-:38: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
- movies_df = pd.read_csv(
-
-```
-
-Now that we have read the dataset, let's create sequences of movies
-for every user. Here is the function for doing just that.
-
-
-```python
-
-def get_movie_sequence_per_user(ratings_df):
- """Get movieID sequences for every user."""
- sequences = collections.defaultdict(list)
-
- for user_id, movie_id, rating, timestamp in ratings_df.values:
- sequences[user_id].append(
- {
- "movie_id": movie_id,
- "timestamp": timestamp,
- "rating": rating,
- }
- )
-
- # Sort movie sequences by timestamp for every user.
- for user_id, context in sequences.items():
- context.sort(key=lambda x: x["timestamp"])
- sequences[user_id] = context
-
- return sequences
-
-```
-
-We need to do some filtering and processing before we proceed
-with training the model:
-
-1. Form sequences of all lengths up to
- `min(user_sequence_length, MAX_CONTEXT_LENGTH)`. So, every user
- will have multiple sequences corresponding to it.
-2. Get labels, i.e., Given a sequence of length `n`, the first
- `n-1` tokens will be fed to the model as input, and the label
- will be the last token.
-3. Remove all user sequences with less than `MIN_SEQUENCE_LENGTH`
- movies.
-4. Pad all sequences to `MAX_CONTEXT_LENGTH`.
-
-
-```python
-
-def generate_examples_from_user_sequences(sequences):
- """Generates sequences for all users, with padding, truncation, etc."""
-
- def generate_examples_from_user_sequence(sequence):
- """Generates examples for a single user sequence."""
-
- examples = []
- for label_idx in range(1, len(sequence)):
- start_idx = max(0, label_idx - MAX_CONTEXT_LENGTH)
- context = sequence[start_idx:label_idx]
-
- # Padding
- while len(context) < MAX_CONTEXT_LENGTH:
- context.append(
- {
- "movie_id": 0,
- "timestamp": 0,
- "rating": 0.0,
- }
- )
-
- label_movie_id = int(sequence[label_idx]["movie_id"])
- context_movie_id = [int(movie["movie_id"]) for movie in context]
-
- examples.append(
- {
- "context_movie_id": context_movie_id,
- "label_movie_id": label_movie_id,
- },
- )
- return examples
-
- all_examples = []
- for sequence in sequences.values():
- if len(sequence) < MIN_SEQUENCE_LENGTH:
- continue
-
- user_examples = generate_examples_from_user_sequence(sequence)
-
- all_examples.extend(user_examples)
-
- return all_examples
-
-```
-
-Let's split the dataset into train and test sets. Also, we need to
-change the format of the dataset dictionary so as to enable conversion
-to a `tf.data.Dataset` object.
-
-
-```python
-sequences = get_movie_sequence_per_user(ratings_df)
-examples = generate_examples_from_user_sequences(sequences)
-
-# Train-test split.
-random.shuffle(examples)
-split_index = int(TRAIN_DATA_FRACTION * len(examples))
-train_examples = examples[:split_index]
-test_examples = examples[split_index:]
-
-
-def list_of_dicts_to_dict_of_lists(list_of_dicts):
- """Convert list of dictionaries to dictionary of lists for
- `tf.data` conversion.
- """
- dict_of_lists = collections.defaultdict(list)
- for dictionary in list_of_dicts:
- for key, value in dictionary.items():
- dict_of_lists[key].append(value)
- return dict_of_lists
-
-
-train_examples = list_of_dicts_to_dict_of_lists(train_examples)
-test_examples = list_of_dicts_to_dict_of_lists(test_examples)
-
-train_ds = tf.data.Dataset.from_tensor_slices(train_examples).map(
- lambda x: (x["context_movie_id"], x["label_movie_id"])
-)
-test_ds = tf.data.Dataset.from_tensor_slices(test_examples).map(
- lambda x: (x["context_movie_id"], x["label_movie_id"])
-)
-```
-
-We need to batch our datasets. We also user `cache()` and `prefetch()`
-for better performance.
-
-
-```python
-train_ds = train_ds.batch(BATCH_SIZE).cache().prefetch(tf.data.AUTOTUNE)
-test_ds = test_ds.batch(TEST_BATCH_SIZE).cache().prefetch(tf.data.AUTOTUNE)
-```
-
-Let's print out one batch.
-
-
-```python
-for sample in train_ds.take(1):
- print(sample)
-```
-
-
-```
-(, )
-
-```
-
----
-## Model and Training
-
-In the basic retrieval example, we used one query tower for the
-user, and the candidate tower for the candidate movie. We are
-going to use a two-tower architecture here as well. However,
-we use the query tower with a Gated Recurrent Unit (GRU) layer
-to encode the sequence of historical movies, and keep the same
-candidate tower for the candidate movie.
-
-Note: Take a look at how the labels are defined. The label tensor
-(of shape `(batch_size, batch_size)`) contains one-hot vectors. The idea
-is: for every sample, consider movie IDs corresponding to other samples in
-the batch as negatives.
-
-
-```python
-
-class SequentialRetrievalModel(keras.Model):
- """Create the sequential retrieval model.
-
- Args:
- movies_count: Total number of unique movies in the dataset.
- embedding_dimension: Output dimension for movie embedding tables.
- """
-
- def __init__(
- self,
- movies_count,
- embedding_dimension=128,
- **kwargs,
- ):
- super().__init__(**kwargs)
- # Our query tower, simply an embedding table followed by
- # a GRU unit. This encodes sequence of historical movies.
- self.query_model = keras.Sequential(
- [
- keras.layers.Embedding(movies_count + 1, embedding_dimension),
- keras.layers.GRU(embedding_dimension),
- ]
- )
-
- # Our candidate tower, simply an embedding table.
- self.candidate_model = keras.layers.Embedding(
- movies_count + 1, embedding_dimension
- )
-
- # The layer that performs the retrieval.
- self.retrieval = keras_rs.layers.BruteForceRetrieval(k=10, return_scores=False)
- self.loss_fn = keras.losses.CategoricalCrossentropy(
- from_logits=True,
- )
-
- def build(self, input_shape):
- self.query_model.build(input_shape)
- self.candidate_model.build(input_shape)
-
- # In this case, the candidates are directly the movie embeddings.
- # We take a shortcut and directly reuse the variable.
- self.retrieval.candidate_embeddings = self.candidate_model.embeddings
- self.retrieval.build(input_shape)
- super().build(input_shape)
-
- def call(self, inputs, training=False):
- query_embeddings = self.query_model(inputs)
- result = {
- "query_embeddings": query_embeddings,
- }
-
- if not training:
- # Skip the retrieval of top movies during training as the
- # predictions are not used.
- result["predictions"] = self.retrieval(query_embeddings)
- return result
-
- def compute_loss(self, x, y, y_pred, sample_weight, training=True):
- candidate_id = y
- query_embeddings = y_pred["query_embeddings"]
- candidate_embeddings = self.candidate_model(candidate_id)
-
- num_queries = keras.ops.shape(query_embeddings)[0]
- num_candidates = keras.ops.shape(candidate_embeddings)[0]
-
- # One-hot vectors for labels.
- labels = keras.ops.eye(num_queries, num_candidates)
-
- # Compute the affinity score by multiplying the two embeddings.
- scores = keras.ops.matmul(
- query_embeddings, keras.ops.transpose(candidate_embeddings)
- )
-
- return self.loss_fn(labels, scores, sample_weight)
-
-```
-
-Let's instantiate, compile and train our model.
-
-
-```python
-model = SequentialRetrievalModel(
- movies_count=movies_count + 1, embedding_dimension=EMBEDDING_DIM
-)
-
-# Compile.
-model.compile(optimizer=keras.optimizers.AdamW(learning_rate=LEARNING_RATE))
-
-# Train.
-model.fit(
- train_ds,
- validation_data=test_ds,
- epochs=NUM_EPOCHS,
-)
-```
-
-
----
-## Making predictions
-
-Now that we have a model, we would like to be able to make predictions.
-
-So far, we have only handled movies by id. Now is the time to create a mapping
-keyed by movie IDs to be able to surface the titles.
-
-
-```python
-movie_id_to_movie_title = dict(zip(movies_df["MovieID"], movies_df["Title"]))
-movie_id_to_movie_title[0] = "" # Because id 0 is not in the dataset.
-```
-
-We then simply use the Keras `model.predict()` method. Under the hood, it calls
-the `BruteForceRetrieval` layer to perform the actual retrieval.
-
-Note that this model can retrieve movies already watched by the user. We could
-easily add logic to remove them if that is desirable.
-
-
-```python
-print("\n==> Movies the user has watched:")
-movie_sequence = test_ds.unbatch().take(1)
-for element in movie_sequence:
- for movie_id in element[0][:-1]:
- print(movie_id_to_movie_title[movie_id.numpy()], end=", ")
- print(movie_id_to_movie_title[element[0][-1].numpy()])
-
-predictions = model.predict(movie_sequence.batch(1))
-predictions = keras.ops.convert_to_numpy(predictions["predictions"])
-
-print("\n==> Recommended movies for the above sequence:")
-for movie_id in predictions[0]:
- print(movie_id_to_movie_title[movie_id])
-```
-
-
-
-```
-==> Movies the user has watched:
-10 Things I Hate About You (1999), American Beauty (1999), Bachelor, The (1999), Austin Powers: The Spy Who Shagged Me (1999), Arachnophobia (1990), Big Daddy (1999), Bone Collector, The (1999), Bug's Life, A (1998), Bowfinger (1999), Dead Calm (1989)
-
-```
-
-
-
-```
- 1/Unknown 0s 300ms/step
-
-
-```
-
- 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 302ms/step
-
-
-
-
-```
-==> Recommended movies for the above sequence:
-Creepshow (1982)
-Bringing Out the Dead (1999)
-Civil Action, A (1998)
-Doors, The (1991)
-Cruel Intentions (1999)
-Brokedown Palace (1999)
-Dead Calm (1989)
-Condorman (1981)
-Clan of the Cave Bear, The (1986)
-Clerks (1994)
-
-/usr/local/lib/python3.11/dist-packages/keras/src/trainers/epoch_iterator.py:151: UserWarning: Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches. You may need to use the `.repeat()` function when building your dataset.
- self._interrupted_warning()
-
-```
-