diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..917c1db
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,4 @@
+**/.venv/
+logs/
+mnist.npz
+*.zip
diff --git a/.venv/pyvenv.cfg b/.venv/pyvenv.cfg
new file mode 100644
index 0000000..e129fd0
--- /dev/null
+++ b/.venv/pyvenv.cfg
@@ -0,0 +1,3 @@
+home = C:\Python310
+include-system-site-packages = false
+version = 3.10.7
diff --git a/.vscode/settings.json b/.vscode/settings.json
new file mode 100644
index 0000000..dc3f727
--- /dev/null
+++ b/.vscode/settings.json
@@ -0,0 +1,3 @@
+{
+  "python.analysis.typeCheckingMode": "basic"
+}
diff --git a/exam/questions.md b/exam/questions.md
index 06452e5..2fcf384 100644
--- a/exam/questions.md
+++ b/exam/questions.md
@@ -108,6 +108,8 @@
 
 - Compare Cutout and DropBlock. [5]
 
+- Describe in detail how is CutMix performed. [5]
+
 - Describe Squeeze and Excitation applied to a ResNet block. [5]
 
 - Draw the Mobile inverted bottleneck block (including explanation of separable
@@ -119,3 +121,32 @@
   channels. Write down (or derive) the equation of transposed convolution
   (or equivalently backpropagation through a convolution to its inputs). [5]
 
+#### Questions@:, Lecture 6 Questions
+- Describe the differences among semantic segmentation, image classification,
+  object detection, and instance segmentation, and write down which metrics
+  are used for these tasks. [5]
+
+- Write down how is $\mathit{AP}_{50}$ computed. [5]
+
+- Considering a Fast-RCNN architecture, draw overall network architecture,
+  explain what a RoI-pooling layer is, show how the network parametrizes
+  bounding boxes and write down the loss. Finally, describe non-maximum
+  suppression and how the Fast-RCNN prediction is performed. [10]
+
+- Considering a Faster-RCNN architecture, describe the region proposal network
+  (what are anchors, architecture including both heads, how are the coordinates
+  of proposals parametrized, what does the loss look like). [10]
+
+- Considering Mask-RCNN architecture, describe the additions to a Faster-RCNN
+  architecture (the RoI-Align layer, the new mask-producing head). [5]
+
+- Write down the focal loss with class weighting, including the commonly used
+  hyperparameter values. [5]
+
+- Draw the overall architecture of a RetinaNet architecture (the computation of
+  $C_1, \ldots, C_7$, the FPN architecture computing $P_1, \ldots, P_7$
+  including the block combining feature maps of different resolutions; the
+  classification and bounding box generation heads, including their output
+  size). Write down the losses for both heads. [10]
+
+- Describe GroupNorm, and compare it to BatchNorm and LayerNorm. [5]
diff --git a/labs/.gitignore b/labs/.gitignore
index 6319f80..acfd147 100644
--- a/labs/.gitignore
+++ b/labs/.gitignore
@@ -3,5 +3,5 @@ logs/
 *.h5
 *.keras
 *.npz
-*.pickle
+*.tfrecord
 *.zip
diff --git a/labs/01/expected.txt b/labs/01/expected.txt
new file mode 100644
index 0000000..fdaf786
--- /dev/null
+++ b/labs/01/expected.txt
@@ -0,0 +1,39 @@
+python3 mnist_layers_activations.py --hidden_layers=0 --activation=none
+Epoch  1/10 accuracy: 0.7801 - loss: 0.8405 - val_accuracy: 0.9300 - val_loss: 0.2716
+Epoch  5/10 accuracy: 0.9222 - loss: 0.2792 - val_accuracy: 0.9406 - val_loss: 0.2203
+Epoch 10/10 accuracy: 0.9304 - loss: 0.2515 - val_accuracy: 0.9432 - val_loss: 0.2159
+
+python3 mnist_layers_activations.py --hidden_layers=1 --activation=none
+Epoch  1/10 accuracy: 0.8483 - loss: 0.5230 - val_accuracy: 0.9352 - val_loss: 0.2422
+Epoch  5/10 accuracy: 0.9236 - loss: 0.2758 - val_accuracy: 0.9360 - val_loss: 0.2325
+Epoch 10/10 accuracy: 0.9298 - loss: 0.2517 - val_accuracy: 0.9354 - val_loss: 0.2439
+
+python3 mnist_layers_activations.py --hidden_layers=1 --activation=relu
+Epoch  1/10 accuracy: 0.8503 - loss: 0.5286 - val_accuracy: 0.9604 - val_loss: 0.1432
+Epoch  5/10 accuracy: 0.9824 - loss: 0.0613 - val_accuracy: 0.9808 - val_loss: 0.0740
+Epoch 10/10 accuracy: 0.9948 - loss: 0.0202 - val_accuracy: 0.9788 - val_loss: 0.0821
+
+python3 mnist_layers_activations.py --hidden_layers=1 --activation=tanh
+Epoch  1/10 accuracy: 0.8529 - loss: 0.5183 - val_accuracy: 0.9564 - val_loss: 0.1632
+Epoch  5/10 accuracy: 0.9800 - loss: 0.0728 - val_accuracy: 0.9740 - val_loss: 0.0853
+Epoch 10/10 accuracy: 0.9948 - loss: 0.0244 - val_accuracy: 0.9782 - val_loss: 0.0772
+
+python3 mnist_layers_activations.py --hidden_layers=1 --activation=sigmoid
+Epoch  1/10 accuracy: 0.7851 - loss: 0.8650 - val_accuracy: 0.9414 - val_loss: 0.2196
+Epoch  5/10 accuracy: 0.9647 - loss: 0.1270 - val_accuracy: 0.9704 - val_loss: 0.1079
+Epoch 10/10 accuracy: 0.9852 - loss: 0.0583 - val_accuracy: 0.9756 - val_loss: 0.0837
+
+python3 mnist_layers_activations.py --hidden_layers=3 --activation=relu
+Epoch  1/10 accuracy: 0.8497 - loss: 0.5011 - val_accuracy: 0.9664 - val_loss: 0.1225
+Epoch  5/10 accuracy: 0.9862 - loss: 0.0438 - val_accuracy: 0.9734 - val_loss: 0.1026
+Epoch 10/10 accuracy: 0.9932 - loss: 0.0202 - val_accuracy: 0.9818 - val_loss: 0.0865
+
+python3 mnist_layers_activations.py --hidden_layers=10 --activation=relu
+Epoch  1/10 accuracy: 0.7710 - loss: 0.6793 - val_accuracy: 0.9570 - val_loss: 0.1479
+Epoch  5/10 accuracy: 0.9780 - loss: 0.0783 - val_accuracy: 0.9786 - val_loss: 0.0808
+Epoch 10/10 accuracy: 0.9869 - loss: 0.0481 - val_accuracy: 0.9724 - val_loss: 0.1163
+
+python3 mnist_layers_activations.py --hidden_layers=10 --activation=sigmoid
+Epoch  1/10 accuracy: 0.1072 - loss: 2.3068 - val_accuracy: 0.1784 - val_loss: 2.1247
+Epoch  5/10 accuracy: 0.8825 - loss: 0.4776 - val_accuracy: 0.9164 - val_loss: 0.3686
+Epoch 10/10 accuracy: 0.9294 - loss: 0.2994 - val_accuracy: 0.9386 - val_loss: 0.2671
diff --git a/labs/01/mnist.ps1 b/labs/01/mnist.ps1
new file mode 100644
index 0000000..a274269
--- /dev/null
+++ b/labs/01/mnist.ps1
@@ -0,0 +1,24 @@
+# Write-Output "python3 mnist_layers_activations.py --hidden_layers=0 --activation=none"
+..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=0 --activation=none
+# Write-Output ""
+# Write-Output "python3 mnist_layers_activations.py --hidden_layers=1 --activation=none"
+..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=1 --activation=none
+# Write-Output ""
+# Write-Output "python3 mnist_layers_activations.py --hidden_layers=1 --activation=relu"
+..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=1 --activation=relu
+# Write-Output ""
+# Write-Output "python3 mnist_layers_activations.py --hidden_layers=1 --activation=tanh"
+..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=1 --activation=tanh
+# Write-Output ""
+# Write-Output "python3 mnist_layers_activations.py --hidden_layers=1 --activation=sigmoid"
+..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=1 --activation=sigmoid
+# Write-Output ""
+# Write-Output "python3 mnist_layers_activations.py --hidden_layers=3 --activation=relu"
+..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=3 --activation=relu
+# Write-Output ""
+# Write-Output "python3 mnist_layers_activations.py --hidden_layers=10 --activation=relu"
+..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=10 --activation=relu
+# Write-Output ""
+# Write-Output "python3 mnist_layers_activations.py --hidden_layers=10 --activation=sigmoid"
+..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=10 --activation=sigmoid
+# Write-Output ""
diff --git a/labs/01/mnist_layers_activations.py b/labs/01/mnist_layers_activations.py
index d58b796..bf78be2 100644
--- a/labs/01/mnist_layers_activations.py
+++ b/labs/01/mnist_layers_activations.py
@@ -10,6 +10,11 @@
 
 from mnist import MNIST
 
+# Jonas Glerup Røssum <jglr@itu.dk>
+# 31a0a96a-c590-4486-b194-f72765b2ce25
+# Xiao Wang <xiao.wang@student.uni-tuebingen.de>
+# 91d4d1d7-b800-4765-96b9-df098ac36a66
+
 parser = argparse.ArgumentParser()
 # These arguments will be set appropriately by ReCodEx, even if you change them.
 parser.add_argument("--activation", default="none", choices=["none", "relu", "tanh", "sigmoid"], help="Activation.")
@@ -68,7 +73,7 @@ def main(args: argparse.Namespace) -> dict[str, float]:
     # Create the model
     model = keras.Sequential()
     model.add(keras.Input([MNIST.H, MNIST.W, MNIST.C]))
-    # TODO: Finish the model. Namely:
+    # Finish the model. Namely:
     # - start by adding a `keras.layers.Rescaling(1 / 255)` layer;
     # - then add a `keras.layers.Flatten()` layer;
     # - add `args.hidden_layers` number of fully connected hidden layers
@@ -76,6 +81,14 @@ def main(args: argparse.Namespace) -> dict[str, float]:
     #   from `args.activation`, allowing "none", "relu", "tanh", "sigmoid";
     # - finally, add an output fully connected layer with  `MNIST.LABELS` units
     #   and `softmax` activation.
+    model.add(keras.layers.Rescaling(1 / 255))
+    model.add(keras.layers.Flatten())
+
+    for _ in range(args.hidden_layers):
+        activation = None if args.activation == "none" else args.activation
+        model.add(keras.layers.Dense(args.hidden_layer, activation=activation))
+
+    model.add(keras.layers.Dense(MNIST.LABELS, activation="softmax"))
 
     model.compile(
         optimizer=keras.optimizers.Adam(),
diff --git a/labs/01/numpy_entropy.py b/labs/01/numpy_entropy.py
index 8e86bff..819b6b0 100644
--- a/labs/01/numpy_entropy.py
+++ b/labs/01/numpy_entropy.py
@@ -1,4 +1,10 @@
 #!/usr/bin/env python3
+
+# Jonas Glerup Røssum <jglr@itu.dk>
+# 31a0a96a-c590-4486-b194-f72765b2ce25
+# Xiao Wang <xiao.wang@student.uni-tuebingen.de>
+# 91d4d1d7-b800-4765-96b9-df098ac36a66
+
 import argparse
 
 import numpy as np
@@ -12,42 +18,51 @@
 
 
 def main(args: argparse.Namespace) -> tuple[float, float, float]:
-    # TODO: Load data distribution, each line containing a datapoint -- a string.
-    with open(args.data_path, "r") as data:
+    # Load data distribution, each line containing a datapoint -- a string.
+    data_map = {}
+
+    # Load data distribution, each line containing a datapoint -- a string.
+    with open(args.data_path, "r", encoding="utf-8") as data:
         for line in data:
             line = line.rstrip("\n")
-            # TODO: Process the line, aggregating data with built-in Python
+
+            # Process the line, aggregating data with built-in Python
             # data structures (not NumPy, which is not suitable for incremental
             # addition and string mapping).
+            if line in data_map:
+                data_map[line] += 1
+            else:
+                data_map[line] = 1
 
-    # TODO: Create a NumPy array containing the data distribution. The
+    # Create a NumPy array containing the data distribution. The
     # NumPy array should contain only data, not any mapping. Alternatively,
     # the NumPy array might be created after loading the model distribution.
+    data_dist = np.array(list(data_map.values())) / sum(data_map.values())
+
+    # Load model distribution, each line `string \t probability`.
+    model_map = {}
 
-    # TODO: Load model distribution, each line `string \t probability`.
     with open(args.model_path, "r") as model:
         for line in model:
             line = line.rstrip("\n")
-            # TODO: Process the line, aggregating using Python data structures.
+            key, value = line.split("\t")
+            model_map[key] = float(value)
 
-    # TODO: Create a NumPy array containing the model distribution.
+    # Create a NumPy array containing the model distribution.
+    model_dist = np.array([model_map[key] if key in model_map else np.inf for key in data_map.keys()])
 
-    # TODO: Compute the entropy H(data distribution). You should not use
-    # manual for/while cycles, but instead use the fact that most NumPy methods
-    # operate on all elements (for example `*` is vector element-wise multiplication).
-    entropy = ...
+    # Compute the entropy H(data distribution).
+    entropy = -np.sum(data_dist * np.log(data_dist))
 
-    # TODO: Compute cross-entropy H(data distribution, model distribution).
-    # When some data distribution elements are missing in the model distribution,
-    # return `np.inf`.
-    crossentropy = ...
+    # Compute cross-entropy H(data distribution, model distribution).
+    crossentropy = -np.sum(data_dist * np.log(model_dist))
 
-    # TODO: Compute KL-divergence D_KL(data distribution, model_distribution),
-    # again using `np.inf` when needed.
-    kl_divergence = ...
+    # Compute KL-divergence D_KL(data distribution, model_distribution).
+    kl_divergence = crossentropy - entropy
+    # kl_divergence = np.where(np.isinf(kl_divergence), np.inf, kl_divergence)
 
     # Return the computed values for ReCodEx to validate.
-    return entropy, crossentropy, kl_divergence
+    return entropy, crossentropy if np.isfinite(crossentropy) else np.inf, kl_divergence if np.isfinite(kl_divergence) else np.inf
 
 
 if __name__ == "__main__":
diff --git a/labs/01/output.txt b/labs/01/output.txt
new file mode 100644
index 0000000..916c534
--- /dev/null
+++ b/labs/01/output.txt
@@ -0,0 +1,167 @@
+Epoch 1/10
+1100/1100 14s 12ms/step - accuracy: 0.7761 - loss: 0.8442 - val_accuracy: 0.9298 - val_loss: 0.2730
+Epoch 2/10
+1100/1100 12s 11ms/step - accuracy: 0.9057 - loss: 0.3428 - val_accuracy: 0.9336 - val_loss: 0.2418
+Epoch 3/10
+1100/1100 11s 10ms/step - accuracy: 0.9177 - loss: 0.2945 - val_accuracy: 0.9366 - val_loss: 0.2284
+Epoch 4/10
+1100/1100 12s 10ms/step - accuracy: 0.9193 - loss: 0.2839 - val_accuracy: 0.9384 - val_loss: 0.2267
+Epoch 5/10
+1100/1100 11s 10ms/step - accuracy: 0.9228 - loss: 0.2790 - val_accuracy: 0.9392 - val_loss: 0.2208
+Epoch 6/10
+1100/1100 12s 11ms/step - accuracy: 0.9244 - loss: 0.2713 - val_accuracy: 0.9440 - val_loss: 0.2162
+Epoch 7/10
+1100/1100 13s 12ms/step - accuracy: 0.9252 - loss: 0.2662 - val_accuracy: 0.9398 - val_loss: 0.2178
+Epoch 8/10
+1100/1100 14s 12ms/step - accuracy: 0.9269 - loss: 0.2626 - val_accuracy: 0.9398 - val_loss: 0.2169
+Epoch 9/10
+1100/1100 13s 12ms/step - accuracy: 0.9286 - loss: 0.2612 - val_accuracy: 0.9458 - val_loss: 0.2128
+Epoch 10/10
+1100/1100 13s 12ms/step - accuracy: 0.9307 - loss: 0.2515 - val_accuracy: 0.9438 - val_loss: 0.2161
+
+Epoch 1/10
+1100/1100 15s 13ms/step - accuracy: 0.8422 - loss: 0.5383 - val_accuracy: 0.9346 - val_loss: 0.2400
+Epoch 2/10
+1100/1100 18s 17ms/step - accuracy: 0.9120 - loss: 0.3102 - val_accuracy: 0.9364 - val_loss: 0.2372
+Epoch 3/10
+1100/1100 16s 15ms/step - accuracy: 0.9233 - loss: 0.2774 - val_accuracy: 0.9352 - val_loss: 0.2342
+Epoch 4/10
+1100/1100 16s 14ms/step - accuracy: 0.9225 - loss: 0.2736 - val_accuracy: 0.9366 - val_loss: 0.2336
+Epoch 5/10
+1100/1100 15s 13ms/step - accuracy: 0.9233 - loss: 0.2760 - val_accuracy: 0.9344 - val_loss: 0.2331
+Epoch 6/10
+1100/1100 22s 20ms/step - accuracy: 0.9251 - loss: 0.2683 - val_accuracy: 0.9382 - val_loss: 0.2247
+Epoch 7/10
+1100/1100 15s 14ms/step - accuracy: 0.9261 - loss: 0.2658 - val_accuracy: 0.9356 - val_loss: 0.2367
+Epoch 8/10
+1100/1100 15s 14ms/step - accuracy: 0.9256 - loss: 0.2635 - val_accuracy: 0.9364 - val_loss: 0.2308
+Epoch 9/10
+1100/1100 15s 13ms/step - accuracy: 0.9253 - loss: 0.2625 - val_accuracy: 0.9386 - val_loss: 0.2277
+Epoch 10/10
+1100/1100 15s 13ms/step - accuracy: 0.9301 - loss: 0.2515 - val_accuracy: 0.9358 - val_loss: 0.2441
+
+Epoch 1/10
+1100/1100 16s 13ms/step - accuracy: 0.8499 - loss: 0.5317 - val_accuracy: 0.9618 - val_loss: 0.1400
+Epoch 2/10
+1100/1100 15s 13ms/step - accuracy: 0.9517 - loss: 0.1637 - val_accuracy: 0.9682 - val_loss: 0.1153
+Epoch 3/10
+1100/1100 14s 13ms/step - accuracy: 0.9700 - loss: 0.1021 - val_accuracy: 0.9730 - val_loss: 0.0897
+Epoch 4/10
+1100/1100 13s 12ms/step - accuracy: 0.9774 - loss: 0.0757 - val_accuracy: 0.9754 - val_loss: 0.0835
+Epoch 5/10
+1100/1100 13s 12ms/step - accuracy: 0.9824 - loss: 0.0603 - val_accuracy: 0.9772 - val_loss: 0.0766
+Epoch 6/10
+1100/1100 14s 12ms/step - accuracy: 0.9855 - loss: 0.0486 - val_accuracy: 0.9762 - val_loss: 0.0850
+Epoch 7/10
+1100/1100 14s 13ms/step - accuracy: 0.9889 - loss: 0.0374 - val_accuracy: 0.9776 - val_loss: 0.0774
+Epoch 8/10
+1100/1100 13s 12ms/step - accuracy: 0.9901 - loss: 0.0318 - val_accuracy: 0.9786 - val_loss: 0.0765
+Epoch 9/10
+1100/1100 13s 12ms/step - accuracy: 0.9928 - loss: 0.0267 - val_accuracy: 0.9804 - val_loss: 0.0766
+Epoch 10/10
+1100/1100 14s 12ms/step - accuracy: 0.9944 - loss: 0.0208 - val_accuracy: 0.9792 - val_loss: 0.0801
+
+Epoch 1/10
+1100/1100 14s 12ms/step - accuracy: 0.8468 - loss: 0.5308 - val_accuracy: 0.9594 - val_loss: 0.1591
+Epoch 2/10
+1100/1100 13s 12ms/step - accuracy: 0.9433 - loss: 0.1909 - val_accuracy: 0.9646 - val_loss: 0.1300
+Epoch 3/10
+1100/1100 13s 12ms/step - accuracy: 0.9658 - loss: 0.1235 - val_accuracy: 0.9726 - val_loss: 0.0973
+Epoch 4/10
+1100/1100 13s 12ms/step - accuracy: 0.9744 - loss: 0.0909 - val_accuracy: 0.9732 - val_loss: 0.0876
+Epoch 5/10
+1100/1100 13s 12ms/step - accuracy: 0.9798 - loss: 0.0747 - val_accuracy: 0.9788 - val_loss: 0.0770
+Epoch 6/10
+1100/1100 13s 12ms/step - accuracy: 0.9832 - loss: 0.0606 - val_accuracy: 0.9766 - val_loss: 0.0801
+Epoch 7/10
+1100/1100 13s 12ms/step - accuracy: 0.9881 - loss: 0.0460 - val_accuracy: 0.9792 - val_loss: 0.0714
+Epoch 8/10
+1100/1100 13s 12ms/step - accuracy: 0.9894 - loss: 0.0397 - val_accuracy: 0.9768 - val_loss: 0.0741
+Epoch 9/10
+1100/1100 13s 12ms/step - accuracy: 0.9923 - loss: 0.0312 - val_accuracy: 0.9796 - val_loss: 0.0709
+Epoch 10/10
+1100/1100 14s 12ms/step - accuracy: 0.9940 - loss: 0.0257 - val_accuracy: 0.9802 - val_loss: 0.0720
+
+Epoch 1/10
+1100/1100 15s 13ms/step - accuracy: 0.8072 - loss: 0.8138 - val_accuracy: 0.9452 - val_loss: 0.2121
+Epoch 2/10
+1100/1100 15s 14ms/step - accuracy: 0.9241 - loss: 0.2602 - val_accuracy: 0.9570 - val_loss: 0.1663
+Epoch 3/10
+1100/1100 15s 14ms/step - accuracy: 0.9476 - loss: 0.1863 - val_accuracy: 0.9648 - val_loss: 0.1322
+Epoch 4/10
+1100/1100 14s 13ms/step - accuracy: 0.9583 - loss: 0.1490 - val_accuracy: 0.9670 - val_loss: 0.1168
+Epoch 5/10
+1100/1100 14s 13ms/step - accuracy: 0.9658 - loss: 0.1243 - val_accuracy: 0.9696 - val_loss: 0.1047
+Epoch 6/10
+1100/1100 14s 12ms/step - accuracy: 0.9706 - loss: 0.1065 - val_accuracy: 0.9718 - val_loss: 0.0975
+Epoch 7/10
+1100/1100 13s 12ms/step - accuracy: 0.9758 - loss: 0.0891 - val_accuracy: 0.9740 - val_loss: 0.0918
+Epoch 8/10
+1100/1100 13s 12ms/step - accuracy: 0.9779 - loss: 0.0792 - val_accuracy: 0.9758 - val_loss: 0.0885
+Epoch 9/10
+1100/1100 14s 13ms/step - accuracy: 0.9816 - loss: 0.0681 - val_accuracy: 0.9776 - val_loss: 0.0825
+Epoch 10/10
+1100/1100 14s 12ms/step - accuracy: 0.9852 - loss: 0.0583 - val_accuracy: 0.9766 - val_loss: 0.0831
+
+Epoch 1/10
+1100/1100 16s 14ms/step - accuracy: 0.8483 - loss: 0.5002 - val_accuracy: 0.9650 - val_loss: 0.1189
+Epoch 2/10
+1100/1100 16s 14ms/step - accuracy: 0.9609 - loss: 0.1262 - val_accuracy: 0.9718 - val_loss: 0.0971
+Epoch 3/10
+1100/1100 16s 14ms/step - accuracy: 0.9759 - loss: 0.0783 - val_accuracy: 0.9772 - val_loss: 0.0690
+Epoch 4/10
+1100/1100 16s 14ms/step - accuracy: 0.9810 - loss: 0.0597 - val_accuracy: 0.9788 - val_loss: 0.0752
+Epoch 5/10
+1100/1100 15s 14ms/step - accuracy: 0.9855 - loss: 0.0468 - val_accuracy: 0.9748 - val_loss: 0.0817
+Epoch 6/10
+1100/1100 16s 14ms/step - accuracy: 0.9884 - loss: 0.0398 - val_accuracy: 0.9758 - val_loss: 0.0909
+Epoch 7/10
+1100/1100 15s 14ms/step - accuracy: 0.9898 - loss: 0.0318 - val_accuracy: 0.9724 - val_loss: 0.0998
+Epoch 8/10
+1100/1100 16s 14ms/step - accuracy: 0.9892 - loss: 0.0305 - val_accuracy: 0.9778 - val_loss: 0.0952
+Epoch 9/10
+1100/1100 16s 14ms/step - accuracy: 0.9914 - loss: 0.0267 - val_accuracy: 0.9756 - val_loss: 0.0878
+Epoch 10/10
+1100/1100 16s 15ms/step - accuracy: 0.9935 - loss: 0.0203 - val_accuracy: 0.9770 - val_loss: 0.0974
+
+Epoch 1/10
+1100/1100 24s 21ms/step - accuracy: 0.7772 - loss: 0.6657 - val_accuracy: 0.9524 - val_loss: 0.1752
+Epoch 2/10
+1100/1100 24s 22ms/step - accuracy: 0.9525 - loss: 0.1705 - val_accuracy: 0.9682 - val_loss: 0.1261
+Epoch 3/10
+1100/1100 22s 20ms/step - accuracy: 0.9675 - loss: 0.1162 - val_accuracy: 0.9750 - val_loss: 0.0945
+Epoch 4/10
+1100/1100 22s 20ms/step - accuracy: 0.9735 - loss: 0.0929 - val_accuracy: 0.9720 - val_loss: 0.1018
+Epoch 5/10
+1100/1100 22s 20ms/step - accuracy: 0.9789 - loss: 0.0794 - val_accuracy: 0.9762 - val_loss: 0.0888
+Epoch 6/10
+1100/1100 22s 20ms/step - accuracy: 0.9806 - loss: 0.0729 - val_accuracy: 0.9760 - val_loss: 0.0961
+Epoch 7/10
+1100/1100 22s 20ms/step - accuracy: 0.9847 - loss: 0.0578 - val_accuracy: 0.9810 - val_loss: 0.0932
+Epoch 8/10
+1100/1100 22s 20ms/step - accuracy: 0.9824 - loss: 0.0643 - val_accuracy: 0.9786 - val_loss: 0.0854
+Epoch 9/10
+1100/1100 22s 20ms/step - accuracy: 0.9864 - loss: 0.0487 - val_accuracy: 0.9764 - val_loss: 0.1054
+Epoch 10/10
+1100/1100 22s 20ms/step - accuracy: 0.9864 - loss: 0.0493 - val_accuracy: 0.9780 - val_loss: 0.1108
+
+Epoch 1/10
+1100/1100 23s 20ms/step - accuracy: 0.1052 - loss: 2.3130 - val_accuracy: 0.1808 - val_loss: 1.9383
+Epoch 2/10
+1100/1100 22s 20ms/step - accuracy: 0.2002 - loss: 1.9364 - val_accuracy: 0.2168 - val_loss: 1.8587
+Epoch 3/10
+1100/1100 23s 20ms/step - accuracy: 0.2161 - loss: 1.8392 - val_accuracy: 0.5588 - val_loss: 1.2106
+Epoch 4/10
+1100/1100 22s 20ms/step - accuracy: 0.5594 - loss: 1.1159 - val_accuracy: 0.8168 - val_loss: 0.7119
+Epoch 5/10
+1100/1100 22s 20ms/step - accuracy: 0.8359 - loss: 0.6312 - val_accuracy: 0.8994 - val_loss: 0.4360
+Epoch 6/10
+1100/1100 22s 20ms/step - accuracy: 0.8827 - loss: 0.4854 - val_accuracy: 0.9066 - val_loss: 0.4053
+Epoch 7/10
+1100/1100 22s 20ms/step - accuracy: 0.9007 - loss: 0.4218 - val_accuracy: 0.9166 - val_loss: 0.3660
+Epoch 8/10
+1100/1100 22s 20ms/step - accuracy: 0.9075 - loss: 0.3940 - val_accuracy: 0.9204 - val_loss: 0.3552
+Epoch 9/10
+1100/1100 22s 20ms/step - accuracy: 0.9090 - loss: 0.3922 - val_accuracy: 0.9242 - val_loss: 0.3356
+Epoch 10/10
+1100/1100 24s 22ms/step - accuracy: 0.9191 - loss: 0.3534 - val_accuracy: 0.9270 - val_loss: 0.3286
diff --git a/labs/01/pca_first.keras.py b/labs/01/pca_first.keras.py
index 1f99e21..0632b22 100644
--- a/labs/01/pca_first.keras.py
+++ b/labs/01/pca_first.keras.py
@@ -9,6 +9,11 @@
 
 from mnist import MNIST
 
+# Jonas Glerup Røssum <jglr@itu.dk>
+# 31a0a96a-c590-4486-b194-f72765b2ce25
+# Xiao Wang <xiao.wang@student.uni-tuebingen.de>
+# 91d4d1d7-b800-4765-96b9-df098ac36a66
+
 parser = argparse.ArgumentParser()
 # These arguments will be set appropriately by ReCodEx, even if you change them.
 parser.add_argument("--examples", default=256, type=int, help="MNIST examples to use.")
@@ -32,39 +37,43 @@ def main(args: argparse.Namespace) -> tuple[float, float]:
     data_indices = np.random.choice(mnist.train.size, size=args.examples, replace=False)
     data = keras.ops.convert_to_tensor(mnist.train.data["images"][data_indices] / 255, dtype="float32")
 
-    # TODO: Data has shape [args.examples, MNIST.H, MNIST.W, MNIST.C].
+    # Data has shape [args.examples, MNIST.H, MNIST.W, MNIST.C].
     # We want to reshape it to [args.examples, MNIST.H * MNIST.W * MNIST.C].
     # We can do so using `keras.ops.reshape(data, new_shape)` with new shape
     # `[data.shape[0], data.shape[1] * data.shape[2] * data.shape[3]]`.
-    data = ...
+    data = keras.ops.reshape(data, [data.shape[0], data.shape[1] * data.shape[2] * data.shape[3]])
 
-    # TODO: Now compute mean of every feature. Use `keras.ops.mean`, and set
+    # Now compute mean of every feature. Use `keras.ops.mean`, and set
     # `axis` to zero -- therefore, the mean will be computed across the first
     # dimension, so across examples.
-    mean = ...
+    mean = keras.ops.mean(data, axis=0)
 
-    # TODO: Compute the covariance matrix. The covariance matrix is
+    # Compute the covariance matrix. The covariance matrix is
     #   (data - mean)^T * (data - mean) / data.shape[0]
     # where transpose can be computed using `keras.ops.transpose` and
     # matrix multiplication using either Python operator @ or `keras.ops.matmul`.
-    cov = ...
+    cov = keras.ops.transpose(data-mean) @ (data-mean) / data.shape[0]
 
-    # TODO: Compute the total variance, which is the sum of the diagonal
+    # Compute the total variance, which is the sum of the diagonal
     # of the covariance matrix. To extract the diagonal use `keras.ops.diagonal`,
     # and to sum a tensor use `keras.ops.sum`.
-    total_variance = ...
+    total_variance = keras.ops.sum(keras.ops.diagonal(cov))
 
-    # TODO: Now run `args.iterations` of the power iteration algorithm.
+    # Now run `args.iterations` of the power iteration algorithm.
     # Start with a vector of `cov.shape[0]` ones of type `"float32"` using `keras.ops.ones`.
-    v = ...
+    v = keras.ops.ones(cov.shape[0], dtype="float32")
     for i in range(args.iterations):
-        # TODO: In the power iteration algorithm, we compute
+        # In the power iteration algorithm, we compute
         # 1. v = cov v
         #    The matrix-vector multiplication can be computed as regular matrix multiplication.
+        v = keras.ops.matmul(cov, v)
+
         # 2. s = l2_norm(v)
         #    The l2_norm can be computed using for example `keras.ops.norm`.
+        s = keras.ops.norm(v, 2)
+
         # 3. v = v / s
-        pass
+        v = v / s
 
     # The `v` is now approximately the eigenvector of the largest eigenvalue, `s`.
     # We now compute the explained variance, which is the ratio of `s` and `total_variance`.
diff --git a/labs/01/pca_first.py b/labs/01/pca_first.py
index 2e4ef10..deecf06 100644
--- a/labs/01/pca_first.py
+++ b/labs/01/pca_first.py
@@ -7,6 +7,11 @@
 
 from mnist import MNIST
 
+# Jonas Glerup Røssum <jglr@itu.dk>
+# 31a0a96a-c590-4486-b194-f72765b2ce25
+# Xiao Wang <xiao.wang@student.uni-tuebingen.de>
+# 91d4d1d7-b800-4765-96b9-df098ac36a66
+
 parser = argparse.ArgumentParser()
 # These arguments will be set appropriately by ReCodEx, even if you change them.
 parser.add_argument("--examples", default=256, type=int, help="MNIST examples to use.")
@@ -30,43 +35,46 @@ def main(args: argparse.Namespace) -> tuple[float, float]:
     data_indices = np.random.choice(mnist.train.size, size=args.examples, replace=False)
     data = torch.tensor(mnist.train.data["images"][data_indices] / 255, dtype=torch.float32)
 
-    # TODO: Data has shape [args.examples, MNIST.H, MNIST.W, MNIST.C].
+    # Data has shape [args.examples, MNIST.H, MNIST.W, MNIST.C].
     # We want to reshape it to [args.examples, MNIST.H * MNIST.W * MNIST.C].
     # We can do so using `torch.reshape(data, new_shape)` with new shape
     # `[data.shape[0], data.shape[1] * data.shape[2] * data.shape[3]]`.
-    data = ...
+    data = torch.reshape(data, (data.shape[0], data.shape[1] * data.shape[2] * data.shape[3]))
 
-    # TODO: Now compute mean of every feature. Use `torch.mean`, and set
+    # Now compute mean of every feature. Use `torch.mean`, and set
     # `dim` (or `axis`) argument to zero -- therefore, the mean will be
     # computed across the first dimension, so across examples.
     #
     # Note that for compatibility with Numpy/TF/Keras, all `dim` arguments
     # in PyTorch can be also called `axis`.
-    mean = ...
+    mean = torch.mean(data, axis=0)
 
-    # TODO: Compute the covariance matrix. The covariance matrix is
+    # Compute the covariance matrix. The covariance matrix is
     #   (data - mean)^T * (data - mean) / data.shape[0]
     # where transpose can be computed using `torch.transpose` or `torch.t` and
     # matrix multiplication using either Python operator @ or `torch.matmul`.
-    cov = ...
+    cov = torch.matmul(torch.t(data-mean), data-mean)/data.shape[0]
 
     # TODO: Compute the total variance, which is the sum of the diagonal
     # of the covariance matrix. To extract the diagonal use `torch.diagonal`,
     # and to sum a tensor use `torch.sum`.
-    total_variance = ...
+    total_variance = torch.sum(torch.diagonal(cov)).item()
 
     # TODO: Now run `args.iterations` of the power iteration algorithm.
     # Start with a vector of `cov.shape[0]` ones of type `torch.float32` using `torch.ones`.
-    v = ...
+    v = torch.ones(cov.shape[0], dtype=torch.float32)
+
     for i in range(args.iterations):
-        # TODO: In the power iteration algorithm, we compute
-        # 1. v = cov v
-        #    The matrix-vector multiplication can be computed as regular matrix multiplication
-        #    or using `torch.mv`.
-        # 2. s = l2_norm(v)
-        #    The l2_norm can be computed using for example `torch.linalg.vector_norm`.
-        # 3. v = v / s
-        pass
+         # TODO: In the power iteration algorithm, we compute
+         # 1. v = cov v
+         #    The matrix-vector multiplication can be computed as regular matrix multiplication
+         #    or using `torch.mv`.
+         # 2. s = l2_norm(v)
+         #    The l2_norm can be computed using for example `torch.linalg.vector_norm`.
+         # 3. v = v / s
+        v = cov @ v
+        s = torch.linalg.vector_norm(v)
+        v = v/s
 
     # The `v` is now approximately the eigenvector of the largest eigenvalue, `s`.
     # We now compute the explained variance, which is the ratio of `s` and `total_variance`.
diff --git a/labs/01/run.ps1 b/labs/01/run.ps1
new file mode 100644
index 0000000..a68f5e8
--- /dev/null
+++ b/labs/01/run.ps1
@@ -0,0 +1 @@
+..\..\.venv\Scripts\python .\pca_first.keras.py
diff --git a/labs/01/test.ps1 b/labs/01/test.ps1
new file mode 100644
index 0000000..75ddf37
--- /dev/null
+++ b/labs/01/test.ps1
@@ -0,0 +1,4 @@
+python3 numpy_entropy.py --data_path numpy_entropy_data_1.txt --model_path numpy_entropy_model_1.txt
+python3 numpy_entropy.py --data_path numpy_entropy_data_2.txt --model_path numpy_entropy_model_2.txt
+python3 numpy_entropy.py --data_path numpy_entropy_data_3.txt --model_path numpy_entropy_model_3.txt
+spython3 numpy_entropy.py --data_path numpy_entropy_data_4.txt --model_path numpy_entropy_model_4.txt
diff --git a/labs/02/gym_cartpole.py b/labs/02/gym_cartpole.py
index 7befc72..b708b63 100644
--- a/labs/02/gym_cartpole.py
+++ b/labs/02/gym_cartpole.py
@@ -8,6 +8,12 @@
 import keras
 import numpy as np
 import torch
+from collections import Counter
+
+# Jonas Glerup Røssum <jglr@itu.dk>
+# 31a0a96a-c590-4486-b194-f72765b2ce25
+# Xiao Wang <xiao.wang@student.uni-tuebingen.de>
+# 91d4d1d7-b800-4765-96b9-df098ac36a66
 
 parser = argparse.ArgumentParser()
 # These arguments will be set appropriately by ReCodEx, even if you change them.
@@ -17,8 +23,8 @@
 parser.add_argument("--seed", default=42, type=int, help="Random seed.")
 parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.")
 # If you add more arguments, ReCodEx will keep them with your default values.
-parser.add_argument("--batch_size", default=..., type=int, help="Batch size.")
-parser.add_argument("--epochs", default=..., type=int, help="Number of epochs.")
+parser.add_argument("--batch_size", default=10, type=int, help="Batch size.")
+parser.add_argument("--epochs", default=100, type=int, help="Number of epochs.")
 parser.add_argument("--model", default="gym_cartpole_model.keras", type=str, help="Output model path.")
 
 
@@ -49,7 +55,7 @@ def on_epoch_end(self, epoch, logs=None):
 
 def evaluate_model(
     model: keras.Model, seed: int = 42, episodes: int = 100, render: bool = False, report_per_episode: bool = False
-) -> float:
+    ) -> float:
     """Evaluate the given model on CartPole-v1 environment.
 
     Returns the average score achieved on the given number of episodes.
@@ -86,16 +92,10 @@ def evaluate_model(
 def main(args: argparse.Namespace) -> keras.Model | None:
     # Set the random seed and the number of threads.
     keras.utils.set_random_seed(args.seed)
-    if args.threads:
-        torch.set_num_threads(args.threads)
-        torch.set_num_interop_threads(args.threads)
+    torch.set_num_threads(args.threads)
+    torch.set_num_interop_threads(args.threads)
 
     if not args.evaluate:
-        if args.batch_size is ...:
-            raise ValueError("You must specify the batch size, either in the defaults or on the command line.")
-        if args.epochs is ...:
-            raise ValueError("You must specify the number of epochs, either in the defaults or on the command line.")
-
         # Create logdir name
         args.logdir = os.path.join("logs", "{}-{}-{}".format(
             os.path.basename(globals().get("__file__", "notebook")),
@@ -107,16 +107,37 @@ def main(args: argparse.Namespace) -> keras.Model | None:
         data = np.loadtxt("gym_cartpole_data.txt")
         observations, labels = data[:, :-1], data[:, -1].astype(np.int32)
 
+
+
         # TODO: Create the model in the `model` variable. Note that
         # the model can perform any of:
         # - binary classification with 1 output and sigmoid activation;
         # - two-class classification with 2 outputs and softmax activation.
-        model = ...
+
+        # Convert the labels to one-hot encoding
+        labels = keras.ops.one_hot(labels, num_classes=2)
+
+        model = keras.Sequential(name="gym_model", layers=[
+            # Input layer
+            keras.layers.Input(shape=(observations.shape[1],)),
+            # Hidden layers
+            keras.layers.Dense(8, activation="tanh"),
+            # Output layer
+            keras.layers.Dense(2, activation="softmax"),  # 2 outputs because we have 2 actions in the cart pole problem
+        ])
+
+
+        model.summary()
 
         # TODO: Prepare the model for training using the `model.compile` method.
-        model.compile(...)
+        model.compile(
+            loss=keras.losses.CategoricalCrossentropy(label_smoothing=0.1),
+            optimizer=keras.optimizers.Adam(learning_rate=0.009),
+            metrics=["accuracy"],
+        )
 
         tb_callback = TorchTensorBoardCallback(args.logdir)
+        labels = keras.ops.one_hot(labels,num_classes=2)
         model.fit(observations, labels, batch_size=args.batch_size, epochs=args.epochs, callbacks=[tb_callback])
 
         # Save the model, without the optimizer state.
diff --git a/labs/02/mnist_training.py b/labs/02/mnist_training.py
index 6655133..116ae98 100644
--- a/labs/02/mnist_training.py
+++ b/labs/02/mnist_training.py
@@ -11,6 +11,11 @@
 
 from mnist import MNIST
 
+# Jonas Glerup Røssum <jglr@itu.dk>
+# 31a0a96a-c590-4486-b194-f72765b2ce25
+# Xiao Wang <xiao.wang@student.uni-tuebingen.de>
+# 91d4d1d7-b800-4765-96b9-df098ac36a66
+
 parser = argparse.ArgumentParser()
 # These arguments will be set appropriately by ReCodEx, even if you change them.
 parser.add_argument("--batch_size", default=50, type=int, help="Batch size.")
@@ -107,8 +112,34 @@ def main(args: argparse.Namespace) -> dict[str, float]:
     #   in `model.optimizer._learning_rate` if needed), so after training, the learning rate
     #   should be `args.learning_rate_final`.
 
+    optimizer = None
+    lr, momen, decay, final_lr, epochs = args.learning_rate, args.momentum, args.decay, args.learning_rate_final, args.epochs
+    if decay:
+        if not final_lr:
+            print("Please define a final learning rate!")
+        else:    
+            steps = mnist.train.size/args.batch_size*epochs
+            init_lr = args.learning_rate
+            if decay == "linear":
+                lr = keras.optimizers.schedules.PolynomialDecay(initial_learning_rate=init_lr, decay_steps=steps, end_learning_rate=final_lr)
+            elif decay == "exponential":
+                decay_rate = final_lr/init_lr
+                lr = keras.optimizers.schedules.ExponentialDecay(initial_learning_rate=init_lr, decay_steps=steps, decay_rate=decay_rate)
+            elif decay == "cosine":
+                alpha = final_lr/init_lr
+                lr = keras.optimizers.schedules.CosineDecay(initial_learning_rate=init_lr, decay_steps=steps, alpha=alpha)
+
+    if args.optimizer == 'SGD':
+        if momen:
+            optimizer = keras.optimizers.SGD(learning_rate=lr, momentum=momen, nesterov=True)
+        else:
+            optimizer = keras.optimizers.SGD(learning_rate=lr)
+    elif args.optimizer =="Adam":
+        optimizer = keras.optimizers.Adam(learning_rate=lr)
+              
+        
     model.compile(
-        optimizer=...,
+        optimizer=optimizer,
         loss=keras.losses.SparseCategoricalCrossentropy(),
         metrics=[keras.metrics.SparseCategoricalAccuracy("accuracy")],
     )
@@ -121,6 +152,10 @@ def main(args: argparse.Namespace) -> dict[str, float]:
         validation_data=(mnist.dev.data["images"], mnist.dev.data["labels"]),
         callbacks=[tb_callback],
     )
+    model.summary()
+
+    if decay:        
+        print("Next learning rate to be used:", model.optimizer.learning_rate.item())
 
     # Return development metrics for ReCodEx to validate.
     return {metric: values[-1] for metric, values in logs.history.items() if metric.startswith("val_")}
diff --git a/labs/02/sgd_backpropagation.ps1 b/labs/02/sgd_backpropagation.ps1
new file mode 100644
index 0000000..f613710
--- /dev/null
+++ b/labs/02/sgd_backpropagation.ps1
@@ -0,0 +1,50 @@
+# Examples:
+# ../../.venv/Scripts/python sgd_backpropagation.py --batch_size=64 --hidden_layer=20 --learning_rate=0.1
+# Dev accuracy after epoch 1 is 93.30
+# Dev accuracy after epoch 2 is 94.38
+# Dev accuracy after epoch 3 is 95.16
+# Dev accuracy after epoch 4 is 95.50
+# Dev accuracy after epoch 5 is 95.96
+# Dev accuracy after epoch 6 is 96.04
+# Dev accuracy after epoch 7 is 95.82
+# Dev accuracy after epoch 8 is 95.92
+# Dev accuracy after epoch 9 is 95.96
+# Dev accuracy after epoch 10 is 96.16
+# Test accuracy after epoch 10 is 95.26
+
+# ../../.venv/Scripts/python sgd_backpropagation.py --batch_size=100 --hidden_layer=32 --learning_rate=0.2
+# Dev accuracy after epoch 1 is 93.64
+# Dev accuracy after epoch 2 is 94.80
+# Dev accuracy after epoch 3 is 95.56
+# Dev accuracy after epoch 4 is 95.98
+# Dev accuracy after epoch 5 is 96.24
+# Dev accuracy after epoch 6 is 96.74
+# Dev accuracy after epoch 7 is 96.52
+# Dev accuracy after epoch 8 is 96.54
+# Dev accuracy after epoch 9 is 97.04
+# Dev accuracy after epoch 10 is 97.02
+# Test accuracy after epoch 10 is 96.16
+
+# Tests:
+../../.venv/Scripts/python sgd_backpropagation.py --epochs=2 --batch_size=64 --hidden_layer=20 --learning_rate=0.1
+# Expected
+# Dev accuracy after epoch 1 is 93.30
+# Dev accuracy after epoch 2 is 94.38
+# Test accuracy after epoch 2 is 93.15
+
+# Actual
+# Dev accuracy after epoch 1 is 92.98
+# Dev accuracy after epoch 2 is 93.98
+# Test accuracy after epoch 2 is 92.73
+
+
+../../.venv/Scripts/python sgd_backpropagation.py --epochs=2 --batch_size=100 --hidden_layer=32 --learning_rate=0.2
+# Expected:
+# Dev accuracy after epoch 1 is 93.64
+# Dev accuracy after epoch 2 is 94.80
+# Test accuracy after epoch 2 is 93.54
+
+# Actual:
+# Dev accuracy after epoch 1 is 94.16
+# Dev accuracy after epoch 2 is 94.98
+# Test accuracy after epoch 2 is 93.56
diff --git a/labs/02/sgd_backpropagation.py b/labs/02/sgd_backpropagation.py
index cff312a..e3cfacf 100644
--- a/labs/02/sgd_backpropagation.py
+++ b/labs/02/sgd_backpropagation.py
@@ -3,7 +3,10 @@
 import datetime
 import os
 import re
-os.environ.setdefault("KERAS_BACKEND", "torch")  # Use PyTorch backend unless specified otherwise
+
+os.environ.setdefault(
+    "KERAS_BACKEND", "torch"
+)  # Use PyTorch backend unless specified otherwise
 
 import keras
 import numpy as np
@@ -12,15 +15,26 @@
 
 from mnist import MNIST
 
+# Jonas Glerup Røssum <jglr@itu.dk>
+# 31a0a96a-c590-4486-b194-f72765b2ce25
+# Xiao Wang <xiao.wang@student.uni-tuebingen.de>
+# 91d4d1d7-b800-4765-96b9-df098ac36a66
+
 parser = argparse.ArgumentParser()
 # These arguments will be set appropriately by ReCodEx, even if you change them.
 parser.add_argument("--batch_size", default=50, type=int, help="Batch size.")
 parser.add_argument("--epochs", default=10, type=int, help="Number of epochs.")
-parser.add_argument("--hidden_layer", default=100, type=int, help="Size of the hidden layer.")
+parser.add_argument(
+    "--hidden_layer", default=100, type=int, help="Size of the hidden layer."
+)
 parser.add_argument("--learning_rate", default=0.1, type=float, help="Learning rate.")
-parser.add_argument("--recodex", default=False, action="store_true", help="Evaluation in ReCodEx.")
+parser.add_argument(
+    "--recodex", default=False, action="store_true", help="Evaluation in ReCodEx."
+)
 parser.add_argument("--seed", default=42, type=int, help="Random seed.")
-parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.")
+parser.add_argument(
+    "--threads", default=1, type=int, help="Maximum number of threads to use."
+)
 # If you add more arguments, ReCodEx will keep them with your default values.
 
 
@@ -30,29 +44,57 @@ def __init__(self, args: argparse.Namespace) -> None:
         self._args = args
 
         self._W1 = keras.Variable(
-            keras.random.normal([MNIST.W * MNIST.H * MNIST.C, args.hidden_layer], stddev=0.1, seed=args.seed),
+            keras.random.normal(
+                [MNIST.W * MNIST.H * MNIST.C, args.hidden_layer],
+                stddev=0.1,
+                seed=args.seed,
+            ),
             trainable=True,
         )
         self._b1 = keras.Variable(keras.ops.zeros([args.hidden_layer]), trainable=True)
 
-        # TODO: Create variables:
+        # Create variables:
         # - _W2, which is a trainable variable of size `[args.hidden_layer, MNIST.LABELS]`,
         #   initialized to `keras.random.normal` value `with stddev=0.1` and `seed=args.seed`,
         # - _b2, which is a trainable variable of size `[MNIST.LABELS]` initialized to zeros
-        ...
+        self._W2 = keras.Variable(
+            keras.random.normal(
+                [args.hidden_layer, MNIST.LABELS], stddev=0.1, seed=args.seed
+            ),
+            trainable=True,
+        )
+
+        self._b2 = keras.Variable(keras.ops.zeros([MNIST.LABELS]), trainable=True)
 
     def predict(self, inputs: torch.Tensor) -> torch.Tensor:
-        # TODO: Define the computation of the network. Notably:
+        # Define the computation of the network. Notably:
         # - start by casting the input byte image to `float32` with `keras.ops.cast`
+
+        cast_inputs = keras.ops.cast(inputs, dtype="float32")
+
         # - then divide the tensor by 255 to normalize it to the `[0, 1]` range
+
+        normalized_inputs = cast_inputs / 255
+
         # - then reshape it to the shape `[inputs.shape[0], -1]`.
         #   The -1 is a wildcard which is computed so that the number
         #   of elements before and after the reshape is preserved.
+
+        reshaped_inputs = keras.ops.reshape(normalized_inputs, [inputs.shape[0], -1])
+
         # - then multiply it by `self._W1` and then add `self._b1`
         # - apply `keras.ops.tanh`
+
+        hidden_layer_output = keras.ops.tanh(
+            keras.ops.matmul(reshaped_inputs, self._W1) + self._b1
+        )
+
         # - multiply the result by `self._W2` and then add `self._b2`
+
+        hidden_layer_output = keras.ops.matmul(hidden_layer_output, self._W2) + self._b2
+
         # - finally apply `keras.ops.softmax` and return the result
-        return ...
+        return keras.ops.softmax(hidden_layer_output)
 
     def train_epoch(self, dataset: MNIST.Dataset) -> None:
         for batch in dataset.batches(self._args.batch_size):
@@ -62,49 +104,54 @@ def train_epoch(self, dataset: MNIST.Dataset) -> None:
             # Size of the batch is `self._args.batch_size`, except for the last, which
             # might be smaller.
 
-            # TODO: Compute the predicted probabilities of the batch images using `self.predict`
-            probabilities = ...
+            # Compute the predicted probabilities of the batch images using `self.predict`
+            probabilities = self.predict(batch["images"])
 
-            # TODO: Manually compute the loss:
+            # Manually compute the loss:
             # - For every batch example, the loss is the categorical crossentropy of the
             #   predicted probabilities and the gold label. To compute the crossentropy, you can
             #   - either use `keras.ops.one_hot` to obtain one-hot encoded gold labels,
             #   - or suitably use `keras.ops.take_along_axis` to "index" the predicted probabilities.
             # - Finally, compute the average across the batch examples.
-            loss = ...
-
+            loss = keras.ops.mean(
+                keras.ops.categorical_crossentropy(
+                    keras.ops.one_hot(batch["labels"], MNIST.LABELS), probabilities
+                )
+            )
             # We create a list of all variables. Note that a `keras.Model/Layer` automatically
             # tracks owned variables, so we could also use `self.trainable_variables`
             # (or even `self.variables`, which is useful for loading/saving).
             variables = [self._W1, self._b1, self._W2, self._b2]
+            # print("w1, b1, w2, b2:", self._W1.shape, self._b1.shape, self._W2.shape, self._b2.shape)
 
-            # TODO: Compute the gradient of the loss with respect to variables using
+            # Compute the gradient of the loss with respect to variables using
             # backpropagation algorithm by
             # - first resetting the gradients of all variables to zero with `self.zero_grad()`,
             # - then calling `loss.backward()`.
-            ...
+            self.zero_grad()
+            loss.backward()
 
             gradients = [variable.value.grad for variable in variables]
+            # print("gradients:", gradients)
             with torch.no_grad():
                 for variable, gradient in zip(variables, gradients):
-                    # TODO: Perform the SGD update with learning rate `self._args.learning_rate`
+                    # Perform the SGD update with learning rate `self._args.learning_rate`
                     # for the variable and computed gradient. You can modify the
                     # variable value with `variable.assign` or in this case the more
                     # efficient `variable.assign_sub`.
-                    ...
+                    variable.assign_sub(self._args.learning_rate * gradient)
 
     def evaluate(self, dataset: MNIST.Dataset) -> float:
         # Compute the accuracy of the model prediction
         correct = 0
         for batch in dataset.batches(self._args.batch_size):
-            # TODO: Compute the probabilities of the batch images using `self.predict`
+            # Compute the probabilities of the batch images using `self.predict`
             # and convert them to Numpy with `keras.ops.convert_to_numpy`.
-            probabilities = ...
+            probabilities = keras.ops.convert_to_numpy(self.predict(batch["images"]))
 
-            # TODO: Evaluate how many batch examples were predicted
+            # Evaluate how many batch examples were predicted
             # correctly and increase `correct` variable accordingly.
-            correct += ...
-
+            correct += np.sum(np.argmax(probabilities, axis=-1) == batch["labels"])
         return correct / dataset.size
 
 
@@ -116,11 +163,19 @@ def main(args: argparse.Namespace) -> tuple[float, float]:
         torch.set_num_interop_threads(args.threads)
 
     # Create logdir name
-    args.logdir = os.path.join("logs", "{}-{}-{}".format(
-        os.path.basename(globals().get("__file__", "notebook")),
-        datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
-        ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items())))
-    ))
+    args.logdir = os.path.join(
+        "logs",
+        "{}-{}-{}".format(
+            os.path.basename(globals().get("__file__", "notebook")),
+            datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
+            ",".join(
+                (
+                    "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v)
+                    for k, v in sorted(vars(args).items())
+                )
+            ),
+        ),
+    )
 
     # Load data
     mnist = MNIST()
@@ -132,16 +187,23 @@ def main(args: argparse.Namespace) -> tuple[float, float]:
     model = Model(args)
 
     for epoch in range(args.epochs):
-        # TODO: Run the `train_epoch` with `mnist.train` dataset
-
-        # TODO: Evaluate the dev data using `evaluate` on `mnist.dev` dataset
-        accuracy = ...
-        print("Dev accuracy after epoch {} is {:.2f}".format(epoch + 1, 100 * accuracy), flush=True)
+        # Run the `train_epoch` with `mnist.train` dataset
+        model.train_epoch(mnist.train)
+
+        # Evaluate the dev data using `evaluate` on `mnist.dev` dataset
+        accuracy = model.evaluate(mnist.dev)
+        print(
+            "Dev accuracy after epoch {} is {:.2f}".format(epoch + 1, 100 * accuracy),
+            flush=True,
+        )
         writer.add_scalar("dev/accuracy", 100 * accuracy, epoch + 1)
 
-    # TODO: Evaluate the test data using `evaluate` on `mnist.test` dataset
-    test_accuracy = ...
-    print("Test accuracy after epoch {} is {:.2f}".format(epoch + 1, 100 * test_accuracy), flush=True)
+    # Evaluate the test data using `evaluate` on `mnist.test` dataset
+    test_accuracy = model.evaluate(mnist.test)
+    print(
+        "Test accuracy after epoch {} is {:.2f}".format(epoch + 1, 100 * test_accuracy),
+        flush=True,
+    )
     writer.add_scalar("test/accuracy", 100 * test_accuracy, epoch + 1)
 
     # Return dev and test accuracies for ReCodEx to validate.
diff --git a/labs/02/sgd_manual.py b/labs/02/sgd_manual.py
index 422d3e9..f023328 100644
--- a/labs/02/sgd_manual.py
+++ b/labs/02/sgd_manual.py
@@ -12,6 +12,11 @@
 
 from mnist import MNIST
 
+# Jonas Glerup Røssum <jglr@itu.dk>
+# 31a0a96a-c590-4486-b194-f72765b2ce25
+# Xiao Wang <xiao.wang@student.uni-tuebingen.de>
+# 91d4d1d7-b800-4765-96b9-df098ac36a66
+
 parser = argparse.ArgumentParser()
 # These arguments will be set appropriately by ReCodEx, even if you change them.
 parser.add_argument("--batch_size", default=50, type=int, help="Batch size.")
@@ -39,7 +44,9 @@ def __init__(self, args: argparse.Namespace) -> None:
         # - _W2, which is a trainable variable of size `[args.hidden_layer, MNIST.LABELS]`,
         #   initialized to `keras.random.normal` value `with stddev=0.1` and `seed=args.seed`,
         # - _b2, which is a trainable variable of size `[MNIST.LABELS]` initialized to zeros
-        ...
+        self._W2 = keras.Variable(keras.random.normal([args.hidden_layer, MNIST.LABELS], stddev=0.1, seed=args.seed),
+            trainable=True)
+        self._b2 = keras.Variable(keras.ops.zeros([MNIST.LABELS]), trainable=True)
 
     def predict(self, inputs: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
         # TODO(sgd_backpropagation): Define the computation of the network. Notably:
@@ -56,7 +63,14 @@ def predict(self, inputs: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, tor
         # TODO: In order to support manual gradient computation, you should
         # return not only the output layer, but also the hidden layer after applying
         # tanh, and the input layer after reshaping.
-        return ..., ..., ...
+        input = keras.ops.cast(inputs, dtype="float32")
+        input = torch.div(input, 255)
+        input = input.reshape([input.shape[0], -1])
+        hidden_input = keras.ops.matmul(input,self._W1) + self._b1
+        hidden_output = keras.ops.tanh(hidden_input)
+        sm_input = keras.ops.matmul(hidden_output,self._W2) + self._b2
+        output = keras.ops.softmax(sm_input)
+        return input, hidden_output, output
 
     def train_epoch(self, dataset: MNIST.Dataset) -> None:
         for batch in dataset.batches(self._args.batch_size):
@@ -72,7 +86,7 @@ def train_epoch(self, dataset: MNIST.Dataset) -> None:
             #
             # Compute the input layer, hidden layer and output layer
             # of the batch images using `self.predict`.
-
+            input_layer, hidden_layer, probabilities = self.predict(torch.tensor(batch['images']))
             # TODO: Compute the gradient of the loss with respect to all
             # variables. Note that the loss is computed as in `sgd_backpropagation`:
             # - For every batch example, the loss is the categorical crossentropy of the
@@ -80,7 +94,6 @@ def train_epoch(self, dataset: MNIST.Dataset) -> None:
             #   - either use `keras.ops.one_hot` to obtain one-hot encoded gold labels,
             #   - or suitably use `keras.ops.take_along_axis` to "index" the predicted probabilities.
             # - Finally, compute the average across the batch examples.
-            #
             # During the gradient computation, you will need to compute
             # a batched version of a so-called outer product
             #   `C[a, i, j] = A[a, i] * B[a, j]`,
@@ -88,12 +101,30 @@ def train_epoch(self, dataset: MNIST.Dataset) -> None:
             #   `A[:, :, np.newaxis] * B[:, np.newaxis, :]`
             # or with
             #   `keras.ops.einsum("ai,aj->aij", A, B)`.
+            gold_labels = keras.ops.one_hot(batch['labels'], num_classes=MNIST.LABELS)
+            loss = torch.mean(keras.ops.categorical_crossentropy(gold_labels, probabilities))
+            
+            gd_loss = probabilities - gold_labels
+            gd_b2 = gd_loss
+            #print("loss gradient, hidden_layer, input", gd_b2.shape, hidden_layer.shape, input_layer.shape)
+            gd_w2 = keras.ops.einsum("ai,aj->aij", hidden_layer, gd_loss)
+            gd_h = keras.ops.matmul(gd_loss, keras.ops.transpose(self._W2))
+            hidden_input = keras.ops.matmul(input_layer,self._W1) + self._b1
+            gd_h_i = gd_h*(1-keras.ops.power(keras.ops.tanh(hidden_input), 2))
+            gd_b1 = gd_h_i
+            gd_w1 = keras.ops.einsum("ai,aj->aij", input_layer, gd_h_i)
+            #print("gd_w2, gd_w1, gd_b2, gd_b1:", gd_w2.shape, gd_w1.shape, gd_b2.shape, gd_b1.shape)
 
             # TODO(sgd_backpropagation): Perform the SGD update with learning rate `self._args.learning_rate`
             # for the variable and computed gradient. You can modify the
             # variable value with `variable.assign` or in this case the more
             # efficient `variable.assign_sub`.
-            ...
+            variables = [self._W1, self._b1, self._W2, self._b2]
+            gradients = [gd_w1, gd_b1, gd_w2, gd_b2]
+            with torch.no_grad():
+                for variable, gradient in zip(variables, gradients):
+                    variable.assign_sub(self._args.learning_rate*keras.ops.mean(gradient, axis=0))
+
 
     def evaluate(self, dataset: MNIST.Dataset) -> float:
         # Compute the accuracy of the model prediction
@@ -101,11 +132,11 @@ def evaluate(self, dataset: MNIST.Dataset) -> float:
         for batch in dataset.batches(self._args.batch_size):
             # TODO: Compute the probabilities of the batch images using `self.predict`
             # and convert them to Numpy with `keras.ops.convert_to_numpy`.
-            probabilities = ...
+            probabilities = keras.ops.convert_to_numpy(self.predict(torch.tensor(batch['images']))[2])
 
             # TODO(sgd_backpropagation): Evaluate how many batch examples were predicted
             # correctly and increase `correct` variable accordingly.
-            correct += ...
+            correct += np.sum(np.argmax(probabilities, axis=-1) == batch["labels"])
 
         return correct / dataset.size
 
@@ -135,14 +166,14 @@ def main(args: argparse.Namespace) -> tuple[float, float]:
 
     for epoch in range(args.epochs):
         # TODO: Run the `train_epoch` with `mnist.train` dataset
-
+        model.train_epoch(mnist.train)
         # TODO: Evaluate the dev data using `evaluate` on `mnist.dev` dataset
-        accuracy = ...
+        accuracy = model.evaluate(mnist.dev)
         print("Dev accuracy after epoch {} is {:.2f}".format(epoch + 1, 100 * accuracy), flush=True)
         writer.add_scalar("dev/accuracy", 100 * accuracy, epoch + 1)
 
     # TODO: Evaluate the test data using `evaluate` on `mnist.test` dataset
-    test_accuracy = ...
+    test_accuracy = model.evaluate(mnist.test)
     print("Test accuracy after epoch {} is {:.2f}".format(epoch + 1, 100 * test_accuracy), flush=True)
     writer.add_scalar("test/accuracy", 100 * test_accuracy, epoch + 1)
 
diff --git a/labs/02/test.ps1 b/labs/02/test.ps1
new file mode 100644
index 0000000..fa38f74
--- /dev/null
+++ b/labs/02/test.ps1
@@ -0,0 +1 @@
+../../.venv/Scripts/python .\gym_cartpole.py  && ../../.venv/Scripts/python .\gym_cartpole.py --evaluate
diff --git a/labs/03/mnist_ensemble.ps1 b/labs/03/mnist_ensemble.ps1
new file mode 100644
index 0000000..526a6bd
--- /dev/null
+++ b/labs/03/mnist_ensemble.ps1
@@ -0,0 +1,2 @@
+python3 mnist_ensemble.py --epochs=1 --models=5
+python3 mnist_ensemble.py --epochs=1 --models=5 --hidden_layers=200
diff --git a/labs/03/mnist_ensemble.py b/labs/03/mnist_ensemble.py
index ebffcf9..93bb2eb 100644
--- a/labs/03/mnist_ensemble.py
+++ b/labs/03/mnist_ensemble.py
@@ -7,6 +7,7 @@
 import torch
 
 from mnist import MNIST
+import numpy as np
 
 parser = argparse.ArgumentParser()
 # These arguments will be set appropriately by ReCodEx, even if you change them.
@@ -54,11 +55,13 @@ def main(args: argparse.Namespace) -> tuple[list[float], list[float]]:
         print("Done")
 
     individual_accuracies, ensemble_accuracies = [], []
+    model_predictions = []
     for model in range(args.models):
-        # TODO: Compute the accuracy on the dev set for the individual `models[model]`.
-        individual_accuracy = ...
+        # Compute the accuracy on the dev set for the individual `models[model]`.
+        individual_accuracy = models[model].evaluate(mnist.dev.data["images"], mnist.dev.data["labels"])[1]
+        print(individual_accuracy)
 
-        # TODO: Compute the accuracy on the dev set for the ensemble `models[0:model+1]`.
+        # Compute the accuracy on the dev set for the ensemble `models[0:model+1]`.
         #
         # Generally you can choose one of the following approaches:
         # 1) Use Keras Functional API and construct a `keras.Model` averaging the models
@@ -69,7 +72,17 @@ def main(args: argparse.Namespace) -> tuple[list[float], list[float]]:
         #    need to construct Keras ensemble model at all, and instead call `model.predict`
         #    on the individual models and average the results. To measure accuracy,
         #    either do it completely manually or use `keras.metrics.SparseCategoricalAccuracy`.
-        ensemble_accuracy = ...
+        inputs = keras.Input(shape=(MNIST.W, MNIST.H, MNIST.C))
+        ensemble_output = keras.layers.Average()([model(inputs) for model in models[0:model+1]])
+        ensemble_model = keras.Model(inputs=inputs, outputs=ensemble_output)
+
+        ensemble_model.compile(
+            optimizer=keras.optimizers.Adam(),
+            loss=keras.losses.SparseCategoricalCrossentropy(),
+            metrics=[keras.metrics.SparseCategoricalAccuracy(name="accuracy")],
+        )
+
+        ensemble_accuracy = ensemble_model.evaluate(mnist.dev.data["images"], mnist.dev.data["labels"])[1]
 
         # Store the accuracies
         individual_accuracies.append(individual_accuracy)
diff --git a/labs/03/mnist_regularization.ps1 b/labs/03/mnist_regularization.ps1
new file mode 100644
index 0000000..2a61e88
--- /dev/null
+++ b/labs/03/mnist_regularization.ps1
@@ -0,0 +1,24 @@
+# Run script from root repo directory
+
+.\.venv\Scripts\python labs\03\mnist_regularization.py --epochs=1 --dropout=0.3
+.\.venv\Scripts\python labs\03\mnist_regularization.py --epochs=1 --dropout=0.5 --hidden_layers 300 300
+.\.venv\Scripts\python labs\03\mnist_regularization.py --epochs=1 --weight_decay=0.1
+.\.venv\Scripts\python labs\03\mnist_regularization.py --epochs=1 --weight_decay=0.3
+.\.venv\Scripts\python labs\03\mnist_regularization.py --epochs=1 --label_smoothing=0.1
+.\.venv\Scripts\python labs\03\mnist_regularization.py --epochs=1 --label_smoothing=0.3
+
+# Expected
+# accuracy: 0.5981 - loss: 1.2688 - val_accuracy: 0.9174 - val_loss: 0.3051
+# accuracy: 0.3429 - loss: 1.9163 - val_accuracy: 0.8826 - val_loss: 0.4937
+# accuracy: 0.7014 - loss: 1.0412 - val_accuracy: 0.9236 - val_loss: 0.2776
+# accuracy: 0.7006 - loss: 1.0429 - val_accuracy: 0.9232 - val_loss: 0.2801
+# accuracy: 0.7102 - loss: 1.3015 - val_accuracy: 0.9276 - val_loss: 0.7656
+# accuracy: 0.7113 - loss: 1.6854 - val_accuracy: 0.9332 - val_loss: 1.3709
+
+# Actual
+# accuracy: 0.6178 - loss: 1.2374 - val_accuracy: 0.9164 - val_loss: 0.3045
+# accuracy: 0.3412 - loss: 1.8919 - val_accuracy: 0.8818 - val_loss: 0.4794
+# accuracy: 0.6948 - loss: 1.0394 - val_accuracy: 0.9186 - val_loss: 0.2859
+# accuracy: 0.6947 - loss: 1.0410 - val_accuracy: 0.9184 - val_loss: 0.2885
+# accuracy: 0.6996 - loss: 1.3013 - val_accuracy: 0.9228 - val_loss: 0.7735
+# accuracy: 0.7102 - loss: 1.6879 - val_accuracy: 0.9284 - val_loss: 1.3739
diff --git a/labs/03/mnist_regularization.py b/labs/03/mnist_regularization.py
index cd78fcf..0b2e5a2 100644
--- a/labs/03/mnist_regularization.py
+++ b/labs/03/mnist_regularization.py
@@ -3,7 +3,10 @@
 import datetime
 import os
 import re
-os.environ.setdefault("KERAS_BACKEND", "torch")  # Use PyTorch backend unless specified otherwise
+
+os.environ.setdefault(
+    "KERAS_BACKEND", "torch"
+)  # Use PyTorch backend unless specified otherwise
 
 import keras
 import torch
@@ -15,12 +18,20 @@
 parser.add_argument("--batch_size", default=50, type=int, help="Batch size.")
 parser.add_argument("--dropout", default=0, type=float, help="Dropout regularization.")
 parser.add_argument("--epochs", default=30, type=int, help="Number of epochs.")
-parser.add_argument("--hidden_layers", default=[400], nargs="*", type=int, help="Hidden layer sizes.")
+parser.add_argument(
+    "--hidden_layers", default=[400], nargs="*", type=int, help="Hidden layer sizes."
+)
 parser.add_argument("--label_smoothing", default=0, type=float, help="Label smoothing.")
-parser.add_argument("--recodex", default=False, action="store_true", help="Evaluation in ReCodEx.")
+parser.add_argument(
+    "--recodex", default=False, action="store_true", help="Evaluation in ReCodEx."
+)
 parser.add_argument("--seed", default=42, type=int, help="Random seed.")
-parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.")
-parser.add_argument("--weight_decay", default=0, type=float, help="Weight decay strength.")
+parser.add_argument(
+    "--threads", default=1, type=int, help="Maximum number of threads to use."
+)
+parser.add_argument(
+    "--weight_decay", default=0, type=float, help="Weight decay strength."
+)
 # If you add more arguments, ReCodEx will keep them with your default values.
 
 
@@ -32,7 +43,10 @@ def __init__(self, path):
     def writer(self, writer):
         if writer not in self._writers:
             import torch.utils.tensorboard
-            self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer))
+
+            self._writers[writer] = torch.utils.tensorboard.SummaryWriter(
+                os.path.join(self._path, writer)
+            )
         return self._writers[writer]
 
     def add_logs(self, writer, logs, step):
@@ -43,10 +57,24 @@ def add_logs(self, writer, logs, step):
 
     def on_epoch_end(self, epoch, logs=None):
         if logs:
-            if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer):
-                logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)}
-            self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1)
-            self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1)
+            if isinstance(
+                getattr(self.model, "optimizer", None), keras.optimizers.Optimizer
+            ):
+                logs = logs | {
+                    "learning_rate": keras.ops.convert_to_numpy(
+                        self.model.optimizer.learning_rate
+                    )
+                }
+            self.add_logs(
+                "train",
+                {k: v for k, v in logs.items() if not k.startswith("val_")},
+                epoch + 1,
+            )
+            self.add_logs(
+                "val",
+                {k[4:]: v for k, v in logs.items() if k.startswith("val_")},
+                epoch + 1,
+            )
 
 
 def main(args: argparse.Namespace) -> dict[str, float]:
@@ -57,16 +85,24 @@ def main(args: argparse.Namespace) -> dict[str, float]:
         torch.set_num_interop_threads(args.threads)
 
     # Create logdir name
-    args.logdir = os.path.join("logs", "{}-{}-{}".format(
-        os.path.basename(globals().get("__file__", "notebook")),
-        datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
-        ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items())))
-    ))
+    args.logdir = os.path.join(
+        "logs",
+        "{}-{}-{}".format(
+            os.path.basename(globals().get("__file__", "notebook")),
+            datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
+            ",".join(
+                (
+                    "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v)
+                    for k, v in sorted(vars(args).items())
+                )
+            ),
+        ),
+    )
 
     # Load data
     mnist = MNIST(size={"train": 5_000})
 
-    # TODO: Incorporate dropout to the model below. Namely, add
+    # Incorporate dropout to the model below. Namely, add
     #   a `keras.layers.Dropout` layer with `args.dropout` rate after
     #   the `Flatten` layer and after each `Dense` hidden layer (but not after
     #   the output `Dense` layer).
@@ -74,11 +110,15 @@ def main(args: argparse.Namespace) -> dict[str, float]:
     model = keras.Sequential()
     model.add(keras.layers.Rescaling(1 / 255))
     model.add(keras.layers.Flatten())
+    model.add(keras.layers.Dropout(args.dropout))
+
     for hidden_layer in args.hidden_layers:
         model.add(keras.layers.Dense(hidden_layer, activation="relu"))
+        model.add(keras.layers.Dropout(rate=args.dropout))
+
     model.add(keras.layers.Dense(MNIST.LABELS, activation="softmax"))
 
-    # TODO: Implement label smoothing with the given `args.label_smoothing` strength.
+    # Implement label smoothing with the given `args.label_smoothing` strength.
     # You need to change the `SparseCategorical{Crossentropy,Accuracy}` to
     # `Categorical{Crossentropy,Accuracy}`, because `label_smoothing` is supported
     # only by the `CategoricalCrossentropy`. That means you also need to modify
@@ -86,29 +126,52 @@ def main(args: argparse.Namespace) -> dict[str, float]:
     # of the gold class to a full categorical distribution (you can use either NumPy,
     # or there is a helper method also in the `keras.utils` module).
 
-    # TODO: Create a `keras.optimizers.AdamW`, using the default learning
+    # Create a `keras.optimizers.AdamW`, using the default learning
     # rate and a weight decay of strength `args.weight_decay`. Then call the
     # `exclude_from_weight_decay` method to specify that all variables with "bias"
     # in their name should not be decayed.
-    optimizer = ...
-
-    model.compile(
-        optimizer=optimizer,
-        loss=keras.losses.SparseCategoricalCrossentropy(),
-        metrics=[keras.metrics.SparseCategoricalAccuracy(name="accuracy")],
-    )
+    optimizer = keras.optimizers.AdamW(weight_decay=args.weight_decay)
+    optimizer.exclude_from_weight_decay(var_names=["bias"])
+
+    s = args.label_smoothing != 0
+
+    if s:
+        model.compile(
+            optimizer=optimizer,
+            loss=keras.losses.CategoricalCrossentropy(label_smoothing=args.label_smoothing),
+            metrics=[keras.metrics.CategoricalAccuracy(name="accuracy")],
+        )
+    else:
+        model.compile(
+            optimizer=optimizer,
+            loss=keras.losses.SparseCategoricalCrossentropy(),
+            metrics=[keras.metrics.SparseCategoricalAccuracy(name="accuracy")],
+        )
 
     tb_callback = TorchTensorBoardCallback(args.logdir)
 
     logs = model.fit(
-        mnist.train.data["images"], mnist.train.data["labels"],
-        batch_size=args.batch_size, epochs=args.epochs,
-        validation_data=(mnist.dev.data["images"], mnist.dev.data["labels"]),
+        mnist.train.data["images"],
+        keras.utils.to_categorical(
+            mnist.train.data["labels"], num_classes=mnist.LABELS
+        ) if s else mnist.train.data["labels"],
+        batch_size=args.batch_size,
+        epochs=args.epochs,
+        validation_data=(
+            mnist.dev.data["images"],
+            keras.utils.to_categorical(
+                mnist.dev.data["labels"], num_classes=mnist.LABELS
+            ) if s else mnist.dev.data["labels"],
+        ),
         callbacks=[tb_callback],
     )
 
     # Return development metrics for ReCodEx to validate.
-    return {metric: values[-1] for metric, values in logs.history.items() if metric.startswith("val_")}
+    return {
+        metric: values[-1]
+        for metric, values in logs.history.items()
+        if metric.startswith("val_")
+    }
 
 
 if __name__ == "__main__":
diff --git a/labs/03/uppercase.py b/labs/03/uppercase.py
index c975e3f..c83d5c5 100644
--- a/labs/03/uppercase.py
+++ b/labs/03/uppercase.py
@@ -10,16 +10,16 @@
 
 from uppercase_data import UppercaseData
 
-# TODO: Set reasonable values for the hyperparameters, especially for
+# Set reasonable values for the hyperparameters, especially for
 # `alphabet_size`, `batch_size`, `epochs`, and `window`.
 # Also, you can set the number of threads to 0 to use all your CPU cores.
 parser = argparse.ArgumentParser()
-parser.add_argument("--alphabet_size", default=..., type=int, help="If given, use this many most frequent chars.")
-parser.add_argument("--batch_size", default=..., type=int, help="Batch size.")
-parser.add_argument("--epochs", default=..., type=int, help="Number of epochs.")
+parser.add_argument("--alphabet_size", default=70, type=int, help="If given, use this many most frequent chars.")
+parser.add_argument("--batch_size", default=1024, type=int, help="Batch size.")
+parser.add_argument("--epochs", default=2, type=int, help="Number of epochs.")
 parser.add_argument("--seed", default=42, type=int, help="Random seed.")
-parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.")
-parser.add_argument("--window", default=..., type=int, help="Window size to use.")
+parser.add_argument("--threads", default=0, type=int, help="Maximum number of threads to use.")
+parser.add_argument("--window", default=4, type=int, help="Window size to use.")
 
 
 class TorchTensorBoardCallback(keras.callbacks.Callback):
@@ -64,7 +64,7 @@ def main(args: argparse.Namespace) -> None:
     # Load data
     uppercase_data = UppercaseData(args.window, args.alphabet_size)
 
-    # TODO: Implement a suitable model, optionally including regularization, select
+    # Implement a suitable model, optionally including regularization, select
     # good hyperparameters and train the model.
     #
     # The inputs are _windows_ of fixed size (`args.window` characters on the left,
@@ -79,16 +79,34 @@ def main(args: argparse.Namespace) -> None:
     #   You can then flatten the one-hot encoded windows and follow with a dense layer.
     # - Alternatively, you can use `keras.layers.Embedding` (which is an efficient
     #   implementation of one-hot encoding followed by a Dense layer) and flatten afterwards.
-    model = ...
+    model = keras.Sequential([
+        keras.layers.InputLayer(shape=[2 * args.window + 1], dtype="int32"),
+        keras.layers.CategoryEncoding(len(uppercase_data.train.alphabet)),
+        keras.layers.Embedding(len(uppercase_data.train.alphabet), 8),
+
+        keras.layers.Flatten(),
+        keras.layers.Dense(64, activation='relu'),
+        keras.layers.Dropout(rate=0.5),
+        keras.layers.Dense(1, activation='sigmoid') # Sigmoid activation function for binary classification
+    ])
+
+    # Generate correctly capitalized test set.
+
+    predictions = model.predict(uppercase_data.test.data, batch_size=args.batch_size)
 
-    # TODO: Generate correctly capitalized test set.
     # Use `uppercase_data.test.text` as input, capitalize suitable characters,
     # and write the result to predictions_file (which is
     # `uppercase_test.txt` in the `args.logdir` directory).
     os.makedirs(args.logdir, exist_ok=True)
     with open(os.path.join(args.logdir, "uppercase_test.txt"), "w", encoding="utf-8") as predictions_file:
-        ...
-
+        new_text = ""
+        for pred, word in zip(predictions, uppercase_data.test.text):
+            if pred > .5:
+                new_word = word.upper()
+                new_text += new_word
+            else:
+                new_text
+        predictions_file.write(new_text)
 
 if __name__ == "__main__":
     args = parser.parse_args([] if "__file__" not in globals() else None)
diff --git a/labs/04/cifar10.py b/labs/04/cifar10.py
index 0ed0533..ec06755 100644
--- a/labs/04/cifar10.py
+++ b/labs/04/cifar10.py
@@ -33,7 +33,8 @@ def dataset(self, transform: Callable[[dict[str, np.ndarray]], Any] | None = Non
             return CIFAR10.TorchDataset(self, transform)
 
     class TorchDataset(torch.utils.data.Dataset):
-        def __init__(self, dataset: "Dataset", transform: Callable[[dict[str, np.ndarray]], Any] | None) -> None:
+        def __init__(self, dataset: "CIFAR10.Dataset",
+                     transform: Callable[[dict[str, np.ndarray]], Any] | None) -> None:
             self._dataset = dataset
             self._transform = transform
 
diff --git a/labs/04/cifar_competition.ps1 b/labs/04/cifar_competition.ps1
new file mode 100644
index 0000000..0d919fe
--- /dev/null
+++ b/labs/04/cifar_competition.ps1
@@ -0,0 +1 @@
+clear && python .\cifar_competition.py
diff --git a/labs/04/cifar_competition.py b/labs/04/cifar_competition.py
index 0541de8..be29019 100644
--- a/labs/04/cifar_competition.py
+++ b/labs/04/cifar_competition.py
@@ -3,7 +3,10 @@
 import datetime
 import os
 import re
-os.environ.setdefault("KERAS_BACKEND", "torch")  # Use PyTorch backend unless specified otherwise
+
+os.environ.setdefault(
+    "KERAS_BACKEND", "torch"
+)  # Use PyTorch backend unless specified otherwise
 
 import keras
 import numpy as np
@@ -11,13 +14,23 @@
 
 from cifar10 import CIFAR10
 
-# TODO: Define reasonable defaults and optionally more parameters.
+# Define reasonable defaults and optionally more parameters.
 # Also, you can set the number of threads to 0 to use all your CPU cores.
 parser = argparse.ArgumentParser()
-parser.add_argument("--batch_size", default=..., type=int, help="Batch size.")
-parser.add_argument("--epochs", default=..., type=int, help="Number of epochs.")
+parser.add_argument("--batch_size", default=128, type=int, help="Batch size.")
+parser.add_argument("--epochs", default=30, type=int, help="Number of epochs.")
+# parser.add_argument("--epochs", default=200, type=int, help="Number of epochs.")
+parser.add_argument("--learning_rate", default=0.001, help="Initial learning rate")
+parser.add_argument(
+    "--weight_decay", default=1e-4, type=float, help="L2 regularization weight decay."
+)
+parser.add_argument(
+    "--label_smoothing", default=0.1, type=float, help="Label smoothing."
+)
 parser.add_argument("--seed", default=42, type=int, help="Random seed.")
-parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.")
+parser.add_argument(
+    "--threads", default=1, type=int, help="Maximum number of threads to use."
+)
 
 
 class TorchTensorBoardCallback(keras.callbacks.Callback):
@@ -28,7 +41,10 @@ def __init__(self, path):
     def writer(self, writer):
         if writer not in self._writers:
             import torch.utils.tensorboard
-            self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer))
+
+            self._writers[writer] = torch.utils.tensorboard.SummaryWriter(
+                os.path.join(self._path, writer)
+            )
         return self._writers[writer]
 
     def add_logs(self, writer, logs, step):
@@ -39,13 +55,51 @@ def add_logs(self, writer, logs, step):
 
     def on_epoch_end(self, epoch, logs=None):
         if logs:
-            if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer):
-                logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)}
-            self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1)
-            self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1)
-
+            if isinstance(
+                getattr(self.model, "optimizer", None), keras.optimizers.Optimizer
+            ):
+                logs = logs | {
+                    "learning_rate": keras.ops.convert_to_numpy(
+                        self.model.optimizer.learning_rate
+                    )
+                }
+            self.add_logs(
+                "train",
+                {k: v for k, v in logs.items() if not k.startswith("val_")},
+                epoch + 1,
+            )
+            self.add_logs(
+                "val",
+                {k[4:]: v for k, v in logs.items() if k.startswith("val_")},
+                epoch + 1,
+            )
+
+def create_res(input_layer, filters, kernel_size, strides):
+    h = keras.layers.Conv2D(
+        filters=filters,
+        kernel_size=kernel_size,
+        strides=strides,
+        padding="same",
+        activation=None,
+    )(input_layer)
+
+    h = keras.layers.BatchNormalization()(h)
+    h = keras.layers.Activation("relu")(h)
+    h = keras.layers.Conv2D(
+        filters=filters,
+        kernel_size=kernel_size,
+        strides=1,
+        padding="same",
+        activation=None,
+        use_bias=False,
+    )(h)
+    h = keras.layers.BatchNormalization()(h)
+    h = keras.layers.Add()([input_layer, h])
+    h = keras.layers.Activation("relu")(h)
+    return h
 
 def main(args: argparse.Namespace) -> None:
+
     # Set the random seed and the number of threads.
     keras.utils.set_random_seed(args.seed)
     if args.threads:
@@ -53,23 +107,75 @@ def main(args: argparse.Namespace) -> None:
         torch.set_num_interop_threads(args.threads)
 
     # Create logdir name
-    args.logdir = os.path.join("logs", "{}-{}-{}".format(
-        os.path.basename(globals().get("__file__", "notebook")),
-        datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
-        ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items())))
-    ))
+    args.logdir = os.path.join(
+        "logs",
+        "{}-{}-{}".format(
+            os.path.basename(globals().get("__file__", "notebook")),
+            datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
+            ",".join(
+                (
+                    "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v)
+                    for k, v in sorted(vars(args).items())
+                )
+            ),
+        ),
+    )
 
     # Load data
     cifar = CIFAR10()
 
-    # TODO: Create the model and train it
-    model = ...
+    # Create the model and train it
+    inputs = keras.Input(shape=cifar.train.data["images"][0].shape)
+    h = keras.layers.Rescaling(1 / 255)(inputs)
+    h = keras.layers.Conv2D(64, 3, 1, "same", activation="relu")(h)
+    h = create_res(h, 64, 3, 1)
+    h = keras.layers.MaxPool2D(2)(h)
+    h = create_res(h, 64, 3, 1)
+    h = keras.layers.MaxPool2D(2)(h)
+    h = keras.layers.Dropout(0.2)(h)
+    h = create_res(h, 64, 3, 1)
+    h = keras.layers.Flatten()(h)
+    h = keras.layers.Dropout(0.2)(h)
+    h = keras.layers.Dense(200, activation="relu")(h)
+    outputs = keras.layers.Dense(len(CIFAR10.LABELS), activation="softmax")(h)
+
+    model = keras.Model(inputs=inputs, outputs=outputs)
+
+    model.summary()
+
+
+    lr_optimizer = keras.optimizers.schedules.CosineDecay(
+        initial_learning_rate=args.learning_rate,
+        decay_steps=len(cifar.train.data["images"] / args.batch_size * args.epochs)
+    )
+
+    model.compile(
+        optimizer=keras.optimizers.Adam(
+            learning_rate=lr_optimizer,
+            weight_decay=args.weight_decay),
+        loss=keras.losses.SparseCategoricalCrossentropy(),
+        metrics=[keras.metrics.SparseCategoricalAccuracy(name="accuracy")],
+    )
+
+    model.fit(
+        cifar.train.data["images"],
+        cifar.train.data["labels"],
+        batch_size=args.batch_size,
+        epochs=args.epochs,
+
+    )
+
+    model.save(os.path.join(args.logdir, "cifar.h5"), include_optimizer=False)
 
     # Generate test set annotations, but in `args.logdir` to allow parallel execution.
     os.makedirs(args.logdir, exist_ok=True)
-    with open(os.path.join(args.logdir, "cifar_competition_test.txt"), "w", encoding="utf-8") as predictions_file:
-        # TODO: Perform the prediction on the test data.
-        for probs in model.predict(...):
+    with open(
+        os.path.join(args.logdir, "cifar_competition_test.txt"), "w", encoding="utf-8"
+    ) as predictions_file:
+        # Perform the prediction on the test data.
+        for probs in model.predict(
+            cifar.test.data["images"], batch_size=args.batch_size
+        ):
             print(np.argmax(probs), file=predictions_file)
 
 
diff --git a/labs/04/mnist_cnn results.txt b/labs/04/mnist_cnn results.txt
new file mode 100644
index 0000000..63271eb
--- /dev/null
+++ b/labs/04/mnist_cnn results.txt	
@@ -0,0 +1,29 @@
+👉 TEST 1
+python3 mnist_cnn.py --epochs=1 --cnn=F,H-100
+1100/1100 ━━━━━━━━━━━━━━━━━━━━ 14s 12ms/step - accuracy: 0.8499 - loss: 0.5317 - val_accuracy: 0.9618 - val_loss: 0.1400
+1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.8503 - loss: 0.5286 - val_accuracy: 0.9604 - val_loss: 0.1432
+
+👉 TEST 2
+python3 mnist_cnn.py --epochs=1 --cnn=F,H-100,D-0.5
+1100/1100 ━━━━━━━━━━━━━━━━━━━━ 14s 13ms/step - accuracy: 0.7662 - loss: 0.7543 - val_accuracy: 0.9576 - val_loss: 0.1612
+1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.7706 - loss: 0.7444 - val_accuracy: 0.9572 - val_loss: 0.1606
+
+👉 TEST 3
+python3 mnist_cnn.py --epochs=1 --cnn=M-5-2,F,H-50
+1100/1100 ━━━━━━━━━━━━━━━━━━━━ 14s 13ms/step - accuracy: 0.6706 - loss: 1.0717 - val_accuracy: 0.8814 - val_loss: 0.3802
+1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.6630 - loss: 1.0703 - val_accuracy: 0.8798 - val_loss: 0.3894
+
+👉 TEST 4
+python3 mnist_cnn.py --epochs=1 --cnn=C-8-3-5-same,C-8-3-2-valid,F,H-50
+1100/1100 ━━━━━━━━━━━━━━━━━━━━ 18s 16ms/step - accuracy: 0.5799 - loss: 1.2751 - val_accuracy: 0.8898 - val_loss: 0.3616
+1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.5898 - loss: 1.2535 - val_accuracy: 0.8774 - val_loss: 0.4079
+
+👉 TEST 5
+python3 mnist_cnn.py --epochs=1 --cnn=CB-6-3-5-valid,F,H-32
+1100/1100 ━━━━━━━━━━━━━━━━━━━━ 20s 17ms/step - accuracy: 0.6976 - loss: 0.9518 - val_accuracy: 0.9228 - val_loss: 0.2614
+1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.6822 - loss: 1.0011 - val_accuracy: 0.9284 - val_loss: 0.2537
+
+👉 TEST 6
+python3 mnist_cnn.py --epochs=1 --cnn=CB-8-3-5-valid,R-[CB-8-3-1-same,CB-8-3-1-same],F,H-50
+1100/1100 ━━━━━━━━━━━━━━━━━━━━ 31s 27ms/step - accuracy: 0.7476 - loss: 0.7841 - val_accuracy: 0.9370 - val_loss: 0.2037
+1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.7562 - loss: 0.7717 - val_accuracy: 0.9486 - val_loss: 0.1734
diff --git a/labs/04/mnist_cnn.ps1 b/labs/04/mnist_cnn.ps1
new file mode 100644
index 0000000..bf78797
--- /dev/null
+++ b/labs/04/mnist_cnn.ps1
@@ -0,0 +1,30 @@
+""
+"👉 TEST 1"
+"python3 mnist_cnn.py --epochs=1 --cnn=F,H-100"
+python3 mnist_cnn.py --epochs=1 --cnn=F,H-100
+"1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.8503 - loss: 0.5286 - val_accuracy: 0.9604 - val_loss: 0.1432"
+""
+"👉 TEST 2"
+"python3 mnist_cnn.py --epochs=1 --cnn=F,H-100,D-0.5"
+python3 mnist_cnn.py --epochs=1 --cnn=F,H-100,D-0.5
+"1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.7706 - loss: 0.7444 - val_accuracy: 0.9572 - val_loss: 0.1606"
+""
+"👉 TEST 3"
+"python3 mnist_cnn.py --epochs=1 --cnn=M-5-2,F,H-50"
+python3 mnist_cnn.py --epochs=1 --cnn=M-5-2,F,H-50
+"1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.6630 - loss: 1.0703 - val_accuracy: 0.8798 - val_loss: 0.3894"
+""
+"👉 TEST 4"
+"python3 mnist_cnn.py --epochs=1 --cnn=C-8-3-5-same,C-8-3-2-valid,F,H-50"
+python3 mnist_cnn.py --epochs=1 --cnn=C-8-3-5-same,C-8-3-2-valid,F,H-50
+"1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.5898 - loss: 1.2535 - val_accuracy: 0.8774 - val_loss: 0.4079"
+""
+"👉 TEST 5"
+"python3 mnist_cnn.py --epochs=1 --cnn=CB-6-3-5-valid,F,H-32"
+python3 mnist_cnn.py --epochs=1 --cnn=CB-6-3-5-valid,F,H-32
+"1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.6822 - loss: 1.0011 - val_accuracy: 0.9284 - val_loss: 0.2537"
+""
+"👉 TEST 6"
+"python3 mnist_cnn.py --epochs=1 --cnn=CB-8-3-5-valid,R-[CB-8-3-1-same,CB-8-3-1-same],F,H-50"
+python3 mnist_cnn.py --epochs=1 --cnn=CB-8-3-5-valid,R-[CB-8-3-1-same,CB-8-3-1-same],F,H-50
+"1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.7562 - loss: 0.7717 - val_accuracy: 0.9486 - val_loss: 0.1734"
diff --git a/labs/04/mnist_cnn.py b/labs/04/mnist_cnn.py
index a3a91cd..b3c5727 100644
--- a/labs/04/mnist_cnn.py
+++ b/labs/04/mnist_cnn.py
@@ -1,9 +1,14 @@
 #!/usr/bin/env python3
 import argparse
 import os
-os.environ.setdefault("KERAS_BACKEND", "torch")  # Use PyTorch backend unless specified otherwise
+import re
+
+os.environ.setdefault(
+    "KERAS_BACKEND", "torch"
+)  # Use PyTorch backend unless specified otherwise
 
 import keras
+from keras.layers import add
 import torch
 
 from mnist import MNIST
@@ -11,42 +16,115 @@
 parser = argparse.ArgumentParser()
 # These arguments will be set appropriately by ReCodEx, even if you change them.
 parser.add_argument("--batch_size", default=50, type=int, help="Batch size.")
-parser.add_argument("--cnn", default=None, type=str, help="CNN architecture.")
+parser.add_argument(
+    "--cnn",
+    default="CB-16-5-2-same,M-3-2,F,H-100,D-0.5",
+    type=str,
+    help="CNN architecture.",
+)
 parser.add_argument("--epochs", default=10, type=int, help="Number of epochs.")
-parser.add_argument("--recodex", default=False, action="store_true", help="Evaluation in ReCodEx.")
+parser.add_argument(
+    "--recodex", default=False, action="store_true", help="Evaluation in ReCodEx."
+)
 parser.add_argument("--seed", default=42, type=int, help="Random seed.")
-parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.")
+parser.add_argument(
+    "--threads", default=1, type=int, help="Maximum number of threads to use."
+)
 # If you add more arguments, ReCodEx will keep them with your default values.
 
 
+def create_layer(layer_type, layer_args, hidden):
+    if layer_type == "C":
+        filters, kernel_size, stride, padding = layer_args
+        hidden = keras.layers.Conv2D(
+            filters=int(filters),
+            kernel_size=(int(kernel_size)),
+            strides=(int(stride)),
+            padding=padding,
+            activation="relu",
+        )(hidden)
+
+        return hidden
+
+    # - `CB-filters-kernel_size-stride-padding`: Same as `C`, but use batch normalization.
+    #   In detail, start with a convolutional layer **without bias** and activation,
+    #   then add a batch normalization layer, and finally the ReLU activation.
+    if layer_type == "CB":
+        filters, kernel_size, stride, padding = layer_args
+        hidden = keras.layers.Conv2D(
+            filters=int(filters),
+            kernel_size=(int(kernel_size)),
+            strides=(int(stride)),
+            padding=padding,
+            use_bias=False,
+        )(hidden)
+        hidden = keras.layers.BatchNormalization()(hidden)
+        hidden = keras.layers.ReLU()(hidden)
+        return hidden
+
+    # - `M-pool_size-stride`: Add max pooling with specified size and stride, using
+    #   the default "valid" padding.
+    if layer_type == "M":
+        pool_size, stride = layer_args
+        hidden = keras.layers.MaxPooling2D(
+            pool_size=int(pool_size),
+            strides=(int(stride)),
+        )(hidden)
+        return hidden
+
+    # - `R-[layers]`: Add a residual connection. The `layers` contain a specification
+    #   of at least one convolutional layer (but not a recursive residual connection `R`).
+    #   The input to the `R` layer should be processed sequentially by `layers`, and the
+    #   produced output (after the ReLU nonlinearity of the last layer) should be added
+    #   to the input (of this `R` layer).
+    if layer_type == "R":
+        input_layer = hidden
+        layers = "-".join(layer_args)[1:-1].split(",")
+
+        for layer in layers:
+            layer_type, *layer_args = layer.split("-")
+
+            hidden = create_layer(layer_type, layer_args, hidden)
+
+        hidden = keras.layers.Add()([input_layer, hidden])
+
+        return hidden
+
+    # - `F`: Flatten inputs. Must appear exactly once in the architecture.
+    if layer_type == "F":
+        hidden = keras.layers.Flatten()(hidden)
+        return hidden
+
+    # - `H-hidden_layer_size`: Add a dense layer with ReLU activation and the specified size.
+    if layer_type == "H":
+        hidden_layer_size,  = layer_args
+        hidden = keras.layers.Dense(units=int(hidden_layer_size), activation="relu")(hidden)
+        return hidden
+
+    # - `D-dropout_rate`: Apply dropout with the given dropout rate.
+    if layer_type == "D":
+        dropout_rate, = layer_args
+        hidden = keras.layers.Dropout(rate=float(dropout_rate))(hidden)
+        return hidden
+
+
 class Model(keras.Model):
     def __init__(self, args: argparse.Namespace) -> None:
-        # TODO: Create the model. The template uses the functional API, but
+        # Create the model. The template uses the functional API, but
         # feel free to use subclassing if you want.
         inputs = keras.Input(shape=[MNIST.H, MNIST.W, MNIST.C])
         hidden = keras.layers.Rescaling(1 / 255)(inputs)
 
-        # TODO: Add CNN layers specified by `args.cnn`, which contains
-        # a comma-separated list of the following layers:
-        # - `C-filters-kernel_size-stride-padding`: Add a convolutional layer with ReLU
-        #   activation and specified number of filters, kernel size, stride and padding.
-        # - `CB-filters-kernel_size-stride-padding`: Same as `C`, but use batch normalization.
-        #   In detail, start with a convolutional layer **without bias** and activation,
-        #   then add a batch normalization layer, and finally the ReLU activation.
-        # - `M-pool_size-stride`: Add max pooling with specified size and stride, using
-        #   the default "valid" padding.
-        # - `R-[layers]`: Add a residual connection. The `layers` contain a specification
-        #   of at least one convolutional layer (but not a recursive residual connection `R`).
-        #   The input to the `R` layer should be processed sequentially by `layers`, and the
-        #   produced output (after the ReLU nonlinearity of the last layer) should be added
-        #   to the input (of this `R` layer).
-        # - `F`: Flatten inputs. Must appear exactly once in the architecture.
-        # - `H-hidden_layer_size`: Add a dense layer with ReLU activation and the specified size.
-        # - `D-dropout_rate`: Apply dropout with the given dropout rate.
+        cnn_args = re.split(r",(?![^\[]*\])", args.cnn)
+
+        for layer in cnn_args:
+            layer_type, *layer_args = layer.split("-")
+
+            hidden = create_layer(layer_type, layer_args, hidden)
+
         # You can assume the resulting network is valid; it is fine to crash if it is not.
         #
         # Produce the results in the variable `hidden`.
-        hidden = ...
 
         # Add the final output layer
         outputs = keras.layers.Dense(MNIST.LABELS, activation="softmax")(hidden)
@@ -73,13 +151,19 @@ def main(args: argparse.Namespace) -> dict[str, float]:
     model = Model(args)
 
     logs = model.fit(
-        mnist.train.data["images"], mnist.train.data["labels"],
-        batch_size=args.batch_size, epochs=args.epochs,
+        mnist.train.data["images"],
+        mnist.train.data["labels"],
+        batch_size=args.batch_size,
+        epochs=args.epochs,
         validation_data=(mnist.dev.data["images"], mnist.dev.data["labels"]),
     )
 
     # Return development metrics for ReCodEx to validate.
-    return {metric: values[-1] for metric, values in logs.history.items() if metric.startswith("val_")}
+    return {
+        metric: values[-1]
+        for metric, values in logs.history.items()
+        if metric.startswith("val_")
+    }
 
 
 if __name__ == "__main__":
diff --git a/labs/04/mnist_multiple.ps1 b/labs/04/mnist_multiple.ps1
new file mode 100644
index 0000000..3416b36
--- /dev/null
+++ b/labs/04/mnist_multiple.ps1
@@ -0,0 +1,11 @@
+""
+"👉 TEST 1"
+"python3 mnist_multiple.py --epochs=1 --batch_size=50"
+python3 mnist_multiple.py --epochs=1 --batch_size=50
+"275/275 ━━━━━━━━━━━━━━━━━━━━ 11s 38ms/step - direct_comparison_accuracy: 0.7993 - indirect_comparison_accuracy: 0.8930 - loss: 1.6710 - val_direct_comparison_accuracy: 0.9508 - val_indirect_comparison_accuracy: 0.9836 - val_loss: 0.2984"
+""
+"👉 TEST 2"
+"python3 mnist_cnn.py --epochs=1 --cnn=F,H-100,D-0.5"
+python3 mnist_multiple.py --epochs=1 --batch_size=100
+"275/275 ━━━━━━━━━━━━━━━━━━━━ 11s 38ms/step - direct_comparison_accuracy: 0.7680 - indirect_comparison_accuracy: 0.8637 - loss: 2.1429 - val_direct_comparison_accuracy: 0.9288 - val_indirect_comparison_accuracy: 0.9772 - val_loss: 0.4157"
+""
diff --git a/labs/04/mnist_multiple.py b/labs/04/mnist_multiple.py
index 06b9d9e..def13ab 100644
--- a/labs/04/mnist_multiple.py
+++ b/labs/04/mnist_multiple.py
@@ -1,7 +1,10 @@
 #!/usr/bin/env python3
 import argparse
 import os
-os.environ.setdefault("KERAS_BACKEND", "torch")  # Use PyTorch backend unless specified otherwise
+
+os.environ.setdefault(
+    "KERAS_BACKEND", "torch"
+)  # Use PyTorch backend unless specified otherwise
 
 import numpy as np
 import keras
@@ -13,9 +16,13 @@
 # These arguments will be set appropriately by ReCodEx, even if you change them.
 parser.add_argument("--batch_size", default=50, type=int, help="Batch size.")
 parser.add_argument("--epochs", default=5, type=int, help="Number of epochs.")
-parser.add_argument("--recodex", default=False, action="store_true", help="Evaluation in ReCodEx.")
+parser.add_argument(
+    "--recodex", default=False, action="store_true", help="Evaluation in ReCodEx."
+)
 parser.add_argument("--seed", default=42, type=int, help="Random seed.")
-parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.")
+parser.add_argument(
+    "--threads", default=1, type=int, help="Maximum number of threads to use."
+)
 # If you add more arguments, ReCodEx will keep them with your default values.
 
 
@@ -27,7 +34,7 @@ def __init__(self, args: argparse.Namespace) -> None:
             keras.Input(shape=[MNIST.H, MNIST.W, MNIST.C]),
         )
 
-        # TODO: The model starts by passing each input image through the same
+        # The model starts by passing each input image through the same
         # subnetwork (with shared weights), which should perform
         # - keras.layers.Rescaling(1 / 255) to convert images to floats in [0, 1] range,
         # - convolution with 10 filters, 3x3 kernel size, stride 2, "valid" padding, ReLU activation,
@@ -36,24 +43,49 @@ def __init__(self, args: argparse.Namespace) -> None:
         # - fully connected layer with 200 neurons and ReLU activation,
         # obtaining a 200-dimensional feature vector FV of each image.
 
-        # TODO: Using the computed representations, the model should produce four outputs:
+        rescale = keras.layers.Rescaling(1 / 255)
+        c1 = keras.layers.Conv2D(
+            filters=10, kernel_size=3, strides=2, padding="valid", activation="relu"
+        )
+        c2 = keras.layers.Conv2D(
+            filters=20, kernel_size=3, strides=2, padding="valid", activation="relu"
+        )
+        flat = keras.layers.Flatten()
+        hidden = keras.layers.Dense(200, activation="relu")
+
+        fv1 = hidden(flat(c2(c1(rescale(images[0])))))
+        fv2 = hidden(flat(c2(c1(rescale(images[1])))))
+
+        # Using the computed representations, the model should produce four outputs:
         # - first, compute _direct comparison_ whether the first digit is
         #   greater than the second, by
         #   - concatenating the two 200-dimensional image representations FV,
         #   - processing them using another 200-neuron ReLU dense layer
         #   - computing one output using a dense layer with "sigmoid" activation
+        concatenation = keras.layers.Concatenate()([fv1, fv2])
+        hidden2 = keras.layers.Dense(200, activation="relu")
+        pred_layer = keras.layers.Dense(1, activation="sigmoid")
+        direct_comparison = pred_layer(hidden2(concatenation))
         # - then, classify the computed representation FV of the first image using
         #   a densely connected softmax layer into 10 classes;
         # - then, classify the computed representation FV of the second image using
         #   the same layer (identical, i.e., with shared weights) into 10 classes;
+        classification_layer = keras.layers.Dense(10, activation="softmax")
+        d1 = classification_layer(fv1)
+        d2 = classification_layer(fv2)
         # - finally, compute _indirect comparison_ whether the first digit
         #   is greater than second, by comparing the predictions from the above
         #   two outputs; convert the comparison to "float32" using `keras.ops.cast`.
         outputs = {
-            "direct_comparison": ...,
-            "digit_1": ...,
-            "digit_2": ...,
-            "indirect_comparison": ...,
+            "direct_comparison": direct_comparison,
+            "digit_1": d1,
+            "digit_2": d2,
+            "indirect_comparison": keras.ops.cast(
+                keras.ops.greater(
+                    keras.ops.argmax(d1, axis=1), keras.ops.argmax(d2, axis=1)
+                ),
+                "float32",
+            ),
         }
 
         # Finally, construct the model.
@@ -65,7 +97,7 @@ def __init__(self, args: argparse.Namespace) -> None:
         # the keys of the `outputs` dictionary.
         self.output_names = sorted(outputs.keys())
 
-        # TODO: Define the appropriate losses for the model outputs
+        # Define the appropriate losses for the model outputs
         # "direct_comparison", "digit_1", "digit_2". Regarding metrics,
         # the accuracy of both the direct and indirect comparisons should be
         # computed; name both metrics "accuracy" (i.e., pass "accuracy" as the
@@ -73,19 +105,25 @@ def __init__(self, args: argparse.Namespace) -> None:
         self.compile(
             optimizer=keras.optimizers.Adam(),
             loss={
-                "direct_comparison": ...,
-                "digit_1": ...,
-                "digit_2": ...,
+                "direct_comparison": keras.losses.BinaryCrossentropy(),
+                "digit_1": keras.losses.SparseCategoricalCrossentropy(),
+                "digit_2": keras.losses.SparseCategoricalCrossentropy(),
             },
             metrics={
-                "direct_comparison": [...],
-                "indirect_comparison": [...],
+                "direct_comparison": [
+                    keras.metrics.BinaryAccuracy(name="accuracy"),
+                ],
+                "indirect_comparison": [
+                    keras.metrics.BinaryAccuracy(name="accuracy"),
+                ],
             },
         )
 
     # Create an appropriate dataset using the MNIST data.
     def create_dataset(
-        self, mnist_dataset: MNIST.Dataset, args: argparse.Namespace,
+        self,
+        mnist_dataset: MNIST.Dataset,
+        args: argparse.Namespace,
     ) -> torch.utils.data.Dataset:
         # Original MNIST dataset.
         images, labels = mnist_dataset.data["images"], mnist_dataset.data["labels"]
@@ -94,16 +132,27 @@ def create_dataset(
         # You can assume that the size of the original dataset is even.
         class TorchDataset(torch.utils.data.Dataset):
             def __len__(self) -> int:
-                # TODO: The new dataset has half the size of the original one.
-                return ...
+                # The new dataset has half the size of the original one.
+                return len(images) // 2
 
-            def __getitem__(self, index: int) -> tuple[tuple[np.ndarray, np.ndarray], dict[str, np.ndarray]]:
-                # TODO: Given an `index`, generate a dataset element suitable for our model.
+            def __getitem__(
+                self, index: int
+            ) -> tuple[tuple[np.ndarray, np.ndarray], dict[str, np.ndarray]]:
+                # Given an `index`, generate a dataset element suitable for our model.
                 # Notably, the element should be a pair `(input, output)`, with
                 # - `input` being a pair of images `(images[2 * index], images[2 * index + 1])`,
                 # - `output` being a dictionary with keys "digit_1", "digit_2", "direct_comparison",
                 #   and "indirect_comparison".
-                return ...
+                return (
+                    (images[2 * index], images[2 * index + 1]),
+                    {
+                        "digit_1": labels[2 * index],
+                        "digit_2": labels[2 * index + 1],
+                        "direct_comparison": labels[2 * index] > labels[2 * index + 1],
+                        "indirect_comparison": labels[2 * index]
+                        > labels[2 * index + 1],
+                    },
+                )
 
         return TorchDataset()
 
@@ -122,14 +171,22 @@ def main(args: argparse.Namespace) -> dict[str, float]:
     model = Model(args)
 
     # Construct suitable dataloaders from the MNIST data.
-    train = torch.utils.data.DataLoader(model.create_dataset(mnist.train, args), args.batch_size, shuffle=True)
-    dev = torch.utils.data.DataLoader(model.create_dataset(mnist.dev, args), args.batch_size)
+    train = torch.utils.data.DataLoader(
+        model.create_dataset(mnist.train, args), args.batch_size, shuffle=True
+    )
+    dev = torch.utils.data.DataLoader(
+        model.create_dataset(mnist.dev, args), args.batch_size
+    )
 
     # Train
     logs = model.fit(train, epochs=args.epochs, validation_data=dev)
 
     # Return development metrics for ReCodEx to validate.
-    return {metric: values[-1] for metric, values in logs.history.items() if metric.startswith("val_")}
+    return {
+        metric: values[-1]
+        for metric, values in logs.history.items()
+        if metric.startswith("val_")
+    }
 
 
 if __name__ == "__main__":
diff --git a/labs/04/torch_dataset.ps1 b/labs/04/torch_dataset.ps1
new file mode 100644
index 0000000..46fa378
--- /dev/null
+++ b/labs/04/torch_dataset.ps1
@@ -0,0 +1,11 @@
+# ""
+# "👉 TEST 1"
+# "python3 torch_dataset.py --epochs=1 --batch_size=100"
+# python3 torch_dataset.py --epochs=1 --batch_size=100
+# "50/50 ━━━━━━━━━━━━━━━━━━━━ 3s 33ms/step - accuracy: 0.1297 - loss: 2.2519 - val_accuracy: 0.2710 - val_loss: 1.9796"
+""
+"👉 TEST 2"
+"python3 torch_dataset.py --epochs=1 --batch_size=50 --augment"
+python3 torch_dataset.py --epochs=1 --batch_size=50 --augment
+"100/100 ━━━━━━━━━━━━━━━━━━━━ 4s 34ms/step - accuracy: 0.1354 - loss: 2.2565 - val_accuracy: 0.2690 - val_loss: 1.9889"
+""
diff --git a/labs/04/torch_dataset.py b/labs/04/torch_dataset.py
index 5e0c330..f689e54 100644
--- a/labs/04/torch_dataset.py
+++ b/labs/04/torch_dataset.py
@@ -53,54 +53,67 @@ def main(args: argparse.Namespace) -> dict[str, float]:
         metrics=[keras.metrics.SparseCategoricalAccuracy(name="accuracy")],
     )
 
-    # TODO: Create a Torch dataset constructible from the given `CIFAR10.Dataset`.
+    # Create a Torch dataset constructible from the given `CIFAR10.Dataset`.
     # You should use only the first `size` examples of the dataset, and optional
     # augmentation function `augmentation_fn` may be applied to the images.
     class TorchDataset(torch.utils.data.Dataset):
+        images: np.ndarray
+        labels: np.ndarray
+        augmentation_fn: callable
+
         def __init__(self, cifar: CIFAR10.Dataset, size: int, augmentation_fn=None) -> None:
-            # TODO: Note that the images and labels are available in `cifar.data["images"]`
+            # Note that the images and labels are available in `cifar.data["images"]`
             # and `cifar.data["labels"]`.
-            ...
+            self.images = cifar.data["images"][:size]
+            self.labels = cifar.data["labels"][:size]
+            self.augmentation_fn = augmentation_fn
 
         def __len__(self) -> int:
-            # TODO: Return the appropriate size.
-            ...
+            # Return the appropriate size.
+            size = len(self.images)
+            return size
+
 
         def __getitem__(self, index: int) -> tuple[np.ndarray | torch.Tensor, int]:
-            # TODO: Return the `index`-th example from the dataset, with the image optionally
+            # Return the `index`-th example from the dataset, with the image optionally
             # passed through the `augmentation_fn` if it is not `None`.
-            ...
+            return self.augmentation_fn(self.images[index]) if self.augmentation_fn else self.images[index], self.labels[index]
 
     if args.augment:
         # Construct a sequence of augmentation transformations from `torchvision.transforms.v2`.
         transformation = v2.Compose([
-            # TODO: Add the following transformations:
+            # Add the following transformations:
             # - first create a `v2.RandomResize` that scales the image to
             #   random size in range [28, 36],
             # - then add `v2.Pad` that pads the image with 4 pixels on each side,
             # - then add `v2.RandomCrop` that chooses a random crop of size 32x32,
             # - and finally add `v2.RandomHorizontalFlip` that uniformly
             #   randomly flips the image horizontally.
-            ...
+            v2.RandomResize(28, 36),
+            v2.Pad(4),
+            v2.RandomCrop(32),
+            v2.RandomHorizontalFlip(),
         ])
 
         def augmentation_fn(image: np.ndarray) -> torch.Tensor:
-            # TODO: First, convert the numpy `images` to a PyTorch tensor of uint8s,
+            # First, convert the numpy `images` to a PyTorch tensor of uint8s,
             # preferably by using `torch.from_numpy` or `torch.as_tensor` to avoid copying.
             # Then, because of the channels-position mismatch, permute the axes
             # in the image to change the order of the axes from HWC to CHW.
             # Next, apply the `transformation` to the image (by calling it with
             # the image as an argument), and finally permute the axes back to
             # the original order.
-            return ...
+
+            return transformation(torch.as_tensor(image).permute(2, 0, 1)).permute(1, 2, 0)
+
     else:
         augmentation_fn = None
 
-    # TODO: Create `train` and `dev` instances of `TorchDataset` from the corresponding
+    # Create `train` and `dev` instances of `TorchDataset` from the corresponding
     # `cifar` datasets. Limit their sizes to 5_000 and 1_000 examples, respectively,
     # and use the `augmentation_fn` for the training dataset.
-    train = ...
-    dev = ...
+    train = TorchDataset(cifar.train, 5_000, augmentation_fn)
+    dev = TorchDataset(cifar.dev, 1_000)
 
     if args.show_images:
         from torch.utils import tensorboard
@@ -114,10 +127,10 @@ def augmentation_fn(image: np.ndarray) -> torch.Tensor:
         tb_writer.close()
         print("Saved first {} training imaged to logs/{}".format(GRID * GRID, TAG))
 
-    # TODO: Create `train` and `dev` instances of `torch.utils.data.DataLoader` from
+    # Create `train` and `dev` instances of `torch.utils.data.DataLoader` from
     # the datasets, using the given `args.batch_size` and shuffling the training dataset.
-    train = ...
-    dev = ...
+    train = torch.utils.data.DataLoader(train, args.batch_size, shuffle=True)
+    dev = torch.utils.data.DataLoader(dev, args.batch_size)
 
     # Train
     logs = model.fit(train, epochs=args.epochs, validation_data=dev)
diff --git a/labs/05/cags_classification.py b/labs/05/cags_classification.py
index 6381238..e12a9eb 100644
--- a/labs/05/cags_classification.py
+++ b/labs/05/cags_classification.py
@@ -3,7 +3,10 @@
 import datetime
 import os
 import re
-os.environ.setdefault("KERAS_BACKEND", "torch")  # Use PyTorch backend unless specified otherwise
+
+os.environ.setdefault(
+    "KERAS_BACKEND", "torch"
+)  # Use PyTorch backend unless specified otherwise
 
 import keras
 import numpy as np
@@ -11,13 +14,19 @@
 
 from cags_dataset import CAGS
 
-# TODO: Define reasonable defaults and optionally more parameters.
+# Define reasonable defaults and optionally more parameters.
 # Also, you can set the number of threads to 0 to use all your CPU cores.
 parser = argparse.ArgumentParser()
-parser.add_argument("--batch_size", default=..., type=int, help="Batch size.")
-parser.add_argument("--epochs", default=None, type=int, help="Number of epochs.")
+parser.add_argument("--learning_rate", default=0.0001, help="Initial learning rate")
+parser.add_argument(
+    "--weight_decay", default=1e-4, type=float, help="L2 regularization weight decay."
+)
+parser.add_argument("--batch_size", default=10, type=int, help="Batch size.")
+parser.add_argument("--epochs", default=35, type=int, help="Number of epochs.")
 parser.add_argument("--seed", default=42, type=int, help="Random seed.")
-parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.")
+parser.add_argument(
+    "--threads", default=1, type=int, help="Maximum number of threads to use."
+)
 
 
 class TorchTensorBoardCallback(keras.callbacks.Callback):
@@ -28,7 +37,10 @@ def __init__(self, path):
     def writer(self, writer):
         if writer not in self._writers:
             import torch.utils.tensorboard
-            self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer))
+
+            self._writers[writer] = torch.utils.tensorboard.SummaryWriter(
+                os.path.join(self._path, writer)
+            )
         return self._writers[writer]
 
     def add_logs(self, writer, logs, step):
@@ -39,10 +51,24 @@ def add_logs(self, writer, logs, step):
 
     def on_epoch_end(self, epoch, logs=None):
         if logs:
-            if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer):
-                logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)}
-            self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1)
-            self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1)
+            if isinstance(
+                getattr(self.model, "optimizer", None), keras.optimizers.Optimizer
+            ):
+                logs = logs | {
+                    "learning_rate": keras.ops.convert_to_numpy(
+                        self.model.optimizer.learning_rate
+                    )
+                }
+            self.add_logs(
+                "train",
+                {k: v for k, v in logs.items() if not k.startswith("val_")},
+                epoch + 1,
+            )
+            self.add_logs(
+                "val",
+                {k[4:]: v for k, v in logs.items() if k.startswith("val_")},
+                epoch + 1,
+            )
 
 
 def main(args: argparse.Namespace) -> None:
@@ -53,11 +79,19 @@ def main(args: argparse.Namespace) -> None:
         torch.set_num_interop_threads(args.threads)
 
     # Create logdir name
-    args.logdir = os.path.join("logs", "{}-{}-{}".format(
-        os.path.basename(globals().get("__file__", "notebook")),
-        datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
-        ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items())))
-    ))
+    args.logdir = os.path.join(
+        "logs",
+        "{}-{}-{}".format(
+            os.path.basename(globals().get("__file__", "notebook")),
+            datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
+            ",".join(
+                (
+                    "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v)
+                    for k, v in sorted(vars(args).items())
+                )
+            ),
+        ),
+    )
 
     # Load the data. The individual examples are dictionaries with the keys:
     # - "image", a `[224, 224, 3]` tensor of `torch.uint8` values in [0-255] range,
@@ -65,18 +99,70 @@ def main(args: argparse.Namespace) -> None:
     # - "label", a scalar of the correct class in `range(len(CAGS.LABELS))`.
     cags = CAGS()
 
+    train_images = np.array([entry["image"] for entry in cags.train])
+    train_labels = np.array([entry["label"] for entry in cags.train])
+
+    dev_images = np.array([entry["image"] for entry in cags.dev])
+    dev_labels = np.array([entry["label"] for entry in cags.dev])
+
+    test_images = np.array([entry["image"] for entry in cags.test])
+    test_labels = np.array([entry["label"] for entry in cags.test])
+
     # Load the EfficientNetV2-B0 model. It assumes the input images are
     # represented in the [0-255] range.
     backbone = keras.applications.EfficientNetV2B0(include_top=False, pooling="avg")
 
-    # TODO: Create the model and train it
-    model = ...
+    # Create the model and train it
+    model = keras.models.Sequential(
+        [
+            backbone,
+            keras.layers.Dense(len(CAGS.LABELS), activation="softmax"),
+        ]
+    )
+
+    model.compile(
+        optimizer=keras.optimizers.Adam(
+            learning_rate=keras.optimizers.schedules.CosineDecay(
+                initial_learning_rate=args.learning_rate,
+                decay_steps=len(train_images / args.batch_size * args.epochs),
+            ),
+            weight_decay=args.weight_decay,
+        ),
+        loss=keras.losses.SparseCategoricalCrossentropy(),
+        metrics=[keras.metrics.SparseCategoricalAccuracy(name="accuracy")],
+    )
+
+    model.summary()
+
+    earlyStopping = keras.callbacks.EarlyStopping(
+
+
+    )
+
+    model.fit(
+        train_images,
+        train_labels,
+        batch_size=args.batch_size,
+        epochs=args.epochs,
+        validation_data=(dev_images, dev_labels),
+        callbacks=[
+            keras.callbacks.TensorBoard(args.logdir, update_freq="epoch"),
+            TorchTensorBoardCallback(args.logdir),
+            earlyStopping,
+        ],
+    )
 
     # Generate test set annotations, but in `args.logdir` to allow parallel execution.
     os.makedirs(args.logdir, exist_ok=True)
-    with open(os.path.join(args.logdir, "cags_classification.txt"), "w", encoding="utf-8") as predictions_file:
-        # TODO: Predict the probabilities on the test set
-        test_probabilities = model.predict(...)
+    with open(
+        os.path.join(args.logdir, "cags_classification.txt"), "w", encoding="utf-8"
+    ) as predictions_file:
+        # Predict the probabilities on the test set
+        test_probabilities = model.predict(
+            test_images,
+            batch_size=args.batch_size,
+            verbose=1,
+        )
 
         for probs in test_probabilities:
             print(np.argmax(probs), file=predictions_file)
diff --git a/labs/05/cags_dataset.py b/labs/05/cags_dataset.py
index 782d19f..698a88c 100644
--- a/labs/05/cags_dataset.py
+++ b/labs/05/cags_dataset.py
@@ -56,7 +56,7 @@ def transform(self, transform: Callable[[dict[str, torch.Tensor]], Any]) -> torc
             return CAGS.TransformedDataset(self, transform)
 
     class TransformedDataset(torch.utils.data.Dataset):
-        def __init__(self, dataset: "Dataset", transform: Callable[[dict[str, torch.Tensor]], Any]) -> None:
+        def __init__(self, dataset: "CAGS.Dataset", transform: Callable[[dict[str, torch.Tensor]], Any]) -> None:
             self._dataset = dataset
             self._transform = transform
 
@@ -66,6 +66,9 @@ def __len__(self) -> int:
         def __getitem__(self, index: int) -> Any:
             return self._transform(self._dataset[index])
 
+        def transform(self, transform: Callable[[dict[str, torch.Tensor]], Any]) -> torch.utils.data.Dataset:
+            return CAGS.TransformedDataset(self, transform)
+
     def __init__(self) -> None:
         for dataset, size in [("train", 2_142), ("dev", 306), ("test", 612)]:
             path = "cags.{}.tfrecord".format(dataset)
@@ -116,19 +119,16 @@ def get_value_of_kind(kind: int) -> int:
 
                     get_value_of_kind(0x12)
                     if data[offset] == 0x0A:
-                        get_value_of_kind(0x0A)
-                        length = get_value_of_kind(0x0A)
+                        length = get_value_of_kind(0x0A) and get_value_of_kind(0x0A)
                         entries[-1][key] = np.frombuffer(data, np.uint8, length, offset).copy(); offset += length
                     elif data[offset] == 0x1A:
-                        get_value_of_kind(0x1A)
-                        length = get_value_of_kind(0x0A)
+                        length = get_value_of_kind(0x1A) and get_value_of_kind(0x0A)
                         values, target_offset = [], offset + length
                         while offset < target_offset:
                             values.append(get_value())
                         entries[-1][key] = np.array(values, dtype=np.int64)
                     elif data[offset] == 0x12:
-                        get_value_of_kind(0x12)
-                        length = get_value_of_kind(0x0A)
+                        length = get_value_of_kind(0x12) and get_value_of_kind(0x0A)
                         entries[-1][key] = np.frombuffer(
                             data, np.dtype("<f4"), length >> 2, offset).astype(np.float32).copy(); offset += length
                     else:
diff --git a/labs/05/cags_segmentation.py b/labs/05/cags_segmentation.py
index f81402b..e2ed25e 100644
--- a/labs/05/cags_segmentation.py
+++ b/labs/05/cags_segmentation.py
@@ -3,7 +3,10 @@
 import datetime
 import os
 import re
-os.environ.setdefault("KERAS_BACKEND", "torch")  # Use PyTorch backend unless specified otherwise
+
+os.environ.setdefault(
+    "KERAS_BACKEND", "torch"
+)  # Use PyTorch backend unless specified otherwise
 
 import keras
 import numpy as np
@@ -11,13 +14,22 @@
 
 from cags_dataset import CAGS
 
-# TODO: Define reasonable defaults and optionally more parameters.
+# Define reasonable defaults and optionally more parameters.
 # Also, you can set the number of threads to 0 to use all your CPU cores.
 parser = argparse.ArgumentParser()
-parser.add_argument("--batch_size", default=..., type=int, help="Batch size.")
-parser.add_argument("--epochs", default=None, type=int, help="Number of epochs.")
+parser.add_argument("--batch_size", default=64, type=int, help="Batch size.")
+parser.add_argument("--epochs", default=50, type=int, help="Number of epochs.")
 parser.add_argument("--seed", default=42, type=int, help="Random seed.")
-parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.")
+parser.add_argument(
+    "--threads", default=0, type=int, help="Maximum number of threads to use."
+)
+
+parser.add_argument(
+    "--learning_rate", default=0.001, type=float, help="(Initial) Learning rate."
+)
+parser.add_argument(
+    "--final_learning_rate", default=0.0001, type=float, help="Final learning rate."
+)
 
 
 class TorchTensorBoardCallback(keras.callbacks.Callback):
@@ -28,7 +40,10 @@ def __init__(self, path):
     def writer(self, writer):
         if writer not in self._writers:
             import torch.utils.tensorboard
-            self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer))
+
+            self._writers[writer] = torch.utils.tensorboard.SummaryWriter(
+                os.path.join(self._path, writer)
+            )
         return self._writers[writer]
 
     def add_logs(self, writer, logs, step):
@@ -39,10 +54,24 @@ def add_logs(self, writer, logs, step):
 
     def on_epoch_end(self, epoch, logs=None):
         if logs:
-            if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer):
-                logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)}
-            self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1)
-            self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1)
+            if isinstance(
+                getattr(self.model, "optimizer", None), keras.optimizers.Optimizer
+            ):
+                logs = logs | {
+                    "learning_rate": keras.ops.convert_to_numpy(
+                        self.model.optimizer.learning_rate
+                    )
+                }
+            self.add_logs(
+                "train",
+                {k: v for k, v in logs.items() if not k.startswith("val_")},
+                epoch + 1,
+            )
+            self.add_logs(
+                "val",
+                {k[4:]: v for k, v in logs.items() if k.startswith("val_")},
+                epoch + 1,
+            )
 
 
 def main(args: argparse.Namespace) -> None:
@@ -53,11 +82,19 @@ def main(args: argparse.Namespace) -> None:
         torch.set_num_interop_threads(args.threads)
 
     # Create logdir name
-    args.logdir = os.path.join("logs", "{}-{}-{}".format(
-        os.path.basename(globals().get("__file__", "notebook")),
-        datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
-        ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items())))
-    ))
+    args.logdir = os.path.join(
+        "logs",
+        "{}-{}-{}".format(
+            os.path.basename(globals().get("__file__", "notebook")),
+            datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
+            ",".join(
+                (
+                    "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v)
+                    for k, v in sorted(vars(args).items())
+                )
+            ),
+        ),
+    )
 
     # Load the data. The individual examples are dictionaries with the keys:
     # - "image", a `[224, 224, 3]` tensor of `torch.uint8` values in [0-255] range,
@@ -65,27 +102,108 @@ def main(args: argparse.Namespace) -> None:
     # - "label", a scalar of the correct class in `range(len(CAGS.LABELS))`.
     cags = CAGS()
 
+    train_images = np.array([entry["image"] for entry in cags.train])
+    train_masks = np.array([entry["mask"] for entry in cags.train])
+
+    dev_images = np.array([entry["image"] for entry in cags.dev])
+    dev_masks = np.array([entry["mask"] for entry in cags.dev])
+
+    test_images = np.array([entry["image"] for entry in cags.test])
+    test_masks = np.array([entry["mask"] for entry in cags.test])
+
     # Load the EfficientNetV2-B0 model. It assumes the input images are
     # represented in the [0-255] range.
-    backbone = keras.applications.EfficientNetV2B0(include_top=False)
+    backbone = keras.applications.EfficientNetV2B0(include_top=False, input_shape=[cags.H, cags.W, cags.C])
+    backbone.trainable = False
 
     # Extract features of different resolution. Assuming 224x224 input images
     # (you can set this explicitly via `input_shape` of the above constructor),
     # the below model returns five outputs with resolution 7x7, 14x14, 28x28, 56x56, 112x112.
     backbone = keras.Model(
         inputs=backbone.input,
-        outputs=[backbone.get_layer(layer).output for layer in [
-            "top_activation", "block5e_add", "block3b_add", "block2b_add", "block1a_project_activation"]]
+        outputs=[
+            backbone.get_layer(layer).output
+            for layer in [
+                "top_activation",
+                "block5e_add",
+                "block3b_add",
+                "block2b_add",
+                "block1a_project_activation",
+            ]
+        ],
+    )
+
+    def rlbn(inputs):
+        return keras.layers.ReLU()(keras.layers.BatchNormalization()(inputs))
+    input_layer = keras.layers.Input(shape=[cags.H, cags.W, cags.C])
+
+    x7_top_act, x14_5e_add, x28_3b_add, x56_2b_add, x112_1a_project_act = backbone(input_layer)
+
+
+
+    c1 = rlbn(keras.layers.Conv2D(x7_top_act.shape[-1], 3, 1, "same")(x7_top_act))
+    c1 = rlbn(keras.layers.Conv2D(x7_top_act.shape[-1], 3, 1, "same")(c1))
+    c1 = rlbn(keras.layers.Conv2DTranspose(x14_5e_add.shape[-1], 3, 2, "same")(c1)) #shape: 14, 14, 112
+
+    c2 = keras.layers.Concatenate()([c1, x14_5e_add]) # shape 14, 14, 224
+    c2 = rlbn(keras.layers.Conv2D(x14_5e_add.shape[-1], 3, 1, "same")(c2)) # shape 14, 14, 112
+    c2 = rlbn(keras.layers.Conv2D(c2.shape[-1], 3, 1, "same")(c2))
+    c2 = rlbn(keras.layers.Conv2DTranspose(x28_3b_add.shape[-1], 3, 2, "same")(c2)) # filters = 40 shape: 28, 28, 40
+
+    c3 = keras.layers.Concatenate()([c2, x28_3b_add]) # shape 28, 28, 80
+    c3 = rlbn(keras.layers.Conv2D(x28_3b_add.shape[-1], 3, 1, "same")(c3)) # shape 28, 28, 40
+    c3 = rlbn(keras.layers.Conv2D(c3.shape[-1], 3, 1, "same")(c3))
+    c3 = rlbn(keras.layers.Conv2DTranspose(x56_2b_add.shape[-1], 3, 2, "same")(c3)) # shape 56, 56, 24
+    # layer 4
+    c4 = keras.layers.Concatenate()([c3, x56_2b_add]) # shape 56, 56, 48
+    c4 = rlbn(keras.layers.Conv2D(x56_2b_add.shape[-1], 3, 1, "same")(c4)) # shape 56, 56, 24
+    c4 = rlbn(keras.layers.Conv2D(c4.shape[-1], 3, 1, "same")(c4))
+    c4 = rlbn(keras.layers.Conv2DTranspose(x112_1a_project_act.shape[-1], 3, 2, "same")(c4)) # shape 112, 112, 16
+    # layer 5
+    c5 = keras.layers.Concatenate()([c4, x112_1a_project_act]) # shape 112, 112, 32
+    c5 = rlbn(keras.layers.Conv2D(x112_1a_project_act.shape[-1], 3, 1, "same")(c5)) # shape 112, 112, 16
+    c5 = rlbn(keras.layers.Conv2D(c5.shape[-1], 3, 1, "same")(c5))
+    c5 = rlbn(keras.layers.Conv2DTranspose(input_layer.shape[-1], 3, 2, "same")(c5)) # shape 224, 224, 3
+    # outputs
+    output_layer = keras.layers.Concatenate()([c5, input_layer]) # shape 224, 224, 6
+    output_layer = rlbn(keras.layers.Conv2D(output_layer.shape[-1], 3, 1, "same")(output_layer)) # shape 224, 224, 6
+    output_layer = rlbn(keras.layers.Conv2D(output_layer.shape[-1], 3, 1, "same")(output_layer))
+    output_layer = keras.layers.Conv2D(1, 1, 1, "same", activation="sigmoid")(output_layer)
+
+
+    # Create the model and train it
+    model = keras.Model(input_layer, output_layer, name="cags_segmentation")
+    lr = args.learning_rate
+    if args.final_learning_rate:
+        steps = len(train_images)/args.batch_size*args.epochs
+        lr = keras.optimizers.schedules.PolynomialDecay(initial_learning_rate=args.learning_rate,
+                                                    decay_steps=steps, end_learning_rate=args.final_learning_rate)
+
+    model.compile(
+        optimizer=keras.optimizers.Adam(learning_rate=lr),
+        loss=keras.losses.BinaryCrossentropy(),
+        metrics=[
+            cags.MaskIoUMetric(name="iou"),
+        ],
     )
 
-    # TODO: Create the model and train it
-    model = ...
+    model.summary()
+
+    model.fit(
+        train_images,
+        train_masks,
+        epochs=args.epochs,
+        batch_size=args.batch_size,
+        validation_data=(dev_images, dev_masks),
+    )
 
     # Generate test set annotations, but in `args.logdir` to allow parallel execution.
     os.makedirs(args.logdir, exist_ok=True)
-    with open(os.path.join(args.logdir, "cags_segmentation.txt"), "w", encoding="utf-8") as predictions_file:
-        # TODO: Predict the masks on the test set
-        test_masks = model.predict(...)
+    with open(
+        os.path.join(args.logdir, "cags_segmentation.txt"), "w", encoding="utf-8"
+    ) as predictions_file:
+        # Predict the masks on the test set
+        test_masks = model.predict(test_images)
 
         for mask in test_masks:
             zeros, ones, runs = 0, 0, []
diff --git a/labs/06/bboxes_utils.py b/labs/06/bboxes_utils.py
new file mode 100644
index 0000000..13f4dc8
--- /dev/null
+++ b/labs/06/bboxes_utils.py
@@ -0,0 +1,181 @@
+#!/usr/bin/env python3
+import argparse
+from typing import Callable
+import unittest
+
+import numpy as np
+
+# Bounding boxes and anchors are expected to be Numpy tensors,
+# where the last dimension has size 4.
+
+# For bounding boxes in pixel coordinates, the 4 values correspond to:
+TOP: int = 0
+LEFT: int = 1
+BOTTOM: int = 2
+RIGHT: int = 3
+
+
+def bboxes_area(bboxes: np.ndarray) -> np.ndarray:
+    """ Compute area of given set of bboxes.
+
+    Each bbox is parametrized as a four-tuple (top, left, bottom, right).
+
+    If the bboxes.shape is [..., 4], the output shape is bboxes.shape[:-1].
+    """
+    return np.maximum(bboxes[..., BOTTOM] - bboxes[..., TOP], 0) \
+        * np.maximum(bboxes[..., RIGHT] - bboxes[..., LEFT], 0)
+
+
+def bboxes_iou(xs: np.ndarray, ys: np.ndarray) -> np.ndarray:
+    """ Compute IoU of corresponding pairs from two sets of bboxes `xs` and `ys`.
+
+    Each bbox is parametrized as a four-tuple (top, left, bottom, right).
+
+    Note that broadcasting is supported, so passing inputs with
+    `xs.shape=[num_xs, 1, 4]` and `ys.shape=[1, num_ys, 4]` produces an output
+    with shape `[num_xs, num_ys]`, computing IoU for all pairs of bboxes from
+    `xs` and `ys`. Formally, the output shape is `np.broadcast(xs, ys).shape[:-1]`.
+    """
+    intersections = np.stack([
+        np.maximum(xs[..., TOP], ys[..., TOP]),
+        np.maximum(xs[..., LEFT], ys[..., LEFT]),
+        np.minimum(xs[..., BOTTOM], ys[..., BOTTOM]),
+        np.minimum(xs[..., RIGHT], ys[..., RIGHT]),
+    ], axis=-1)
+
+    xs_area, ys_area, intersections_area = bboxes_area(xs), bboxes_area(ys), bboxes_area(intersections)
+
+    return intersections_area / (xs_area + ys_area - intersections_area)
+
+
+def bboxes_to_rcnn(anchors: np.ndarray, bboxes: np.ndarray) -> np.ndarray:
+    """ Convert `bboxes` to a R-CNN-like representation relative to `anchors`.
+
+    The `anchors` and `bboxes` are arrays of four-tuples (top, left, bottom, right);
+    you can use the TOP, LEFT, BOTTOM, RIGHT constants as indices of the
+    respective coordinates.
+
+    The resulting representation of a single bbox is a four-tuple with:
+    - (bbox_y_center - anchor_y_center) / anchor_height
+    - (bbox_x_center - anchor_x_center) / anchor_width
+    - log(bbox_height / anchor_height)
+    - log(bbox_width / anchor_width)
+
+    If the `anchors.shape` is `[anchors_len, 4]` and `bboxes.shape` is `[anchors_len, 4]`,
+    the output shape is `[anchors_len, 4]`.
+    """
+
+    # TODO: Implement according to the docstring.
+    raise NotImplementedError()
+
+
+def bboxes_from_rcnn(anchors: np.ndarray, rcnns: np.ndarray) -> np.ndarray:
+    """ Convert R-CNN-like representation relative to `anchor` to a `bbox`.
+
+    If the `anchors.shape` is `[anchors_len, 4]` and `rcnns.shape` is `[anchors_len, 4]`,
+    the output shape is `[anchors_len, 4]`.
+    """
+
+    # TODO: Implement according to the docstring.
+    raise NotImplementedError()
+
+
+def bboxes_training(
+    anchors: np.ndarray, gold_classes: np.ndarray, gold_bboxes: np.ndarray, iou_threshold: float
+) -> tuple[np.ndarray, np.ndarray]:
+    """ Compute training data for object detection.
+
+    Arguments:
+    - `anchors` is an array of four-tuples (top, left, bottom, right)
+    - `gold_classes` is an array of zero-based classes of the gold objects
+    - `gold_bboxes` is an array of four-tuples (top, left, bottom, right)
+      of the gold objects
+    - `iou_threshold` is a given threshold
+
+    Returns:
+    - `anchor_classes` contains for every anchor either 0 for background
+      (if no gold object is assigned) or `1 + gold_class` if a gold object
+      with `gold_class` is assigned to it
+    - `anchor_bboxes` contains for every anchor a four-tuple
+      `(center_y, center_x, height, width)` representing the gold bbox of
+      a chosen object using parametrization of R-CNN; zeros if no gold object
+      was assigned to the anchor
+    If the `anchors` shape is `[anchors_len, 4]`, the `anchor_classes` shape
+    is `[anchors_len]` and the `anchor_bboxes` shape is `[anchors_len, 4]`.
+
+    Algorithm:
+    - First, for each gold object, assign it to an anchor with the largest IoU
+      (the one with smaller index if there are several). In case several gold
+      objects are assigned to a single anchor, use the gold object with smaller
+      index.
+    - For each unused anchor, find the gold object with the largest IoU
+      (again the one with smaller index if there are several), and if the IoU
+      is >= iou_threshold, assign the object to the anchor.
+    """
+
+    # TODO: First, for each gold object, assign it to an anchor with the
+    # largest IoU (the one with smaller index if there are several). In case
+    # several gold objects are assigned to a single anchor, use the gold object
+    # with smaller index.
+
+    # TODO: For each unused anchor, find the gold object with the largest IoU
+    # (again the one with smaller index if there are several), and if the IoU
+    # is >= threshold, assign the object to the anchor.
+
+    anchor_classes, anchor_bboxes = ..., ...
+
+    return anchor_classes, anchor_bboxes
+
+
+def main(args: argparse.Namespace) -> tuple[Callable, Callable, Callable]:
+    return bboxes_to_rcnn, bboxes_from_rcnn, bboxes_training
+
+
+class Tests(unittest.TestCase):
+    def test_bboxes_to_from_rcnn(self):
+        data = [
+            [[0, 0, 10, 10], [0, 0, 10, 10], [0, 0, 0, 0]],
+            [[0, 0, 10, 10], [5, 0, 15, 10], [.5, 0, 0, 0]],
+            [[0, 0, 10, 10], [0, 5, 10, 15], [0, .5, 0, 0]],
+            [[0, 0, 10, 10], [0, 0, 20, 30], [.5, 1, np.log(2), np.log(3)]],
+            [[0, 9, 10, 19], [2, 10, 5, 16], [-0.15, -0.1, -1.20397, -0.51083]],
+            [[5, 3, 15, 13], [7, 7, 10, 9], [-0.15, 0, -1.20397, -1.60944]],
+            [[7, 6, 17, 16], [9, 10, 12, 13], [-0.15, 0.05, -1.20397, -1.20397]],
+            [[5, 6, 15, 16], [7, 7, 10, 10], [-0.15, -0.25, -1.20397, -1.20397]],
+            [[6, 3, 16, 13], [8, 5, 12, 8], [-0.1, -0.15, -0.91629, -1.20397]],
+            [[5, 2, 15, 12], [9, 6, 12, 8], [0.05, 0, -1.20397, -1.60944]],
+            [[2, 10, 12, 20], [6, 11, 8, 17], [0, -0.1, -1.60944, -0.51083]],
+            [[10, 9, 20, 19], [12, 13, 17, 16], [-0.05, 0.05, -0.69315, -1.20397]],
+            [[6, 7, 16, 17], [10, 11, 12, 14], [0, 0.05, -1.60944, -1.20397]],
+            [[2, 2, 12, 12], [3, 5, 8, 8], [-0.15, -0.05, -0.69315, -1.20397]],
+        ]
+        # First run on individual anchors, and then on all together
+        for anchors, bboxes, rcnns in [map(lambda x: [x], row) for row in data] + [zip(*data)]:
+            anchors, bboxes, rcnns = [np.array(data, np.float32) for data in [anchors, bboxes, rcnns]]
+            np.testing.assert_almost_equal(bboxes_to_rcnn(anchors, bboxes), rcnns, decimal=3)
+            np.testing.assert_almost_equal(bboxes_from_rcnn(anchors, rcnns), bboxes, decimal=3)
+
+    def test_bboxes_training(self):
+        anchors = np.array([[0, 0, 10, 10], [0, 10, 10, 20], [10, 0, 20, 10], [10, 10, 20, 20]], np.float32)
+        for gold_classes, gold_bboxes, anchor_classes, anchor_bboxes, iou in [
+                [[1], [[14., 14, 16, 16]], [0, 0, 0, 2], [[0, 0, 0, 0]] * 3 + [[0, 0, np.log(.2), np.log(.2)]], 0.5],
+                [[2], [[0., 0, 20, 20]], [3, 0, 0, 0], [[.5, .5, np.log(2), np.log(2)]] + [[0, 0, 0, 0]] * 3, 0.26],
+                [[2], [[0., 0, 20, 20]], [3, 3, 3, 3],
+                 [[y, x, np.log(2), np.log(2)] for y in [.5, -.5] for x in [.5, -.5]], 0.24],
+                [[0, 1], [[3, 3, 20, 18], [10, 1, 18, 21]], [0, 0, 0, 1],
+                 [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [-0.35, -0.45, 0.53062, 0.40546]], 0.5],
+                [[0, 1], [[3, 3, 20, 18], [10, 1, 18, 21]], [0, 0, 2, 1],
+                 [[0, 0, 0, 0], [0, 0, 0, 0], [-0.1, 0.6, -0.22314, 0.69314], [-0.35, -0.45, 0.53062, 0.40546]], 0.3],
+                [[0, 1], [[3, 3, 20, 18], [10, 1, 18, 21]], [0, 1, 2, 1],
+                 [[0, 0, 0, 0], [0.65, -0.45, 0.53062, 0.40546], [-0.1, 0.6, -0.22314, 0.69314],
+                  [-0.35, -0.45, 0.53062, 0.40546]], 0.17],
+        ]:
+            gold_classes, anchor_classes = np.array(gold_classes, np.int32), np.array(anchor_classes, np.int32)
+            gold_bboxes, anchor_bboxes = np.array(gold_bboxes, np.float32), np.array(anchor_bboxes, np.float32)
+            computed_classes, computed_bboxes = bboxes_training(anchors, gold_classes, gold_bboxes, iou)
+            np.testing.assert_almost_equal(computed_classes, anchor_classes, decimal=3)
+            np.testing.assert_almost_equal(computed_bboxes, anchor_bboxes, decimal=3)
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/labs/06/svhn_competition.py b/labs/06/svhn_competition.py
new file mode 100644
index 0000000..ef3e6d0
--- /dev/null
+++ b/labs/06/svhn_competition.py
@@ -0,0 +1,101 @@
+#!/usr/bin/env python3
+import argparse
+import datetime
+import os
+import re
+os.environ.setdefault("KERAS_BACKEND", "torch")  # Use PyTorch backend unless specified otherwise
+
+import keras
+import numpy as np
+import torch
+
+import bboxes_utils
+from svhn_dataset import SVHN
+
+# TODO: Define reasonable defaults and optionally more parameters.
+# Also, you can set the number of threads to 0 to use all your CPU cores.
+parser = argparse.ArgumentParser()
+parser.add_argument("--batch_size", default=..., type=int, help="Batch size.")
+parser.add_argument("--epochs", default=..., type=int, help="Number of epochs.")
+parser.add_argument("--seed", default=42, type=int, help="Random seed.")
+parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.")
+
+
+class TorchTensorBoardCallback(keras.callbacks.Callback):
+    def __init__(self, path):
+        self._path = path
+        self._writers = {}
+
+    def writer(self, writer):
+        if writer not in self._writers:
+            import torch.utils.tensorboard
+            self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer))
+        return self._writers[writer]
+
+    def add_logs(self, writer, logs, step):
+        if logs:
+            for key, value in logs.items():
+                self.writer(writer).add_scalar(key, value, step)
+            self.writer(writer).flush()
+
+    def on_epoch_end(self, epoch, logs=None):
+        if logs:
+            if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer):
+                logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)}
+            self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1)
+            self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1)
+
+
+def main(args: argparse.Namespace) -> None:
+    # Set the random seed and the number of threads.
+    keras.utils.set_random_seed(args.seed)
+    if args.threads:
+        torch.set_num_threads(args.threads)
+        torch.set_num_interop_threads(args.threads)
+
+    # Create logdir name
+    args.logdir = os.path.join("logs", "{}-{}-{}".format(
+        os.path.basename(globals().get("__file__", "notebook")),
+        datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
+        ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items())))
+    ))
+
+    # Load the data. The individual examples are dictionaries with the keys:
+    # - "image", a `[SIZE, SIZE, 3]` tensor of `torch.uint8` values in [0-255] range,
+    # - "classes", a `[num_digits]` vector with classes of image digits,
+    # - "bboxes", a `[num_digits, 4]` vector with bounding boxes of image digits.
+    svhn = SVHN()
+
+    # Load the EfficientNetV2-B0 model. It assumes the input images are
+    # represented in the [0-255] range.
+    backbone = keras.applications.EfficientNetV2B0(include_top=False)
+
+    # Extract features of different resolution. Assuming 224x224 input images
+    # (you can set this explicitly via `input_shape` of the above constructor),
+    # the below model returns five outputs with resolution 7x7, 14x14, 28x28, 56x56, 112x112.
+    backbone = keras.Model(
+        inputs=backbone.input,
+        outputs=[backbone.get_layer(layer).output for layer in [
+            "top_activation", "block5e_add", "block3b_add", "block2b_add", "block1a_project_activation"]]
+    )
+
+    # TODO: Create the model and train it
+    model = ...
+
+    # Generate test set annotations, but in `args.logdir` to allow parallel execution.
+    os.makedirs(args.logdir, exist_ok=True)
+    with open(os.path.join(args.logdir, "svhn_competition.txt"), "w", encoding="utf-8") as predictions_file:
+        # TODO: Predict the digits and their bounding boxes on the test set.
+        # Assume that for a single test image we get
+        # - `predicted_classes`: a 1D array with the predicted digits,
+        # - `predicted_bboxes`: a [len(predicted_classes), 4] array with bboxes;
+        for predicted_classes, predicted_bboxes in ...:
+            output = []
+            for label, bbox in zip(predicted_classes, predicted_bboxes):
+                output += [label] + list(bbox)
+            print(*output, file=predictions_file)
+
+
+if __name__ == "__main__":
+    args = parser.parse_args([] if "__file__" not in globals() else None)
+    main(args)
diff --git a/labs/06/svhn_dataset.py b/labs/06/svhn_dataset.py
new file mode 100644
index 0000000..bd368b8
--- /dev/null
+++ b/labs/06/svhn_dataset.py
@@ -0,0 +1,238 @@
+import os
+import sys
+import struct
+from typing import Any, Callable, Sequence, TextIO
+import urllib.request
+os.environ.setdefault("KERAS_BACKEND", "torch")  # Use PyTorch backend unless specified otherwise
+
+import keras
+import numpy as np
+import torch
+import torchvision
+
+
+class SVHN:
+    LABELS: int = 10
+
+    # Type alias for a bounding box -- a list of floats.
+    BBox = list[float]
+
+    # The indices of the bounding box coordinates.
+    TOP: int = 0
+    LEFT: int = 1
+    BOTTOM: int = 2
+    RIGHT: int = 3
+
+    _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/"
+
+    class Dataset(torch.utils.data.Dataset):
+        def __init__(self, path: str, size: int) -> None:
+            self._path = path
+            self._data = None
+            self._size = size
+
+        def __len__(self) -> int:
+            return self._size
+
+        def __getitem__(self, index: int) -> dict[str, torch.Tensor]:
+            if self._data is None:
+                self._data = []
+                for entry in SVHN._load_data(self._path, self._size):
+                    entry["image"] = torchvision.io.decode_image(
+                        torch.from_numpy(entry["image"]), torchvision.io.ImageReadMode.RGB).permute(1, 2, 0)
+                    entry["classes"] = np.asarray(entry["classes"], np.int64)
+                    entry["bboxes"] = np.asarray(entry["bboxes"], np.int64).reshape(-1, 4)
+                    self._data.append(entry)
+            return self._data[index]
+
+        def transform(self, transform: Callable[[dict[str, torch.Tensor]], Any]) -> torch.utils.data.Dataset:
+            return SVHN.TransformedDataset(self, transform)
+
+    class TransformedDataset(torch.utils.data.Dataset):
+        def __init__(self, dataset: "SVHN.Dataset", transform: Callable[[dict[str, torch.Tensor]], Any]) -> None:
+            self._dataset = dataset
+            self._transform = transform
+
+        def __len__(self) -> int:
+            return self._dataset._size
+
+        def __getitem__(self, index: int) -> Any:
+            return self._transform(self._dataset[index])
+
+        def transform(self, transform: Callable[[dict[str, torch.Tensor]], Any]) -> torch.utils.data.Dataset:
+            return SVHN.TransformedDataset(self, transform)
+
+    def __init__(self) -> None:
+        for dataset, size in [("train", 10_000), ("dev", 1_267), ("test", 4_535)]:
+            path = "svhn.{}.tfrecord".format(dataset)
+            if not os.path.exists(path):
+                print("Downloading file {}...".format(path), file=sys.stderr)
+                urllib.request.urlretrieve("{}/{}".format(self._URL, path), filename="{}.tmp".format(path))
+                os.rename("{}.tmp".format(path), path)
+
+            setattr(self, dataset, self.Dataset(path, size))
+
+    train: Dataset
+    dev: Dataset
+    test: Dataset
+
+    # TFRecord loading
+    @staticmethod
+    def _load_data(path: str, items: int) -> list[dict[str, Any]]:
+        def get_value() -> int:
+            nonlocal data, offset
+            value = np.int64(data[offset] & 0x7F); start = offset; offset += 1
+            while data[offset - 1] & 0x80:
+                value |= (data[offset] & 0x7F) << (7 * (offset - start)); offset += 1
+            return value
+
+        def get_value_of_kind(kind: int) -> int:
+            nonlocal data, offset
+            assert data[offset] == kind; offset += 1
+            return get_value()
+
+        entries = []
+        with open(path, "rb") as file:
+            while len(entries) < items:
+                entries.append({})
+
+                length = file.read(8); assert len(length) == 8
+                length, = struct.unpack("<Q", length)
+                assert len(file.read(4)) == 4
+                data = file.read(length); assert len(data) == length
+                assert len(file.read(4)) == 4
+
+                offset = 0
+                length = get_value_of_kind(0x0A)
+                assert len(data) - offset == length
+                while offset < len(data):
+                    get_value_of_kind(0x0A)
+                    length = get_value_of_kind(0x0A)
+                    key = data[offset:offset + length].decode("utf-8"); offset += length
+
+                    get_value_of_kind(0x12)
+                    if data[offset] == 0x0A:
+                        length = get_value_of_kind(0x0A) and get_value_of_kind(0x0A)
+                        entries[-1][key] = np.frombuffer(data, np.uint8, length, offset).copy(); offset += length
+                    elif data[offset] == 0x1A:
+                        length = get_value_of_kind(0x1A) and get_value_of_kind(0x0A)
+                        values, target_offset = [], offset + length
+                        while offset < target_offset:
+                            values.append(get_value())
+                        entries[-1][key] = np.array(values, dtype=np.int64)
+                    elif data[offset] == 0x12:
+                        length = get_value_of_kind(0x12) and get_value_of_kind(0x0A)
+                        entries[-1][key] = np.frombuffer(
+                            data, np.dtype("<f4"), length >> 2, offset).astype(np.float32).copy(); offset += length
+                    else:
+                        raise ValueError("Unsupported data tag {}".format(data[offset]))
+        return entries
+
+    # Evaluation infrastructure.
+    @staticmethod
+    def evaluate(
+        gold_dataset: "SVHN.Dataset", predictions: Sequence[tuple[list[int], list[BBox]]], iou_threshold: float = 0.5,
+    ) -> float:
+        def bbox_iou(x: SVHN.BBox, y: SVHN.BBox) -> float:
+            def area(bbox: SVHN.BBox) -> float:
+                return max(bbox[SVHN.BOTTOM] - bbox[SVHN.TOP], 0) * max(bbox[SVHN.RIGHT] - bbox[SVHN.LEFT], 0)
+            intersection = [max(x[SVHN.TOP], y[SVHN.TOP]), max(x[SVHN.LEFT], y[SVHN.LEFT]),
+                            min(x[SVHN.BOTTOM], y[SVHN.BOTTOM]), min(x[SVHN.RIGHT], y[SVHN.RIGHT])]
+            x_area, y_area, intersection_area = area(x), area(y), area(intersection)
+            return intersection_area / (x_area + y_area - intersection_area)
+
+        gold = [(np.array(example["classes"]), np.array(example["bboxes"])) for example in gold_dataset]
+
+        if len(predictions) != len(gold):
+            raise RuntimeError("The predictions are of different size than gold data: {} vs {}".format(
+                len(predictions), len(gold)))
+
+        correct = 0
+        for (gold_classes, gold_bboxes), (prediction_classes, prediction_bboxes) in zip(gold, predictions):
+            if len(gold_classes) != len(prediction_classes):
+                continue
+
+            used = [False] * len(gold_classes)
+            for cls, bbox in zip(prediction_classes, prediction_bboxes):
+                best = None
+                for i in range(len(gold_classes)):
+                    if used[i] or gold_classes[i] != cls:
+                        continue
+                    iou = bbox_iou(bbox, gold_bboxes[i])
+                    if iou >= iou_threshold and (best is None or iou > best_iou):
+                        best, best_iou = i, iou
+                if best is None:
+                    break
+                used[best] = True
+            correct += all(used)
+
+        return 100 * correct / len(gold)
+
+    @staticmethod
+    def evaluate_file(gold_dataset: Dataset, predictions_file: TextIO) -> float:
+        predictions = []
+        for line in predictions_file:
+            values = line.split()
+            if len(values) % 5:
+                raise RuntimeError("Each prediction must contain multiple of 5 numbers, found {}".format(len(values)))
+
+            predictions.append(([], []))
+            for i in range(0, len(values), 5):
+                predictions[-1][0].append(int(values[i]))
+                predictions[-1][1].append([float(value) for value in values[i + 1:i + 5]])
+
+        return SVHN.evaluate(gold_dataset, predictions)
+
+    # Visualization infrastructure.
+    @staticmethod
+    def visualize(image: np.ndarray, labels: list[Any], bboxes: list[BBox], show: bool):
+        """Visualize the given image plus recognized objects.
+
+        Arguments:
+        - `image` is NumPy input image with pixels in range [0-255];
+        - `labels` is a list of labels to be shown using the `str` method;
+        - `bboxes` is a list of `BBox`es (fourtuples TOP, LEFT, BOTTOM, RIGHT);
+        - `show` controls whether to show the figure or return it:
+          - if `True`, the figure is shown using `plt.show()`;
+          - if `False`, the `plt.Figure` instance is returned; it can be saved
+            to TensorBoard using a the `add_figure` method of a `SummaryWriter`.
+        """
+        import matplotlib.pyplot as plt
+
+        figure = plt.figure(figsize=(4, 4))
+        plt.axis("off")
+        plt.imshow(np.asarray(image, np.uint8))
+        for label, (top, left, bottom, right) in zip(labels, bboxes):
+            plt.gca().add_patch(plt.Rectangle(
+                [left, top], right - left, bottom - top, fill=False, edgecolor=[1, 0, 1], linewidth=2))
+            plt.gca().text(left, top, str(label), bbox={"facecolor": [1, 0, 1], "alpha": 0.5},
+                           clip_box=plt.gca().clipbox, clip_on=True, ha="left", va="top")
+
+        if show:
+            plt.show()
+        else:
+            return figure
+
+
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate")
+    parser.add_argument("--visualize", default=None, type=str, help="Prediction file to visualize")
+    parser.add_argument("--dataset", default="dev", type=str, help="Gold dataset to evaluate")
+    args = parser.parse_args()
+
+    if args.evaluate:
+        with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file:
+            accuracy = SVHN.evaluate_file(getattr(SVHN(), args.dataset), predictions_file)
+        print("SVHN accuracy: {:.2f}%".format(accuracy))
+
+    if args.visualize:
+        with open(args.visualize, "r", encoding="utf-8-sig") as predictions_file:
+            for line, example in zip(predictions_file, getattr(SVHN(), args.dataset)):
+                values = line.split()
+                classes, bboxes = [], []
+                for i in range(0, len(values), 5):
+                    classes.append(values[i])
+                    bboxes.append([float(value) for value in values[i + 1:i + 5]])
+                SVHN.visualize(example["image"], classes, bboxes, show=True)
diff --git a/labs/team_description.py b/labs/team_description.py
index 14ed5e1..1d232bc 100644
--- a/labs/team_description.py
+++ b/labs/team_description.py
@@ -6,4 +6,7 @@
 #
 # You can find out ReCodEx ID in the URL bar after navigating
 # to your User profile page. The ID has the following format:
-# 01234567-89ab-cdef-0123-456789abcdef.
+# Jonas Glerup Røssum <jglr@itu.dk>
+# 31a0a96a-c590-4486-b194-f72765b2ce25
+# Xiao Wang <xiao.wang@student.uni-tuebingen.de>
+# 91d4d1d7-b800-4765-96b9-df098ac36a66
diff --git a/lectures/lecture06.md b/lectures/lecture06.md
new file mode 100644
index 0000000..3631cee
--- /dev/null
+++ b/lectures/lecture06.md
@@ -0,0 +1,20 @@
+### Lecture: 6. Object Detection
+#### Date: Mar 25
+#### Slides: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides/?06
+#### Reading: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides.pdf/npfl138-2324-06.pdf, PDF Slides
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-czech.mp4, CZ Lecture
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-czech.practicals.mp4, CZ Practicals
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-english.mp4, EN Lecture
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-english.practicals.mp4, EN Practicals
+#### Questions: #lecture_6_questions
+#### Lecture assignment: bboxes_utils
+#### Lecture assignment: svhn_competition
+
+- R-CNN [[R-CNN](https://arxiv.org/abs/1311.2524)]
+- Fast R-CNN [[Fast R-CNN](https://arxiv.org/abs/1504.08083)]
+- Proposing RoIs using Faster R-CNN [[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497)]
+- Mask R-CNN [[Mask R-CNN](https://arxiv.org/abs/1703.06870)]
+- Feature Pyramid Networks [[Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144)]
+- Focal Loss, RetinaNet [[Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002)]
+- _EfficientDet [[EfficientDet: Scalable and Efficient Object Detection](https://arxiv.org/abs/1911.09070)]_
+- Group Normalization [[Group Normalization](https://arxiv.org/abs/1803.08494)]
diff --git a/lectures/lecture07.md b/lectures/lecture07.md
new file mode 100644
index 0000000..e53fb70
--- /dev/null
+++ b/lectures/lecture07.md
@@ -0,0 +1,2 @@
+### Lecture: 7. Easter Monday
+#### Date: Apr 01
diff --git a/pull.ps1 b/pull.ps1
new file mode 100644
index 0000000..9cadfe4
--- /dev/null
+++ b/pull.ps1
@@ -0,0 +1 @@
+git pull upstream master
diff --git a/setup.ps1 b/setup.ps1
new file mode 100644
index 0000000..f1f7bbe
--- /dev/null
+++ b/setup.ps1
@@ -0,0 +1,6 @@
+git remote rename origin upstream
+git remote add origin git@github.com:joglr/npfl138.git
+git fetch
+git checkout master
+python -m venv .venv
+.venv/Scripts/pip install -r .\labs\requirements.txt
diff --git a/slides/06/06.md b/slides/06/06.md
new file mode 100644
index 0000000..730d29e
--- /dev/null
+++ b/slides/06/06.md
@@ -0,0 +1,711 @@
+title: NPFL138, Lecture 6
+class: title, langtech, cc-by-sa
+
+# Object Detection
+
+## Milan Straka
+
+### March 25, 2024
+
+---
+section: FastR-CNN
+class: middle, center
+# Beyond Image Classification
+
+# Beyond Image Classification
+
+---
+# Beyond Image Classification
+
+![w=70%,f=right](../01/object_detection.svgz)
+
+- Object detection (including location)
+<br clear="both">
+
+~~~
+![w=70%,f=right](../01/image_segmentation.svgz)
+
+- Image segmentation
+<br clear="both">
+
+~~~
+![w=70%,f=right](../01/human_pose_estimation.jpg)
+
+- Human pose estimation
+
+---
+# Beyond Image Classification
+
+![w=100%,v=middle](cv_tasks.jpg)
+
+---
+# Object Localization
+
+![w=100%](object_localization.png)
+
+We can perform object localization by jointly predicting the bounding box
+coordinates using regression.
+
+---
+# R-CNN
+
+![w=42%,f=right](roi_generation.jpg)
+
+To be able to recognize and localize _several_ objects, assume we were given
+multiple interesting regions of the image, called **regions of interest** (RoI).
+For each of them, we decide:
+- whether it contains an object;
+- the location of the object relative to the RoI.
+
+~~~
+![w=45%,f=right](rcnn_architecture.svgz)
+
+In R-CNN, we start with a network pre-trained on ImageNet (VGG-16 is used in the
+original paper), and we use it to process _every RoI_, rescaling every one of
+them to the size of $224×224$.
+
+~~~
+For every RoI, two sibling heads are added:
+- _classification head_ predicts either _background_ or one of $K$ object types
+  ($K+1$ in total),
+~~~
+- _bounding box regression head_ predicts 4 bounding box parameters relative
+  to RoI.
+
+---
+# R-CNN – Bounding Boxes
+
+A bounding box is parametrized as follows. Let $x_r, y_r, w_r, h_r$ be
+center coordinates and width and height of the RoI respectively, and let $x, y, w, h$ be
+parameters of the bounding box. We represent the bounding box relative
+to the RoI as follows:
+$$\begin{aligned}
+t_x &= (x - x_r)/w_r, & t_y &= (y - y_r)/h_r, \\
+t_w &= \log (w/w_r), & t_h &= \log (h/h_r).
+\end{aligned}$$
+
+~~~
+In Fast R-CNN, the $\textrm{smooth}_{L_1}$ loss, or **Huber loss**, is employed for bounding box parameters:
+
+![w=19.5%,f=right](huber_loss.svgz)
+
+$$\textrm{smooth}_{L_1}(x) = \begin{cases}
+  0.5x^2    & \textrm{if }|x| < 1, \\
+  |x| - 0.5 & \textrm{otherwise}.
+\end{cases}$$
+
+~~~
+The complete loss is then ($λ=1$ is used in the Fast R-CNN paper)
+$$L(ĉ, t̂, c, t) = L_\textrm{cls}(ĉ, c) + λ ⋅ [c ≥ 1] ⋅
+  ∑\nolimits_{i ∈ \lbrace \mathrm{x, y, w, h}\rbrace} \textrm{smooth}_{L_1}(t̂_i - t_i).$$
+
+---
+# R-CNN – Bounding Boxes
+
+The described bounding box representation is usually called `CXCYWH`:
+
+![w=60%,h=center](bbox_representation_cxcywh.webp)
+
+---
+# R-CNN – Bounding Boxes
+
+In the datasets, the bounding boxes are usually represented using `XYXY` format:
+
+![w=60%,h=center](bbox_representation_xyxy.webp)
+
+---
+# R-CNN – Bounding Boxes
+
+Finally, you could also come across the `XYWH` format:
+
+![w=60%,h=center](bbox_representation_xywh.webp)
+
+---
+# Fast R-CNN Architecture
+
+The R-CNN is slow, because it needs to process every RoI by the convolutional
+backbone. To speed it up, we might want to first process the whole image by the
+backbone and only then extract a fixed-size representation for every RoI.
+
+~~~
+
+We achieve that using **RoI pooling**, replacing the last max-pool $14×14 → 7×7$
+VGG layer.
+
+![w=50%](roi_projection.svgz)![w=50%,mw=50%,h=center](roi_pooling.svgz)
+
+During RoI pooling, we obtain a $7×7$ RoI representation by first projecting the
+RoI to the $14×14$ resolution and then computing each of the $7×7$ values by
+**max-pooling** the corresponding “pixels” of the convolutional image features.
+
+---
+# Fast R-CNN
+
+![w=85%,h=center](fast_rcnn_rumcajs.svgz)
+
+~~~
+![w=85%,h=center](fast_rcnn_vgg.png)
+
+---
+# Fast R-CNN and R-CNN Comparison
+
+![w=100%](fast_rcnn_architecture.svgz)
+
+---
+# Fast R-CNN Architecture
+
+![w=100%,v=middle](fast_rcnn.jpg)
+
+---
+# Fast R-CNN Training and Inference
+
+## Intersection over Union
+For two bounding boxes (or two masks) the _intersection over union_ (_IoU_)
+is a ratio of the intersection of the boxes (or masks) and the union
+of the boxes (or masks).
+
+~~~
+## Choosing RoIs for Training
+During training, we use 2 images with 64 RoIs each. The RoIs are selected
+so that 25% have intersection over union (IoU) overlap with ground-truth
+boxes at least 0.5; the others are chosen to have the IoU in range $[0.1, 0.5)$,
+the so-called _hard examples_.
+
+~~~
+## Running Inference
+During inference, we utilize all RoIs, but a single object can be found in
+several of them. To choose the most salient prediction, we perform **non-maximum
+suppression** – we ignore predictions which have an overlap with a higher
+scoring prediction of the _same class_, where the overlap is computed using IoU
+(0.3 threshold is used in the paper). Higher scoring predictions is the ones
+with higher probability from the _classification head_.
+
+---
+# Object Detection Evaluation
+
+## Average Precision
+Evaluation is performed using _Average Precision_ ($\mathit{AP}$ or $\mathit{AP}_{50}$).
+
+We assume all bounding boxes (or masks) produced by a system have confidence
+values which can be used to rank them. Then, for a single class, we take the
+boxes (or masks) in the order of the ranks and generate precision/recall curve,
+considering a bounding box correct if it has IoU at least 50% with any
+ground-truth box.
+
+![w=60%,mw=50%,h=center](precision_recall_person.svgz)![w=60%,mw=50%,h=center](precision_recall_bottle.svgz)
+
+---
+# Object Detection Evaluation – Average Precision
+
+The general idea of AP is to compute the area under the precision/recall curve.
+
+![w=80%,mw=49%,h=center](precision_recall_curve.png)
+
+~~~
+![w=80%,mw=49%,h=center](precision_recall_curve_interpolated.jpg)
+
+We start by interpolating the precision/recall curve, so that it is always
+nonincreasing.
+
+~~~
+![w=80%,mw=49%,h=center,f=right](average_precision.jpg)
+
+Finally, the average precision for a single class is an average of precision at
+recall $0.0, 0.1, 0.2, …, 1.0$.
+
+~~~
+The final AP is a mean of average precision of all classes.
+
+---
+class: tablewide
+style: table {line-height: 1}
+# Object Detection Evaluation – Average Precision
+
+For the COCO dataset, the AP is computed slightly differently. First, it is an
+average over 101 recall points $0.00, 0.01, 0.02, …, 1.00$.
+
+~~~
+In the original metric, IoU of 50% is enough to consider a prediction valid.
+We can generalize the definition to $\mathit{AP}_{t}$, where an object
+prediction is considered valid if IoU is at least $t$%.
+
+~~~
+The main COCO metric, denoted just $\mathit{AP}$, is the mean of
+$\mathit{AP}_{50},\mathit{AP}_{55}, \mathit{AP}_{60}, …, \mathit{AP}_{95}$.
+
+~~~
+| Metric | Description |
+|:------:|:------------|
+| $\mathit{AP}$ | Mean of $\mathit{AP}_{50},\mathit{AP}_{55}, \mathit{AP}_{60}, \mathit{AP}_{65}, …, \mathit{AP}_{95}$ |
+| $\mathit{AP}_{50}$ | AP at IoU 50% |
+| $\mathit{AP}_{75}$ | AP at IoU 75% |
+~~~
+| $\mathit{AP}_{S}$ | AP for small objects: $\textit{area} < 32^2$ |
+| $\mathit{AP}_{M}$ | AP for medium objects: $32^2 < \textit{area} < 96^2$ |
+| $\mathit{AP}_{L}$ | AP for large objects: $96^2 < \textit{area}$ |
+
+
+---
+section: FasterR-CNN
+# Faster R-CNN
+
+![w=40%,f=right](fast_rcnn_speed.svgz)
+
+Even if Fast R-CNN is much faster then R-CNN, it can still be improved,
+considering that the most problematic and time consuming part is generating the RoIs.
+<br clear="both">
+
+~~~
+![w=30%,f=right](faster_rcnn_architecture.png)
+
+Faster R-CNN extends Fast R-CNN by including a **region proposal
+network (RPN)**, whose goal is to generate the RoIs automatically.
+
+~~~
+The regional proposal networks produces the so-called **region proposals**,
+which then play the role of RoIs in the rest of the pipeline (i.e.,
+the Fast R-CNN).
+
+~~~
+The region proposals are generated similarly to how predictions are generated
+in Fast R-CNN. We start with several **anchors** and from each anchor
+we generate either a single region proposal or nothing.
+
+---
+# Faster R-CNN – Anchors
+
+If we consider the $14×14$ VGG backbone output, each “pixel” corresponds
+to a region of size $16×16$ in the original image.
+
+![w=45%,h=center](anchor_net.svgz)
+
+~~~
+We can therefore interpret each value in the $14×14$ output as a representation
+of a part of the image _centered_ in the corresponding image region, and try
+predicting a region proposal from **every one** of them.
+
+~~~
+We call the dense grid of image regions from which we are predicting the
+proposals the **anchors**. They have fixed size, and in practice we use
+_several_ anchors per position.
+
+---
+# Faster R-CNN
+
+For every anchor, we classify it in two classes (background, object)
+and also predict the region proposal bounding box relatively to the anchor,
+exactly as in (Fast) R-CNN.
+
+~~~
+![w=58%,f=right](faster_rcnn_rpn.svgz)
+
+We perform the classification and the bounding box regression by first
+running a $3×3$ convolution followed by ReLU on the $14×14$ VGG output,
+and then attaching the two heads.
+~~~
+Assuming there are $A$ anchors on every position:
+- the classification head generates $2A$ outputs, performing $\softmax$ on every
+  2 of them;
+- the regression head generates $4A$ region proposal coordinates.
+
+~~~
+The authors consider 3 scales $(128^2, 256^2, 512^2)$ and 3 aspect ratios
+$(1:1, 1:2, 2:1)$.
+
+---
+# Faster R-CNN
+
+During training, we generate
+- positive training examples for every anchor that has the highest IoU with
+  a ground-truth box;
+~~~
+- furthermore, a positive example is also any anchor with
+  IoU at least 0.7 for any ground-truth box;
+~~~
+- negative training examples for every anchor that has IoU at most 0.3 with all
+  ground-truth boxes;
+~~~
+- the positive and negative examples are generated with a ratio _up to_ 1:1
+  (less, if there are not enough positive examples; each minibatch consits of
+  a single image and 256 anchors).
+
+~~~
+During inference, we consider all predicted non-background regions, run
+non-maximum suppression on them using a 0.7 IoU threshold, and then take $N$
+top-scored regions (i.e., the ones with the highest probability from the
+classification head) – the paper uses 300 proposals, compared to 2000 in the Fast
+R-CNN.
+
+---
+# Faster R-CNN
+
+![w=94%,h=center](faster_rcnn_performance.svgz)
+
+---
+# Two-stage Detectors
+
+The Faster R-CNN is a so-called **two-stage** detector, where the regions are
+refined twice – once in the region proposal network, and then in the final
+bounding box regressor.
+
+~~~
+Several **single-stage** detector architectures have been proposed, mainly
+because they are faster and smaller, but until circa 2017 the two-stage
+detectors achieved better results.
+
+---
+section: MaskR-CNN
+# Mask R-CNN
+
+Straightforward extension of Faster R-CNN able to produce image segmentation
+(i.e., masks for every object).
+
+![w=100%,mh=80%,v=middle](../01/image_segmentation.svgz)
+
+---
+# Mask R-CNN – Architecture
+
+![w=100%,v=middle](mask_rcnn_architecture.png)
+
+---
+# Mask R-CNN – RoIAlign
+
+More precise alignment is required for the RoI in order to predict the masks.
+Instead of quantization and max-pooling in RoI pooling, **RoIAlign** uses bilinear
+interpolation of features at four regularly sampled locations in each RoI bin
+and averages them.
+
+![w=68%,mw=50%,h=center](roi_pooling.svgz)![w=68%,mw=50%,h=center](mask_rcnn_roialign.svgz)
+
+~~~
+TorchVision provides `torchvision.ops.roi_align` and `torchvision.ops.roi_pool`.
+
+---
+# Mask R-CNN
+
+Masks are predicted in a third branch of the object detector.
+
+- Higher resolution of the mask is usually needed (at least $14×14$, or even more).
+- The masks are predicted for each class separately.
+- The masks are predicted using convolutions instead of fully connected layers
+  (the upscaling convolutions are $2×2$ with stride 2).
+
+![w=79%,h=center](mask_rcnn_heads.svgz)
+
+~~~
+Improvements from Nov 2021: all convs (except for the output layer) are followed
+by BN, the _class&bbox_ head uses 4 convs instead of 2 MLPs, RPN contains
+two convs instead of one.
+
+---
+# Mask R-CNN
+
+![w=100%,v=middle](mask_rcnn_ablation.svgz)
+
+---
+# Mask R-CNN – Human Pose Estimation
+
+![w=80%,h=center](../01/human_pose_estimation.jpg)
+
+~~~
+- Testing applicability of Mask R-CNN architecture.
+
+- Keypoints (e.g., left shoulder, right elbow, …) are detected
+  as independent one-hot masks of size $56×56$ with $\softmax$ output function.
+
+~~~
+![w=70%,h=center](mask_rcnn_hpe_performance.svgz)
+
+---
+section: FPN
+# Feature Pyramid Networks
+
+![w=85%,h=center](fpn_overview.svgz)
+
+---
+# Feature Pyramid Networks
+
+![w=62%,h=center](fpn_architecture.svgz)
+
+---
+# Feature Pyramid Networks
+
+![w=56%,h=center](fpn_architecture_detailed.svgz)
+
+---
+# Feature Pyramid Networks
+
+We employ FPN as a backbone in Faster R-CNN.
+
+~~~
+Assuming ResNet-like network with $224×224$ input, we denote $C_2, C_3, …, C_5$
+the image features of the last convolutional layer of size $56×56, 28×28, …,
+7×7$ (i.e., $C_i$ indicates a downscaling of $2^i$).
+~~~
+The FPN representations incorporating the smaller resolution features are
+denoted as $P_2, …, P_5$, each consisting of 256 channels; the classification
+heads are shared.
+
+~~~
+In both the RPN and the Fast R-CNN, authors utilize the $P_2, …, P_5$
+representations, considering single-size anchors for every $P_i$ (of size
+$32^2, 64^2, 128^2, 256^2$, respectively). However, three aspect ratios
+$(1:1, 1:2, 2:1)$ are still used.
+
+~~~
+![w=100%](fpn_results.svgz)
+
+---
+section: FocalLoss
+# Focal Loss
+
+![w=46%,f=right](fast_rcnn_rumcajs.svgz)
+
+For single-stage object detection architectures, _class imbalance_ has been
+identified as the main issue preventing obtaining performance comparable to
+two-stage detectors. In a single-stage detector, there can be tens of thousands
+of anchors, with only dozens of useful training examples.
+
+~~~
+![w=46%,f=right](focal_loss_graph.svgz)
+
+Cross-entropy loss is computed as
+$$𝓛_\textrm{cross-entropy} = -\log p_\textrm{model}(y | x).$$
+
+~~~
+Focal-loss (loss focused on hard examples) is proposed as
+$$𝓛_\textrm{focal-loss} = -(1 - p_\textrm{model}(y | x))^γ ⋅ \log p_\textrm{model}(y | x).$$
+
+---
+# Focal Loss
+
+For $γ=0$, focal loss is equal to cross-entropy loss.
+
+~~~
+Authors reported that $γ=2$ worked best for them for training a single-stage
+detector.
+
+~~~
+![w=100%,mh=75%,v=bottom](focal_loss_cdf.svgz)
+
+---
+# Focal Loss and Class Imbalance
+
+Focal loss is connected to another solution to class imbalance – we might
+introduce weighting factor $α ∈ (0, 1)$ for one class and $1 - α$ for the other
+class, arriving at
+$$ -α_y ⋅ \log p_\textrm{model}(y | x).$$
+
+~~~
+The weight $α$ might be set to the inverse class frequency or treated as
+a hyperparameter.
+
+~~~
+Even if weighting focuses more on low-frequent class, it does not distinguish
+between easy and hard examples, contrary to focal loss.
+
+~~~
+In practice, the focal loss is usually used together with class weighting:
+$$ -α_y ⋅ (1 - p_\textrm{model}(y | x))^γ ⋅ \log p_\textrm{model}(y | x).$$
+For example, authors report that $α=0.25$ (weight of the rare class) works best with $γ=2$.
+
+---
+section: RetinaNet
+# RetinaNet
+
+RetinaNet is a single-stage detector, using feature pyramid network
+architecture. Built on top of ResNet architecture, the feature pyramid
+contains levels $P_3$ through $P_7$, with each $P_l$ having 256 channels
+and resolution $2^l$ times lower than the input. On each pyramid level $P_l$,
+we consider 9 anchors for every position, with 3 different aspect ratios ($1$, $1:2$, $2:1$)
+and with 3 different sizes $(\{2^0, 2^{1/3}, 2^{2/3}\} ⋅ 4 ⋅ 2^l)^2$.
+
+~~~
+Note that ResNet provides only $C_3$ to $C_5$ features. $C_6$ is computed
+using a $3×3$ convolution with stride 2 on $C_5$, and $C_7$ is obtained
+by applying ReLU followed by another $3×3$ stride-2 convolution. The $C_6$ and
+$C_7$ are included to improve large object detection.
+
+---
+# RetinaNet – Architecture
+
+The classification head and the boundary regression heads are fully
+convolutional and do not share parameters (but classification heads are shared
+across levels, and so are the boundary regression heads), generating
+$\mathit{anchors} ⋅ \mathit{classes}$ sigmoids and $\mathit{anchors}$ bounding
+boxes per position.
+
+![w=100%](retinanet.svgz)
+
+---
+# RetinaNet
+
+During training, anchors are assigned to ground-truth object boxes if IoU is at
+least 0.5; to background if IoU with any ground-truth region is at most 0.4
+(the rest of anchors is ignored during training).
+~~~
+The classification head is trained using focal loss with $γ=2$ and $α=0.25$ (but
+according to the paper, all values of $γ$ in $[0.5, 5]$ range work well); the
+boundary regression head is trained using $\textrm{smooth}_{L_1}$ loss as in
+Fast(er) R-CNN.
+
+~~~
+During inference, at most 1000 objects with at least 5% probability from all
+pyramid levels are considered, and all of them are combined using non-maximum
+suppression with a threshold of 0.5. Fixed-size training and testing is used,
+with sizes 400, 500, …, 800 pixels.
+
+~~~
+![w=68%](retinanet_results.svgz)![w=32%](retinanet_graph.svgz)
+
+---
+# RetinaNet – Ablations
+
+Ablations use ResNet-50-FPN backbone trained and tested with 600-pixel images.
+
+![w=80%,h=center](retinanet_ablations.svgz)
+
+---
+section: EfficientDet
+# EfficientDet – Architecture
+
+EfficientDet builds up on EfficientNet and delivered state-of-the-art performance
+in Nov 2019 with minimum time and space requirements (however, its performance
+has already been surpassed significantly). It is a single-scale detector similar
+to RetinaNet, which:
+
+~~~
+- uses EfficientNet as a backbone;
+~~~
+- employs compound scaling;
+~~~
+- uses a newly proposed BiFPN, “efficient bidirectional cross-scale connections
+  and weighted feature fusion”.
+
+~~~
+![w=78%,h=center](efficientdet_architecture.svgz)
+
+---
+# EfficientDet – BiFPN
+
+In multi-scale fusion in FPN, information flows only from the pyramid levels
+with smaller resolution to the levels with higher resolution.
+
+![w=80%,h=center](efficientdet_bifpn.svgz)
+
+~~~
+BiFPN consists of several rounds of bidirectional flows. Each bidirectional flow
+employs residual connections and does not include nodes that have only one input
+edge with no feature fusion. All operations are $3×3$ separable convolutions with
+batch normalization and ReLU, upsampling is done by repeating rows and columns
+and downsampling by max-pooling.
+
+---
+# EfficientDet – Weighted BiFPN
+
+When combining features with different resolutions, it is common to resize them
+to the same resolution and sum them – therefore, all set of features are
+considered to be of the same importance. The authors however argue that features
+from different resolution contribute to the final result _unequally_ and propose
+to combine them with trainable weighs.
+
+~~~
+- **Softmax-based fusion**: In each BiFPN node, we create a trainable weight
+  $w_i$ for every input $⇶I_i$ and the final combination (after resize, before
+  a convolution) is
+  $$∑_i \frac{e^{w_i}}{∑\nolimits_j e^{w_j}} ⇶I_i.$$
+
+~~~
+- **Fast normalized fusion**: Authors propose a simpler alternative of
+  weighting:
+  $$∑_i \frac{\ReLU(w_i)}{ε + ∑\nolimits_j \ReLU(w_j)} ⇶I_i.$$
+  It uses $ε=0.0001$ for stability and is up to 30% faster on a GPU.
+
+
+---
+# EfficientDet – Compound Scaling
+
+Similar to EfficientNet, authors propose to scale various dimensions of the
+network, using a single compound coefficient $ϕ$.
+
+~~~
+After performing a grid search:
+- the width of BiFPN is scaled as $W_\mathit{BiFPN} = 64 ⋅ 1.35^ϕ,$
+- the depth of BiFPN is scaled as $D_\mathit{BiFPN} = 3 + ϕ,$
+- the box/class predictor has the same width as BiFPN and depth $D_\mathit{class} = 3 + \lfloor ϕ/3 \rfloor,$
+- input image resolution increases according to $R_\mathit{image} = 512 + 128 ⋅ ϕ.$
+
+![w=45%,h=center](efficientdet_scaling.svgz)
+
+---
+# EfficientDet – Results
+
+![w=50%](efficientdet_flops.svgz)![w=50%](efficientdet_size.svgz)
+
+---
+# EfficientDet – Results
+
+![w=83%,h=center](efficientdet_results.svgz)
+
+---
+# EfficientDet – Inference Latencies
+
+![w=100%](efficientdet_latency.svgz)
+
+---
+# EfficientDet – Ablations
+
+Given that EfficientDet employs both a powerful backbone and new BiFPN, authors
+quantify the improvement of the individual components.
+
+![w=49%,h=center](efficientdet_ablations_backbone.svgz)
+
+~~~
+The comparison with previously used cross-scale fusion architectures is also
+provided:
+
+![w=49%,h=center](efficientdet_ablations_fpn.svgz)
+
+---
+class: wide
+# EfficientDet-D0 Example
+
+![w=98%,h=center](efficientdet_example.jpg)
+
+---
+section: GroupNorm
+# Normalization
+
+## Batch Normalization
+
+Neuron value is normalized across the minibatch, and in case of CNN also across
+all positions.
+
+~~~
+## Layer Normalization
+
+Neuron value is normalized across the layer.
+
+~~~
+![w=100%](normalizations.svgz)
+
+---
+# Group Normalization
+
+Group Normalization is analogous to Layer normalization, but the channels are
+normalized in groups (by default, $G=32$).
+
+![w=40%,h=center](normalizations.svgz)
+
+~~~
+![w=40%,h=center](group_norm.svgz)
+
+---
+# Group Normalization
+
+![w=78%,h=center](group_norm_vs_batch_norm.svgz)
+
+---
+# Group Normalization
+
+![w=65%,h=center](group_norm_coco.svgz)
diff --git a/slides/06/anchor_net.svgz b/slides/06/anchor_net.svgz
new file mode 100644
index 0000000..a78b80f
Binary files /dev/null and b/slides/06/anchor_net.svgz differ
diff --git a/slides/06/anchor_net.svgz.ref b/slides/06/anchor_net.svgz.ref
new file mode 100644
index 0000000..8473ea0
--- /dev/null
+++ b/slides/06/anchor_net.svgz.ref
@@ -0,0 +1 @@
+Adapted from slide 65 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf.
diff --git a/slides/06/average_precision.jpg b/slides/06/average_precision.jpg
new file mode 100644
index 0000000..aa92c3a
Binary files /dev/null and b/slides/06/average_precision.jpg differ
diff --git a/slides/06/average_precision.jpg.ref b/slides/06/average_precision.jpg.ref
new file mode 100644
index 0000000..0bdfae7
--- /dev/null
+++ b/slides/06/average_precision.jpg.ref
@@ -0,0 +1 @@
+https://miro.medium.com/max/1400/1*naz02wO-XMywlwAdFzF-GA.jpeg
diff --git a/slides/06/bbox_representation_cxcywh.webp b/slides/06/bbox_representation_cxcywh.webp
new file mode 100644
index 0000000..745ad04
Binary files /dev/null and b/slides/06/bbox_representation_cxcywh.webp differ
diff --git a/slides/06/bbox_representation_cxcywh.webp.ref b/slides/06/bbox_representation_cxcywh.webp.ref
new file mode 100644
index 0000000..91b33ac
--- /dev/null
+++ b/slides/06/bbox_representation_cxcywh.webp.ref
@@ -0,0 +1 @@
+https://miro.medium.com/1*Z80D7vwD-3UwP16asY-k6A.jpeg
diff --git a/slides/06/bbox_representation_xywh.webp b/slides/06/bbox_representation_xywh.webp
new file mode 100644
index 0000000..f82925e
Binary files /dev/null and b/slides/06/bbox_representation_xywh.webp differ
diff --git a/slides/06/bbox_representation_xywh.webp.ref b/slides/06/bbox_representation_xywh.webp.ref
new file mode 100644
index 0000000..0e2a026
--- /dev/null
+++ b/slides/06/bbox_representation_xywh.webp.ref
@@ -0,0 +1 @@
+https://miro.medium.com/1*JLeFS2KIOzSTk6lUp1Ou2w.jpeg
diff --git a/slides/06/bbox_representation_xyxy.webp b/slides/06/bbox_representation_xyxy.webp
new file mode 100644
index 0000000..2f7d93b
Binary files /dev/null and b/slides/06/bbox_representation_xyxy.webp differ
diff --git a/slides/06/bbox_representation_xyxy.webp.ref b/slides/06/bbox_representation_xyxy.webp.ref
new file mode 100644
index 0000000..7399ff7
--- /dev/null
+++ b/slides/06/bbox_representation_xyxy.webp.ref
@@ -0,0 +1 @@
+https://miro.medium.com/1*oZcZhzOWKb3kvBHPOHYfow.jpeg
diff --git a/slides/06/cv_tasks.jpg b/slides/06/cv_tasks.jpg
new file mode 100644
index 0000000..de4459b
Binary files /dev/null and b/slides/06/cv_tasks.jpg differ
diff --git a/slides/06/cv_tasks.jpg.ref b/slides/06/cv_tasks.jpg.ref
new file mode 100644
index 0000000..1f5753a
--- /dev/null
+++ b/slides/06/cv_tasks.jpg.ref
@@ -0,0 +1 @@
+https://www.implantology.or.kr/articles/xml/RvNO/
diff --git a/slides/06/efficientdet_ablations_backbone.svgz b/slides/06/efficientdet_ablations_backbone.svgz
new file mode 100644
index 0000000..a73b0d0
Binary files /dev/null and b/slides/06/efficientdet_ablations_backbone.svgz differ
diff --git a/slides/06/efficientdet_ablations_backbone.svgz.ref b/slides/06/efficientdet_ablations_backbone.svgz.ref
new file mode 100644
index 0000000..8ea6795
--- /dev/null
+++ b/slides/06/efficientdet_ablations_backbone.svgz.ref
@@ -0,0 +1 @@
+Table 4 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_ablations_fpn.svgz b/slides/06/efficientdet_ablations_fpn.svgz
new file mode 100644
index 0000000..ac3affa
Binary files /dev/null and b/slides/06/efficientdet_ablations_fpn.svgz differ
diff --git a/slides/06/efficientdet_ablations_fpn.svgz.ref b/slides/06/efficientdet_ablations_fpn.svgz.ref
new file mode 100644
index 0000000..dd61bd6
--- /dev/null
+++ b/slides/06/efficientdet_ablations_fpn.svgz.ref
@@ -0,0 +1 @@
+Table 5 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_architecture.svgz b/slides/06/efficientdet_architecture.svgz
new file mode 100644
index 0000000..dd376f1
Binary files /dev/null and b/slides/06/efficientdet_architecture.svgz differ
diff --git a/slides/06/efficientdet_architecture.svgz.ref b/slides/06/efficientdet_architecture.svgz.ref
new file mode 100644
index 0000000..66db1af
--- /dev/null
+++ b/slides/06/efficientdet_architecture.svgz.ref
@@ -0,0 +1 @@
+Figure 3 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_bifpn.svgz b/slides/06/efficientdet_bifpn.svgz
new file mode 100644
index 0000000..bc694d3
Binary files /dev/null and b/slides/06/efficientdet_bifpn.svgz differ
diff --git a/slides/06/efficientdet_bifpn.svgz.ref b/slides/06/efficientdet_bifpn.svgz.ref
new file mode 100644
index 0000000..86130e9
--- /dev/null
+++ b/slides/06/efficientdet_bifpn.svgz.ref
@@ -0,0 +1 @@
+Figure 2 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_example.jpg b/slides/06/efficientdet_example.jpg
new file mode 100644
index 0000000..1f1aa1b
Binary files /dev/null and b/slides/06/efficientdet_example.jpg differ
diff --git a/slides/06/efficientdet_example.jpg.ref b/slides/06/efficientdet_example.jpg.ref
new file mode 100644
index 0000000..2e9aaab
--- /dev/null
+++ b/slides/06/efficientdet_example.jpg.ref
@@ -0,0 +1 @@
+https://github.com/google/automl/blob/master/efficientdet/g3doc/street.jpg
diff --git a/slides/06/efficientdet_flops.svgz b/slides/06/efficientdet_flops.svgz
new file mode 100644
index 0000000..24d9e8c
Binary files /dev/null and b/slides/06/efficientdet_flops.svgz differ
diff --git a/slides/06/efficientdet_flops.svgz.ref b/slides/06/efficientdet_flops.svgz.ref
new file mode 100644
index 0000000..186b61d
--- /dev/null
+++ b/slides/06/efficientdet_flops.svgz.ref
@@ -0,0 +1 @@
+Figure 1 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_latency.svgz b/slides/06/efficientdet_latency.svgz
new file mode 100644
index 0000000..0a5dd99
Binary files /dev/null and b/slides/06/efficientdet_latency.svgz differ
diff --git a/slides/06/efficientdet_latency.svgz.ref b/slides/06/efficientdet_latency.svgz.ref
new file mode 100644
index 0000000..bb23a56
--- /dev/null
+++ b/slides/06/efficientdet_latency.svgz.ref
@@ -0,0 +1 @@
+Figure 4 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_results.svgz b/slides/06/efficientdet_results.svgz
new file mode 100644
index 0000000..b2e4058
Binary files /dev/null and b/slides/06/efficientdet_results.svgz differ
diff --git a/slides/06/efficientdet_results.svgz.ref b/slides/06/efficientdet_results.svgz.ref
new file mode 100644
index 0000000..c4f6073
--- /dev/null
+++ b/slides/06/efficientdet_results.svgz.ref
@@ -0,0 +1 @@
+Table 2 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_scaling.svgz b/slides/06/efficientdet_scaling.svgz
new file mode 100644
index 0000000..675dbb8
Binary files /dev/null and b/slides/06/efficientdet_scaling.svgz differ
diff --git a/slides/06/efficientdet_scaling.svgz.ref b/slides/06/efficientdet_scaling.svgz.ref
new file mode 100644
index 0000000..5f14bba
--- /dev/null
+++ b/slides/06/efficientdet_scaling.svgz.ref
@@ -0,0 +1 @@
+Table 1 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_size.svgz b/slides/06/efficientdet_size.svgz
new file mode 100644
index 0000000..f42947b
Binary files /dev/null and b/slides/06/efficientdet_size.svgz differ
diff --git a/slides/06/efficientdet_size.svgz.ref b/slides/06/efficientdet_size.svgz.ref
new file mode 100644
index 0000000..bb23a56
--- /dev/null
+++ b/slides/06/efficientdet_size.svgz.ref
@@ -0,0 +1 @@
+Figure 4 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/fast_rcnn.jpg b/slides/06/fast_rcnn.jpg
new file mode 100644
index 0000000..1803bb5
Binary files /dev/null and b/slides/06/fast_rcnn.jpg differ
diff --git a/slides/06/fast_rcnn.jpg.ref b/slides/06/fast_rcnn.jpg.ref
new file mode 100644
index 0000000..fbecdb1
--- /dev/null
+++ b/slides/06/fast_rcnn.jpg.ref
@@ -0,0 +1 @@
+Figure 1 of "Fast R-CNN", https://arxiv.org/abs/1504.08083
diff --git a/slides/06/fast_rcnn_architecture.svgz b/slides/06/fast_rcnn_architecture.svgz
new file mode 100644
index 0000000..b7bda19
Binary files /dev/null and b/slides/06/fast_rcnn_architecture.svgz differ
diff --git a/slides/06/fast_rcnn_architecture.svgz.ref b/slides/06/fast_rcnn_architecture.svgz.ref
new file mode 100644
index 0000000..6efa2ff
--- /dev/null
+++ b/slides/06/fast_rcnn_architecture.svgz.ref
@@ -0,0 +1 @@
+Slide 61 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf.
diff --git a/slides/06/fast_rcnn_rumcajs.svgz b/slides/06/fast_rcnn_rumcajs.svgz
new file mode 100644
index 0000000..c774a93
Binary files /dev/null and b/slides/06/fast_rcnn_rumcajs.svgz differ
diff --git a/slides/06/fast_rcnn_rumcajs.svgz.ref b/slides/06/fast_rcnn_rumcajs.svgz.ref
new file mode 100644
index 0000000..3ebdb63
--- /dev/null
+++ b/slides/06/fast_rcnn_rumcajs.svgz.ref
@@ -0,0 +1 @@
+https://commons.wikimedia.org/wiki/File:Tišnov,_Hajánky,_garážová_ozdoba_(6597).jpg
diff --git a/slides/06/fast_rcnn_speed.svgz b/slides/06/fast_rcnn_speed.svgz
new file mode 100644
index 0000000..9f24720
Binary files /dev/null and b/slides/06/fast_rcnn_speed.svgz differ
diff --git a/slides/06/fast_rcnn_speed.svgz.ref b/slides/06/fast_rcnn_speed.svgz.ref
new file mode 100644
index 0000000..436c3bf
--- /dev/null
+++ b/slides/06/fast_rcnn_speed.svgz.ref
@@ -0,0 +1 @@
+Slide 76 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf.
diff --git a/slides/06/fast_rcnn_vgg.png b/slides/06/fast_rcnn_vgg.png
new file mode 100644
index 0000000..07cfbf0
Binary files /dev/null and b/slides/06/fast_rcnn_vgg.png differ
diff --git a/slides/06/fast_rcnn_vgg.png.ref b/slides/06/fast_rcnn_vgg.png.ref
new file mode 100644
index 0000000..62ac59b
--- /dev/null
+++ b/slides/06/fast_rcnn_vgg.png.ref
@@ -0,0 +1 @@
+https://en.wikipedia.org/wiki/File:VGG_neural_network.png
diff --git a/slides/06/faster_rcnn_architecture.png b/slides/06/faster_rcnn_architecture.png
new file mode 100644
index 0000000..8464540
Binary files /dev/null and b/slides/06/faster_rcnn_architecture.png differ
diff --git a/slides/06/faster_rcnn_architecture.png.ref b/slides/06/faster_rcnn_architecture.png.ref
new file mode 100644
index 0000000..657ebdd
--- /dev/null
+++ b/slides/06/faster_rcnn_architecture.png.ref
@@ -0,0 +1 @@
+Figure 2 of "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", https://arxiv.org/abs/1506.01497
diff --git a/slides/06/faster_rcnn_performance.svgz b/slides/06/faster_rcnn_performance.svgz
new file mode 100644
index 0000000..f2ccc58
Binary files /dev/null and b/slides/06/faster_rcnn_performance.svgz differ
diff --git a/slides/06/faster_rcnn_performance.svgz.ref b/slides/06/faster_rcnn_performance.svgz.ref
new file mode 100644
index 0000000..8796742
--- /dev/null
+++ b/slides/06/faster_rcnn_performance.svgz.ref
@@ -0,0 +1 @@
+Tables 3 and 4 of "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", https://arxiv.org/abs/1506.01497
diff --git a/slides/06/faster_rcnn_rpn.svgz b/slides/06/faster_rcnn_rpn.svgz
new file mode 100644
index 0000000..b493b07
Binary files /dev/null and b/slides/06/faster_rcnn_rpn.svgz differ
diff --git a/slides/06/faster_rcnn_rpn.svgz.ref b/slides/06/faster_rcnn_rpn.svgz.ref
new file mode 100644
index 0000000..1fac88c
--- /dev/null
+++ b/slides/06/faster_rcnn_rpn.svgz.ref
@@ -0,0 +1 @@
+Figure 3 of "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", https://arxiv.org/abs/1506.01497
diff --git a/slides/06/focal_loss_cdf.svgz b/slides/06/focal_loss_cdf.svgz
new file mode 100644
index 0000000..403d6d5
Binary files /dev/null and b/slides/06/focal_loss_cdf.svgz differ
diff --git a/slides/06/focal_loss_cdf.svgz.ref b/slides/06/focal_loss_cdf.svgz.ref
new file mode 100644
index 0000000..0dd7c12
--- /dev/null
+++ b/slides/06/focal_loss_cdf.svgz.ref
@@ -0,0 +1 @@
+Figure 4 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002
diff --git a/slides/06/focal_loss_graph.svgz b/slides/06/focal_loss_graph.svgz
new file mode 100644
index 0000000..44ebdf2
Binary files /dev/null and b/slides/06/focal_loss_graph.svgz differ
diff --git a/slides/06/focal_loss_graph.svgz.ref b/slides/06/focal_loss_graph.svgz.ref
new file mode 100644
index 0000000..ccc201a
--- /dev/null
+++ b/slides/06/focal_loss_graph.svgz.ref
@@ -0,0 +1 @@
+Figure 1 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002
diff --git a/slides/06/fpn_architecture.svgz b/slides/06/fpn_architecture.svgz
new file mode 100644
index 0000000..af04b27
Binary files /dev/null and b/slides/06/fpn_architecture.svgz differ
diff --git a/slides/06/fpn_architecture.svgz.ref b/slides/06/fpn_architecture.svgz.ref
new file mode 100644
index 0000000..96d788c
--- /dev/null
+++ b/slides/06/fpn_architecture.svgz.ref
@@ -0,0 +1 @@
+Figure 2 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144
diff --git a/slides/06/fpn_architecture_detailed.svgz b/slides/06/fpn_architecture_detailed.svgz
new file mode 100644
index 0000000..ff42dd0
Binary files /dev/null and b/slides/06/fpn_architecture_detailed.svgz differ
diff --git a/slides/06/fpn_architecture_detailed.svgz.ref b/slides/06/fpn_architecture_detailed.svgz.ref
new file mode 100644
index 0000000..bfb0bc8
--- /dev/null
+++ b/slides/06/fpn_architecture_detailed.svgz.ref
@@ -0,0 +1 @@
+Figure 3 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144
diff --git a/slides/06/fpn_overview.svgz b/slides/06/fpn_overview.svgz
new file mode 100644
index 0000000..c6c1574
Binary files /dev/null and b/slides/06/fpn_overview.svgz differ
diff --git a/slides/06/fpn_overview.svgz.ref b/slides/06/fpn_overview.svgz.ref
new file mode 100644
index 0000000..c00542b
--- /dev/null
+++ b/slides/06/fpn_overview.svgz.ref
@@ -0,0 +1 @@
+Figure 1 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144
diff --git a/slides/06/fpn_results.svgz b/slides/06/fpn_results.svgz
new file mode 100644
index 0000000..02db310
Binary files /dev/null and b/slides/06/fpn_results.svgz differ
diff --git a/slides/06/fpn_results.svgz.ref b/slides/06/fpn_results.svgz.ref
new file mode 100644
index 0000000..8ced9a5
--- /dev/null
+++ b/slides/06/fpn_results.svgz.ref
@@ -0,0 +1 @@
+Table 4 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144
diff --git a/slides/06/group_norm.svgz b/slides/06/group_norm.svgz
new file mode 100644
index 0000000..0be782b
Binary files /dev/null and b/slides/06/group_norm.svgz differ
diff --git a/slides/06/group_norm.svgz.ref b/slides/06/group_norm.svgz.ref
new file mode 100644
index 0000000..6e47f02
--- /dev/null
+++ b/slides/06/group_norm.svgz.ref
@@ -0,0 +1 @@
+Figure 1 of "Group Normalization", https://arxiv.org/abs/1803.08494
diff --git a/slides/06/group_norm_coco.svgz b/slides/06/group_norm_coco.svgz
new file mode 100644
index 0000000..fe964af
Binary files /dev/null and b/slides/06/group_norm_coco.svgz differ
diff --git a/slides/06/group_norm_coco.svgz.ref b/slides/06/group_norm_coco.svgz.ref
new file mode 100644
index 0000000..86ea266
--- /dev/null
+++ b/slides/06/group_norm_coco.svgz.ref
@@ -0,0 +1 @@
+Tables 4 and 5 of "Group Normalization", https://arxiv.org/abs/1803.08494
diff --git a/slides/06/group_norm_vs_batch_norm.svgz b/slides/06/group_norm_vs_batch_norm.svgz
new file mode 100644
index 0000000..2c017ac
Binary files /dev/null and b/slides/06/group_norm_vs_batch_norm.svgz differ
diff --git a/slides/06/group_norm_vs_batch_norm.svgz.ref b/slides/06/group_norm_vs_batch_norm.svgz.ref
new file mode 100644
index 0000000..e6c9431
--- /dev/null
+++ b/slides/06/group_norm_vs_batch_norm.svgz.ref
@@ -0,0 +1 @@
+Figures 4 and 5 of "Group Normalization", https://arxiv.org/abs/1803.08494
diff --git a/slides/06/huber_loss.py b/slides/06/huber_loss.py
new file mode 100644
index 0000000..f6f93d2
--- /dev/null
+++ b/slides/06/huber_loss.py
@@ -0,0 +1,22 @@
+#!/usr/bin/env python3
+import os
+
+import matplotlib
+import matplotlib.pyplot as plt
+import numpy as np
+
+matplotlib.rcParams["mathtext.fontset"] = "cm"
+
+xs = np.linspace(-3, 3, 51)
+l2 = xs * xs / 2
+huber = np.where(np.abs(xs) <= 1, xs * xs / 2, np.abs(xs) - 0.5)
+d_huber = np.where(np.abs(xs) <= 1, xs, np.sign(xs))
+
+plt.figure(figsize=(5, 3.5))
+plt.plot(xs, l2, label="L2 loss $\\frac{1}{2} x^2$")
+plt.plot(xs, huber, label="Huber loss")
+plt.plot(xs, d_huber, label="Huber loss derivative")
+plt.gca().set_aspect(1)
+plt.grid(True)
+plt.legend(loc="upper center")
+plt.savefig("huber_loss.svg", bbox_inches="tight", transparent=True)
diff --git a/slides/06/huber_loss.svgz b/slides/06/huber_loss.svgz
new file mode 100644
index 0000000..a3362fa
Binary files /dev/null and b/slides/06/huber_loss.svgz differ
diff --git a/slides/06/huber_loss.svgz.ref b/slides/06/huber_loss.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/06/mask_rcnn_ablation.svgz b/slides/06/mask_rcnn_ablation.svgz
new file mode 100644
index 0000000..1b6b8e2
Binary files /dev/null and b/slides/06/mask_rcnn_ablation.svgz differ
diff --git a/slides/06/mask_rcnn_ablation.svgz.ref b/slides/06/mask_rcnn_ablation.svgz.ref
new file mode 100644
index 0000000..8877b9d
--- /dev/null
+++ b/slides/06/mask_rcnn_ablation.svgz.ref
@@ -0,0 +1 @@
+Table 2 of "Mask R-CNN", https://arxiv.org/abs/1703.06870
diff --git a/slides/06/mask_rcnn_architecture.png b/slides/06/mask_rcnn_architecture.png
new file mode 100644
index 0000000..5b9e6ed
Binary files /dev/null and b/slides/06/mask_rcnn_architecture.png differ
diff --git a/slides/06/mask_rcnn_architecture.png.ref b/slides/06/mask_rcnn_architecture.png.ref
new file mode 100644
index 0000000..2d5bd13
--- /dev/null
+++ b/slides/06/mask_rcnn_architecture.png.ref
@@ -0,0 +1 @@
+Figure 1 of "Mask R-CNN", https://arxiv.org/abs/1703.06870
diff --git a/slides/06/mask_rcnn_heads.svgz b/slides/06/mask_rcnn_heads.svgz
new file mode 100644
index 0000000..f5c90b1
Binary files /dev/null and b/slides/06/mask_rcnn_heads.svgz differ
diff --git a/slides/06/mask_rcnn_heads.svgz.ref b/slides/06/mask_rcnn_heads.svgz.ref
new file mode 100644
index 0000000..5e303ff
--- /dev/null
+++ b/slides/06/mask_rcnn_heads.svgz.ref
@@ -0,0 +1 @@
+Figure 4 of "Mask R-CNN", https://arxiv.org/abs/1703.06870
diff --git a/slides/06/mask_rcnn_hpe_performance.svgz b/slides/06/mask_rcnn_hpe_performance.svgz
new file mode 100644
index 0000000..b79f401
Binary files /dev/null and b/slides/06/mask_rcnn_hpe_performance.svgz differ
diff --git a/slides/06/mask_rcnn_hpe_performance.svgz.ref b/slides/06/mask_rcnn_hpe_performance.svgz.ref
new file mode 100644
index 0000000..19c0665
--- /dev/null
+++ b/slides/06/mask_rcnn_hpe_performance.svgz.ref
@@ -0,0 +1 @@
+Table 4 of "Mask R-CNN", https://arxiv.org/abs/1703.06870
diff --git a/slides/06/mask_rcnn_roialign.svgz b/slides/06/mask_rcnn_roialign.svgz
new file mode 100644
index 0000000..0cefb39
Binary files /dev/null and b/slides/06/mask_rcnn_roialign.svgz differ
diff --git a/slides/06/mask_rcnn_roialign.svgz.ref b/slides/06/mask_rcnn_roialign.svgz.ref
new file mode 100644
index 0000000..b4070e5
--- /dev/null
+++ b/slides/06/mask_rcnn_roialign.svgz.ref
@@ -0,0 +1 @@
+Figure 3 of "Mask R-CNN", https://arxiv.org/abs/1703.06870
diff --git a/slides/06/normalizations.svgz b/slides/06/normalizations.svgz
new file mode 100644
index 0000000..6230387
Binary files /dev/null and b/slides/06/normalizations.svgz differ
diff --git a/slides/06/normalizations.svgz.ref b/slides/06/normalizations.svgz.ref
new file mode 100644
index 0000000..7b89167
--- /dev/null
+++ b/slides/06/normalizations.svgz.ref
@@ -0,0 +1 @@
+Figure 2 of "Group Normalization", https://arxiv.org/abs/1803.08494
diff --git a/slides/06/object_localization.png b/slides/06/object_localization.png
new file mode 100644
index 0000000..a6d3c85
Binary files /dev/null and b/slides/06/object_localization.png differ
diff --git a/slides/06/object_localization.png.ref b/slides/06/object_localization.png.ref
new file mode 100644
index 0000000..b84eac5
--- /dev/null
+++ b/slides/06/object_localization.png.ref
@@ -0,0 +1 @@
+Slide 38 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf.
diff --git a/slides/06/precision_recall_bottle.svgz b/slides/06/precision_recall_bottle.svgz
new file mode 100644
index 0000000..41de99d
Binary files /dev/null and b/slides/06/precision_recall_bottle.svgz differ
diff --git a/slides/06/precision_recall_bottle.svgz.ref b/slides/06/precision_recall_bottle.svgz.ref
new file mode 100644
index 0000000..5a828ee
--- /dev/null
+++ b/slides/06/precision_recall_bottle.svgz.ref
@@ -0,0 +1 @@
+Figure 6 of "The PASCAL Visual Object Classes (VOC) Challenge", http://homepages.inf.ed.ac.uk/ckiw/postscript/ijcv_voc09.pdf
diff --git a/slides/06/precision_recall_curve.png b/slides/06/precision_recall_curve.png
new file mode 100644
index 0000000..13f8fb9
Binary files /dev/null and b/slides/06/precision_recall_curve.png differ
diff --git a/slides/06/precision_recall_curve.png.ref b/slides/06/precision_recall_curve.png.ref
new file mode 100644
index 0000000..fc537f8
--- /dev/null
+++ b/slides/06/precision_recall_curve.png.ref
@@ -0,0 +1 @@
+https://miro.medium.com/max/1400/1*VenTq4IgxjmIpOXWdFb-jg.png
diff --git a/slides/06/precision_recall_curve_interpolated.jpg b/slides/06/precision_recall_curve_interpolated.jpg
new file mode 100644
index 0000000..817eae0
Binary files /dev/null and b/slides/06/precision_recall_curve_interpolated.jpg differ
diff --git a/slides/06/precision_recall_curve_interpolated.jpg.ref b/slides/06/precision_recall_curve_interpolated.jpg.ref
new file mode 100644
index 0000000..9a840d2
--- /dev/null
+++ b/slides/06/precision_recall_curve_interpolated.jpg.ref
@@ -0,0 +1 @@
+https://miro.medium.com/max/1400/1*pmSxeb4EfdGnzT6Xa68GEQ.jpeg
diff --git a/slides/06/precision_recall_person.svgz b/slides/06/precision_recall_person.svgz
new file mode 100644
index 0000000..808dd55
Binary files /dev/null and b/slides/06/precision_recall_person.svgz differ
diff --git a/slides/06/precision_recall_person.svgz.ref b/slides/06/precision_recall_person.svgz.ref
new file mode 100644
index 0000000..5a828ee
--- /dev/null
+++ b/slides/06/precision_recall_person.svgz.ref
@@ -0,0 +1 @@
+Figure 6 of "The PASCAL Visual Object Classes (VOC) Challenge", http://homepages.inf.ed.ac.uk/ckiw/postscript/ijcv_voc09.pdf
diff --git a/slides/06/pyramidnet_architecture.svgz b/slides/06/pyramidnet_architecture.svgz
new file mode 100644
index 0000000..d773f10
Binary files /dev/null and b/slides/06/pyramidnet_architecture.svgz differ
diff --git a/slides/06/pyramidnet_architecture.svgz.ref b/slides/06/pyramidnet_architecture.svgz.ref
new file mode 100644
index 0000000..321784e
--- /dev/null
+++ b/slides/06/pyramidnet_architecture.svgz.ref
@@ -0,0 +1 @@
+Table 1 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915
diff --git a/slides/06/pyramidnet_blocks.svgz b/slides/06/pyramidnet_blocks.svgz
new file mode 100644
index 0000000..077785f
Binary files /dev/null and b/slides/06/pyramidnet_blocks.svgz differ
diff --git a/slides/06/pyramidnet_blocks.svgz.ref b/slides/06/pyramidnet_blocks.svgz.ref
new file mode 100644
index 0000000..2fde23d
--- /dev/null
+++ b/slides/06/pyramidnet_blocks.svgz.ref
@@ -0,0 +1 @@
+Figure 1 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915
diff --git a/slides/06/pyramidnet_cifar.svgz b/slides/06/pyramidnet_cifar.svgz
new file mode 100644
index 0000000..4f2b985
Binary files /dev/null and b/slides/06/pyramidnet_cifar.svgz differ
diff --git a/slides/06/pyramidnet_cifar.svgz.ref b/slides/06/pyramidnet_cifar.svgz.ref
new file mode 100644
index 0000000..bc183f0
--- /dev/null
+++ b/slides/06/pyramidnet_cifar.svgz.ref
@@ -0,0 +1 @@
+Table 4 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915
diff --git a/slides/06/pyramidnet_growth_rate.svgz b/slides/06/pyramidnet_growth_rate.svgz
new file mode 100644
index 0000000..5474788
Binary files /dev/null and b/slides/06/pyramidnet_growth_rate.svgz differ
diff --git a/slides/06/pyramidnet_growth_rate.svgz.ref b/slides/06/pyramidnet_growth_rate.svgz.ref
new file mode 100644
index 0000000..12ee550
--- /dev/null
+++ b/slides/06/pyramidnet_growth_rate.svgz.ref
@@ -0,0 +1 @@
+Figure 2 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915
diff --git a/slides/06/pyramidnet_residuals.svgz b/slides/06/pyramidnet_residuals.svgz
new file mode 100644
index 0000000..c4290c1
Binary files /dev/null and b/slides/06/pyramidnet_residuals.svgz differ
diff --git a/slides/06/pyramidnet_residuals.svgz.ref b/slides/06/pyramidnet_residuals.svgz.ref
new file mode 100644
index 0000000..b53108d
--- /dev/null
+++ b/slides/06/pyramidnet_residuals.svgz.ref
@@ -0,0 +1 @@
+Figure 5 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915
diff --git a/slides/06/rcnn_architecture.svgz b/slides/06/rcnn_architecture.svgz
new file mode 100644
index 0000000..0a7cf0e
Binary files /dev/null and b/slides/06/rcnn_architecture.svgz differ
diff --git a/slides/06/rcnn_architecture.svgz.ref b/slides/06/rcnn_architecture.svgz.ref
new file mode 100644
index 0000000..1a20f30
--- /dev/null
+++ b/slides/06/rcnn_architecture.svgz.ref
@@ -0,0 +1 @@
+Slide 54 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf.
diff --git a/slides/06/retinanet.svgz b/slides/06/retinanet.svgz
new file mode 100644
index 0000000..60fe4c1
Binary files /dev/null and b/slides/06/retinanet.svgz differ
diff --git a/slides/06/retinanet.svgz.ref b/slides/06/retinanet.svgz.ref
new file mode 100644
index 0000000..aab04d0
--- /dev/null
+++ b/slides/06/retinanet.svgz.ref
@@ -0,0 +1 @@
+Figure 3 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002
diff --git a/slides/06/retinanet_ablations.svgz b/slides/06/retinanet_ablations.svgz
new file mode 100644
index 0000000..aec5956
Binary files /dev/null and b/slides/06/retinanet_ablations.svgz differ
diff --git a/slides/06/retinanet_ablations.svgz.ref b/slides/06/retinanet_ablations.svgz.ref
new file mode 100644
index 0000000..1e51d14
--- /dev/null
+++ b/slides/06/retinanet_ablations.svgz.ref
@@ -0,0 +1 @@
+Table 1 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002
diff --git a/slides/06/retinanet_graph.svgz b/slides/06/retinanet_graph.svgz
new file mode 100644
index 0000000..299a928
Binary files /dev/null and b/slides/06/retinanet_graph.svgz differ
diff --git a/slides/06/retinanet_graph.svgz.ref b/slides/06/retinanet_graph.svgz.ref
new file mode 100644
index 0000000..b54356d
--- /dev/null
+++ b/slides/06/retinanet_graph.svgz.ref
@@ -0,0 +1 @@
+Figure 2 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002
diff --git a/slides/06/retinanet_results.svgz b/slides/06/retinanet_results.svgz
new file mode 100644
index 0000000..80a5c4d
Binary files /dev/null and b/slides/06/retinanet_results.svgz differ
diff --git a/slides/06/retinanet_results.svgz.ref b/slides/06/retinanet_results.svgz.ref
new file mode 100644
index 0000000..38a2dcf
--- /dev/null
+++ b/slides/06/retinanet_results.svgz.ref
@@ -0,0 +1 @@
+Table 2 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002
diff --git a/slides/06/roi_generation.jpg b/slides/06/roi_generation.jpg
new file mode 100644
index 0000000..18f7350
Binary files /dev/null and b/slides/06/roi_generation.jpg differ
diff --git a/slides/06/roi_generation.jpg.ref b/slides/06/roi_generation.jpg.ref
new file mode 100644
index 0000000..fbb2b02
--- /dev/null
+++ b/slides/06/roi_generation.jpg.ref
@@ -0,0 +1 @@
+Slide 48 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf.
diff --git a/slides/06/roi_pooling.svgz b/slides/06/roi_pooling.svgz
new file mode 100644
index 0000000..b5d6c0d
Binary files /dev/null and b/slides/06/roi_pooling.svgz differ
diff --git a/slides/06/roi_pooling.svgz.ref b/slides/06/roi_pooling.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/06/roi_projection.svgz b/slides/06/roi_projection.svgz
new file mode 100644
index 0000000..a6aee2e
Binary files /dev/null and b/slides/06/roi_projection.svgz differ
diff --git a/slides/06/roi_projection.svgz.ref b/slides/06/roi_projection.svgz.ref
new file mode 100644
index 0000000..1cc5acc
--- /dev/null
+++ b/slides/06/roi_projection.svgz.ref
@@ -0,0 +1 @@
+Slide 65 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf.
diff --git a/tasks/bboxes_utils.md b/tasks/bboxes_utils.md
new file mode 100644
index 0000000..64c58c8
--- /dev/null
+++ b/tasks/bboxes_utils.md
@@ -0,0 +1,26 @@
+### Assignment: bboxes_utils
+#### Date: Deadline: Apr 09, 22:00
+#### Points: 2 points
+
+This is a preparatory assignment for `svhn_competition`. The goal is to
+implement several bounding box manipulation routines in the
+[bboxes_utils.py](https://github.com/ufal/npfl138/tree/master/labs/06/bboxes_utils.py)
+module. Notably, you need to implement the following methods:
+- `bboxes_to_rcnn`: convert given bounding boxes to a R-CNN-like
+  representation relative to the given anchors;
+- `bboxes_from_rcnn`: convert R-CNN-like representations relative to
+  given anchors back to bounding boxes;
+- `bboxes_training`: given a list of anchors and gold objects, assign gold
+  objects to anchors and generate suitable training data (the exact algorithm
+  is described in the template).
+
+The [bboxes_utils.py](https://github.com/ufal/npfl138/tree/master/labs/06/bboxes_utils.py)
+contains simple unit tests, which are evaluated when executing the module,
+which you can use to check the validity of your implementation. Note that
+the template does not contain type annotations because Python typing system is
+not flexible enough to describe the tensor shape changes.
+
+When submitting to ReCodEx, the method `main` is executed, returning the
+implemented `bboxes_to_rcnn`, `bboxes_from_rcnn` and `bboxes_training`
+methods. These methods are then executed and compared to the reference
+implementation.
diff --git a/tasks/cags_classification.md b/tasks/cags_classification.md
index eedad3a..42009a4 100644
--- a/tasks/cags_classification.md
+++ b/tasks/cags_classification.md
@@ -31,8 +31,8 @@ estimates on the batch) or in inference regime. There is one exception though
 inference regime even when `training == True`._
 
 The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits a solution
-which achieves at least _93%_ test set accuracy will get 4 points; the rest
-5 points will be distributed depending on relative ordering of your solutions.
+achieving at least _93%_ test set accuracy gets 4 points; the remaining
+5 bonus points are distributed depending on relative ordering of your solutions.
 
 You may want to start with the
 [cags_classification.py](https://github.com/ufal/npfl138/tree/master/labs/05/cags_classification.py)
diff --git a/tasks/cags_segmentation.md b/tasks/cags_segmentation.md
index 7669d70..d9677e5 100644
--- a/tasks/cags_segmentation.md
+++ b/tasks/cags_segmentation.md
@@ -18,8 +18,8 @@ module, which can also evaluate your predictions (either by running with
 `evaluate_segmentation_file` method).
 
 The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits a solution
-which achieves at least _87%_ test set IoU gets 4 points; the rest
-5 points will be distributed depending on relative ordering of your solutions.
+achieving at least _87%_ test set IoU gets 4 points; the remaining
+5 bonus points are distributed depending on relative ordering of your solutions.
 
 You may want to start with the
 [cags_segmentation.py](https://github.com/ufal/npfl138/tree/master/labs/05/cags_segmentation.py)
diff --git a/tasks/cifar_competition.md b/tasks/cifar_competition.md
index ae455fb..9505c18 100644
--- a/tasks/cifar_competition.md
+++ b/tasks/cifar_competition.md
@@ -8,8 +8,9 @@ You can load the data using the
 module. Note that the test set is different than that of official CIFAR-10.
 
 The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits a solution
-which achieves at least _70%_ test set accuracy will get 4 points; the rest
-5 points will be distributed depending on relative ordering of your solutions.
+achieving at least _70%_ test set accuracy gets 4 points; the remaining
+5 bonus points are distributed depending on relative ordering of your solutions.
+
 Note that my solutions usually need to achieve around ~85% on the development
 set to score 70% on the test set.
 
diff --git a/tasks/cnn_manual.md b/tasks/cnn_manual.md
index 6d1483a..06f3a83 100644
--- a/tasks/cnn_manual.md
+++ b/tasks/cnn_manual.md
@@ -12,9 +12,9 @@ activation and `valid` padding, specified in the `args.cnn` option.
 The `args.cnn` contains comma-separated layer specifications in the format
 `filters-kernel_size-stride`.
 
-Of course, you cannot use any TensorFlow convolutional operation (instead,
+Of course, you cannot use any PyTorch convolutional operation (instead,
 implement the forward and backward pass using matrix multiplication and other
-operations), nor the `tf.GradientTape` for gradient computation.
+operations), nor the `.backward()` for gradient computation.
 
 To make debugging easier, the template supports a `--verify` option, which
 allows comparing the forward pass and the three gradients you compute in the
diff --git a/tasks/mnist_ensemble.md b/tasks/mnist_ensemble.md
index 9deb3d5..7bee385 100644
--- a/tasks/mnist_ensemble.md
+++ b/tasks/mnist_ensemble.md
@@ -8,7 +8,7 @@ Your goal in this assignment is to implement model ensembling.
 The [mnist_ensemble.py](https://github.com/ufal/npfl138/tree/master/labs/03/mnist_ensemble.py)
 template trains `args.models` individual models, and your goal is to perform
 an ensemble of the first model, first two models, first three models, …, all
-models, and evaluate their accuracy on the test set.
+models, and evaluate their accuracy on the development set.
 
 #### Tests Start: mnist_ensemble_tests
 _Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._
diff --git a/tasks/sgd_manual.md b/tasks/sgd_manual.md
index fd268e7..d054d94 100644
--- a/tasks/sgd_manual.md
+++ b/tasks/sgd_manual.md
@@ -17,7 +17,7 @@ Start with the
 [sgd_manual.py](https://github.com/ufal/npfl138/tree/master/labs/02/sgd_manual.py)
 template, which is based on
 [sgd_backpropagation.py](https://github.com/ufal/npfl138/tree/master/labs/02/sgd_backpropagation.py)
-one. Be aware that these templates generates each a different output file.
+one.
 
 Note that ReCodEx disables the PyTorch automatic differentiation during
 evaluation.
diff --git a/tasks/svhn_competition.md b/tasks/svhn_competition.md
new file mode 100644
index 0000000..902484f
--- /dev/null
+++ b/tasks/svhn_competition.md
@@ -0,0 +1,44 @@
+### Assignment: svhn_competition
+#### Date: Deadline: Apr 09, 22:00
+#### Points: 5 points+5 bonus
+
+The goal of this assignment is to implement a system performing object
+recognition, optionally utilizing the pretrained EfficientNetV2-B0 backbone
+(or any other model from `keras.applications`).
+
+The [Street View House Numbers (SVHN) dataset](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/demos/svhn_train.html)
+annotates for every photo all digits appearing on it, including their bounding
+boxes. The dataset can be loaded using the [svhn_dataset.py](https://github.com/ufal/npfl138/tree/master/labs/06/svhn_dataset.py)
+module. Similarly to the `CAGS` dataset, the `train/dev/test` are PyTorch
+`torch.utils.data.Dataset`s, and every element is a dictionary with the following keys:
+- `"image"`: a square 3-channel image stored using PyTorch tensor of type `torch.uint8`,
+- `"classes"`: a 1D `np.ndarray`  with all digit labels appearing in the image,
+- `"bboxes"`: a `[num_digits, 4]` 2D `np.ndarray` with bounding boxes of every
+  digit in the image, each represented as `[TOP, LEFT, BOTTOM, RIGHT]`.
+
+Each test set image annotation consists of a sequence of space separated
+five-tuples _label top left bottom right_, and the annotation is considered
+correct, if exactly the gold digits are predicted, each with IoU at least 0.5.
+The whole test set score is then the prediction accuracy of individual images.
+You can again evaluate your predictions using the
+[svhn_dataset.py](https://github.com/ufal/npfl138/tree/master/labs/06/svhn_dataset.py)
+module, either by running with `--evaluate=path` arguments, or using its
+`evaluate_file` method.
+
+The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions).
+Everyone who submits a solution achieving at least _20%_ test set accuracy gets
+5 points; the remaining 5 bonus points are distributed depending on relative ordering
+of your solutions. Note that I usually need at least _35%_ development set
+accuracy to achieve the required test set performance.
+
+You should start with the
+[svhn_competition.py](https://github.com/ufal/npfl138/tree/master/labs/06/svhn_competition.py)
+template, which generates the test set annotation in the required format.
+
+_A baseline solution can use RetinaNet-like single stage detector,
+using only a single level of convolutional features (no FPN)
+with single-scale and single-aspect anchors. Focal loss is available
+as [keras.losses.BinaryFocalCrossentropy](https://keras.io/api/losses/probabilistic_losses/#binaryfocalcrossentropy-class)
+and non-maximum suppression as
+[torchvision.ops.nms](https://pytorch.org/vision/main/generated/torchvision.ops.nms.html#nms) or
+[torchvision.ops.batched_nms](https://pytorch.org/vision/main/generated/torchvision.ops.batched_nms.html#batched-nms)._
diff --git a/tasks/uppercase.md b/tasks/uppercase.md
index 66288d4..5c9a9e4 100644
--- a/tasks/uppercase.md
+++ b/tasks/uppercase.md
@@ -15,8 +15,8 @@ only used to understand the approach you took, and to indicate teams).
 Explicitly, submit **exactly one .txt file** and **at least one .py/ipynb file**.
 
 The task is also a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits
-a solution which achieves at least _98.5%_ accuracy will get 4 basic points; the
-5 bonus points will be distributed depending on relative ordering of your
+a solution achieving at least _98.5%_ accuracy gets 4 basic points; the
+remaining 5 bonus points are distributed depending on relative ordering of your
 solutions. The accuracy is computed per-character and can be evaluated
 by running [uppercase_data.py](https://github.com/ufal/npfl138/tree/master/labs/03/uppercase_data.py)
 with `--evaluate` argument, or using its `evaluate_file` method.