diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..917c1db --- /dev/null +++ b/.gitignore @@ -0,0 +1,4 @@ +**/.venv/ +logs/ +mnist.npz +*.zip diff --git a/.venv/pyvenv.cfg b/.venv/pyvenv.cfg new file mode 100644 index 0000000..e129fd0 --- /dev/null +++ b/.venv/pyvenv.cfg @@ -0,0 +1,3 @@ +home = C:\Python310 +include-system-site-packages = false +version = 3.10.7 diff --git a/.vscode/settings.json b/.vscode/settings.json new file mode 100644 index 0000000..dc3f727 --- /dev/null +++ b/.vscode/settings.json @@ -0,0 +1,3 @@ +{ + "python.analysis.typeCheckingMode": "basic" +} diff --git a/exam/questions.md b/exam/questions.md index 06452e5..2fcf384 100644 --- a/exam/questions.md +++ b/exam/questions.md @@ -108,6 +108,8 @@ - Compare Cutout and DropBlock. [5] +- Describe in detail how is CutMix performed. [5] + - Describe Squeeze and Excitation applied to a ResNet block. [5] - Draw the Mobile inverted bottleneck block (including explanation of separable @@ -119,3 +121,32 @@ channels. Write down (or derive) the equation of transposed convolution (or equivalently backpropagation through a convolution to its inputs). [5] +#### Questions@:, Lecture 6 Questions +- Describe the differences among semantic segmentation, image classification, + object detection, and instance segmentation, and write down which metrics + are used for these tasks. [5] + +- Write down how is $\mathit{AP}_{50}$ computed. [5] + +- Considering a Fast-RCNN architecture, draw overall network architecture, + explain what a RoI-pooling layer is, show how the network parametrizes + bounding boxes and write down the loss. Finally, describe non-maximum + suppression and how the Fast-RCNN prediction is performed. [10] + +- Considering a Faster-RCNN architecture, describe the region proposal network + (what are anchors, architecture including both heads, how are the coordinates + of proposals parametrized, what does the loss look like). [10] + +- Considering Mask-RCNN architecture, describe the additions to a Faster-RCNN + architecture (the RoI-Align layer, the new mask-producing head). [5] + +- Write down the focal loss with class weighting, including the commonly used + hyperparameter values. [5] + +- Draw the overall architecture of a RetinaNet architecture (the computation of + $C_1, \ldots, C_7$, the FPN architecture computing $P_1, \ldots, P_7$ + including the block combining feature maps of different resolutions; the + classification and bounding box generation heads, including their output + size). Write down the losses for both heads. [10] + +- Describe GroupNorm, and compare it to BatchNorm and LayerNorm. [5] diff --git a/labs/.gitignore b/labs/.gitignore index 6319f80..acfd147 100644 --- a/labs/.gitignore +++ b/labs/.gitignore @@ -3,5 +3,5 @@ logs/ *.h5 *.keras *.npz -*.pickle +*.tfrecord *.zip diff --git a/labs/01/expected.txt b/labs/01/expected.txt new file mode 100644 index 0000000..fdaf786 --- /dev/null +++ b/labs/01/expected.txt @@ -0,0 +1,39 @@ +python3 mnist_layers_activations.py --hidden_layers=0 --activation=none +Epoch 1/10 accuracy: 0.7801 - loss: 0.8405 - val_accuracy: 0.9300 - val_loss: 0.2716 +Epoch 5/10 accuracy: 0.9222 - loss: 0.2792 - val_accuracy: 0.9406 - val_loss: 0.2203 +Epoch 10/10 accuracy: 0.9304 - loss: 0.2515 - val_accuracy: 0.9432 - val_loss: 0.2159 + +python3 mnist_layers_activations.py --hidden_layers=1 --activation=none +Epoch 1/10 accuracy: 0.8483 - loss: 0.5230 - val_accuracy: 0.9352 - val_loss: 0.2422 +Epoch 5/10 accuracy: 0.9236 - loss: 0.2758 - val_accuracy: 0.9360 - val_loss: 0.2325 +Epoch 10/10 accuracy: 0.9298 - loss: 0.2517 - val_accuracy: 0.9354 - val_loss: 0.2439 + +python3 mnist_layers_activations.py --hidden_layers=1 --activation=relu +Epoch 1/10 accuracy: 0.8503 - loss: 0.5286 - val_accuracy: 0.9604 - val_loss: 0.1432 +Epoch 5/10 accuracy: 0.9824 - loss: 0.0613 - val_accuracy: 0.9808 - val_loss: 0.0740 +Epoch 10/10 accuracy: 0.9948 - loss: 0.0202 - val_accuracy: 0.9788 - val_loss: 0.0821 + +python3 mnist_layers_activations.py --hidden_layers=1 --activation=tanh +Epoch 1/10 accuracy: 0.8529 - loss: 0.5183 - val_accuracy: 0.9564 - val_loss: 0.1632 +Epoch 5/10 accuracy: 0.9800 - loss: 0.0728 - val_accuracy: 0.9740 - val_loss: 0.0853 +Epoch 10/10 accuracy: 0.9948 - loss: 0.0244 - val_accuracy: 0.9782 - val_loss: 0.0772 + +python3 mnist_layers_activations.py --hidden_layers=1 --activation=sigmoid +Epoch 1/10 accuracy: 0.7851 - loss: 0.8650 - val_accuracy: 0.9414 - val_loss: 0.2196 +Epoch 5/10 accuracy: 0.9647 - loss: 0.1270 - val_accuracy: 0.9704 - val_loss: 0.1079 +Epoch 10/10 accuracy: 0.9852 - loss: 0.0583 - val_accuracy: 0.9756 - val_loss: 0.0837 + +python3 mnist_layers_activations.py --hidden_layers=3 --activation=relu +Epoch 1/10 accuracy: 0.8497 - loss: 0.5011 - val_accuracy: 0.9664 - val_loss: 0.1225 +Epoch 5/10 accuracy: 0.9862 - loss: 0.0438 - val_accuracy: 0.9734 - val_loss: 0.1026 +Epoch 10/10 accuracy: 0.9932 - loss: 0.0202 - val_accuracy: 0.9818 - val_loss: 0.0865 + +python3 mnist_layers_activations.py --hidden_layers=10 --activation=relu +Epoch 1/10 accuracy: 0.7710 - loss: 0.6793 - val_accuracy: 0.9570 - val_loss: 0.1479 +Epoch 5/10 accuracy: 0.9780 - loss: 0.0783 - val_accuracy: 0.9786 - val_loss: 0.0808 +Epoch 10/10 accuracy: 0.9869 - loss: 0.0481 - val_accuracy: 0.9724 - val_loss: 0.1163 + +python3 mnist_layers_activations.py --hidden_layers=10 --activation=sigmoid +Epoch 1/10 accuracy: 0.1072 - loss: 2.3068 - val_accuracy: 0.1784 - val_loss: 2.1247 +Epoch 5/10 accuracy: 0.8825 - loss: 0.4776 - val_accuracy: 0.9164 - val_loss: 0.3686 +Epoch 10/10 accuracy: 0.9294 - loss: 0.2994 - val_accuracy: 0.9386 - val_loss: 0.2671 diff --git a/labs/01/mnist.ps1 b/labs/01/mnist.ps1 new file mode 100644 index 0000000..a274269 --- /dev/null +++ b/labs/01/mnist.ps1 @@ -0,0 +1,24 @@ +# Write-Output "python3 mnist_layers_activations.py --hidden_layers=0 --activation=none" +..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=0 --activation=none +# Write-Output "" +# Write-Output "python3 mnist_layers_activations.py --hidden_layers=1 --activation=none" +..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=1 --activation=none +# Write-Output "" +# Write-Output "python3 mnist_layers_activations.py --hidden_layers=1 --activation=relu" +..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=1 --activation=relu +# Write-Output "" +# Write-Output "python3 mnist_layers_activations.py --hidden_layers=1 --activation=tanh" +..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=1 --activation=tanh +# Write-Output "" +# Write-Output "python3 mnist_layers_activations.py --hidden_layers=1 --activation=sigmoid" +..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=1 --activation=sigmoid +# Write-Output "" +# Write-Output "python3 mnist_layers_activations.py --hidden_layers=3 --activation=relu" +..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=3 --activation=relu +# Write-Output "" +# Write-Output "python3 mnist_layers_activations.py --hidden_layers=10 --activation=relu" +..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=10 --activation=relu +# Write-Output "" +# Write-Output "python3 mnist_layers_activations.py --hidden_layers=10 --activation=sigmoid" +..\..\.venv\Scripts\python mnist_layers_activations.py --hidden_layers=10 --activation=sigmoid +# Write-Output "" diff --git a/labs/01/mnist_layers_activations.py b/labs/01/mnist_layers_activations.py index d58b796..bf78be2 100644 --- a/labs/01/mnist_layers_activations.py +++ b/labs/01/mnist_layers_activations.py @@ -10,6 +10,11 @@ from mnist import MNIST +# Jonas Glerup Røssum +# 31a0a96a-c590-4486-b194-f72765b2ce25 +# Xiao Wang +# 91d4d1d7-b800-4765-96b9-df098ac36a66 + parser = argparse.ArgumentParser() # These arguments will be set appropriately by ReCodEx, even if you change them. parser.add_argument("--activation", default="none", choices=["none", "relu", "tanh", "sigmoid"], help="Activation.") @@ -68,7 +73,7 @@ def main(args: argparse.Namespace) -> dict[str, float]: # Create the model model = keras.Sequential() model.add(keras.Input([MNIST.H, MNIST.W, MNIST.C])) - # TODO: Finish the model. Namely: + # Finish the model. Namely: # - start by adding a `keras.layers.Rescaling(1 / 255)` layer; # - then add a `keras.layers.Flatten()` layer; # - add `args.hidden_layers` number of fully connected hidden layers @@ -76,6 +81,14 @@ def main(args: argparse.Namespace) -> dict[str, float]: # from `args.activation`, allowing "none", "relu", "tanh", "sigmoid"; # - finally, add an output fully connected layer with `MNIST.LABELS` units # and `softmax` activation. + model.add(keras.layers.Rescaling(1 / 255)) + model.add(keras.layers.Flatten()) + + for _ in range(args.hidden_layers): + activation = None if args.activation == "none" else args.activation + model.add(keras.layers.Dense(args.hidden_layer, activation=activation)) + + model.add(keras.layers.Dense(MNIST.LABELS, activation="softmax")) model.compile( optimizer=keras.optimizers.Adam(), diff --git a/labs/01/numpy_entropy.py b/labs/01/numpy_entropy.py index 8e86bff..819b6b0 100644 --- a/labs/01/numpy_entropy.py +++ b/labs/01/numpy_entropy.py @@ -1,4 +1,10 @@ #!/usr/bin/env python3 + +# Jonas Glerup Røssum +# 31a0a96a-c590-4486-b194-f72765b2ce25 +# Xiao Wang +# 91d4d1d7-b800-4765-96b9-df098ac36a66 + import argparse import numpy as np @@ -12,42 +18,51 @@ def main(args: argparse.Namespace) -> tuple[float, float, float]: - # TODO: Load data distribution, each line containing a datapoint -- a string. - with open(args.data_path, "r") as data: + # Load data distribution, each line containing a datapoint -- a string. + data_map = {} + + # Load data distribution, each line containing a datapoint -- a string. + with open(args.data_path, "r", encoding="utf-8") as data: for line in data: line = line.rstrip("\n") - # TODO: Process the line, aggregating data with built-in Python + + # Process the line, aggregating data with built-in Python # data structures (not NumPy, which is not suitable for incremental # addition and string mapping). + if line in data_map: + data_map[line] += 1 + else: + data_map[line] = 1 - # TODO: Create a NumPy array containing the data distribution. The + # Create a NumPy array containing the data distribution. The # NumPy array should contain only data, not any mapping. Alternatively, # the NumPy array might be created after loading the model distribution. + data_dist = np.array(list(data_map.values())) / sum(data_map.values()) + + # Load model distribution, each line `string \t probability`. + model_map = {} - # TODO: Load model distribution, each line `string \t probability`. with open(args.model_path, "r") as model: for line in model: line = line.rstrip("\n") - # TODO: Process the line, aggregating using Python data structures. + key, value = line.split("\t") + model_map[key] = float(value) - # TODO: Create a NumPy array containing the model distribution. + # Create a NumPy array containing the model distribution. + model_dist = np.array([model_map[key] if key in model_map else np.inf for key in data_map.keys()]) - # TODO: Compute the entropy H(data distribution). You should not use - # manual for/while cycles, but instead use the fact that most NumPy methods - # operate on all elements (for example `*` is vector element-wise multiplication). - entropy = ... + # Compute the entropy H(data distribution). + entropy = -np.sum(data_dist * np.log(data_dist)) - # TODO: Compute cross-entropy H(data distribution, model distribution). - # When some data distribution elements are missing in the model distribution, - # return `np.inf`. - crossentropy = ... + # Compute cross-entropy H(data distribution, model distribution). + crossentropy = -np.sum(data_dist * np.log(model_dist)) - # TODO: Compute KL-divergence D_KL(data distribution, model_distribution), - # again using `np.inf` when needed. - kl_divergence = ... + # Compute KL-divergence D_KL(data distribution, model_distribution). + kl_divergence = crossentropy - entropy + # kl_divergence = np.where(np.isinf(kl_divergence), np.inf, kl_divergence) # Return the computed values for ReCodEx to validate. - return entropy, crossentropy, kl_divergence + return entropy, crossentropy if np.isfinite(crossentropy) else np.inf, kl_divergence if np.isfinite(kl_divergence) else np.inf if __name__ == "__main__": diff --git a/labs/01/output.txt b/labs/01/output.txt new file mode 100644 index 0000000..916c534 --- /dev/null +++ b/labs/01/output.txt @@ -0,0 +1,167 @@ +Epoch 1/10 +1100/1100 14s 12ms/step - accuracy: 0.7761 - loss: 0.8442 - val_accuracy: 0.9298 - val_loss: 0.2730 +Epoch 2/10 +1100/1100 12s 11ms/step - accuracy: 0.9057 - loss: 0.3428 - val_accuracy: 0.9336 - val_loss: 0.2418 +Epoch 3/10 +1100/1100 11s 10ms/step - accuracy: 0.9177 - loss: 0.2945 - val_accuracy: 0.9366 - val_loss: 0.2284 +Epoch 4/10 +1100/1100 12s 10ms/step - accuracy: 0.9193 - loss: 0.2839 - val_accuracy: 0.9384 - val_loss: 0.2267 +Epoch 5/10 +1100/1100 11s 10ms/step - accuracy: 0.9228 - loss: 0.2790 - val_accuracy: 0.9392 - val_loss: 0.2208 +Epoch 6/10 +1100/1100 12s 11ms/step - accuracy: 0.9244 - loss: 0.2713 - val_accuracy: 0.9440 - val_loss: 0.2162 +Epoch 7/10 +1100/1100 13s 12ms/step - accuracy: 0.9252 - loss: 0.2662 - val_accuracy: 0.9398 - val_loss: 0.2178 +Epoch 8/10 +1100/1100 14s 12ms/step - accuracy: 0.9269 - loss: 0.2626 - val_accuracy: 0.9398 - val_loss: 0.2169 +Epoch 9/10 +1100/1100 13s 12ms/step - accuracy: 0.9286 - loss: 0.2612 - val_accuracy: 0.9458 - val_loss: 0.2128 +Epoch 10/10 +1100/1100 13s 12ms/step - accuracy: 0.9307 - loss: 0.2515 - val_accuracy: 0.9438 - val_loss: 0.2161 + +Epoch 1/10 +1100/1100 15s 13ms/step - accuracy: 0.8422 - loss: 0.5383 - val_accuracy: 0.9346 - val_loss: 0.2400 +Epoch 2/10 +1100/1100 18s 17ms/step - accuracy: 0.9120 - loss: 0.3102 - val_accuracy: 0.9364 - val_loss: 0.2372 +Epoch 3/10 +1100/1100 16s 15ms/step - accuracy: 0.9233 - loss: 0.2774 - val_accuracy: 0.9352 - val_loss: 0.2342 +Epoch 4/10 +1100/1100 16s 14ms/step - accuracy: 0.9225 - loss: 0.2736 - val_accuracy: 0.9366 - val_loss: 0.2336 +Epoch 5/10 +1100/1100 15s 13ms/step - accuracy: 0.9233 - loss: 0.2760 - val_accuracy: 0.9344 - val_loss: 0.2331 +Epoch 6/10 +1100/1100 22s 20ms/step - accuracy: 0.9251 - loss: 0.2683 - val_accuracy: 0.9382 - val_loss: 0.2247 +Epoch 7/10 +1100/1100 15s 14ms/step - accuracy: 0.9261 - loss: 0.2658 - val_accuracy: 0.9356 - val_loss: 0.2367 +Epoch 8/10 +1100/1100 15s 14ms/step - accuracy: 0.9256 - loss: 0.2635 - val_accuracy: 0.9364 - val_loss: 0.2308 +Epoch 9/10 +1100/1100 15s 13ms/step - accuracy: 0.9253 - loss: 0.2625 - val_accuracy: 0.9386 - val_loss: 0.2277 +Epoch 10/10 +1100/1100 15s 13ms/step - accuracy: 0.9301 - loss: 0.2515 - val_accuracy: 0.9358 - val_loss: 0.2441 + +Epoch 1/10 +1100/1100 16s 13ms/step - accuracy: 0.8499 - loss: 0.5317 - val_accuracy: 0.9618 - val_loss: 0.1400 +Epoch 2/10 +1100/1100 15s 13ms/step - accuracy: 0.9517 - loss: 0.1637 - val_accuracy: 0.9682 - val_loss: 0.1153 +Epoch 3/10 +1100/1100 14s 13ms/step - accuracy: 0.9700 - loss: 0.1021 - val_accuracy: 0.9730 - val_loss: 0.0897 +Epoch 4/10 +1100/1100 13s 12ms/step - accuracy: 0.9774 - loss: 0.0757 - val_accuracy: 0.9754 - val_loss: 0.0835 +Epoch 5/10 +1100/1100 13s 12ms/step - accuracy: 0.9824 - loss: 0.0603 - val_accuracy: 0.9772 - val_loss: 0.0766 +Epoch 6/10 +1100/1100 14s 12ms/step - accuracy: 0.9855 - loss: 0.0486 - val_accuracy: 0.9762 - val_loss: 0.0850 +Epoch 7/10 +1100/1100 14s 13ms/step - accuracy: 0.9889 - loss: 0.0374 - val_accuracy: 0.9776 - val_loss: 0.0774 +Epoch 8/10 +1100/1100 13s 12ms/step - accuracy: 0.9901 - loss: 0.0318 - val_accuracy: 0.9786 - val_loss: 0.0765 +Epoch 9/10 +1100/1100 13s 12ms/step - accuracy: 0.9928 - loss: 0.0267 - val_accuracy: 0.9804 - val_loss: 0.0766 +Epoch 10/10 +1100/1100 14s 12ms/step - accuracy: 0.9944 - loss: 0.0208 - val_accuracy: 0.9792 - val_loss: 0.0801 + +Epoch 1/10 +1100/1100 14s 12ms/step - accuracy: 0.8468 - loss: 0.5308 - val_accuracy: 0.9594 - val_loss: 0.1591 +Epoch 2/10 +1100/1100 13s 12ms/step - accuracy: 0.9433 - loss: 0.1909 - val_accuracy: 0.9646 - val_loss: 0.1300 +Epoch 3/10 +1100/1100 13s 12ms/step - accuracy: 0.9658 - loss: 0.1235 - val_accuracy: 0.9726 - val_loss: 0.0973 +Epoch 4/10 +1100/1100 13s 12ms/step - accuracy: 0.9744 - loss: 0.0909 - val_accuracy: 0.9732 - val_loss: 0.0876 +Epoch 5/10 +1100/1100 13s 12ms/step - accuracy: 0.9798 - loss: 0.0747 - val_accuracy: 0.9788 - val_loss: 0.0770 +Epoch 6/10 +1100/1100 13s 12ms/step - accuracy: 0.9832 - loss: 0.0606 - val_accuracy: 0.9766 - val_loss: 0.0801 +Epoch 7/10 +1100/1100 13s 12ms/step - accuracy: 0.9881 - loss: 0.0460 - val_accuracy: 0.9792 - val_loss: 0.0714 +Epoch 8/10 +1100/1100 13s 12ms/step - accuracy: 0.9894 - loss: 0.0397 - val_accuracy: 0.9768 - val_loss: 0.0741 +Epoch 9/10 +1100/1100 13s 12ms/step - accuracy: 0.9923 - loss: 0.0312 - val_accuracy: 0.9796 - val_loss: 0.0709 +Epoch 10/10 +1100/1100 14s 12ms/step - accuracy: 0.9940 - loss: 0.0257 - val_accuracy: 0.9802 - val_loss: 0.0720 + +Epoch 1/10 +1100/1100 15s 13ms/step - accuracy: 0.8072 - loss: 0.8138 - val_accuracy: 0.9452 - val_loss: 0.2121 +Epoch 2/10 +1100/1100 15s 14ms/step - accuracy: 0.9241 - loss: 0.2602 - val_accuracy: 0.9570 - val_loss: 0.1663 +Epoch 3/10 +1100/1100 15s 14ms/step - accuracy: 0.9476 - loss: 0.1863 - val_accuracy: 0.9648 - val_loss: 0.1322 +Epoch 4/10 +1100/1100 14s 13ms/step - accuracy: 0.9583 - loss: 0.1490 - val_accuracy: 0.9670 - val_loss: 0.1168 +Epoch 5/10 +1100/1100 14s 13ms/step - accuracy: 0.9658 - loss: 0.1243 - val_accuracy: 0.9696 - val_loss: 0.1047 +Epoch 6/10 +1100/1100 14s 12ms/step - accuracy: 0.9706 - loss: 0.1065 - val_accuracy: 0.9718 - val_loss: 0.0975 +Epoch 7/10 +1100/1100 13s 12ms/step - accuracy: 0.9758 - loss: 0.0891 - val_accuracy: 0.9740 - val_loss: 0.0918 +Epoch 8/10 +1100/1100 13s 12ms/step - accuracy: 0.9779 - loss: 0.0792 - val_accuracy: 0.9758 - val_loss: 0.0885 +Epoch 9/10 +1100/1100 14s 13ms/step - accuracy: 0.9816 - loss: 0.0681 - val_accuracy: 0.9776 - val_loss: 0.0825 +Epoch 10/10 +1100/1100 14s 12ms/step - accuracy: 0.9852 - loss: 0.0583 - val_accuracy: 0.9766 - val_loss: 0.0831 + +Epoch 1/10 +1100/1100 16s 14ms/step - accuracy: 0.8483 - loss: 0.5002 - val_accuracy: 0.9650 - val_loss: 0.1189 +Epoch 2/10 +1100/1100 16s 14ms/step - accuracy: 0.9609 - loss: 0.1262 - val_accuracy: 0.9718 - val_loss: 0.0971 +Epoch 3/10 +1100/1100 16s 14ms/step - accuracy: 0.9759 - loss: 0.0783 - val_accuracy: 0.9772 - val_loss: 0.0690 +Epoch 4/10 +1100/1100 16s 14ms/step - accuracy: 0.9810 - loss: 0.0597 - val_accuracy: 0.9788 - val_loss: 0.0752 +Epoch 5/10 +1100/1100 15s 14ms/step - accuracy: 0.9855 - loss: 0.0468 - val_accuracy: 0.9748 - val_loss: 0.0817 +Epoch 6/10 +1100/1100 16s 14ms/step - accuracy: 0.9884 - loss: 0.0398 - val_accuracy: 0.9758 - val_loss: 0.0909 +Epoch 7/10 +1100/1100 15s 14ms/step - accuracy: 0.9898 - loss: 0.0318 - val_accuracy: 0.9724 - val_loss: 0.0998 +Epoch 8/10 +1100/1100 16s 14ms/step - accuracy: 0.9892 - loss: 0.0305 - val_accuracy: 0.9778 - val_loss: 0.0952 +Epoch 9/10 +1100/1100 16s 14ms/step - accuracy: 0.9914 - loss: 0.0267 - val_accuracy: 0.9756 - val_loss: 0.0878 +Epoch 10/10 +1100/1100 16s 15ms/step - accuracy: 0.9935 - loss: 0.0203 - val_accuracy: 0.9770 - val_loss: 0.0974 + +Epoch 1/10 +1100/1100 24s 21ms/step - accuracy: 0.7772 - loss: 0.6657 - val_accuracy: 0.9524 - val_loss: 0.1752 +Epoch 2/10 +1100/1100 24s 22ms/step - accuracy: 0.9525 - loss: 0.1705 - val_accuracy: 0.9682 - val_loss: 0.1261 +Epoch 3/10 +1100/1100 22s 20ms/step - accuracy: 0.9675 - loss: 0.1162 - val_accuracy: 0.9750 - val_loss: 0.0945 +Epoch 4/10 +1100/1100 22s 20ms/step - accuracy: 0.9735 - loss: 0.0929 - val_accuracy: 0.9720 - val_loss: 0.1018 +Epoch 5/10 +1100/1100 22s 20ms/step - accuracy: 0.9789 - loss: 0.0794 - val_accuracy: 0.9762 - val_loss: 0.0888 +Epoch 6/10 +1100/1100 22s 20ms/step - accuracy: 0.9806 - loss: 0.0729 - val_accuracy: 0.9760 - val_loss: 0.0961 +Epoch 7/10 +1100/1100 22s 20ms/step - accuracy: 0.9847 - loss: 0.0578 - val_accuracy: 0.9810 - val_loss: 0.0932 +Epoch 8/10 +1100/1100 22s 20ms/step - accuracy: 0.9824 - loss: 0.0643 - val_accuracy: 0.9786 - val_loss: 0.0854 +Epoch 9/10 +1100/1100 22s 20ms/step - accuracy: 0.9864 - loss: 0.0487 - val_accuracy: 0.9764 - val_loss: 0.1054 +Epoch 10/10 +1100/1100 22s 20ms/step - accuracy: 0.9864 - loss: 0.0493 - val_accuracy: 0.9780 - val_loss: 0.1108 + +Epoch 1/10 +1100/1100 23s 20ms/step - accuracy: 0.1052 - loss: 2.3130 - val_accuracy: 0.1808 - val_loss: 1.9383 +Epoch 2/10 +1100/1100 22s 20ms/step - accuracy: 0.2002 - loss: 1.9364 - val_accuracy: 0.2168 - val_loss: 1.8587 +Epoch 3/10 +1100/1100 23s 20ms/step - accuracy: 0.2161 - loss: 1.8392 - val_accuracy: 0.5588 - val_loss: 1.2106 +Epoch 4/10 +1100/1100 22s 20ms/step - accuracy: 0.5594 - loss: 1.1159 - val_accuracy: 0.8168 - val_loss: 0.7119 +Epoch 5/10 +1100/1100 22s 20ms/step - accuracy: 0.8359 - loss: 0.6312 - val_accuracy: 0.8994 - val_loss: 0.4360 +Epoch 6/10 +1100/1100 22s 20ms/step - accuracy: 0.8827 - loss: 0.4854 - val_accuracy: 0.9066 - val_loss: 0.4053 +Epoch 7/10 +1100/1100 22s 20ms/step - accuracy: 0.9007 - loss: 0.4218 - val_accuracy: 0.9166 - val_loss: 0.3660 +Epoch 8/10 +1100/1100 22s 20ms/step - accuracy: 0.9075 - loss: 0.3940 - val_accuracy: 0.9204 - val_loss: 0.3552 +Epoch 9/10 +1100/1100 22s 20ms/step - accuracy: 0.9090 - loss: 0.3922 - val_accuracy: 0.9242 - val_loss: 0.3356 +Epoch 10/10 +1100/1100 24s 22ms/step - accuracy: 0.9191 - loss: 0.3534 - val_accuracy: 0.9270 - val_loss: 0.3286 diff --git a/labs/01/pca_first.keras.py b/labs/01/pca_first.keras.py index 1f99e21..0632b22 100644 --- a/labs/01/pca_first.keras.py +++ b/labs/01/pca_first.keras.py @@ -9,6 +9,11 @@ from mnist import MNIST +# Jonas Glerup Røssum +# 31a0a96a-c590-4486-b194-f72765b2ce25 +# Xiao Wang +# 91d4d1d7-b800-4765-96b9-df098ac36a66 + parser = argparse.ArgumentParser() # These arguments will be set appropriately by ReCodEx, even if you change them. parser.add_argument("--examples", default=256, type=int, help="MNIST examples to use.") @@ -32,39 +37,43 @@ def main(args: argparse.Namespace) -> tuple[float, float]: data_indices = np.random.choice(mnist.train.size, size=args.examples, replace=False) data = keras.ops.convert_to_tensor(mnist.train.data["images"][data_indices] / 255, dtype="float32") - # TODO: Data has shape [args.examples, MNIST.H, MNIST.W, MNIST.C]. + # Data has shape [args.examples, MNIST.H, MNIST.W, MNIST.C]. # We want to reshape it to [args.examples, MNIST.H * MNIST.W * MNIST.C]. # We can do so using `keras.ops.reshape(data, new_shape)` with new shape # `[data.shape[0], data.shape[1] * data.shape[2] * data.shape[3]]`. - data = ... + data = keras.ops.reshape(data, [data.shape[0], data.shape[1] * data.shape[2] * data.shape[3]]) - # TODO: Now compute mean of every feature. Use `keras.ops.mean`, and set + # Now compute mean of every feature. Use `keras.ops.mean`, and set # `axis` to zero -- therefore, the mean will be computed across the first # dimension, so across examples. - mean = ... + mean = keras.ops.mean(data, axis=0) - # TODO: Compute the covariance matrix. The covariance matrix is + # Compute the covariance matrix. The covariance matrix is # (data - mean)^T * (data - mean) / data.shape[0] # where transpose can be computed using `keras.ops.transpose` and # matrix multiplication using either Python operator @ or `keras.ops.matmul`. - cov = ... + cov = keras.ops.transpose(data-mean) @ (data-mean) / data.shape[0] - # TODO: Compute the total variance, which is the sum of the diagonal + # Compute the total variance, which is the sum of the diagonal # of the covariance matrix. To extract the diagonal use `keras.ops.diagonal`, # and to sum a tensor use `keras.ops.sum`. - total_variance = ... + total_variance = keras.ops.sum(keras.ops.diagonal(cov)) - # TODO: Now run `args.iterations` of the power iteration algorithm. + # Now run `args.iterations` of the power iteration algorithm. # Start with a vector of `cov.shape[0]` ones of type `"float32"` using `keras.ops.ones`. - v = ... + v = keras.ops.ones(cov.shape[0], dtype="float32") for i in range(args.iterations): - # TODO: In the power iteration algorithm, we compute + # In the power iteration algorithm, we compute # 1. v = cov v # The matrix-vector multiplication can be computed as regular matrix multiplication. + v = keras.ops.matmul(cov, v) + # 2. s = l2_norm(v) # The l2_norm can be computed using for example `keras.ops.norm`. + s = keras.ops.norm(v, 2) + # 3. v = v / s - pass + v = v / s # The `v` is now approximately the eigenvector of the largest eigenvalue, `s`. # We now compute the explained variance, which is the ratio of `s` and `total_variance`. diff --git a/labs/01/pca_first.py b/labs/01/pca_first.py index 2e4ef10..deecf06 100644 --- a/labs/01/pca_first.py +++ b/labs/01/pca_first.py @@ -7,6 +7,11 @@ from mnist import MNIST +# Jonas Glerup Røssum +# 31a0a96a-c590-4486-b194-f72765b2ce25 +# Xiao Wang +# 91d4d1d7-b800-4765-96b9-df098ac36a66 + parser = argparse.ArgumentParser() # These arguments will be set appropriately by ReCodEx, even if you change them. parser.add_argument("--examples", default=256, type=int, help="MNIST examples to use.") @@ -30,43 +35,46 @@ def main(args: argparse.Namespace) -> tuple[float, float]: data_indices = np.random.choice(mnist.train.size, size=args.examples, replace=False) data = torch.tensor(mnist.train.data["images"][data_indices] / 255, dtype=torch.float32) - # TODO: Data has shape [args.examples, MNIST.H, MNIST.W, MNIST.C]. + # Data has shape [args.examples, MNIST.H, MNIST.W, MNIST.C]. # We want to reshape it to [args.examples, MNIST.H * MNIST.W * MNIST.C]. # We can do so using `torch.reshape(data, new_shape)` with new shape # `[data.shape[0], data.shape[1] * data.shape[2] * data.shape[3]]`. - data = ... + data = torch.reshape(data, (data.shape[0], data.shape[1] * data.shape[2] * data.shape[3])) - # TODO: Now compute mean of every feature. Use `torch.mean`, and set + # Now compute mean of every feature. Use `torch.mean`, and set # `dim` (or `axis`) argument to zero -- therefore, the mean will be # computed across the first dimension, so across examples. # # Note that for compatibility with Numpy/TF/Keras, all `dim` arguments # in PyTorch can be also called `axis`. - mean = ... + mean = torch.mean(data, axis=0) - # TODO: Compute the covariance matrix. The covariance matrix is + # Compute the covariance matrix. The covariance matrix is # (data - mean)^T * (data - mean) / data.shape[0] # where transpose can be computed using `torch.transpose` or `torch.t` and # matrix multiplication using either Python operator @ or `torch.matmul`. - cov = ... + cov = torch.matmul(torch.t(data-mean), data-mean)/data.shape[0] # TODO: Compute the total variance, which is the sum of the diagonal # of the covariance matrix. To extract the diagonal use `torch.diagonal`, # and to sum a tensor use `torch.sum`. - total_variance = ... + total_variance = torch.sum(torch.diagonal(cov)).item() # TODO: Now run `args.iterations` of the power iteration algorithm. # Start with a vector of `cov.shape[0]` ones of type `torch.float32` using `torch.ones`. - v = ... + v = torch.ones(cov.shape[0], dtype=torch.float32) + for i in range(args.iterations): - # TODO: In the power iteration algorithm, we compute - # 1. v = cov v - # The matrix-vector multiplication can be computed as regular matrix multiplication - # or using `torch.mv`. - # 2. s = l2_norm(v) - # The l2_norm can be computed using for example `torch.linalg.vector_norm`. - # 3. v = v / s - pass + # TODO: In the power iteration algorithm, we compute + # 1. v = cov v + # The matrix-vector multiplication can be computed as regular matrix multiplication + # or using `torch.mv`. + # 2. s = l2_norm(v) + # The l2_norm can be computed using for example `torch.linalg.vector_norm`. + # 3. v = v / s + v = cov @ v + s = torch.linalg.vector_norm(v) + v = v/s # The `v` is now approximately the eigenvector of the largest eigenvalue, `s`. # We now compute the explained variance, which is the ratio of `s` and `total_variance`. diff --git a/labs/01/run.ps1 b/labs/01/run.ps1 new file mode 100644 index 0000000..a68f5e8 --- /dev/null +++ b/labs/01/run.ps1 @@ -0,0 +1 @@ +..\..\.venv\Scripts\python .\pca_first.keras.py diff --git a/labs/01/test.ps1 b/labs/01/test.ps1 new file mode 100644 index 0000000..75ddf37 --- /dev/null +++ b/labs/01/test.ps1 @@ -0,0 +1,4 @@ +python3 numpy_entropy.py --data_path numpy_entropy_data_1.txt --model_path numpy_entropy_model_1.txt +python3 numpy_entropy.py --data_path numpy_entropy_data_2.txt --model_path numpy_entropy_model_2.txt +python3 numpy_entropy.py --data_path numpy_entropy_data_3.txt --model_path numpy_entropy_model_3.txt +spython3 numpy_entropy.py --data_path numpy_entropy_data_4.txt --model_path numpy_entropy_model_4.txt diff --git a/labs/02/gym_cartpole.py b/labs/02/gym_cartpole.py index 7befc72..b708b63 100644 --- a/labs/02/gym_cartpole.py +++ b/labs/02/gym_cartpole.py @@ -8,6 +8,12 @@ import keras import numpy as np import torch +from collections import Counter + +# Jonas Glerup Røssum +# 31a0a96a-c590-4486-b194-f72765b2ce25 +# Xiao Wang +# 91d4d1d7-b800-4765-96b9-df098ac36a66 parser = argparse.ArgumentParser() # These arguments will be set appropriately by ReCodEx, even if you change them. @@ -17,8 +23,8 @@ parser.add_argument("--seed", default=42, type=int, help="Random seed.") parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.") # If you add more arguments, ReCodEx will keep them with your default values. -parser.add_argument("--batch_size", default=..., type=int, help="Batch size.") -parser.add_argument("--epochs", default=..., type=int, help="Number of epochs.") +parser.add_argument("--batch_size", default=10, type=int, help="Batch size.") +parser.add_argument("--epochs", default=100, type=int, help="Number of epochs.") parser.add_argument("--model", default="gym_cartpole_model.keras", type=str, help="Output model path.") @@ -49,7 +55,7 @@ def on_epoch_end(self, epoch, logs=None): def evaluate_model( model: keras.Model, seed: int = 42, episodes: int = 100, render: bool = False, report_per_episode: bool = False -) -> float: + ) -> float: """Evaluate the given model on CartPole-v1 environment. Returns the average score achieved on the given number of episodes. @@ -86,16 +92,10 @@ def evaluate_model( def main(args: argparse.Namespace) -> keras.Model | None: # Set the random seed and the number of threads. keras.utils.set_random_seed(args.seed) - if args.threads: - torch.set_num_threads(args.threads) - torch.set_num_interop_threads(args.threads) + torch.set_num_threads(args.threads) + torch.set_num_interop_threads(args.threads) if not args.evaluate: - if args.batch_size is ...: - raise ValueError("You must specify the batch size, either in the defaults or on the command line.") - if args.epochs is ...: - raise ValueError("You must specify the number of epochs, either in the defaults or on the command line.") - # Create logdir name args.logdir = os.path.join("logs", "{}-{}-{}".format( os.path.basename(globals().get("__file__", "notebook")), @@ -107,16 +107,37 @@ def main(args: argparse.Namespace) -> keras.Model | None: data = np.loadtxt("gym_cartpole_data.txt") observations, labels = data[:, :-1], data[:, -1].astype(np.int32) + + # TODO: Create the model in the `model` variable. Note that # the model can perform any of: # - binary classification with 1 output and sigmoid activation; # - two-class classification with 2 outputs and softmax activation. - model = ... + + # Convert the labels to one-hot encoding + labels = keras.ops.one_hot(labels, num_classes=2) + + model = keras.Sequential(name="gym_model", layers=[ + # Input layer + keras.layers.Input(shape=(observations.shape[1],)), + # Hidden layers + keras.layers.Dense(8, activation="tanh"), + # Output layer + keras.layers.Dense(2, activation="softmax"), # 2 outputs because we have 2 actions in the cart pole problem + ]) + + + model.summary() # TODO: Prepare the model for training using the `model.compile` method. - model.compile(...) + model.compile( + loss=keras.losses.CategoricalCrossentropy(label_smoothing=0.1), + optimizer=keras.optimizers.Adam(learning_rate=0.009), + metrics=["accuracy"], + ) tb_callback = TorchTensorBoardCallback(args.logdir) + labels = keras.ops.one_hot(labels,num_classes=2) model.fit(observations, labels, batch_size=args.batch_size, epochs=args.epochs, callbacks=[tb_callback]) # Save the model, without the optimizer state. diff --git a/labs/02/mnist_training.py b/labs/02/mnist_training.py index 6655133..116ae98 100644 --- a/labs/02/mnist_training.py +++ b/labs/02/mnist_training.py @@ -11,6 +11,11 @@ from mnist import MNIST +# Jonas Glerup Røssum +# 31a0a96a-c590-4486-b194-f72765b2ce25 +# Xiao Wang +# 91d4d1d7-b800-4765-96b9-df098ac36a66 + parser = argparse.ArgumentParser() # These arguments will be set appropriately by ReCodEx, even if you change them. parser.add_argument("--batch_size", default=50, type=int, help="Batch size.") @@ -107,8 +112,34 @@ def main(args: argparse.Namespace) -> dict[str, float]: # in `model.optimizer._learning_rate` if needed), so after training, the learning rate # should be `args.learning_rate_final`. + optimizer = None + lr, momen, decay, final_lr, epochs = args.learning_rate, args.momentum, args.decay, args.learning_rate_final, args.epochs + if decay: + if not final_lr: + print("Please define a final learning rate!") + else: + steps = mnist.train.size/args.batch_size*epochs + init_lr = args.learning_rate + if decay == "linear": + lr = keras.optimizers.schedules.PolynomialDecay(initial_learning_rate=init_lr, decay_steps=steps, end_learning_rate=final_lr) + elif decay == "exponential": + decay_rate = final_lr/init_lr + lr = keras.optimizers.schedules.ExponentialDecay(initial_learning_rate=init_lr, decay_steps=steps, decay_rate=decay_rate) + elif decay == "cosine": + alpha = final_lr/init_lr + lr = keras.optimizers.schedules.CosineDecay(initial_learning_rate=init_lr, decay_steps=steps, alpha=alpha) + + if args.optimizer == 'SGD': + if momen: + optimizer = keras.optimizers.SGD(learning_rate=lr, momentum=momen, nesterov=True) + else: + optimizer = keras.optimizers.SGD(learning_rate=lr) + elif args.optimizer =="Adam": + optimizer = keras.optimizers.Adam(learning_rate=lr) + + model.compile( - optimizer=..., + optimizer=optimizer, loss=keras.losses.SparseCategoricalCrossentropy(), metrics=[keras.metrics.SparseCategoricalAccuracy("accuracy")], ) @@ -121,6 +152,10 @@ def main(args: argparse.Namespace) -> dict[str, float]: validation_data=(mnist.dev.data["images"], mnist.dev.data["labels"]), callbacks=[tb_callback], ) + model.summary() + + if decay: + print("Next learning rate to be used:", model.optimizer.learning_rate.item()) # Return development metrics for ReCodEx to validate. return {metric: values[-1] for metric, values in logs.history.items() if metric.startswith("val_")} diff --git a/labs/02/sgd_backpropagation.ps1 b/labs/02/sgd_backpropagation.ps1 new file mode 100644 index 0000000..f613710 --- /dev/null +++ b/labs/02/sgd_backpropagation.ps1 @@ -0,0 +1,50 @@ +# Examples: +# ../../.venv/Scripts/python sgd_backpropagation.py --batch_size=64 --hidden_layer=20 --learning_rate=0.1 +# Dev accuracy after epoch 1 is 93.30 +# Dev accuracy after epoch 2 is 94.38 +# Dev accuracy after epoch 3 is 95.16 +# Dev accuracy after epoch 4 is 95.50 +# Dev accuracy after epoch 5 is 95.96 +# Dev accuracy after epoch 6 is 96.04 +# Dev accuracy after epoch 7 is 95.82 +# Dev accuracy after epoch 8 is 95.92 +# Dev accuracy after epoch 9 is 95.96 +# Dev accuracy after epoch 10 is 96.16 +# Test accuracy after epoch 10 is 95.26 + +# ../../.venv/Scripts/python sgd_backpropagation.py --batch_size=100 --hidden_layer=32 --learning_rate=0.2 +# Dev accuracy after epoch 1 is 93.64 +# Dev accuracy after epoch 2 is 94.80 +# Dev accuracy after epoch 3 is 95.56 +# Dev accuracy after epoch 4 is 95.98 +# Dev accuracy after epoch 5 is 96.24 +# Dev accuracy after epoch 6 is 96.74 +# Dev accuracy after epoch 7 is 96.52 +# Dev accuracy after epoch 8 is 96.54 +# Dev accuracy after epoch 9 is 97.04 +# Dev accuracy after epoch 10 is 97.02 +# Test accuracy after epoch 10 is 96.16 + +# Tests: +../../.venv/Scripts/python sgd_backpropagation.py --epochs=2 --batch_size=64 --hidden_layer=20 --learning_rate=0.1 +# Expected +# Dev accuracy after epoch 1 is 93.30 +# Dev accuracy after epoch 2 is 94.38 +# Test accuracy after epoch 2 is 93.15 + +# Actual +# Dev accuracy after epoch 1 is 92.98 +# Dev accuracy after epoch 2 is 93.98 +# Test accuracy after epoch 2 is 92.73 + + +../../.venv/Scripts/python sgd_backpropagation.py --epochs=2 --batch_size=100 --hidden_layer=32 --learning_rate=0.2 +# Expected: +# Dev accuracy after epoch 1 is 93.64 +# Dev accuracy after epoch 2 is 94.80 +# Test accuracy after epoch 2 is 93.54 + +# Actual: +# Dev accuracy after epoch 1 is 94.16 +# Dev accuracy after epoch 2 is 94.98 +# Test accuracy after epoch 2 is 93.56 diff --git a/labs/02/sgd_backpropagation.py b/labs/02/sgd_backpropagation.py index cff312a..e3cfacf 100644 --- a/labs/02/sgd_backpropagation.py +++ b/labs/02/sgd_backpropagation.py @@ -3,7 +3,10 @@ import datetime import os import re -os.environ.setdefault("KERAS_BACKEND", "torch") # Use PyTorch backend unless specified otherwise + +os.environ.setdefault( + "KERAS_BACKEND", "torch" +) # Use PyTorch backend unless specified otherwise import keras import numpy as np @@ -12,15 +15,26 @@ from mnist import MNIST +# Jonas Glerup Røssum +# 31a0a96a-c590-4486-b194-f72765b2ce25 +# Xiao Wang +# 91d4d1d7-b800-4765-96b9-df098ac36a66 + parser = argparse.ArgumentParser() # These arguments will be set appropriately by ReCodEx, even if you change them. parser.add_argument("--batch_size", default=50, type=int, help="Batch size.") parser.add_argument("--epochs", default=10, type=int, help="Number of epochs.") -parser.add_argument("--hidden_layer", default=100, type=int, help="Size of the hidden layer.") +parser.add_argument( + "--hidden_layer", default=100, type=int, help="Size of the hidden layer." +) parser.add_argument("--learning_rate", default=0.1, type=float, help="Learning rate.") -parser.add_argument("--recodex", default=False, action="store_true", help="Evaluation in ReCodEx.") +parser.add_argument( + "--recodex", default=False, action="store_true", help="Evaluation in ReCodEx." +) parser.add_argument("--seed", default=42, type=int, help="Random seed.") -parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.") +parser.add_argument( + "--threads", default=1, type=int, help="Maximum number of threads to use." +) # If you add more arguments, ReCodEx will keep them with your default values. @@ -30,29 +44,57 @@ def __init__(self, args: argparse.Namespace) -> None: self._args = args self._W1 = keras.Variable( - keras.random.normal([MNIST.W * MNIST.H * MNIST.C, args.hidden_layer], stddev=0.1, seed=args.seed), + keras.random.normal( + [MNIST.W * MNIST.H * MNIST.C, args.hidden_layer], + stddev=0.1, + seed=args.seed, + ), trainable=True, ) self._b1 = keras.Variable(keras.ops.zeros([args.hidden_layer]), trainable=True) - # TODO: Create variables: + # Create variables: # - _W2, which is a trainable variable of size `[args.hidden_layer, MNIST.LABELS]`, # initialized to `keras.random.normal` value `with stddev=0.1` and `seed=args.seed`, # - _b2, which is a trainable variable of size `[MNIST.LABELS]` initialized to zeros - ... + self._W2 = keras.Variable( + keras.random.normal( + [args.hidden_layer, MNIST.LABELS], stddev=0.1, seed=args.seed + ), + trainable=True, + ) + + self._b2 = keras.Variable(keras.ops.zeros([MNIST.LABELS]), trainable=True) def predict(self, inputs: torch.Tensor) -> torch.Tensor: - # TODO: Define the computation of the network. Notably: + # Define the computation of the network. Notably: # - start by casting the input byte image to `float32` with `keras.ops.cast` + + cast_inputs = keras.ops.cast(inputs, dtype="float32") + # - then divide the tensor by 255 to normalize it to the `[0, 1]` range + + normalized_inputs = cast_inputs / 255 + # - then reshape it to the shape `[inputs.shape[0], -1]`. # The -1 is a wildcard which is computed so that the number # of elements before and after the reshape is preserved. + + reshaped_inputs = keras.ops.reshape(normalized_inputs, [inputs.shape[0], -1]) + # - then multiply it by `self._W1` and then add `self._b1` # - apply `keras.ops.tanh` + + hidden_layer_output = keras.ops.tanh( + keras.ops.matmul(reshaped_inputs, self._W1) + self._b1 + ) + # - multiply the result by `self._W2` and then add `self._b2` + + hidden_layer_output = keras.ops.matmul(hidden_layer_output, self._W2) + self._b2 + # - finally apply `keras.ops.softmax` and return the result - return ... + return keras.ops.softmax(hidden_layer_output) def train_epoch(self, dataset: MNIST.Dataset) -> None: for batch in dataset.batches(self._args.batch_size): @@ -62,49 +104,54 @@ def train_epoch(self, dataset: MNIST.Dataset) -> None: # Size of the batch is `self._args.batch_size`, except for the last, which # might be smaller. - # TODO: Compute the predicted probabilities of the batch images using `self.predict` - probabilities = ... + # Compute the predicted probabilities of the batch images using `self.predict` + probabilities = self.predict(batch["images"]) - # TODO: Manually compute the loss: + # Manually compute the loss: # - For every batch example, the loss is the categorical crossentropy of the # predicted probabilities and the gold label. To compute the crossentropy, you can # - either use `keras.ops.one_hot` to obtain one-hot encoded gold labels, # - or suitably use `keras.ops.take_along_axis` to "index" the predicted probabilities. # - Finally, compute the average across the batch examples. - loss = ... - + loss = keras.ops.mean( + keras.ops.categorical_crossentropy( + keras.ops.one_hot(batch["labels"], MNIST.LABELS), probabilities + ) + ) # We create a list of all variables. Note that a `keras.Model/Layer` automatically # tracks owned variables, so we could also use `self.trainable_variables` # (or even `self.variables`, which is useful for loading/saving). variables = [self._W1, self._b1, self._W2, self._b2] + # print("w1, b1, w2, b2:", self._W1.shape, self._b1.shape, self._W2.shape, self._b2.shape) - # TODO: Compute the gradient of the loss with respect to variables using + # Compute the gradient of the loss with respect to variables using # backpropagation algorithm by # - first resetting the gradients of all variables to zero with `self.zero_grad()`, # - then calling `loss.backward()`. - ... + self.zero_grad() + loss.backward() gradients = [variable.value.grad for variable in variables] + # print("gradients:", gradients) with torch.no_grad(): for variable, gradient in zip(variables, gradients): - # TODO: Perform the SGD update with learning rate `self._args.learning_rate` + # Perform the SGD update with learning rate `self._args.learning_rate` # for the variable and computed gradient. You can modify the # variable value with `variable.assign` or in this case the more # efficient `variable.assign_sub`. - ... + variable.assign_sub(self._args.learning_rate * gradient) def evaluate(self, dataset: MNIST.Dataset) -> float: # Compute the accuracy of the model prediction correct = 0 for batch in dataset.batches(self._args.batch_size): - # TODO: Compute the probabilities of the batch images using `self.predict` + # Compute the probabilities of the batch images using `self.predict` # and convert them to Numpy with `keras.ops.convert_to_numpy`. - probabilities = ... + probabilities = keras.ops.convert_to_numpy(self.predict(batch["images"])) - # TODO: Evaluate how many batch examples were predicted + # Evaluate how many batch examples were predicted # correctly and increase `correct` variable accordingly. - correct += ... - + correct += np.sum(np.argmax(probabilities, axis=-1) == batch["labels"]) return correct / dataset.size @@ -116,11 +163,19 @@ def main(args: argparse.Namespace) -> tuple[float, float]: torch.set_num_interop_threads(args.threads) # Create logdir name - args.logdir = os.path.join("logs", "{}-{}-{}".format( - os.path.basename(globals().get("__file__", "notebook")), - datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), - ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items()))) - )) + args.logdir = os.path.join( + "logs", + "{}-{}-{}".format( + os.path.basename(globals().get("__file__", "notebook")), + datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), + ",".join( + ( + "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) + for k, v in sorted(vars(args).items()) + ) + ), + ), + ) # Load data mnist = MNIST() @@ -132,16 +187,23 @@ def main(args: argparse.Namespace) -> tuple[float, float]: model = Model(args) for epoch in range(args.epochs): - # TODO: Run the `train_epoch` with `mnist.train` dataset - - # TODO: Evaluate the dev data using `evaluate` on `mnist.dev` dataset - accuracy = ... - print("Dev accuracy after epoch {} is {:.2f}".format(epoch + 1, 100 * accuracy), flush=True) + # Run the `train_epoch` with `mnist.train` dataset + model.train_epoch(mnist.train) + + # Evaluate the dev data using `evaluate` on `mnist.dev` dataset + accuracy = model.evaluate(mnist.dev) + print( + "Dev accuracy after epoch {} is {:.2f}".format(epoch + 1, 100 * accuracy), + flush=True, + ) writer.add_scalar("dev/accuracy", 100 * accuracy, epoch + 1) - # TODO: Evaluate the test data using `evaluate` on `mnist.test` dataset - test_accuracy = ... - print("Test accuracy after epoch {} is {:.2f}".format(epoch + 1, 100 * test_accuracy), flush=True) + # Evaluate the test data using `evaluate` on `mnist.test` dataset + test_accuracy = model.evaluate(mnist.test) + print( + "Test accuracy after epoch {} is {:.2f}".format(epoch + 1, 100 * test_accuracy), + flush=True, + ) writer.add_scalar("test/accuracy", 100 * test_accuracy, epoch + 1) # Return dev and test accuracies for ReCodEx to validate. diff --git a/labs/02/sgd_manual.py b/labs/02/sgd_manual.py index 422d3e9..f023328 100644 --- a/labs/02/sgd_manual.py +++ b/labs/02/sgd_manual.py @@ -12,6 +12,11 @@ from mnist import MNIST +# Jonas Glerup Røssum +# 31a0a96a-c590-4486-b194-f72765b2ce25 +# Xiao Wang +# 91d4d1d7-b800-4765-96b9-df098ac36a66 + parser = argparse.ArgumentParser() # These arguments will be set appropriately by ReCodEx, even if you change them. parser.add_argument("--batch_size", default=50, type=int, help="Batch size.") @@ -39,7 +44,9 @@ def __init__(self, args: argparse.Namespace) -> None: # - _W2, which is a trainable variable of size `[args.hidden_layer, MNIST.LABELS]`, # initialized to `keras.random.normal` value `with stddev=0.1` and `seed=args.seed`, # - _b2, which is a trainable variable of size `[MNIST.LABELS]` initialized to zeros - ... + self._W2 = keras.Variable(keras.random.normal([args.hidden_layer, MNIST.LABELS], stddev=0.1, seed=args.seed), + trainable=True) + self._b2 = keras.Variable(keras.ops.zeros([MNIST.LABELS]), trainable=True) def predict(self, inputs: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]: # TODO(sgd_backpropagation): Define the computation of the network. Notably: @@ -56,7 +63,14 @@ def predict(self, inputs: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, tor # TODO: In order to support manual gradient computation, you should # return not only the output layer, but also the hidden layer after applying # tanh, and the input layer after reshaping. - return ..., ..., ... + input = keras.ops.cast(inputs, dtype="float32") + input = torch.div(input, 255) + input = input.reshape([input.shape[0], -1]) + hidden_input = keras.ops.matmul(input,self._W1) + self._b1 + hidden_output = keras.ops.tanh(hidden_input) + sm_input = keras.ops.matmul(hidden_output,self._W2) + self._b2 + output = keras.ops.softmax(sm_input) + return input, hidden_output, output def train_epoch(self, dataset: MNIST.Dataset) -> None: for batch in dataset.batches(self._args.batch_size): @@ -72,7 +86,7 @@ def train_epoch(self, dataset: MNIST.Dataset) -> None: # # Compute the input layer, hidden layer and output layer # of the batch images using `self.predict`. - + input_layer, hidden_layer, probabilities = self.predict(torch.tensor(batch['images'])) # TODO: Compute the gradient of the loss with respect to all # variables. Note that the loss is computed as in `sgd_backpropagation`: # - For every batch example, the loss is the categorical crossentropy of the @@ -80,7 +94,6 @@ def train_epoch(self, dataset: MNIST.Dataset) -> None: # - either use `keras.ops.one_hot` to obtain one-hot encoded gold labels, # - or suitably use `keras.ops.take_along_axis` to "index" the predicted probabilities. # - Finally, compute the average across the batch examples. - # # During the gradient computation, you will need to compute # a batched version of a so-called outer product # `C[a, i, j] = A[a, i] * B[a, j]`, @@ -88,12 +101,30 @@ def train_epoch(self, dataset: MNIST.Dataset) -> None: # `A[:, :, np.newaxis] * B[:, np.newaxis, :]` # or with # `keras.ops.einsum("ai,aj->aij", A, B)`. + gold_labels = keras.ops.one_hot(batch['labels'], num_classes=MNIST.LABELS) + loss = torch.mean(keras.ops.categorical_crossentropy(gold_labels, probabilities)) + + gd_loss = probabilities - gold_labels + gd_b2 = gd_loss + #print("loss gradient, hidden_layer, input", gd_b2.shape, hidden_layer.shape, input_layer.shape) + gd_w2 = keras.ops.einsum("ai,aj->aij", hidden_layer, gd_loss) + gd_h = keras.ops.matmul(gd_loss, keras.ops.transpose(self._W2)) + hidden_input = keras.ops.matmul(input_layer,self._W1) + self._b1 + gd_h_i = gd_h*(1-keras.ops.power(keras.ops.tanh(hidden_input), 2)) + gd_b1 = gd_h_i + gd_w1 = keras.ops.einsum("ai,aj->aij", input_layer, gd_h_i) + #print("gd_w2, gd_w1, gd_b2, gd_b1:", gd_w2.shape, gd_w1.shape, gd_b2.shape, gd_b1.shape) # TODO(sgd_backpropagation): Perform the SGD update with learning rate `self._args.learning_rate` # for the variable and computed gradient. You can modify the # variable value with `variable.assign` or in this case the more # efficient `variable.assign_sub`. - ... + variables = [self._W1, self._b1, self._W2, self._b2] + gradients = [gd_w1, gd_b1, gd_w2, gd_b2] + with torch.no_grad(): + for variable, gradient in zip(variables, gradients): + variable.assign_sub(self._args.learning_rate*keras.ops.mean(gradient, axis=0)) + def evaluate(self, dataset: MNIST.Dataset) -> float: # Compute the accuracy of the model prediction @@ -101,11 +132,11 @@ def evaluate(self, dataset: MNIST.Dataset) -> float: for batch in dataset.batches(self._args.batch_size): # TODO: Compute the probabilities of the batch images using `self.predict` # and convert them to Numpy with `keras.ops.convert_to_numpy`. - probabilities = ... + probabilities = keras.ops.convert_to_numpy(self.predict(torch.tensor(batch['images']))[2]) # TODO(sgd_backpropagation): Evaluate how many batch examples were predicted # correctly and increase `correct` variable accordingly. - correct += ... + correct += np.sum(np.argmax(probabilities, axis=-1) == batch["labels"]) return correct / dataset.size @@ -135,14 +166,14 @@ def main(args: argparse.Namespace) -> tuple[float, float]: for epoch in range(args.epochs): # TODO: Run the `train_epoch` with `mnist.train` dataset - + model.train_epoch(mnist.train) # TODO: Evaluate the dev data using `evaluate` on `mnist.dev` dataset - accuracy = ... + accuracy = model.evaluate(mnist.dev) print("Dev accuracy after epoch {} is {:.2f}".format(epoch + 1, 100 * accuracy), flush=True) writer.add_scalar("dev/accuracy", 100 * accuracy, epoch + 1) # TODO: Evaluate the test data using `evaluate` on `mnist.test` dataset - test_accuracy = ... + test_accuracy = model.evaluate(mnist.test) print("Test accuracy after epoch {} is {:.2f}".format(epoch + 1, 100 * test_accuracy), flush=True) writer.add_scalar("test/accuracy", 100 * test_accuracy, epoch + 1) diff --git a/labs/02/test.ps1 b/labs/02/test.ps1 new file mode 100644 index 0000000..fa38f74 --- /dev/null +++ b/labs/02/test.ps1 @@ -0,0 +1 @@ +../../.venv/Scripts/python .\gym_cartpole.py && ../../.venv/Scripts/python .\gym_cartpole.py --evaluate diff --git a/labs/03/mnist_ensemble.ps1 b/labs/03/mnist_ensemble.ps1 new file mode 100644 index 0000000..526a6bd --- /dev/null +++ b/labs/03/mnist_ensemble.ps1 @@ -0,0 +1,2 @@ +python3 mnist_ensemble.py --epochs=1 --models=5 +python3 mnist_ensemble.py --epochs=1 --models=5 --hidden_layers=200 diff --git a/labs/03/mnist_ensemble.py b/labs/03/mnist_ensemble.py index ebffcf9..93bb2eb 100644 --- a/labs/03/mnist_ensemble.py +++ b/labs/03/mnist_ensemble.py @@ -7,6 +7,7 @@ import torch from mnist import MNIST +import numpy as np parser = argparse.ArgumentParser() # These arguments will be set appropriately by ReCodEx, even if you change them. @@ -54,11 +55,13 @@ def main(args: argparse.Namespace) -> tuple[list[float], list[float]]: print("Done") individual_accuracies, ensemble_accuracies = [], [] + model_predictions = [] for model in range(args.models): - # TODO: Compute the accuracy on the dev set for the individual `models[model]`. - individual_accuracy = ... + # Compute the accuracy on the dev set for the individual `models[model]`. + individual_accuracy = models[model].evaluate(mnist.dev.data["images"], mnist.dev.data["labels"])[1] + print(individual_accuracy) - # TODO: Compute the accuracy on the dev set for the ensemble `models[0:model+1]`. + # Compute the accuracy on the dev set for the ensemble `models[0:model+1]`. # # Generally you can choose one of the following approaches: # 1) Use Keras Functional API and construct a `keras.Model` averaging the models @@ -69,7 +72,17 @@ def main(args: argparse.Namespace) -> tuple[list[float], list[float]]: # need to construct Keras ensemble model at all, and instead call `model.predict` # on the individual models and average the results. To measure accuracy, # either do it completely manually or use `keras.metrics.SparseCategoricalAccuracy`. - ensemble_accuracy = ... + inputs = keras.Input(shape=(MNIST.W, MNIST.H, MNIST.C)) + ensemble_output = keras.layers.Average()([model(inputs) for model in models[0:model+1]]) + ensemble_model = keras.Model(inputs=inputs, outputs=ensemble_output) + + ensemble_model.compile( + optimizer=keras.optimizers.Adam(), + loss=keras.losses.SparseCategoricalCrossentropy(), + metrics=[keras.metrics.SparseCategoricalAccuracy(name="accuracy")], + ) + + ensemble_accuracy = ensemble_model.evaluate(mnist.dev.data["images"], mnist.dev.data["labels"])[1] # Store the accuracies individual_accuracies.append(individual_accuracy) diff --git a/labs/03/mnist_regularization.ps1 b/labs/03/mnist_regularization.ps1 new file mode 100644 index 0000000..2a61e88 --- /dev/null +++ b/labs/03/mnist_regularization.ps1 @@ -0,0 +1,24 @@ +# Run script from root repo directory + +.\.venv\Scripts\python labs\03\mnist_regularization.py --epochs=1 --dropout=0.3 +.\.venv\Scripts\python labs\03\mnist_regularization.py --epochs=1 --dropout=0.5 --hidden_layers 300 300 +.\.venv\Scripts\python labs\03\mnist_regularization.py --epochs=1 --weight_decay=0.1 +.\.venv\Scripts\python labs\03\mnist_regularization.py --epochs=1 --weight_decay=0.3 +.\.venv\Scripts\python labs\03\mnist_regularization.py --epochs=1 --label_smoothing=0.1 +.\.venv\Scripts\python labs\03\mnist_regularization.py --epochs=1 --label_smoothing=0.3 + +# Expected +# accuracy: 0.5981 - loss: 1.2688 - val_accuracy: 0.9174 - val_loss: 0.3051 +# accuracy: 0.3429 - loss: 1.9163 - val_accuracy: 0.8826 - val_loss: 0.4937 +# accuracy: 0.7014 - loss: 1.0412 - val_accuracy: 0.9236 - val_loss: 0.2776 +# accuracy: 0.7006 - loss: 1.0429 - val_accuracy: 0.9232 - val_loss: 0.2801 +# accuracy: 0.7102 - loss: 1.3015 - val_accuracy: 0.9276 - val_loss: 0.7656 +# accuracy: 0.7113 - loss: 1.6854 - val_accuracy: 0.9332 - val_loss: 1.3709 + +# Actual +# accuracy: 0.6178 - loss: 1.2374 - val_accuracy: 0.9164 - val_loss: 0.3045 +# accuracy: 0.3412 - loss: 1.8919 - val_accuracy: 0.8818 - val_loss: 0.4794 +# accuracy: 0.6948 - loss: 1.0394 - val_accuracy: 0.9186 - val_loss: 0.2859 +# accuracy: 0.6947 - loss: 1.0410 - val_accuracy: 0.9184 - val_loss: 0.2885 +# accuracy: 0.6996 - loss: 1.3013 - val_accuracy: 0.9228 - val_loss: 0.7735 +# accuracy: 0.7102 - loss: 1.6879 - val_accuracy: 0.9284 - val_loss: 1.3739 diff --git a/labs/03/mnist_regularization.py b/labs/03/mnist_regularization.py index cd78fcf..0b2e5a2 100644 --- a/labs/03/mnist_regularization.py +++ b/labs/03/mnist_regularization.py @@ -3,7 +3,10 @@ import datetime import os import re -os.environ.setdefault("KERAS_BACKEND", "torch") # Use PyTorch backend unless specified otherwise + +os.environ.setdefault( + "KERAS_BACKEND", "torch" +) # Use PyTorch backend unless specified otherwise import keras import torch @@ -15,12 +18,20 @@ parser.add_argument("--batch_size", default=50, type=int, help="Batch size.") parser.add_argument("--dropout", default=0, type=float, help="Dropout regularization.") parser.add_argument("--epochs", default=30, type=int, help="Number of epochs.") -parser.add_argument("--hidden_layers", default=[400], nargs="*", type=int, help="Hidden layer sizes.") +parser.add_argument( + "--hidden_layers", default=[400], nargs="*", type=int, help="Hidden layer sizes." +) parser.add_argument("--label_smoothing", default=0, type=float, help="Label smoothing.") -parser.add_argument("--recodex", default=False, action="store_true", help="Evaluation in ReCodEx.") +parser.add_argument( + "--recodex", default=False, action="store_true", help="Evaluation in ReCodEx." +) parser.add_argument("--seed", default=42, type=int, help="Random seed.") -parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.") -parser.add_argument("--weight_decay", default=0, type=float, help="Weight decay strength.") +parser.add_argument( + "--threads", default=1, type=int, help="Maximum number of threads to use." +) +parser.add_argument( + "--weight_decay", default=0, type=float, help="Weight decay strength." +) # If you add more arguments, ReCodEx will keep them with your default values. @@ -32,7 +43,10 @@ def __init__(self, path): def writer(self, writer): if writer not in self._writers: import torch.utils.tensorboard - self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer)) + + self._writers[writer] = torch.utils.tensorboard.SummaryWriter( + os.path.join(self._path, writer) + ) return self._writers[writer] def add_logs(self, writer, logs, step): @@ -43,10 +57,24 @@ def add_logs(self, writer, logs, step): def on_epoch_end(self, epoch, logs=None): if logs: - if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer): - logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)} - self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1) - self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1) + if isinstance( + getattr(self.model, "optimizer", None), keras.optimizers.Optimizer + ): + logs = logs | { + "learning_rate": keras.ops.convert_to_numpy( + self.model.optimizer.learning_rate + ) + } + self.add_logs( + "train", + {k: v for k, v in logs.items() if not k.startswith("val_")}, + epoch + 1, + ) + self.add_logs( + "val", + {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, + epoch + 1, + ) def main(args: argparse.Namespace) -> dict[str, float]: @@ -57,16 +85,24 @@ def main(args: argparse.Namespace) -> dict[str, float]: torch.set_num_interop_threads(args.threads) # Create logdir name - args.logdir = os.path.join("logs", "{}-{}-{}".format( - os.path.basename(globals().get("__file__", "notebook")), - datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), - ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items()))) - )) + args.logdir = os.path.join( + "logs", + "{}-{}-{}".format( + os.path.basename(globals().get("__file__", "notebook")), + datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), + ",".join( + ( + "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) + for k, v in sorted(vars(args).items()) + ) + ), + ), + ) # Load data mnist = MNIST(size={"train": 5_000}) - # TODO: Incorporate dropout to the model below. Namely, add + # Incorporate dropout to the model below. Namely, add # a `keras.layers.Dropout` layer with `args.dropout` rate after # the `Flatten` layer and after each `Dense` hidden layer (but not after # the output `Dense` layer). @@ -74,11 +110,15 @@ def main(args: argparse.Namespace) -> dict[str, float]: model = keras.Sequential() model.add(keras.layers.Rescaling(1 / 255)) model.add(keras.layers.Flatten()) + model.add(keras.layers.Dropout(args.dropout)) + for hidden_layer in args.hidden_layers: model.add(keras.layers.Dense(hidden_layer, activation="relu")) + model.add(keras.layers.Dropout(rate=args.dropout)) + model.add(keras.layers.Dense(MNIST.LABELS, activation="softmax")) - # TODO: Implement label smoothing with the given `args.label_smoothing` strength. + # Implement label smoothing with the given `args.label_smoothing` strength. # You need to change the `SparseCategorical{Crossentropy,Accuracy}` to # `Categorical{Crossentropy,Accuracy}`, because `label_smoothing` is supported # only by the `CategoricalCrossentropy`. That means you also need to modify @@ -86,29 +126,52 @@ def main(args: argparse.Namespace) -> dict[str, float]: # of the gold class to a full categorical distribution (you can use either NumPy, # or there is a helper method also in the `keras.utils` module). - # TODO: Create a `keras.optimizers.AdamW`, using the default learning + # Create a `keras.optimizers.AdamW`, using the default learning # rate and a weight decay of strength `args.weight_decay`. Then call the # `exclude_from_weight_decay` method to specify that all variables with "bias" # in their name should not be decayed. - optimizer = ... - - model.compile( - optimizer=optimizer, - loss=keras.losses.SparseCategoricalCrossentropy(), - metrics=[keras.metrics.SparseCategoricalAccuracy(name="accuracy")], - ) + optimizer = keras.optimizers.AdamW(weight_decay=args.weight_decay) + optimizer.exclude_from_weight_decay(var_names=["bias"]) + + s = args.label_smoothing != 0 + + if s: + model.compile( + optimizer=optimizer, + loss=keras.losses.CategoricalCrossentropy(label_smoothing=args.label_smoothing), + metrics=[keras.metrics.CategoricalAccuracy(name="accuracy")], + ) + else: + model.compile( + optimizer=optimizer, + loss=keras.losses.SparseCategoricalCrossentropy(), + metrics=[keras.metrics.SparseCategoricalAccuracy(name="accuracy")], + ) tb_callback = TorchTensorBoardCallback(args.logdir) logs = model.fit( - mnist.train.data["images"], mnist.train.data["labels"], - batch_size=args.batch_size, epochs=args.epochs, - validation_data=(mnist.dev.data["images"], mnist.dev.data["labels"]), + mnist.train.data["images"], + keras.utils.to_categorical( + mnist.train.data["labels"], num_classes=mnist.LABELS + ) if s else mnist.train.data["labels"], + batch_size=args.batch_size, + epochs=args.epochs, + validation_data=( + mnist.dev.data["images"], + keras.utils.to_categorical( + mnist.dev.data["labels"], num_classes=mnist.LABELS + ) if s else mnist.dev.data["labels"], + ), callbacks=[tb_callback], ) # Return development metrics for ReCodEx to validate. - return {metric: values[-1] for metric, values in logs.history.items() if metric.startswith("val_")} + return { + metric: values[-1] + for metric, values in logs.history.items() + if metric.startswith("val_") + } if __name__ == "__main__": diff --git a/labs/03/uppercase.py b/labs/03/uppercase.py index c975e3f..c83d5c5 100644 --- a/labs/03/uppercase.py +++ b/labs/03/uppercase.py @@ -10,16 +10,16 @@ from uppercase_data import UppercaseData -# TODO: Set reasonable values for the hyperparameters, especially for +# Set reasonable values for the hyperparameters, especially for # `alphabet_size`, `batch_size`, `epochs`, and `window`. # Also, you can set the number of threads to 0 to use all your CPU cores. parser = argparse.ArgumentParser() -parser.add_argument("--alphabet_size", default=..., type=int, help="If given, use this many most frequent chars.") -parser.add_argument("--batch_size", default=..., type=int, help="Batch size.") -parser.add_argument("--epochs", default=..., type=int, help="Number of epochs.") +parser.add_argument("--alphabet_size", default=70, type=int, help="If given, use this many most frequent chars.") +parser.add_argument("--batch_size", default=1024, type=int, help="Batch size.") +parser.add_argument("--epochs", default=2, type=int, help="Number of epochs.") parser.add_argument("--seed", default=42, type=int, help="Random seed.") -parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.") -parser.add_argument("--window", default=..., type=int, help="Window size to use.") +parser.add_argument("--threads", default=0, type=int, help="Maximum number of threads to use.") +parser.add_argument("--window", default=4, type=int, help="Window size to use.") class TorchTensorBoardCallback(keras.callbacks.Callback): @@ -64,7 +64,7 @@ def main(args: argparse.Namespace) -> None: # Load data uppercase_data = UppercaseData(args.window, args.alphabet_size) - # TODO: Implement a suitable model, optionally including regularization, select + # Implement a suitable model, optionally including regularization, select # good hyperparameters and train the model. # # The inputs are _windows_ of fixed size (`args.window` characters on the left, @@ -79,16 +79,34 @@ def main(args: argparse.Namespace) -> None: # You can then flatten the one-hot encoded windows and follow with a dense layer. # - Alternatively, you can use `keras.layers.Embedding` (which is an efficient # implementation of one-hot encoding followed by a Dense layer) and flatten afterwards. - model = ... + model = keras.Sequential([ + keras.layers.InputLayer(shape=[2 * args.window + 1], dtype="int32"), + keras.layers.CategoryEncoding(len(uppercase_data.train.alphabet)), + keras.layers.Embedding(len(uppercase_data.train.alphabet), 8), + + keras.layers.Flatten(), + keras.layers.Dense(64, activation='relu'), + keras.layers.Dropout(rate=0.5), + keras.layers.Dense(1, activation='sigmoid') # Sigmoid activation function for binary classification + ]) + + # Generate correctly capitalized test set. + + predictions = model.predict(uppercase_data.test.data, batch_size=args.batch_size) - # TODO: Generate correctly capitalized test set. # Use `uppercase_data.test.text` as input, capitalize suitable characters, # and write the result to predictions_file (which is # `uppercase_test.txt` in the `args.logdir` directory). os.makedirs(args.logdir, exist_ok=True) with open(os.path.join(args.logdir, "uppercase_test.txt"), "w", encoding="utf-8") as predictions_file: - ... - + new_text = "" + for pred, word in zip(predictions, uppercase_data.test.text): + if pred > .5: + new_word = word.upper() + new_text += new_word + else: + new_text + predictions_file.write(new_text) if __name__ == "__main__": args = parser.parse_args([] if "__file__" not in globals() else None) diff --git a/labs/04/cifar10.py b/labs/04/cifar10.py index 0ed0533..ec06755 100644 --- a/labs/04/cifar10.py +++ b/labs/04/cifar10.py @@ -33,7 +33,8 @@ def dataset(self, transform: Callable[[dict[str, np.ndarray]], Any] | None = Non return CIFAR10.TorchDataset(self, transform) class TorchDataset(torch.utils.data.Dataset): - def __init__(self, dataset: "Dataset", transform: Callable[[dict[str, np.ndarray]], Any] | None) -> None: + def __init__(self, dataset: "CIFAR10.Dataset", + transform: Callable[[dict[str, np.ndarray]], Any] | None) -> None: self._dataset = dataset self._transform = transform diff --git a/labs/04/cifar_competition.ps1 b/labs/04/cifar_competition.ps1 new file mode 100644 index 0000000..0d919fe --- /dev/null +++ b/labs/04/cifar_competition.ps1 @@ -0,0 +1 @@ +clear && python .\cifar_competition.py diff --git a/labs/04/cifar_competition.py b/labs/04/cifar_competition.py index 0541de8..be29019 100644 --- a/labs/04/cifar_competition.py +++ b/labs/04/cifar_competition.py @@ -3,7 +3,10 @@ import datetime import os import re -os.environ.setdefault("KERAS_BACKEND", "torch") # Use PyTorch backend unless specified otherwise + +os.environ.setdefault( + "KERAS_BACKEND", "torch" +) # Use PyTorch backend unless specified otherwise import keras import numpy as np @@ -11,13 +14,23 @@ from cifar10 import CIFAR10 -# TODO: Define reasonable defaults and optionally more parameters. +# Define reasonable defaults and optionally more parameters. # Also, you can set the number of threads to 0 to use all your CPU cores. parser = argparse.ArgumentParser() -parser.add_argument("--batch_size", default=..., type=int, help="Batch size.") -parser.add_argument("--epochs", default=..., type=int, help="Number of epochs.") +parser.add_argument("--batch_size", default=128, type=int, help="Batch size.") +parser.add_argument("--epochs", default=30, type=int, help="Number of epochs.") +# parser.add_argument("--epochs", default=200, type=int, help="Number of epochs.") +parser.add_argument("--learning_rate", default=0.001, help="Initial learning rate") +parser.add_argument( + "--weight_decay", default=1e-4, type=float, help="L2 regularization weight decay." +) +parser.add_argument( + "--label_smoothing", default=0.1, type=float, help="Label smoothing." +) parser.add_argument("--seed", default=42, type=int, help="Random seed.") -parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.") +parser.add_argument( + "--threads", default=1, type=int, help="Maximum number of threads to use." +) class TorchTensorBoardCallback(keras.callbacks.Callback): @@ -28,7 +41,10 @@ def __init__(self, path): def writer(self, writer): if writer not in self._writers: import torch.utils.tensorboard - self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer)) + + self._writers[writer] = torch.utils.tensorboard.SummaryWriter( + os.path.join(self._path, writer) + ) return self._writers[writer] def add_logs(self, writer, logs, step): @@ -39,13 +55,51 @@ def add_logs(self, writer, logs, step): def on_epoch_end(self, epoch, logs=None): if logs: - if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer): - logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)} - self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1) - self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1) - + if isinstance( + getattr(self.model, "optimizer", None), keras.optimizers.Optimizer + ): + logs = logs | { + "learning_rate": keras.ops.convert_to_numpy( + self.model.optimizer.learning_rate + ) + } + self.add_logs( + "train", + {k: v for k, v in logs.items() if not k.startswith("val_")}, + epoch + 1, + ) + self.add_logs( + "val", + {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, + epoch + 1, + ) + +def create_res(input_layer, filters, kernel_size, strides): + h = keras.layers.Conv2D( + filters=filters, + kernel_size=kernel_size, + strides=strides, + padding="same", + activation=None, + )(input_layer) + + h = keras.layers.BatchNormalization()(h) + h = keras.layers.Activation("relu")(h) + h = keras.layers.Conv2D( + filters=filters, + kernel_size=kernel_size, + strides=1, + padding="same", + activation=None, + use_bias=False, + )(h) + h = keras.layers.BatchNormalization()(h) + h = keras.layers.Add()([input_layer, h]) + h = keras.layers.Activation("relu")(h) + return h def main(args: argparse.Namespace) -> None: + # Set the random seed and the number of threads. keras.utils.set_random_seed(args.seed) if args.threads: @@ -53,23 +107,75 @@ def main(args: argparse.Namespace) -> None: torch.set_num_interop_threads(args.threads) # Create logdir name - args.logdir = os.path.join("logs", "{}-{}-{}".format( - os.path.basename(globals().get("__file__", "notebook")), - datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), - ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items()))) - )) + args.logdir = os.path.join( + "logs", + "{}-{}-{}".format( + os.path.basename(globals().get("__file__", "notebook")), + datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), + ",".join( + ( + "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) + for k, v in sorted(vars(args).items()) + ) + ), + ), + ) # Load data cifar = CIFAR10() - # TODO: Create the model and train it - model = ... + # Create the model and train it + inputs = keras.Input(shape=cifar.train.data["images"][0].shape) + h = keras.layers.Rescaling(1 / 255)(inputs) + h = keras.layers.Conv2D(64, 3, 1, "same", activation="relu")(h) + h = create_res(h, 64, 3, 1) + h = keras.layers.MaxPool2D(2)(h) + h = create_res(h, 64, 3, 1) + h = keras.layers.MaxPool2D(2)(h) + h = keras.layers.Dropout(0.2)(h) + h = create_res(h, 64, 3, 1) + h = keras.layers.Flatten()(h) + h = keras.layers.Dropout(0.2)(h) + h = keras.layers.Dense(200, activation="relu")(h) + outputs = keras.layers.Dense(len(CIFAR10.LABELS), activation="softmax")(h) + + model = keras.Model(inputs=inputs, outputs=outputs) + + model.summary() + + + lr_optimizer = keras.optimizers.schedules.CosineDecay( + initial_learning_rate=args.learning_rate, + decay_steps=len(cifar.train.data["images"] / args.batch_size * args.epochs) + ) + + model.compile( + optimizer=keras.optimizers.Adam( + learning_rate=lr_optimizer, + weight_decay=args.weight_decay), + loss=keras.losses.SparseCategoricalCrossentropy(), + metrics=[keras.metrics.SparseCategoricalAccuracy(name="accuracy")], + ) + + model.fit( + cifar.train.data["images"], + cifar.train.data["labels"], + batch_size=args.batch_size, + epochs=args.epochs, + + ) + + model.save(os.path.join(args.logdir, "cifar.h5"), include_optimizer=False) # Generate test set annotations, but in `args.logdir` to allow parallel execution. os.makedirs(args.logdir, exist_ok=True) - with open(os.path.join(args.logdir, "cifar_competition_test.txt"), "w", encoding="utf-8") as predictions_file: - # TODO: Perform the prediction on the test data. - for probs in model.predict(...): + with open( + os.path.join(args.logdir, "cifar_competition_test.txt"), "w", encoding="utf-8" + ) as predictions_file: + # Perform the prediction on the test data. + for probs in model.predict( + cifar.test.data["images"], batch_size=args.batch_size + ): print(np.argmax(probs), file=predictions_file) diff --git a/labs/04/mnist_cnn results.txt b/labs/04/mnist_cnn results.txt new file mode 100644 index 0000000..63271eb --- /dev/null +++ b/labs/04/mnist_cnn results.txt @@ -0,0 +1,29 @@ +👉 TEST 1 +python3 mnist_cnn.py --epochs=1 --cnn=F,H-100 +1100/1100 ━━━━━━━━━━━━━━━━━━━━ 14s 12ms/step - accuracy: 0.8499 - loss: 0.5317 - val_accuracy: 0.9618 - val_loss: 0.1400 +1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.8503 - loss: 0.5286 - val_accuracy: 0.9604 - val_loss: 0.1432 + +👉 TEST 2 +python3 mnist_cnn.py --epochs=1 --cnn=F,H-100,D-0.5 +1100/1100 ━━━━━━━━━━━━━━━━━━━━ 14s 13ms/step - accuracy: 0.7662 - loss: 0.7543 - val_accuracy: 0.9576 - val_loss: 0.1612 +1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.7706 - loss: 0.7444 - val_accuracy: 0.9572 - val_loss: 0.1606 + +👉 TEST 3 +python3 mnist_cnn.py --epochs=1 --cnn=M-5-2,F,H-50 +1100/1100 ━━━━━━━━━━━━━━━━━━━━ 14s 13ms/step - accuracy: 0.6706 - loss: 1.0717 - val_accuracy: 0.8814 - val_loss: 0.3802 +1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.6630 - loss: 1.0703 - val_accuracy: 0.8798 - val_loss: 0.3894 + +👉 TEST 4 +python3 mnist_cnn.py --epochs=1 --cnn=C-8-3-5-same,C-8-3-2-valid,F,H-50 +1100/1100 ━━━━━━━━━━━━━━━━━━━━ 18s 16ms/step - accuracy: 0.5799 - loss: 1.2751 - val_accuracy: 0.8898 - val_loss: 0.3616 +1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.5898 - loss: 1.2535 - val_accuracy: 0.8774 - val_loss: 0.4079 + +👉 TEST 5 +python3 mnist_cnn.py --epochs=1 --cnn=CB-6-3-5-valid,F,H-32 +1100/1100 ━━━━━━━━━━━━━━━━━━━━ 20s 17ms/step - accuracy: 0.6976 - loss: 0.9518 - val_accuracy: 0.9228 - val_loss: 0.2614 +1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.6822 - loss: 1.0011 - val_accuracy: 0.9284 - val_loss: 0.2537 + +👉 TEST 6 +python3 mnist_cnn.py --epochs=1 --cnn=CB-8-3-5-valid,R-[CB-8-3-1-same,CB-8-3-1-same],F,H-50 +1100/1100 ━━━━━━━━━━━━━━━━━━━━ 31s 27ms/step - accuracy: 0.7476 - loss: 0.7841 - val_accuracy: 0.9370 - val_loss: 0.2037 +1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.7562 - loss: 0.7717 - val_accuracy: 0.9486 - val_loss: 0.1734 diff --git a/labs/04/mnist_cnn.ps1 b/labs/04/mnist_cnn.ps1 new file mode 100644 index 0000000..bf78797 --- /dev/null +++ b/labs/04/mnist_cnn.ps1 @@ -0,0 +1,30 @@ +"" +"👉 TEST 1" +"python3 mnist_cnn.py --epochs=1 --cnn=F,H-100" +python3 mnist_cnn.py --epochs=1 --cnn=F,H-100 +"1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.8503 - loss: 0.5286 - val_accuracy: 0.9604 - val_loss: 0.1432" +"" +"👉 TEST 2" +"python3 mnist_cnn.py --epochs=1 --cnn=F,H-100,D-0.5" +python3 mnist_cnn.py --epochs=1 --cnn=F,H-100,D-0.5 +"1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.7706 - loss: 0.7444 - val_accuracy: 0.9572 - val_loss: 0.1606" +"" +"👉 TEST 3" +"python3 mnist_cnn.py --epochs=1 --cnn=M-5-2,F,H-50" +python3 mnist_cnn.py --epochs=1 --cnn=M-5-2,F,H-50 +"1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.6630 - loss: 1.0703 - val_accuracy: 0.8798 - val_loss: 0.3894" +"" +"👉 TEST 4" +"python3 mnist_cnn.py --epochs=1 --cnn=C-8-3-5-same,C-8-3-2-valid,F,H-50" +python3 mnist_cnn.py --epochs=1 --cnn=C-8-3-5-same,C-8-3-2-valid,F,H-50 +"1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.5898 - loss: 1.2535 - val_accuracy: 0.8774 - val_loss: 0.4079" +"" +"👉 TEST 5" +"python3 mnist_cnn.py --epochs=1 --cnn=CB-6-3-5-valid,F,H-32" +python3 mnist_cnn.py --epochs=1 --cnn=CB-6-3-5-valid,F,H-32 +"1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.6822 - loss: 1.0011 - val_accuracy: 0.9284 - val_loss: 0.2537" +"" +"👉 TEST 6" +"python3 mnist_cnn.py --epochs=1 --cnn=CB-8-3-5-valid,R-[CB-8-3-1-same,CB-8-3-1-same],F,H-50" +python3 mnist_cnn.py --epochs=1 --cnn=CB-8-3-5-valid,R-[CB-8-3-1-same,CB-8-3-1-same],F,H-50 +"1100/1100 ━━━━━━━━━━━━━━━━━━━━ 15s 14ms/step - accuracy: 0.7562 - loss: 0.7717 - val_accuracy: 0.9486 - val_loss: 0.1734" diff --git a/labs/04/mnist_cnn.py b/labs/04/mnist_cnn.py index a3a91cd..b3c5727 100644 --- a/labs/04/mnist_cnn.py +++ b/labs/04/mnist_cnn.py @@ -1,9 +1,14 @@ #!/usr/bin/env python3 import argparse import os -os.environ.setdefault("KERAS_BACKEND", "torch") # Use PyTorch backend unless specified otherwise +import re + +os.environ.setdefault( + "KERAS_BACKEND", "torch" +) # Use PyTorch backend unless specified otherwise import keras +from keras.layers import add import torch from mnist import MNIST @@ -11,42 +16,115 @@ parser = argparse.ArgumentParser() # These arguments will be set appropriately by ReCodEx, even if you change them. parser.add_argument("--batch_size", default=50, type=int, help="Batch size.") -parser.add_argument("--cnn", default=None, type=str, help="CNN architecture.") +parser.add_argument( + "--cnn", + default="CB-16-5-2-same,M-3-2,F,H-100,D-0.5", + type=str, + help="CNN architecture.", +) parser.add_argument("--epochs", default=10, type=int, help="Number of epochs.") -parser.add_argument("--recodex", default=False, action="store_true", help="Evaluation in ReCodEx.") +parser.add_argument( + "--recodex", default=False, action="store_true", help="Evaluation in ReCodEx." +) parser.add_argument("--seed", default=42, type=int, help="Random seed.") -parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.") +parser.add_argument( + "--threads", default=1, type=int, help="Maximum number of threads to use." +) # If you add more arguments, ReCodEx will keep them with your default values. +def create_layer(layer_type, layer_args, hidden): + if layer_type == "C": + filters, kernel_size, stride, padding = layer_args + hidden = keras.layers.Conv2D( + filters=int(filters), + kernel_size=(int(kernel_size)), + strides=(int(stride)), + padding=padding, + activation="relu", + )(hidden) + + return hidden + + # - `CB-filters-kernel_size-stride-padding`: Same as `C`, but use batch normalization. + # In detail, start with a convolutional layer **without bias** and activation, + # then add a batch normalization layer, and finally the ReLU activation. + if layer_type == "CB": + filters, kernel_size, stride, padding = layer_args + hidden = keras.layers.Conv2D( + filters=int(filters), + kernel_size=(int(kernel_size)), + strides=(int(stride)), + padding=padding, + use_bias=False, + )(hidden) + hidden = keras.layers.BatchNormalization()(hidden) + hidden = keras.layers.ReLU()(hidden) + return hidden + + # - `M-pool_size-stride`: Add max pooling with specified size and stride, using + # the default "valid" padding. + if layer_type == "M": + pool_size, stride = layer_args + hidden = keras.layers.MaxPooling2D( + pool_size=int(pool_size), + strides=(int(stride)), + )(hidden) + return hidden + + # - `R-[layers]`: Add a residual connection. The `layers` contain a specification + # of at least one convolutional layer (but not a recursive residual connection `R`). + # The input to the `R` layer should be processed sequentially by `layers`, and the + # produced output (after the ReLU nonlinearity of the last layer) should be added + # to the input (of this `R` layer). + if layer_type == "R": + input_layer = hidden + layers = "-".join(layer_args)[1:-1].split(",") + + for layer in layers: + layer_type, *layer_args = layer.split("-") + + hidden = create_layer(layer_type, layer_args, hidden) + + hidden = keras.layers.Add()([input_layer, hidden]) + + return hidden + + # - `F`: Flatten inputs. Must appear exactly once in the architecture. + if layer_type == "F": + hidden = keras.layers.Flatten()(hidden) + return hidden + + # - `H-hidden_layer_size`: Add a dense layer with ReLU activation and the specified size. + if layer_type == "H": + hidden_layer_size, = layer_args + hidden = keras.layers.Dense(units=int(hidden_layer_size), activation="relu")(hidden) + return hidden + + # - `D-dropout_rate`: Apply dropout with the given dropout rate. + if layer_type == "D": + dropout_rate, = layer_args + hidden = keras.layers.Dropout(rate=float(dropout_rate))(hidden) + return hidden + + class Model(keras.Model): def __init__(self, args: argparse.Namespace) -> None: - # TODO: Create the model. The template uses the functional API, but + # Create the model. The template uses the functional API, but # feel free to use subclassing if you want. inputs = keras.Input(shape=[MNIST.H, MNIST.W, MNIST.C]) hidden = keras.layers.Rescaling(1 / 255)(inputs) - # TODO: Add CNN layers specified by `args.cnn`, which contains - # a comma-separated list of the following layers: - # - `C-filters-kernel_size-stride-padding`: Add a convolutional layer with ReLU - # activation and specified number of filters, kernel size, stride and padding. - # - `CB-filters-kernel_size-stride-padding`: Same as `C`, but use batch normalization. - # In detail, start with a convolutional layer **without bias** and activation, - # then add a batch normalization layer, and finally the ReLU activation. - # - `M-pool_size-stride`: Add max pooling with specified size and stride, using - # the default "valid" padding. - # - `R-[layers]`: Add a residual connection. The `layers` contain a specification - # of at least one convolutional layer (but not a recursive residual connection `R`). - # The input to the `R` layer should be processed sequentially by `layers`, and the - # produced output (after the ReLU nonlinearity of the last layer) should be added - # to the input (of this `R` layer). - # - `F`: Flatten inputs. Must appear exactly once in the architecture. - # - `H-hidden_layer_size`: Add a dense layer with ReLU activation and the specified size. - # - `D-dropout_rate`: Apply dropout with the given dropout rate. + cnn_args = re.split(r",(?![^\[]*\])", args.cnn) + + for layer in cnn_args: + layer_type, *layer_args = layer.split("-") + + hidden = create_layer(layer_type, layer_args, hidden) + # You can assume the resulting network is valid; it is fine to crash if it is not. # # Produce the results in the variable `hidden`. - hidden = ... # Add the final output layer outputs = keras.layers.Dense(MNIST.LABELS, activation="softmax")(hidden) @@ -73,13 +151,19 @@ def main(args: argparse.Namespace) -> dict[str, float]: model = Model(args) logs = model.fit( - mnist.train.data["images"], mnist.train.data["labels"], - batch_size=args.batch_size, epochs=args.epochs, + mnist.train.data["images"], + mnist.train.data["labels"], + batch_size=args.batch_size, + epochs=args.epochs, validation_data=(mnist.dev.data["images"], mnist.dev.data["labels"]), ) # Return development metrics for ReCodEx to validate. - return {metric: values[-1] for metric, values in logs.history.items() if metric.startswith("val_")} + return { + metric: values[-1] + for metric, values in logs.history.items() + if metric.startswith("val_") + } if __name__ == "__main__": diff --git a/labs/04/mnist_multiple.ps1 b/labs/04/mnist_multiple.ps1 new file mode 100644 index 0000000..3416b36 --- /dev/null +++ b/labs/04/mnist_multiple.ps1 @@ -0,0 +1,11 @@ +"" +"👉 TEST 1" +"python3 mnist_multiple.py --epochs=1 --batch_size=50" +python3 mnist_multiple.py --epochs=1 --batch_size=50 +"275/275 ━━━━━━━━━━━━━━━━━━━━ 11s 38ms/step - direct_comparison_accuracy: 0.7993 - indirect_comparison_accuracy: 0.8930 - loss: 1.6710 - val_direct_comparison_accuracy: 0.9508 - val_indirect_comparison_accuracy: 0.9836 - val_loss: 0.2984" +"" +"👉 TEST 2" +"python3 mnist_cnn.py --epochs=1 --cnn=F,H-100,D-0.5" +python3 mnist_multiple.py --epochs=1 --batch_size=100 +"275/275 ━━━━━━━━━━━━━━━━━━━━ 11s 38ms/step - direct_comparison_accuracy: 0.7680 - indirect_comparison_accuracy: 0.8637 - loss: 2.1429 - val_direct_comparison_accuracy: 0.9288 - val_indirect_comparison_accuracy: 0.9772 - val_loss: 0.4157" +"" diff --git a/labs/04/mnist_multiple.py b/labs/04/mnist_multiple.py index 06b9d9e..def13ab 100644 --- a/labs/04/mnist_multiple.py +++ b/labs/04/mnist_multiple.py @@ -1,7 +1,10 @@ #!/usr/bin/env python3 import argparse import os -os.environ.setdefault("KERAS_BACKEND", "torch") # Use PyTorch backend unless specified otherwise + +os.environ.setdefault( + "KERAS_BACKEND", "torch" +) # Use PyTorch backend unless specified otherwise import numpy as np import keras @@ -13,9 +16,13 @@ # These arguments will be set appropriately by ReCodEx, even if you change them. parser.add_argument("--batch_size", default=50, type=int, help="Batch size.") parser.add_argument("--epochs", default=5, type=int, help="Number of epochs.") -parser.add_argument("--recodex", default=False, action="store_true", help="Evaluation in ReCodEx.") +parser.add_argument( + "--recodex", default=False, action="store_true", help="Evaluation in ReCodEx." +) parser.add_argument("--seed", default=42, type=int, help="Random seed.") -parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.") +parser.add_argument( + "--threads", default=1, type=int, help="Maximum number of threads to use." +) # If you add more arguments, ReCodEx will keep them with your default values. @@ -27,7 +34,7 @@ def __init__(self, args: argparse.Namespace) -> None: keras.Input(shape=[MNIST.H, MNIST.W, MNIST.C]), ) - # TODO: The model starts by passing each input image through the same + # The model starts by passing each input image through the same # subnetwork (with shared weights), which should perform # - keras.layers.Rescaling(1 / 255) to convert images to floats in [0, 1] range, # - convolution with 10 filters, 3x3 kernel size, stride 2, "valid" padding, ReLU activation, @@ -36,24 +43,49 @@ def __init__(self, args: argparse.Namespace) -> None: # - fully connected layer with 200 neurons and ReLU activation, # obtaining a 200-dimensional feature vector FV of each image. - # TODO: Using the computed representations, the model should produce four outputs: + rescale = keras.layers.Rescaling(1 / 255) + c1 = keras.layers.Conv2D( + filters=10, kernel_size=3, strides=2, padding="valid", activation="relu" + ) + c2 = keras.layers.Conv2D( + filters=20, kernel_size=3, strides=2, padding="valid", activation="relu" + ) + flat = keras.layers.Flatten() + hidden = keras.layers.Dense(200, activation="relu") + + fv1 = hidden(flat(c2(c1(rescale(images[0]))))) + fv2 = hidden(flat(c2(c1(rescale(images[1]))))) + + # Using the computed representations, the model should produce four outputs: # - first, compute _direct comparison_ whether the first digit is # greater than the second, by # - concatenating the two 200-dimensional image representations FV, # - processing them using another 200-neuron ReLU dense layer # - computing one output using a dense layer with "sigmoid" activation + concatenation = keras.layers.Concatenate()([fv1, fv2]) + hidden2 = keras.layers.Dense(200, activation="relu") + pred_layer = keras.layers.Dense(1, activation="sigmoid") + direct_comparison = pred_layer(hidden2(concatenation)) # - then, classify the computed representation FV of the first image using # a densely connected softmax layer into 10 classes; # - then, classify the computed representation FV of the second image using # the same layer (identical, i.e., with shared weights) into 10 classes; + classification_layer = keras.layers.Dense(10, activation="softmax") + d1 = classification_layer(fv1) + d2 = classification_layer(fv2) # - finally, compute _indirect comparison_ whether the first digit # is greater than second, by comparing the predictions from the above # two outputs; convert the comparison to "float32" using `keras.ops.cast`. outputs = { - "direct_comparison": ..., - "digit_1": ..., - "digit_2": ..., - "indirect_comparison": ..., + "direct_comparison": direct_comparison, + "digit_1": d1, + "digit_2": d2, + "indirect_comparison": keras.ops.cast( + keras.ops.greater( + keras.ops.argmax(d1, axis=1), keras.ops.argmax(d2, axis=1) + ), + "float32", + ), } # Finally, construct the model. @@ -65,7 +97,7 @@ def __init__(self, args: argparse.Namespace) -> None: # the keys of the `outputs` dictionary. self.output_names = sorted(outputs.keys()) - # TODO: Define the appropriate losses for the model outputs + # Define the appropriate losses for the model outputs # "direct_comparison", "digit_1", "digit_2". Regarding metrics, # the accuracy of both the direct and indirect comparisons should be # computed; name both metrics "accuracy" (i.e., pass "accuracy" as the @@ -73,19 +105,25 @@ def __init__(self, args: argparse.Namespace) -> None: self.compile( optimizer=keras.optimizers.Adam(), loss={ - "direct_comparison": ..., - "digit_1": ..., - "digit_2": ..., + "direct_comparison": keras.losses.BinaryCrossentropy(), + "digit_1": keras.losses.SparseCategoricalCrossentropy(), + "digit_2": keras.losses.SparseCategoricalCrossentropy(), }, metrics={ - "direct_comparison": [...], - "indirect_comparison": [...], + "direct_comparison": [ + keras.metrics.BinaryAccuracy(name="accuracy"), + ], + "indirect_comparison": [ + keras.metrics.BinaryAccuracy(name="accuracy"), + ], }, ) # Create an appropriate dataset using the MNIST data. def create_dataset( - self, mnist_dataset: MNIST.Dataset, args: argparse.Namespace, + self, + mnist_dataset: MNIST.Dataset, + args: argparse.Namespace, ) -> torch.utils.data.Dataset: # Original MNIST dataset. images, labels = mnist_dataset.data["images"], mnist_dataset.data["labels"] @@ -94,16 +132,27 @@ def create_dataset( # You can assume that the size of the original dataset is even. class TorchDataset(torch.utils.data.Dataset): def __len__(self) -> int: - # TODO: The new dataset has half the size of the original one. - return ... + # The new dataset has half the size of the original one. + return len(images) // 2 - def __getitem__(self, index: int) -> tuple[tuple[np.ndarray, np.ndarray], dict[str, np.ndarray]]: - # TODO: Given an `index`, generate a dataset element suitable for our model. + def __getitem__( + self, index: int + ) -> tuple[tuple[np.ndarray, np.ndarray], dict[str, np.ndarray]]: + # Given an `index`, generate a dataset element suitable for our model. # Notably, the element should be a pair `(input, output)`, with # - `input` being a pair of images `(images[2 * index], images[2 * index + 1])`, # - `output` being a dictionary with keys "digit_1", "digit_2", "direct_comparison", # and "indirect_comparison". - return ... + return ( + (images[2 * index], images[2 * index + 1]), + { + "digit_1": labels[2 * index], + "digit_2": labels[2 * index + 1], + "direct_comparison": labels[2 * index] > labels[2 * index + 1], + "indirect_comparison": labels[2 * index] + > labels[2 * index + 1], + }, + ) return TorchDataset() @@ -122,14 +171,22 @@ def main(args: argparse.Namespace) -> dict[str, float]: model = Model(args) # Construct suitable dataloaders from the MNIST data. - train = torch.utils.data.DataLoader(model.create_dataset(mnist.train, args), args.batch_size, shuffle=True) - dev = torch.utils.data.DataLoader(model.create_dataset(mnist.dev, args), args.batch_size) + train = torch.utils.data.DataLoader( + model.create_dataset(mnist.train, args), args.batch_size, shuffle=True + ) + dev = torch.utils.data.DataLoader( + model.create_dataset(mnist.dev, args), args.batch_size + ) # Train logs = model.fit(train, epochs=args.epochs, validation_data=dev) # Return development metrics for ReCodEx to validate. - return {metric: values[-1] for metric, values in logs.history.items() if metric.startswith("val_")} + return { + metric: values[-1] + for metric, values in logs.history.items() + if metric.startswith("val_") + } if __name__ == "__main__": diff --git a/labs/04/torch_dataset.ps1 b/labs/04/torch_dataset.ps1 new file mode 100644 index 0000000..46fa378 --- /dev/null +++ b/labs/04/torch_dataset.ps1 @@ -0,0 +1,11 @@ +# "" +# "👉 TEST 1" +# "python3 torch_dataset.py --epochs=1 --batch_size=100" +# python3 torch_dataset.py --epochs=1 --batch_size=100 +# "50/50 ━━━━━━━━━━━━━━━━━━━━ 3s 33ms/step - accuracy: 0.1297 - loss: 2.2519 - val_accuracy: 0.2710 - val_loss: 1.9796" +"" +"👉 TEST 2" +"python3 torch_dataset.py --epochs=1 --batch_size=50 --augment" +python3 torch_dataset.py --epochs=1 --batch_size=50 --augment +"100/100 ━━━━━━━━━━━━━━━━━━━━ 4s 34ms/step - accuracy: 0.1354 - loss: 2.2565 - val_accuracy: 0.2690 - val_loss: 1.9889" +"" diff --git a/labs/04/torch_dataset.py b/labs/04/torch_dataset.py index 5e0c330..f689e54 100644 --- a/labs/04/torch_dataset.py +++ b/labs/04/torch_dataset.py @@ -53,54 +53,67 @@ def main(args: argparse.Namespace) -> dict[str, float]: metrics=[keras.metrics.SparseCategoricalAccuracy(name="accuracy")], ) - # TODO: Create a Torch dataset constructible from the given `CIFAR10.Dataset`. + # Create a Torch dataset constructible from the given `CIFAR10.Dataset`. # You should use only the first `size` examples of the dataset, and optional # augmentation function `augmentation_fn` may be applied to the images. class TorchDataset(torch.utils.data.Dataset): + images: np.ndarray + labels: np.ndarray + augmentation_fn: callable + def __init__(self, cifar: CIFAR10.Dataset, size: int, augmentation_fn=None) -> None: - # TODO: Note that the images and labels are available in `cifar.data["images"]` + # Note that the images and labels are available in `cifar.data["images"]` # and `cifar.data["labels"]`. - ... + self.images = cifar.data["images"][:size] + self.labels = cifar.data["labels"][:size] + self.augmentation_fn = augmentation_fn def __len__(self) -> int: - # TODO: Return the appropriate size. - ... + # Return the appropriate size. + size = len(self.images) + return size + def __getitem__(self, index: int) -> tuple[np.ndarray | torch.Tensor, int]: - # TODO: Return the `index`-th example from the dataset, with the image optionally + # Return the `index`-th example from the dataset, with the image optionally # passed through the `augmentation_fn` if it is not `None`. - ... + return self.augmentation_fn(self.images[index]) if self.augmentation_fn else self.images[index], self.labels[index] if args.augment: # Construct a sequence of augmentation transformations from `torchvision.transforms.v2`. transformation = v2.Compose([ - # TODO: Add the following transformations: + # Add the following transformations: # - first create a `v2.RandomResize` that scales the image to # random size in range [28, 36], # - then add `v2.Pad` that pads the image with 4 pixels on each side, # - then add `v2.RandomCrop` that chooses a random crop of size 32x32, # - and finally add `v2.RandomHorizontalFlip` that uniformly # randomly flips the image horizontally. - ... + v2.RandomResize(28, 36), + v2.Pad(4), + v2.RandomCrop(32), + v2.RandomHorizontalFlip(), ]) def augmentation_fn(image: np.ndarray) -> torch.Tensor: - # TODO: First, convert the numpy `images` to a PyTorch tensor of uint8s, + # First, convert the numpy `images` to a PyTorch tensor of uint8s, # preferably by using `torch.from_numpy` or `torch.as_tensor` to avoid copying. # Then, because of the channels-position mismatch, permute the axes # in the image to change the order of the axes from HWC to CHW. # Next, apply the `transformation` to the image (by calling it with # the image as an argument), and finally permute the axes back to # the original order. - return ... + + return transformation(torch.as_tensor(image).permute(2, 0, 1)).permute(1, 2, 0) + else: augmentation_fn = None - # TODO: Create `train` and `dev` instances of `TorchDataset` from the corresponding + # Create `train` and `dev` instances of `TorchDataset` from the corresponding # `cifar` datasets. Limit their sizes to 5_000 and 1_000 examples, respectively, # and use the `augmentation_fn` for the training dataset. - train = ... - dev = ... + train = TorchDataset(cifar.train, 5_000, augmentation_fn) + dev = TorchDataset(cifar.dev, 1_000) if args.show_images: from torch.utils import tensorboard @@ -114,10 +127,10 @@ def augmentation_fn(image: np.ndarray) -> torch.Tensor: tb_writer.close() print("Saved first {} training imaged to logs/{}".format(GRID * GRID, TAG)) - # TODO: Create `train` and `dev` instances of `torch.utils.data.DataLoader` from + # Create `train` and `dev` instances of `torch.utils.data.DataLoader` from # the datasets, using the given `args.batch_size` and shuffling the training dataset. - train = ... - dev = ... + train = torch.utils.data.DataLoader(train, args.batch_size, shuffle=True) + dev = torch.utils.data.DataLoader(dev, args.batch_size) # Train logs = model.fit(train, epochs=args.epochs, validation_data=dev) diff --git a/labs/05/cags_classification.py b/labs/05/cags_classification.py index 6381238..e12a9eb 100644 --- a/labs/05/cags_classification.py +++ b/labs/05/cags_classification.py @@ -3,7 +3,10 @@ import datetime import os import re -os.environ.setdefault("KERAS_BACKEND", "torch") # Use PyTorch backend unless specified otherwise + +os.environ.setdefault( + "KERAS_BACKEND", "torch" +) # Use PyTorch backend unless specified otherwise import keras import numpy as np @@ -11,13 +14,19 @@ from cags_dataset import CAGS -# TODO: Define reasonable defaults and optionally more parameters. +# Define reasonable defaults and optionally more parameters. # Also, you can set the number of threads to 0 to use all your CPU cores. parser = argparse.ArgumentParser() -parser.add_argument("--batch_size", default=..., type=int, help="Batch size.") -parser.add_argument("--epochs", default=None, type=int, help="Number of epochs.") +parser.add_argument("--learning_rate", default=0.0001, help="Initial learning rate") +parser.add_argument( + "--weight_decay", default=1e-4, type=float, help="L2 regularization weight decay." +) +parser.add_argument("--batch_size", default=10, type=int, help="Batch size.") +parser.add_argument("--epochs", default=35, type=int, help="Number of epochs.") parser.add_argument("--seed", default=42, type=int, help="Random seed.") -parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.") +parser.add_argument( + "--threads", default=1, type=int, help="Maximum number of threads to use." +) class TorchTensorBoardCallback(keras.callbacks.Callback): @@ -28,7 +37,10 @@ def __init__(self, path): def writer(self, writer): if writer not in self._writers: import torch.utils.tensorboard - self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer)) + + self._writers[writer] = torch.utils.tensorboard.SummaryWriter( + os.path.join(self._path, writer) + ) return self._writers[writer] def add_logs(self, writer, logs, step): @@ -39,10 +51,24 @@ def add_logs(self, writer, logs, step): def on_epoch_end(self, epoch, logs=None): if logs: - if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer): - logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)} - self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1) - self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1) + if isinstance( + getattr(self.model, "optimizer", None), keras.optimizers.Optimizer + ): + logs = logs | { + "learning_rate": keras.ops.convert_to_numpy( + self.model.optimizer.learning_rate + ) + } + self.add_logs( + "train", + {k: v for k, v in logs.items() if not k.startswith("val_")}, + epoch + 1, + ) + self.add_logs( + "val", + {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, + epoch + 1, + ) def main(args: argparse.Namespace) -> None: @@ -53,11 +79,19 @@ def main(args: argparse.Namespace) -> None: torch.set_num_interop_threads(args.threads) # Create logdir name - args.logdir = os.path.join("logs", "{}-{}-{}".format( - os.path.basename(globals().get("__file__", "notebook")), - datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), - ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items()))) - )) + args.logdir = os.path.join( + "logs", + "{}-{}-{}".format( + os.path.basename(globals().get("__file__", "notebook")), + datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), + ",".join( + ( + "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) + for k, v in sorted(vars(args).items()) + ) + ), + ), + ) # Load the data. The individual examples are dictionaries with the keys: # - "image", a `[224, 224, 3]` tensor of `torch.uint8` values in [0-255] range, @@ -65,18 +99,70 @@ def main(args: argparse.Namespace) -> None: # - "label", a scalar of the correct class in `range(len(CAGS.LABELS))`. cags = CAGS() + train_images = np.array([entry["image"] for entry in cags.train]) + train_labels = np.array([entry["label"] for entry in cags.train]) + + dev_images = np.array([entry["image"] for entry in cags.dev]) + dev_labels = np.array([entry["label"] for entry in cags.dev]) + + test_images = np.array([entry["image"] for entry in cags.test]) + test_labels = np.array([entry["label"] for entry in cags.test]) + # Load the EfficientNetV2-B0 model. It assumes the input images are # represented in the [0-255] range. backbone = keras.applications.EfficientNetV2B0(include_top=False, pooling="avg") - # TODO: Create the model and train it - model = ... + # Create the model and train it + model = keras.models.Sequential( + [ + backbone, + keras.layers.Dense(len(CAGS.LABELS), activation="softmax"), + ] + ) + + model.compile( + optimizer=keras.optimizers.Adam( + learning_rate=keras.optimizers.schedules.CosineDecay( + initial_learning_rate=args.learning_rate, + decay_steps=len(train_images / args.batch_size * args.epochs), + ), + weight_decay=args.weight_decay, + ), + loss=keras.losses.SparseCategoricalCrossentropy(), + metrics=[keras.metrics.SparseCategoricalAccuracy(name="accuracy")], + ) + + model.summary() + + earlyStopping = keras.callbacks.EarlyStopping( + + + ) + + model.fit( + train_images, + train_labels, + batch_size=args.batch_size, + epochs=args.epochs, + validation_data=(dev_images, dev_labels), + callbacks=[ + keras.callbacks.TensorBoard(args.logdir, update_freq="epoch"), + TorchTensorBoardCallback(args.logdir), + earlyStopping, + ], + ) # Generate test set annotations, but in `args.logdir` to allow parallel execution. os.makedirs(args.logdir, exist_ok=True) - with open(os.path.join(args.logdir, "cags_classification.txt"), "w", encoding="utf-8") as predictions_file: - # TODO: Predict the probabilities on the test set - test_probabilities = model.predict(...) + with open( + os.path.join(args.logdir, "cags_classification.txt"), "w", encoding="utf-8" + ) as predictions_file: + # Predict the probabilities on the test set + test_probabilities = model.predict( + test_images, + batch_size=args.batch_size, + verbose=1, + ) for probs in test_probabilities: print(np.argmax(probs), file=predictions_file) diff --git a/labs/05/cags_dataset.py b/labs/05/cags_dataset.py index 782d19f..698a88c 100644 --- a/labs/05/cags_dataset.py +++ b/labs/05/cags_dataset.py @@ -56,7 +56,7 @@ def transform(self, transform: Callable[[dict[str, torch.Tensor]], Any]) -> torc return CAGS.TransformedDataset(self, transform) class TransformedDataset(torch.utils.data.Dataset): - def __init__(self, dataset: "Dataset", transform: Callable[[dict[str, torch.Tensor]], Any]) -> None: + def __init__(self, dataset: "CAGS.Dataset", transform: Callable[[dict[str, torch.Tensor]], Any]) -> None: self._dataset = dataset self._transform = transform @@ -66,6 +66,9 @@ def __len__(self) -> int: def __getitem__(self, index: int) -> Any: return self._transform(self._dataset[index]) + def transform(self, transform: Callable[[dict[str, torch.Tensor]], Any]) -> torch.utils.data.Dataset: + return CAGS.TransformedDataset(self, transform) + def __init__(self) -> None: for dataset, size in [("train", 2_142), ("dev", 306), ("test", 612)]: path = "cags.{}.tfrecord".format(dataset) @@ -116,19 +119,16 @@ def get_value_of_kind(kind: int) -> int: get_value_of_kind(0x12) if data[offset] == 0x0A: - get_value_of_kind(0x0A) - length = get_value_of_kind(0x0A) + length = get_value_of_kind(0x0A) and get_value_of_kind(0x0A) entries[-1][key] = np.frombuffer(data, np.uint8, length, offset).copy(); offset += length elif data[offset] == 0x1A: - get_value_of_kind(0x1A) - length = get_value_of_kind(0x0A) + length = get_value_of_kind(0x1A) and get_value_of_kind(0x0A) values, target_offset = [], offset + length while offset < target_offset: values.append(get_value()) entries[-1][key] = np.array(values, dtype=np.int64) elif data[offset] == 0x12: - get_value_of_kind(0x12) - length = get_value_of_kind(0x0A) + length = get_value_of_kind(0x12) and get_value_of_kind(0x0A) entries[-1][key] = np.frombuffer( data, np.dtype("> 2, offset).astype(np.float32).copy(); offset += length else: diff --git a/labs/05/cags_segmentation.py b/labs/05/cags_segmentation.py index f81402b..e2ed25e 100644 --- a/labs/05/cags_segmentation.py +++ b/labs/05/cags_segmentation.py @@ -3,7 +3,10 @@ import datetime import os import re -os.environ.setdefault("KERAS_BACKEND", "torch") # Use PyTorch backend unless specified otherwise + +os.environ.setdefault( + "KERAS_BACKEND", "torch" +) # Use PyTorch backend unless specified otherwise import keras import numpy as np @@ -11,13 +14,22 @@ from cags_dataset import CAGS -# TODO: Define reasonable defaults and optionally more parameters. +# Define reasonable defaults and optionally more parameters. # Also, you can set the number of threads to 0 to use all your CPU cores. parser = argparse.ArgumentParser() -parser.add_argument("--batch_size", default=..., type=int, help="Batch size.") -parser.add_argument("--epochs", default=None, type=int, help="Number of epochs.") +parser.add_argument("--batch_size", default=64, type=int, help="Batch size.") +parser.add_argument("--epochs", default=50, type=int, help="Number of epochs.") parser.add_argument("--seed", default=42, type=int, help="Random seed.") -parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.") +parser.add_argument( + "--threads", default=0, type=int, help="Maximum number of threads to use." +) + +parser.add_argument( + "--learning_rate", default=0.001, type=float, help="(Initial) Learning rate." +) +parser.add_argument( + "--final_learning_rate", default=0.0001, type=float, help="Final learning rate." +) class TorchTensorBoardCallback(keras.callbacks.Callback): @@ -28,7 +40,10 @@ def __init__(self, path): def writer(self, writer): if writer not in self._writers: import torch.utils.tensorboard - self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer)) + + self._writers[writer] = torch.utils.tensorboard.SummaryWriter( + os.path.join(self._path, writer) + ) return self._writers[writer] def add_logs(self, writer, logs, step): @@ -39,10 +54,24 @@ def add_logs(self, writer, logs, step): def on_epoch_end(self, epoch, logs=None): if logs: - if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer): - logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)} - self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1) - self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1) + if isinstance( + getattr(self.model, "optimizer", None), keras.optimizers.Optimizer + ): + logs = logs | { + "learning_rate": keras.ops.convert_to_numpy( + self.model.optimizer.learning_rate + ) + } + self.add_logs( + "train", + {k: v for k, v in logs.items() if not k.startswith("val_")}, + epoch + 1, + ) + self.add_logs( + "val", + {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, + epoch + 1, + ) def main(args: argparse.Namespace) -> None: @@ -53,11 +82,19 @@ def main(args: argparse.Namespace) -> None: torch.set_num_interop_threads(args.threads) # Create logdir name - args.logdir = os.path.join("logs", "{}-{}-{}".format( - os.path.basename(globals().get("__file__", "notebook")), - datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), - ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items()))) - )) + args.logdir = os.path.join( + "logs", + "{}-{}-{}".format( + os.path.basename(globals().get("__file__", "notebook")), + datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), + ",".join( + ( + "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) + for k, v in sorted(vars(args).items()) + ) + ), + ), + ) # Load the data. The individual examples are dictionaries with the keys: # - "image", a `[224, 224, 3]` tensor of `torch.uint8` values in [0-255] range, @@ -65,27 +102,108 @@ def main(args: argparse.Namespace) -> None: # - "label", a scalar of the correct class in `range(len(CAGS.LABELS))`. cags = CAGS() + train_images = np.array([entry["image"] for entry in cags.train]) + train_masks = np.array([entry["mask"] for entry in cags.train]) + + dev_images = np.array([entry["image"] for entry in cags.dev]) + dev_masks = np.array([entry["mask"] for entry in cags.dev]) + + test_images = np.array([entry["image"] for entry in cags.test]) + test_masks = np.array([entry["mask"] for entry in cags.test]) + # Load the EfficientNetV2-B0 model. It assumes the input images are # represented in the [0-255] range. - backbone = keras.applications.EfficientNetV2B0(include_top=False) + backbone = keras.applications.EfficientNetV2B0(include_top=False, input_shape=[cags.H, cags.W, cags.C]) + backbone.trainable = False # Extract features of different resolution. Assuming 224x224 input images # (you can set this explicitly via `input_shape` of the above constructor), # the below model returns five outputs with resolution 7x7, 14x14, 28x28, 56x56, 112x112. backbone = keras.Model( inputs=backbone.input, - outputs=[backbone.get_layer(layer).output for layer in [ - "top_activation", "block5e_add", "block3b_add", "block2b_add", "block1a_project_activation"]] + outputs=[ + backbone.get_layer(layer).output + for layer in [ + "top_activation", + "block5e_add", + "block3b_add", + "block2b_add", + "block1a_project_activation", + ] + ], + ) + + def rlbn(inputs): + return keras.layers.ReLU()(keras.layers.BatchNormalization()(inputs)) + input_layer = keras.layers.Input(shape=[cags.H, cags.W, cags.C]) + + x7_top_act, x14_5e_add, x28_3b_add, x56_2b_add, x112_1a_project_act = backbone(input_layer) + + + + c1 = rlbn(keras.layers.Conv2D(x7_top_act.shape[-1], 3, 1, "same")(x7_top_act)) + c1 = rlbn(keras.layers.Conv2D(x7_top_act.shape[-1], 3, 1, "same")(c1)) + c1 = rlbn(keras.layers.Conv2DTranspose(x14_5e_add.shape[-1], 3, 2, "same")(c1)) #shape: 14, 14, 112 + + c2 = keras.layers.Concatenate()([c1, x14_5e_add]) # shape 14, 14, 224 + c2 = rlbn(keras.layers.Conv2D(x14_5e_add.shape[-1], 3, 1, "same")(c2)) # shape 14, 14, 112 + c2 = rlbn(keras.layers.Conv2D(c2.shape[-1], 3, 1, "same")(c2)) + c2 = rlbn(keras.layers.Conv2DTranspose(x28_3b_add.shape[-1], 3, 2, "same")(c2)) # filters = 40 shape: 28, 28, 40 + + c3 = keras.layers.Concatenate()([c2, x28_3b_add]) # shape 28, 28, 80 + c3 = rlbn(keras.layers.Conv2D(x28_3b_add.shape[-1], 3, 1, "same")(c3)) # shape 28, 28, 40 + c3 = rlbn(keras.layers.Conv2D(c3.shape[-1], 3, 1, "same")(c3)) + c3 = rlbn(keras.layers.Conv2DTranspose(x56_2b_add.shape[-1], 3, 2, "same")(c3)) # shape 56, 56, 24 + # layer 4 + c4 = keras.layers.Concatenate()([c3, x56_2b_add]) # shape 56, 56, 48 + c4 = rlbn(keras.layers.Conv2D(x56_2b_add.shape[-1], 3, 1, "same")(c4)) # shape 56, 56, 24 + c4 = rlbn(keras.layers.Conv2D(c4.shape[-1], 3, 1, "same")(c4)) + c4 = rlbn(keras.layers.Conv2DTranspose(x112_1a_project_act.shape[-1], 3, 2, "same")(c4)) # shape 112, 112, 16 + # layer 5 + c5 = keras.layers.Concatenate()([c4, x112_1a_project_act]) # shape 112, 112, 32 + c5 = rlbn(keras.layers.Conv2D(x112_1a_project_act.shape[-1], 3, 1, "same")(c5)) # shape 112, 112, 16 + c5 = rlbn(keras.layers.Conv2D(c5.shape[-1], 3, 1, "same")(c5)) + c5 = rlbn(keras.layers.Conv2DTranspose(input_layer.shape[-1], 3, 2, "same")(c5)) # shape 224, 224, 3 + # outputs + output_layer = keras.layers.Concatenate()([c5, input_layer]) # shape 224, 224, 6 + output_layer = rlbn(keras.layers.Conv2D(output_layer.shape[-1], 3, 1, "same")(output_layer)) # shape 224, 224, 6 + output_layer = rlbn(keras.layers.Conv2D(output_layer.shape[-1], 3, 1, "same")(output_layer)) + output_layer = keras.layers.Conv2D(1, 1, 1, "same", activation="sigmoid")(output_layer) + + + # Create the model and train it + model = keras.Model(input_layer, output_layer, name="cags_segmentation") + lr = args.learning_rate + if args.final_learning_rate: + steps = len(train_images)/args.batch_size*args.epochs + lr = keras.optimizers.schedules.PolynomialDecay(initial_learning_rate=args.learning_rate, + decay_steps=steps, end_learning_rate=args.final_learning_rate) + + model.compile( + optimizer=keras.optimizers.Adam(learning_rate=lr), + loss=keras.losses.BinaryCrossentropy(), + metrics=[ + cags.MaskIoUMetric(name="iou"), + ], ) - # TODO: Create the model and train it - model = ... + model.summary() + + model.fit( + train_images, + train_masks, + epochs=args.epochs, + batch_size=args.batch_size, + validation_data=(dev_images, dev_masks), + ) # Generate test set annotations, but in `args.logdir` to allow parallel execution. os.makedirs(args.logdir, exist_ok=True) - with open(os.path.join(args.logdir, "cags_segmentation.txt"), "w", encoding="utf-8") as predictions_file: - # TODO: Predict the masks on the test set - test_masks = model.predict(...) + with open( + os.path.join(args.logdir, "cags_segmentation.txt"), "w", encoding="utf-8" + ) as predictions_file: + # Predict the masks on the test set + test_masks = model.predict(test_images) for mask in test_masks: zeros, ones, runs = 0, 0, [] diff --git a/labs/06/bboxes_utils.py b/labs/06/bboxes_utils.py new file mode 100644 index 0000000..13f4dc8 --- /dev/null +++ b/labs/06/bboxes_utils.py @@ -0,0 +1,181 @@ +#!/usr/bin/env python3 +import argparse +from typing import Callable +import unittest + +import numpy as np + +# Bounding boxes and anchors are expected to be Numpy tensors, +# where the last dimension has size 4. + +# For bounding boxes in pixel coordinates, the 4 values correspond to: +TOP: int = 0 +LEFT: int = 1 +BOTTOM: int = 2 +RIGHT: int = 3 + + +def bboxes_area(bboxes: np.ndarray) -> np.ndarray: + """ Compute area of given set of bboxes. + + Each bbox is parametrized as a four-tuple (top, left, bottom, right). + + If the bboxes.shape is [..., 4], the output shape is bboxes.shape[:-1]. + """ + return np.maximum(bboxes[..., BOTTOM] - bboxes[..., TOP], 0) \ + * np.maximum(bboxes[..., RIGHT] - bboxes[..., LEFT], 0) + + +def bboxes_iou(xs: np.ndarray, ys: np.ndarray) -> np.ndarray: + """ Compute IoU of corresponding pairs from two sets of bboxes `xs` and `ys`. + + Each bbox is parametrized as a four-tuple (top, left, bottom, right). + + Note that broadcasting is supported, so passing inputs with + `xs.shape=[num_xs, 1, 4]` and `ys.shape=[1, num_ys, 4]` produces an output + with shape `[num_xs, num_ys]`, computing IoU for all pairs of bboxes from + `xs` and `ys`. Formally, the output shape is `np.broadcast(xs, ys).shape[:-1]`. + """ + intersections = np.stack([ + np.maximum(xs[..., TOP], ys[..., TOP]), + np.maximum(xs[..., LEFT], ys[..., LEFT]), + np.minimum(xs[..., BOTTOM], ys[..., BOTTOM]), + np.minimum(xs[..., RIGHT], ys[..., RIGHT]), + ], axis=-1) + + xs_area, ys_area, intersections_area = bboxes_area(xs), bboxes_area(ys), bboxes_area(intersections) + + return intersections_area / (xs_area + ys_area - intersections_area) + + +def bboxes_to_rcnn(anchors: np.ndarray, bboxes: np.ndarray) -> np.ndarray: + """ Convert `bboxes` to a R-CNN-like representation relative to `anchors`. + + The `anchors` and `bboxes` are arrays of four-tuples (top, left, bottom, right); + you can use the TOP, LEFT, BOTTOM, RIGHT constants as indices of the + respective coordinates. + + The resulting representation of a single bbox is a four-tuple with: + - (bbox_y_center - anchor_y_center) / anchor_height + - (bbox_x_center - anchor_x_center) / anchor_width + - log(bbox_height / anchor_height) + - log(bbox_width / anchor_width) + + If the `anchors.shape` is `[anchors_len, 4]` and `bboxes.shape` is `[anchors_len, 4]`, + the output shape is `[anchors_len, 4]`. + """ + + # TODO: Implement according to the docstring. + raise NotImplementedError() + + +def bboxes_from_rcnn(anchors: np.ndarray, rcnns: np.ndarray) -> np.ndarray: + """ Convert R-CNN-like representation relative to `anchor` to a `bbox`. + + If the `anchors.shape` is `[anchors_len, 4]` and `rcnns.shape` is `[anchors_len, 4]`, + the output shape is `[anchors_len, 4]`. + """ + + # TODO: Implement according to the docstring. + raise NotImplementedError() + + +def bboxes_training( + anchors: np.ndarray, gold_classes: np.ndarray, gold_bboxes: np.ndarray, iou_threshold: float +) -> tuple[np.ndarray, np.ndarray]: + """ Compute training data for object detection. + + Arguments: + - `anchors` is an array of four-tuples (top, left, bottom, right) + - `gold_classes` is an array of zero-based classes of the gold objects + - `gold_bboxes` is an array of four-tuples (top, left, bottom, right) + of the gold objects + - `iou_threshold` is a given threshold + + Returns: + - `anchor_classes` contains for every anchor either 0 for background + (if no gold object is assigned) or `1 + gold_class` if a gold object + with `gold_class` is assigned to it + - `anchor_bboxes` contains for every anchor a four-tuple + `(center_y, center_x, height, width)` representing the gold bbox of + a chosen object using parametrization of R-CNN; zeros if no gold object + was assigned to the anchor + If the `anchors` shape is `[anchors_len, 4]`, the `anchor_classes` shape + is `[anchors_len]` and the `anchor_bboxes` shape is `[anchors_len, 4]`. + + Algorithm: + - First, for each gold object, assign it to an anchor with the largest IoU + (the one with smaller index if there are several). In case several gold + objects are assigned to a single anchor, use the gold object with smaller + index. + - For each unused anchor, find the gold object with the largest IoU + (again the one with smaller index if there are several), and if the IoU + is >= iou_threshold, assign the object to the anchor. + """ + + # TODO: First, for each gold object, assign it to an anchor with the + # largest IoU (the one with smaller index if there are several). In case + # several gold objects are assigned to a single anchor, use the gold object + # with smaller index. + + # TODO: For each unused anchor, find the gold object with the largest IoU + # (again the one with smaller index if there are several), and if the IoU + # is >= threshold, assign the object to the anchor. + + anchor_classes, anchor_bboxes = ..., ... + + return anchor_classes, anchor_bboxes + + +def main(args: argparse.Namespace) -> tuple[Callable, Callable, Callable]: + return bboxes_to_rcnn, bboxes_from_rcnn, bboxes_training + + +class Tests(unittest.TestCase): + def test_bboxes_to_from_rcnn(self): + data = [ + [[0, 0, 10, 10], [0, 0, 10, 10], [0, 0, 0, 0]], + [[0, 0, 10, 10], [5, 0, 15, 10], [.5, 0, 0, 0]], + [[0, 0, 10, 10], [0, 5, 10, 15], [0, .5, 0, 0]], + [[0, 0, 10, 10], [0, 0, 20, 30], [.5, 1, np.log(2), np.log(3)]], + [[0, 9, 10, 19], [2, 10, 5, 16], [-0.15, -0.1, -1.20397, -0.51083]], + [[5, 3, 15, 13], [7, 7, 10, 9], [-0.15, 0, -1.20397, -1.60944]], + [[7, 6, 17, 16], [9, 10, 12, 13], [-0.15, 0.05, -1.20397, -1.20397]], + [[5, 6, 15, 16], [7, 7, 10, 10], [-0.15, -0.25, -1.20397, -1.20397]], + [[6, 3, 16, 13], [8, 5, 12, 8], [-0.1, -0.15, -0.91629, -1.20397]], + [[5, 2, 15, 12], [9, 6, 12, 8], [0.05, 0, -1.20397, -1.60944]], + [[2, 10, 12, 20], [6, 11, 8, 17], [0, -0.1, -1.60944, -0.51083]], + [[10, 9, 20, 19], [12, 13, 17, 16], [-0.05, 0.05, -0.69315, -1.20397]], + [[6, 7, 16, 17], [10, 11, 12, 14], [0, 0.05, -1.60944, -1.20397]], + [[2, 2, 12, 12], [3, 5, 8, 8], [-0.15, -0.05, -0.69315, -1.20397]], + ] + # First run on individual anchors, and then on all together + for anchors, bboxes, rcnns in [map(lambda x: [x], row) for row in data] + [zip(*data)]: + anchors, bboxes, rcnns = [np.array(data, np.float32) for data in [anchors, bboxes, rcnns]] + np.testing.assert_almost_equal(bboxes_to_rcnn(anchors, bboxes), rcnns, decimal=3) + np.testing.assert_almost_equal(bboxes_from_rcnn(anchors, rcnns), bboxes, decimal=3) + + def test_bboxes_training(self): + anchors = np.array([[0, 0, 10, 10], [0, 10, 10, 20], [10, 0, 20, 10], [10, 10, 20, 20]], np.float32) + for gold_classes, gold_bboxes, anchor_classes, anchor_bboxes, iou in [ + [[1], [[14., 14, 16, 16]], [0, 0, 0, 2], [[0, 0, 0, 0]] * 3 + [[0, 0, np.log(.2), np.log(.2)]], 0.5], + [[2], [[0., 0, 20, 20]], [3, 0, 0, 0], [[.5, .5, np.log(2), np.log(2)]] + [[0, 0, 0, 0]] * 3, 0.26], + [[2], [[0., 0, 20, 20]], [3, 3, 3, 3], + [[y, x, np.log(2), np.log(2)] for y in [.5, -.5] for x in [.5, -.5]], 0.24], + [[0, 1], [[3, 3, 20, 18], [10, 1, 18, 21]], [0, 0, 0, 1], + [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [-0.35, -0.45, 0.53062, 0.40546]], 0.5], + [[0, 1], [[3, 3, 20, 18], [10, 1, 18, 21]], [0, 0, 2, 1], + [[0, 0, 0, 0], [0, 0, 0, 0], [-0.1, 0.6, -0.22314, 0.69314], [-0.35, -0.45, 0.53062, 0.40546]], 0.3], + [[0, 1], [[3, 3, 20, 18], [10, 1, 18, 21]], [0, 1, 2, 1], + [[0, 0, 0, 0], [0.65, -0.45, 0.53062, 0.40546], [-0.1, 0.6, -0.22314, 0.69314], + [-0.35, -0.45, 0.53062, 0.40546]], 0.17], + ]: + gold_classes, anchor_classes = np.array(gold_classes, np.int32), np.array(anchor_classes, np.int32) + gold_bboxes, anchor_bboxes = np.array(gold_bboxes, np.float32), np.array(anchor_bboxes, np.float32) + computed_classes, computed_bboxes = bboxes_training(anchors, gold_classes, gold_bboxes, iou) + np.testing.assert_almost_equal(computed_classes, anchor_classes, decimal=3) + np.testing.assert_almost_equal(computed_bboxes, anchor_bboxes, decimal=3) + + +if __name__ == '__main__': + unittest.main() diff --git a/labs/06/svhn_competition.py b/labs/06/svhn_competition.py new file mode 100644 index 0000000..ef3e6d0 --- /dev/null +++ b/labs/06/svhn_competition.py @@ -0,0 +1,101 @@ +#!/usr/bin/env python3 +import argparse +import datetime +import os +import re +os.environ.setdefault("KERAS_BACKEND", "torch") # Use PyTorch backend unless specified otherwise + +import keras +import numpy as np +import torch + +import bboxes_utils +from svhn_dataset import SVHN + +# TODO: Define reasonable defaults and optionally more parameters. +# Also, you can set the number of threads to 0 to use all your CPU cores. +parser = argparse.ArgumentParser() +parser.add_argument("--batch_size", default=..., type=int, help="Batch size.") +parser.add_argument("--epochs", default=..., type=int, help="Number of epochs.") +parser.add_argument("--seed", default=42, type=int, help="Random seed.") +parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.") + + +class TorchTensorBoardCallback(keras.callbacks.Callback): + def __init__(self, path): + self._path = path + self._writers = {} + + def writer(self, writer): + if writer not in self._writers: + import torch.utils.tensorboard + self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer)) + return self._writers[writer] + + def add_logs(self, writer, logs, step): + if logs: + for key, value in logs.items(): + self.writer(writer).add_scalar(key, value, step) + self.writer(writer).flush() + + def on_epoch_end(self, epoch, logs=None): + if logs: + if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer): + logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)} + self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1) + self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1) + + +def main(args: argparse.Namespace) -> None: + # Set the random seed and the number of threads. + keras.utils.set_random_seed(args.seed) + if args.threads: + torch.set_num_threads(args.threads) + torch.set_num_interop_threads(args.threads) + + # Create logdir name + args.logdir = os.path.join("logs", "{}-{}-{}".format( + os.path.basename(globals().get("__file__", "notebook")), + datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), + ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items()))) + )) + + # Load the data. The individual examples are dictionaries with the keys: + # - "image", a `[SIZE, SIZE, 3]` tensor of `torch.uint8` values in [0-255] range, + # - "classes", a `[num_digits]` vector with classes of image digits, + # - "bboxes", a `[num_digits, 4]` vector with bounding boxes of image digits. + svhn = SVHN() + + # Load the EfficientNetV2-B0 model. It assumes the input images are + # represented in the [0-255] range. + backbone = keras.applications.EfficientNetV2B0(include_top=False) + + # Extract features of different resolution. Assuming 224x224 input images + # (you can set this explicitly via `input_shape` of the above constructor), + # the below model returns five outputs with resolution 7x7, 14x14, 28x28, 56x56, 112x112. + backbone = keras.Model( + inputs=backbone.input, + outputs=[backbone.get_layer(layer).output for layer in [ + "top_activation", "block5e_add", "block3b_add", "block2b_add", "block1a_project_activation"]] + ) + + # TODO: Create the model and train it + model = ... + + # Generate test set annotations, but in `args.logdir` to allow parallel execution. + os.makedirs(args.logdir, exist_ok=True) + with open(os.path.join(args.logdir, "svhn_competition.txt"), "w", encoding="utf-8") as predictions_file: + # TODO: Predict the digits and their bounding boxes on the test set. + # Assume that for a single test image we get + # - `predicted_classes`: a 1D array with the predicted digits, + # - `predicted_bboxes`: a [len(predicted_classes), 4] array with bboxes; + for predicted_classes, predicted_bboxes in ...: + output = [] + for label, bbox in zip(predicted_classes, predicted_bboxes): + output += [label] + list(bbox) + print(*output, file=predictions_file) + + +if __name__ == "__main__": + args = parser.parse_args([] if "__file__" not in globals() else None) + main(args) diff --git a/labs/06/svhn_dataset.py b/labs/06/svhn_dataset.py new file mode 100644 index 0000000..bd368b8 --- /dev/null +++ b/labs/06/svhn_dataset.py @@ -0,0 +1,238 @@ +import os +import sys +import struct +from typing import Any, Callable, Sequence, TextIO +import urllib.request +os.environ.setdefault("KERAS_BACKEND", "torch") # Use PyTorch backend unless specified otherwise + +import keras +import numpy as np +import torch +import torchvision + + +class SVHN: + LABELS: int = 10 + + # Type alias for a bounding box -- a list of floats. + BBox = list[float] + + # The indices of the bounding box coordinates. + TOP: int = 0 + LEFT: int = 1 + BOTTOM: int = 2 + RIGHT: int = 3 + + _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/" + + class Dataset(torch.utils.data.Dataset): + def __init__(self, path: str, size: int) -> None: + self._path = path + self._data = None + self._size = size + + def __len__(self) -> int: + return self._size + + def __getitem__(self, index: int) -> dict[str, torch.Tensor]: + if self._data is None: + self._data = [] + for entry in SVHN._load_data(self._path, self._size): + entry["image"] = torchvision.io.decode_image( + torch.from_numpy(entry["image"]), torchvision.io.ImageReadMode.RGB).permute(1, 2, 0) + entry["classes"] = np.asarray(entry["classes"], np.int64) + entry["bboxes"] = np.asarray(entry["bboxes"], np.int64).reshape(-1, 4) + self._data.append(entry) + return self._data[index] + + def transform(self, transform: Callable[[dict[str, torch.Tensor]], Any]) -> torch.utils.data.Dataset: + return SVHN.TransformedDataset(self, transform) + + class TransformedDataset(torch.utils.data.Dataset): + def __init__(self, dataset: "SVHN.Dataset", transform: Callable[[dict[str, torch.Tensor]], Any]) -> None: + self._dataset = dataset + self._transform = transform + + def __len__(self) -> int: + return self._dataset._size + + def __getitem__(self, index: int) -> Any: + return self._transform(self._dataset[index]) + + def transform(self, transform: Callable[[dict[str, torch.Tensor]], Any]) -> torch.utils.data.Dataset: + return SVHN.TransformedDataset(self, transform) + + def __init__(self) -> None: + for dataset, size in [("train", 10_000), ("dev", 1_267), ("test", 4_535)]: + path = "svhn.{}.tfrecord".format(dataset) + if not os.path.exists(path): + print("Downloading file {}...".format(path), file=sys.stderr) + urllib.request.urlretrieve("{}/{}".format(self._URL, path), filename="{}.tmp".format(path)) + os.rename("{}.tmp".format(path), path) + + setattr(self, dataset, self.Dataset(path, size)) + + train: Dataset + dev: Dataset + test: Dataset + + # TFRecord loading + @staticmethod + def _load_data(path: str, items: int) -> list[dict[str, Any]]: + def get_value() -> int: + nonlocal data, offset + value = np.int64(data[offset] & 0x7F); start = offset; offset += 1 + while data[offset - 1] & 0x80: + value |= (data[offset] & 0x7F) << (7 * (offset - start)); offset += 1 + return value + + def get_value_of_kind(kind: int) -> int: + nonlocal data, offset + assert data[offset] == kind; offset += 1 + return get_value() + + entries = [] + with open(path, "rb") as file: + while len(entries) < items: + entries.append({}) + + length = file.read(8); assert len(length) == 8 + length, = struct.unpack("> 2, offset).astype(np.float32).copy(); offset += length + else: + raise ValueError("Unsupported data tag {}".format(data[offset])) + return entries + + # Evaluation infrastructure. + @staticmethod + def evaluate( + gold_dataset: "SVHN.Dataset", predictions: Sequence[tuple[list[int], list[BBox]]], iou_threshold: float = 0.5, + ) -> float: + def bbox_iou(x: SVHN.BBox, y: SVHN.BBox) -> float: + def area(bbox: SVHN.BBox) -> float: + return max(bbox[SVHN.BOTTOM] - bbox[SVHN.TOP], 0) * max(bbox[SVHN.RIGHT] - bbox[SVHN.LEFT], 0) + intersection = [max(x[SVHN.TOP], y[SVHN.TOP]), max(x[SVHN.LEFT], y[SVHN.LEFT]), + min(x[SVHN.BOTTOM], y[SVHN.BOTTOM]), min(x[SVHN.RIGHT], y[SVHN.RIGHT])] + x_area, y_area, intersection_area = area(x), area(y), area(intersection) + return intersection_area / (x_area + y_area - intersection_area) + + gold = [(np.array(example["classes"]), np.array(example["bboxes"])) for example in gold_dataset] + + if len(predictions) != len(gold): + raise RuntimeError("The predictions are of different size than gold data: {} vs {}".format( + len(predictions), len(gold))) + + correct = 0 + for (gold_classes, gold_bboxes), (prediction_classes, prediction_bboxes) in zip(gold, predictions): + if len(gold_classes) != len(prediction_classes): + continue + + used = [False] * len(gold_classes) + for cls, bbox in zip(prediction_classes, prediction_bboxes): + best = None + for i in range(len(gold_classes)): + if used[i] or gold_classes[i] != cls: + continue + iou = bbox_iou(bbox, gold_bboxes[i]) + if iou >= iou_threshold and (best is None or iou > best_iou): + best, best_iou = i, iou + if best is None: + break + used[best] = True + correct += all(used) + + return 100 * correct / len(gold) + + @staticmethod + def evaluate_file(gold_dataset: Dataset, predictions_file: TextIO) -> float: + predictions = [] + for line in predictions_file: + values = line.split() + if len(values) % 5: + raise RuntimeError("Each prediction must contain multiple of 5 numbers, found {}".format(len(values))) + + predictions.append(([], [])) + for i in range(0, len(values), 5): + predictions[-1][0].append(int(values[i])) + predictions[-1][1].append([float(value) for value in values[i + 1:i + 5]]) + + return SVHN.evaluate(gold_dataset, predictions) + + # Visualization infrastructure. + @staticmethod + def visualize(image: np.ndarray, labels: list[Any], bboxes: list[BBox], show: bool): + """Visualize the given image plus recognized objects. + + Arguments: + - `image` is NumPy input image with pixels in range [0-255]; + - `labels` is a list of labels to be shown using the `str` method; + - `bboxes` is a list of `BBox`es (fourtuples TOP, LEFT, BOTTOM, RIGHT); + - `show` controls whether to show the figure or return it: + - if `True`, the figure is shown using `plt.show()`; + - if `False`, the `plt.Figure` instance is returned; it can be saved + to TensorBoard using a the `add_figure` method of a `SummaryWriter`. + """ + import matplotlib.pyplot as plt + + figure = plt.figure(figsize=(4, 4)) + plt.axis("off") + plt.imshow(np.asarray(image, np.uint8)) + for label, (top, left, bottom, right) in zip(labels, bboxes): + plt.gca().add_patch(plt.Rectangle( + [left, top], right - left, bottom - top, fill=False, edgecolor=[1, 0, 1], linewidth=2)) + plt.gca().text(left, top, str(label), bbox={"facecolor": [1, 0, 1], "alpha": 0.5}, + clip_box=plt.gca().clipbox, clip_on=True, ha="left", va="top") + + if show: + plt.show() + else: + return figure + + +if __name__ == "__main__": + import argparse + parser = argparse.ArgumentParser() + parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate") + parser.add_argument("--visualize", default=None, type=str, help="Prediction file to visualize") + parser.add_argument("--dataset", default="dev", type=str, help="Gold dataset to evaluate") + args = parser.parse_args() + + if args.evaluate: + with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file: + accuracy = SVHN.evaluate_file(getattr(SVHN(), args.dataset), predictions_file) + print("SVHN accuracy: {:.2f}%".format(accuracy)) + + if args.visualize: + with open(args.visualize, "r", encoding="utf-8-sig") as predictions_file: + for line, example in zip(predictions_file, getattr(SVHN(), args.dataset)): + values = line.split() + classes, bboxes = [], [] + for i in range(0, len(values), 5): + classes.append(values[i]) + bboxes.append([float(value) for value in values[i + 1:i + 5]]) + SVHN.visualize(example["image"], classes, bboxes, show=True) diff --git a/labs/team_description.py b/labs/team_description.py index 14ed5e1..1d232bc 100644 --- a/labs/team_description.py +++ b/labs/team_description.py @@ -6,4 +6,7 @@ # # You can find out ReCodEx ID in the URL bar after navigating # to your User profile page. The ID has the following format: -# 01234567-89ab-cdef-0123-456789abcdef. +# Jonas Glerup Røssum +# 31a0a96a-c590-4486-b194-f72765b2ce25 +# Xiao Wang +# 91d4d1d7-b800-4765-96b9-df098ac36a66 diff --git a/lectures/lecture06.md b/lectures/lecture06.md new file mode 100644 index 0000000..3631cee --- /dev/null +++ b/lectures/lecture06.md @@ -0,0 +1,20 @@ +### Lecture: 6. Object Detection +#### Date: Mar 25 +#### Slides: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides/?06 +#### Reading: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides.pdf/npfl138-2324-06.pdf, PDF Slides +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-czech.mp4, CZ Lecture +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-czech.practicals.mp4, CZ Practicals +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-english.mp4, EN Lecture +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-english.practicals.mp4, EN Practicals +#### Questions: #lecture_6_questions +#### Lecture assignment: bboxes_utils +#### Lecture assignment: svhn_competition + +- R-CNN [[R-CNN](https://arxiv.org/abs/1311.2524)] +- Fast R-CNN [[Fast R-CNN](https://arxiv.org/abs/1504.08083)] +- Proposing RoIs using Faster R-CNN [[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497)] +- Mask R-CNN [[Mask R-CNN](https://arxiv.org/abs/1703.06870)] +- Feature Pyramid Networks [[Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144)] +- Focal Loss, RetinaNet [[Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002)] +- _EfficientDet [[EfficientDet: Scalable and Efficient Object Detection](https://arxiv.org/abs/1911.09070)]_ +- Group Normalization [[Group Normalization](https://arxiv.org/abs/1803.08494)] diff --git a/lectures/lecture07.md b/lectures/lecture07.md new file mode 100644 index 0000000..e53fb70 --- /dev/null +++ b/lectures/lecture07.md @@ -0,0 +1,2 @@ +### Lecture: 7. Easter Monday +#### Date: Apr 01 diff --git a/pull.ps1 b/pull.ps1 new file mode 100644 index 0000000..9cadfe4 --- /dev/null +++ b/pull.ps1 @@ -0,0 +1 @@ +git pull upstream master diff --git a/setup.ps1 b/setup.ps1 new file mode 100644 index 0000000..f1f7bbe --- /dev/null +++ b/setup.ps1 @@ -0,0 +1,6 @@ +git remote rename origin upstream +git remote add origin git@github.com:joglr/npfl138.git +git fetch +git checkout master +python -m venv .venv +.venv/Scripts/pip install -r .\labs\requirements.txt diff --git a/slides/06/06.md b/slides/06/06.md new file mode 100644 index 0000000..730d29e --- /dev/null +++ b/slides/06/06.md @@ -0,0 +1,711 @@ +title: NPFL138, Lecture 6 +class: title, langtech, cc-by-sa + +# Object Detection + +## Milan Straka + +### March 25, 2024 + +--- +section: FastR-CNN +class: middle, center +# Beyond Image Classification + +# Beyond Image Classification + +--- +# Beyond Image Classification + +![w=70%,f=right](../01/object_detection.svgz) + +- Object detection (including location) +
+ +~~~ +![w=70%,f=right](../01/image_segmentation.svgz) + +- Image segmentation +
+ +~~~ +![w=70%,f=right](../01/human_pose_estimation.jpg) + +- Human pose estimation + +--- +# Beyond Image Classification + +![w=100%,v=middle](cv_tasks.jpg) + +--- +# Object Localization + +![w=100%](object_localization.png) + +We can perform object localization by jointly predicting the bounding box +coordinates using regression. + +--- +# R-CNN + +![w=42%,f=right](roi_generation.jpg) + +To be able to recognize and localize _several_ objects, assume we were given +multiple interesting regions of the image, called **regions of interest** (RoI). +For each of them, we decide: +- whether it contains an object; +- the location of the object relative to the RoI. + +~~~ +![w=45%,f=right](rcnn_architecture.svgz) + +In R-CNN, we start with a network pre-trained on ImageNet (VGG-16 is used in the +original paper), and we use it to process _every RoI_, rescaling every one of +them to the size of $224×224$. + +~~~ +For every RoI, two sibling heads are added: +- _classification head_ predicts either _background_ or one of $K$ object types + ($K+1$ in total), +~~~ +- _bounding box regression head_ predicts 4 bounding box parameters relative + to RoI. + +--- +# R-CNN – Bounding Boxes + +A bounding box is parametrized as follows. Let $x_r, y_r, w_r, h_r$ be +center coordinates and width and height of the RoI respectively, and let $x, y, w, h$ be +parameters of the bounding box. We represent the bounding box relative +to the RoI as follows: +$$\begin{aligned} +t_x &= (x - x_r)/w_r, & t_y &= (y - y_r)/h_r, \\ +t_w &= \log (w/w_r), & t_h &= \log (h/h_r). +\end{aligned}$$ + +~~~ +In Fast R-CNN, the $\textrm{smooth}_{L_1}$ loss, or **Huber loss**, is employed for bounding box parameters: + +![w=19.5%,f=right](huber_loss.svgz) + +$$\textrm{smooth}_{L_1}(x) = \begin{cases} + 0.5x^2 & \textrm{if }|x| < 1, \\ + |x| - 0.5 & \textrm{otherwise}. +\end{cases}$$ + +~~~ +The complete loss is then ($λ=1$ is used in the Fast R-CNN paper) +$$L(ĉ, t̂, c, t) = L_\textrm{cls}(ĉ, c) + λ ⋅ [c ≥ 1] ⋅ + ∑\nolimits_{i ∈ \lbrace \mathrm{x, y, w, h}\rbrace} \textrm{smooth}_{L_1}(t̂_i - t_i).$$ + +--- +# R-CNN – Bounding Boxes + +The described bounding box representation is usually called `CXCYWH`: + +![w=60%,h=center](bbox_representation_cxcywh.webp) + +--- +# R-CNN – Bounding Boxes + +In the datasets, the bounding boxes are usually represented using `XYXY` format: + +![w=60%,h=center](bbox_representation_xyxy.webp) + +--- +# R-CNN – Bounding Boxes + +Finally, you could also come across the `XYWH` format: + +![w=60%,h=center](bbox_representation_xywh.webp) + +--- +# Fast R-CNN Architecture + +The R-CNN is slow, because it needs to process every RoI by the convolutional +backbone. To speed it up, we might want to first process the whole image by the +backbone and only then extract a fixed-size representation for every RoI. + +~~~ + +We achieve that using **RoI pooling**, replacing the last max-pool $14×14 → 7×7$ +VGG layer. + +![w=50%](roi_projection.svgz)![w=50%,mw=50%,h=center](roi_pooling.svgz) + +During RoI pooling, we obtain a $7×7$ RoI representation by first projecting the +RoI to the $14×14$ resolution and then computing each of the $7×7$ values by +**max-pooling** the corresponding “pixels” of the convolutional image features. + +--- +# Fast R-CNN + +![w=85%,h=center](fast_rcnn_rumcajs.svgz) + +~~~ +![w=85%,h=center](fast_rcnn_vgg.png) + +--- +# Fast R-CNN and R-CNN Comparison + +![w=100%](fast_rcnn_architecture.svgz) + +--- +# Fast R-CNN Architecture + +![w=100%,v=middle](fast_rcnn.jpg) + +--- +# Fast R-CNN Training and Inference + +## Intersection over Union +For two bounding boxes (or two masks) the _intersection over union_ (_IoU_) +is a ratio of the intersection of the boxes (or masks) and the union +of the boxes (or masks). + +~~~ +## Choosing RoIs for Training +During training, we use 2 images with 64 RoIs each. The RoIs are selected +so that 25% have intersection over union (IoU) overlap with ground-truth +boxes at least 0.5; the others are chosen to have the IoU in range $[0.1, 0.5)$, +the so-called _hard examples_. + +~~~ +## Running Inference +During inference, we utilize all RoIs, but a single object can be found in +several of them. To choose the most salient prediction, we perform **non-maximum +suppression** – we ignore predictions which have an overlap with a higher +scoring prediction of the _same class_, where the overlap is computed using IoU +(0.3 threshold is used in the paper). Higher scoring predictions is the ones +with higher probability from the _classification head_. + +--- +# Object Detection Evaluation + +## Average Precision +Evaluation is performed using _Average Precision_ ($\mathit{AP}$ or $\mathit{AP}_{50}$). + +We assume all bounding boxes (or masks) produced by a system have confidence +values which can be used to rank them. Then, for a single class, we take the +boxes (or masks) in the order of the ranks and generate precision/recall curve, +considering a bounding box correct if it has IoU at least 50% with any +ground-truth box. + +![w=60%,mw=50%,h=center](precision_recall_person.svgz)![w=60%,mw=50%,h=center](precision_recall_bottle.svgz) + +--- +# Object Detection Evaluation – Average Precision + +The general idea of AP is to compute the area under the precision/recall curve. + +![w=80%,mw=49%,h=center](precision_recall_curve.png) + +~~~ +![w=80%,mw=49%,h=center](precision_recall_curve_interpolated.jpg) + +We start by interpolating the precision/recall curve, so that it is always +nonincreasing. + +~~~ +![w=80%,mw=49%,h=center,f=right](average_precision.jpg) + +Finally, the average precision for a single class is an average of precision at +recall $0.0, 0.1, 0.2, …, 1.0$. + +~~~ +The final AP is a mean of average precision of all classes. + +--- +class: tablewide +style: table {line-height: 1} +# Object Detection Evaluation – Average Precision + +For the COCO dataset, the AP is computed slightly differently. First, it is an +average over 101 recall points $0.00, 0.01, 0.02, …, 1.00$. + +~~~ +In the original metric, IoU of 50% is enough to consider a prediction valid. +We can generalize the definition to $\mathit{AP}_{t}$, where an object +prediction is considered valid if IoU is at least $t$%. + +~~~ +The main COCO metric, denoted just $\mathit{AP}$, is the mean of +$\mathit{AP}_{50},\mathit{AP}_{55}, \mathit{AP}_{60}, …, \mathit{AP}_{95}$. + +~~~ +| Metric | Description | +|:------:|:------------| +| $\mathit{AP}$ | Mean of $\mathit{AP}_{50},\mathit{AP}_{55}, \mathit{AP}_{60}, \mathit{AP}_{65}, …, \mathit{AP}_{95}$ | +| $\mathit{AP}_{50}$ | AP at IoU 50% | +| $\mathit{AP}_{75}$ | AP at IoU 75% | +~~~ +| $\mathit{AP}_{S}$ | AP for small objects: $\textit{area} < 32^2$ | +| $\mathit{AP}_{M}$ | AP for medium objects: $32^2 < \textit{area} < 96^2$ | +| $\mathit{AP}_{L}$ | AP for large objects: $96^2 < \textit{area}$ | + + +--- +section: FasterR-CNN +# Faster R-CNN + +![w=40%,f=right](fast_rcnn_speed.svgz) + +Even if Fast R-CNN is much faster then R-CNN, it can still be improved, +considering that the most problematic and time consuming part is generating the RoIs. +
+ +~~~ +![w=30%,f=right](faster_rcnn_architecture.png) + +Faster R-CNN extends Fast R-CNN by including a **region proposal +network (RPN)**, whose goal is to generate the RoIs automatically. + +~~~ +The regional proposal networks produces the so-called **region proposals**, +which then play the role of RoIs in the rest of the pipeline (i.e., +the Fast R-CNN). + +~~~ +The region proposals are generated similarly to how predictions are generated +in Fast R-CNN. We start with several **anchors** and from each anchor +we generate either a single region proposal or nothing. + +--- +# Faster R-CNN – Anchors + +If we consider the $14×14$ VGG backbone output, each “pixel” corresponds +to a region of size $16×16$ in the original image. + +![w=45%,h=center](anchor_net.svgz) + +~~~ +We can therefore interpret each value in the $14×14$ output as a representation +of a part of the image _centered_ in the corresponding image region, and try +predicting a region proposal from **every one** of them. + +~~~ +We call the dense grid of image regions from which we are predicting the +proposals the **anchors**. They have fixed size, and in practice we use +_several_ anchors per position. + +--- +# Faster R-CNN + +For every anchor, we classify it in two classes (background, object) +and also predict the region proposal bounding box relatively to the anchor, +exactly as in (Fast) R-CNN. + +~~~ +![w=58%,f=right](faster_rcnn_rpn.svgz) + +We perform the classification and the bounding box regression by first +running a $3×3$ convolution followed by ReLU on the $14×14$ VGG output, +and then attaching the two heads. +~~~ +Assuming there are $A$ anchors on every position: +- the classification head generates $2A$ outputs, performing $\softmax$ on every + 2 of them; +- the regression head generates $4A$ region proposal coordinates. + +~~~ +The authors consider 3 scales $(128^2, 256^2, 512^2)$ and 3 aspect ratios +$(1:1, 1:2, 2:1)$. + +--- +# Faster R-CNN + +During training, we generate +- positive training examples for every anchor that has the highest IoU with + a ground-truth box; +~~~ +- furthermore, a positive example is also any anchor with + IoU at least 0.7 for any ground-truth box; +~~~ +- negative training examples for every anchor that has IoU at most 0.3 with all + ground-truth boxes; +~~~ +- the positive and negative examples are generated with a ratio _up to_ 1:1 + (less, if there are not enough positive examples; each minibatch consits of + a single image and 256 anchors). + +~~~ +During inference, we consider all predicted non-background regions, run +non-maximum suppression on them using a 0.7 IoU threshold, and then take $N$ +top-scored regions (i.e., the ones with the highest probability from the +classification head) – the paper uses 300 proposals, compared to 2000 in the Fast +R-CNN. + +--- +# Faster R-CNN + +![w=94%,h=center](faster_rcnn_performance.svgz) + +--- +# Two-stage Detectors + +The Faster R-CNN is a so-called **two-stage** detector, where the regions are +refined twice – once in the region proposal network, and then in the final +bounding box regressor. + +~~~ +Several **single-stage** detector architectures have been proposed, mainly +because they are faster and smaller, but until circa 2017 the two-stage +detectors achieved better results. + +--- +section: MaskR-CNN +# Mask R-CNN + +Straightforward extension of Faster R-CNN able to produce image segmentation +(i.e., masks for every object). + +![w=100%,mh=80%,v=middle](../01/image_segmentation.svgz) + +--- +# Mask R-CNN – Architecture + +![w=100%,v=middle](mask_rcnn_architecture.png) + +--- +# Mask R-CNN – RoIAlign + +More precise alignment is required for the RoI in order to predict the masks. +Instead of quantization and max-pooling in RoI pooling, **RoIAlign** uses bilinear +interpolation of features at four regularly sampled locations in each RoI bin +and averages them. + +![w=68%,mw=50%,h=center](roi_pooling.svgz)![w=68%,mw=50%,h=center](mask_rcnn_roialign.svgz) + +~~~ +TorchVision provides `torchvision.ops.roi_align` and `torchvision.ops.roi_pool`. + +--- +# Mask R-CNN + +Masks are predicted in a third branch of the object detector. + +- Higher resolution of the mask is usually needed (at least $14×14$, or even more). +- The masks are predicted for each class separately. +- The masks are predicted using convolutions instead of fully connected layers + (the upscaling convolutions are $2×2$ with stride 2). + +![w=79%,h=center](mask_rcnn_heads.svgz) + +~~~ +Improvements from Nov 2021: all convs (except for the output layer) are followed +by BN, the _class&bbox_ head uses 4 convs instead of 2 MLPs, RPN contains +two convs instead of one. + +--- +# Mask R-CNN + +![w=100%,v=middle](mask_rcnn_ablation.svgz) + +--- +# Mask R-CNN – Human Pose Estimation + +![w=80%,h=center](../01/human_pose_estimation.jpg) + +~~~ +- Testing applicability of Mask R-CNN architecture. + +- Keypoints (e.g., left shoulder, right elbow, …) are detected + as independent one-hot masks of size $56×56$ with $\softmax$ output function. + +~~~ +![w=70%,h=center](mask_rcnn_hpe_performance.svgz) + +--- +section: FPN +# Feature Pyramid Networks + +![w=85%,h=center](fpn_overview.svgz) + +--- +# Feature Pyramid Networks + +![w=62%,h=center](fpn_architecture.svgz) + +--- +# Feature Pyramid Networks + +![w=56%,h=center](fpn_architecture_detailed.svgz) + +--- +# Feature Pyramid Networks + +We employ FPN as a backbone in Faster R-CNN. + +~~~ +Assuming ResNet-like network with $224×224$ input, we denote $C_2, C_3, …, C_5$ +the image features of the last convolutional layer of size $56×56, 28×28, …, +7×7$ (i.e., $C_i$ indicates a downscaling of $2^i$). +~~~ +The FPN representations incorporating the smaller resolution features are +denoted as $P_2, …, P_5$, each consisting of 256 channels; the classification +heads are shared. + +~~~ +In both the RPN and the Fast R-CNN, authors utilize the $P_2, …, P_5$ +representations, considering single-size anchors for every $P_i$ (of size +$32^2, 64^2, 128^2, 256^2$, respectively). However, three aspect ratios +$(1:1, 1:2, 2:1)$ are still used. + +~~~ +![w=100%](fpn_results.svgz) + +--- +section: FocalLoss +# Focal Loss + +![w=46%,f=right](fast_rcnn_rumcajs.svgz) + +For single-stage object detection architectures, _class imbalance_ has been +identified as the main issue preventing obtaining performance comparable to +two-stage detectors. In a single-stage detector, there can be tens of thousands +of anchors, with only dozens of useful training examples. + +~~~ +![w=46%,f=right](focal_loss_graph.svgz) + +Cross-entropy loss is computed as +$$𝓛_\textrm{cross-entropy} = -\log p_\textrm{model}(y | x).$$ + +~~~ +Focal-loss (loss focused on hard examples) is proposed as +$$𝓛_\textrm{focal-loss} = -(1 - p_\textrm{model}(y | x))^γ ⋅ \log p_\textrm{model}(y | x).$$ + +--- +# Focal Loss + +For $γ=0$, focal loss is equal to cross-entropy loss. + +~~~ +Authors reported that $γ=2$ worked best for them for training a single-stage +detector. + +~~~ +![w=100%,mh=75%,v=bottom](focal_loss_cdf.svgz) + +--- +# Focal Loss and Class Imbalance + +Focal loss is connected to another solution to class imbalance – we might +introduce weighting factor $α ∈ (0, 1)$ for one class and $1 - α$ for the other +class, arriving at +$$ -α_y ⋅ \log p_\textrm{model}(y | x).$$ + +~~~ +The weight $α$ might be set to the inverse class frequency or treated as +a hyperparameter. + +~~~ +Even if weighting focuses more on low-frequent class, it does not distinguish +between easy and hard examples, contrary to focal loss. + +~~~ +In practice, the focal loss is usually used together with class weighting: +$$ -α_y ⋅ (1 - p_\textrm{model}(y | x))^γ ⋅ \log p_\textrm{model}(y | x).$$ +For example, authors report that $α=0.25$ (weight of the rare class) works best with $γ=2$. + +--- +section: RetinaNet +# RetinaNet + +RetinaNet is a single-stage detector, using feature pyramid network +architecture. Built on top of ResNet architecture, the feature pyramid +contains levels $P_3$ through $P_7$, with each $P_l$ having 256 channels +and resolution $2^l$ times lower than the input. On each pyramid level $P_l$, +we consider 9 anchors for every position, with 3 different aspect ratios ($1$, $1:2$, $2:1$) +and with 3 different sizes $(\{2^0, 2^{1/3}, 2^{2/3}\} ⋅ 4 ⋅ 2^l)^2$. + +~~~ +Note that ResNet provides only $C_3$ to $C_5$ features. $C_6$ is computed +using a $3×3$ convolution with stride 2 on $C_5$, and $C_7$ is obtained +by applying ReLU followed by another $3×3$ stride-2 convolution. The $C_6$ and +$C_7$ are included to improve large object detection. + +--- +# RetinaNet – Architecture + +The classification head and the boundary regression heads are fully +convolutional and do not share parameters (but classification heads are shared +across levels, and so are the boundary regression heads), generating +$\mathit{anchors} ⋅ \mathit{classes}$ sigmoids and $\mathit{anchors}$ bounding +boxes per position. + +![w=100%](retinanet.svgz) + +--- +# RetinaNet + +During training, anchors are assigned to ground-truth object boxes if IoU is at +least 0.5; to background if IoU with any ground-truth region is at most 0.4 +(the rest of anchors is ignored during training). +~~~ +The classification head is trained using focal loss with $γ=2$ and $α=0.25$ (but +according to the paper, all values of $γ$ in $[0.5, 5]$ range work well); the +boundary regression head is trained using $\textrm{smooth}_{L_1}$ loss as in +Fast(er) R-CNN. + +~~~ +During inference, at most 1000 objects with at least 5% probability from all +pyramid levels are considered, and all of them are combined using non-maximum +suppression with a threshold of 0.5. Fixed-size training and testing is used, +with sizes 400, 500, …, 800 pixels. + +~~~ +![w=68%](retinanet_results.svgz)![w=32%](retinanet_graph.svgz) + +--- +# RetinaNet – Ablations + +Ablations use ResNet-50-FPN backbone trained and tested with 600-pixel images. + +![w=80%,h=center](retinanet_ablations.svgz) + +--- +section: EfficientDet +# EfficientDet – Architecture + +EfficientDet builds up on EfficientNet and delivered state-of-the-art performance +in Nov 2019 with minimum time and space requirements (however, its performance +has already been surpassed significantly). It is a single-scale detector similar +to RetinaNet, which: + +~~~ +- uses EfficientNet as a backbone; +~~~ +- employs compound scaling; +~~~ +- uses a newly proposed BiFPN, “efficient bidirectional cross-scale connections + and weighted feature fusion”. + +~~~ +![w=78%,h=center](efficientdet_architecture.svgz) + +--- +# EfficientDet – BiFPN + +In multi-scale fusion in FPN, information flows only from the pyramid levels +with smaller resolution to the levels with higher resolution. + +![w=80%,h=center](efficientdet_bifpn.svgz) + +~~~ +BiFPN consists of several rounds of bidirectional flows. Each bidirectional flow +employs residual connections and does not include nodes that have only one input +edge with no feature fusion. All operations are $3×3$ separable convolutions with +batch normalization and ReLU, upsampling is done by repeating rows and columns +and downsampling by max-pooling. + +--- +# EfficientDet – Weighted BiFPN + +When combining features with different resolutions, it is common to resize them +to the same resolution and sum them – therefore, all set of features are +considered to be of the same importance. The authors however argue that features +from different resolution contribute to the final result _unequally_ and propose +to combine them with trainable weighs. + +~~~ +- **Softmax-based fusion**: In each BiFPN node, we create a trainable weight + $w_i$ for every input $⇶I_i$ and the final combination (after resize, before + a convolution) is + $$∑_i \frac{e^{w_i}}{∑\nolimits_j e^{w_j}} ⇶I_i.$$ + +~~~ +- **Fast normalized fusion**: Authors propose a simpler alternative of + weighting: + $$∑_i \frac{\ReLU(w_i)}{ε + ∑\nolimits_j \ReLU(w_j)} ⇶I_i.$$ + It uses $ε=0.0001$ for stability and is up to 30% faster on a GPU. + + +--- +# EfficientDet – Compound Scaling + +Similar to EfficientNet, authors propose to scale various dimensions of the +network, using a single compound coefficient $ϕ$. + +~~~ +After performing a grid search: +- the width of BiFPN is scaled as $W_\mathit{BiFPN} = 64 ⋅ 1.35^ϕ,$ +- the depth of BiFPN is scaled as $D_\mathit{BiFPN} = 3 + ϕ,$ +- the box/class predictor has the same width as BiFPN and depth $D_\mathit{class} = 3 + \lfloor ϕ/3 \rfloor,$ +- input image resolution increases according to $R_\mathit{image} = 512 + 128 ⋅ ϕ.$ + +![w=45%,h=center](efficientdet_scaling.svgz) + +--- +# EfficientDet – Results + +![w=50%](efficientdet_flops.svgz)![w=50%](efficientdet_size.svgz) + +--- +# EfficientDet – Results + +![w=83%,h=center](efficientdet_results.svgz) + +--- +# EfficientDet – Inference Latencies + +![w=100%](efficientdet_latency.svgz) + +--- +# EfficientDet – Ablations + +Given that EfficientDet employs both a powerful backbone and new BiFPN, authors +quantify the improvement of the individual components. + +![w=49%,h=center](efficientdet_ablations_backbone.svgz) + +~~~ +The comparison with previously used cross-scale fusion architectures is also +provided: + +![w=49%,h=center](efficientdet_ablations_fpn.svgz) + +--- +class: wide +# EfficientDet-D0 Example + +![w=98%,h=center](efficientdet_example.jpg) + +--- +section: GroupNorm +# Normalization + +## Batch Normalization + +Neuron value is normalized across the minibatch, and in case of CNN also across +all positions. + +~~~ +## Layer Normalization + +Neuron value is normalized across the layer. + +~~~ +![w=100%](normalizations.svgz) + +--- +# Group Normalization + +Group Normalization is analogous to Layer normalization, but the channels are +normalized in groups (by default, $G=32$). + +![w=40%,h=center](normalizations.svgz) + +~~~ +![w=40%,h=center](group_norm.svgz) + +--- +# Group Normalization + +![w=78%,h=center](group_norm_vs_batch_norm.svgz) + +--- +# Group Normalization + +![w=65%,h=center](group_norm_coco.svgz) diff --git a/slides/06/anchor_net.svgz b/slides/06/anchor_net.svgz new file mode 100644 index 0000000..a78b80f Binary files /dev/null and b/slides/06/anchor_net.svgz differ diff --git a/slides/06/anchor_net.svgz.ref b/slides/06/anchor_net.svgz.ref new file mode 100644 index 0000000..8473ea0 --- /dev/null +++ b/slides/06/anchor_net.svgz.ref @@ -0,0 +1 @@ +Adapted from slide 65 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf. diff --git a/slides/06/average_precision.jpg b/slides/06/average_precision.jpg new file mode 100644 index 0000000..aa92c3a Binary files /dev/null and b/slides/06/average_precision.jpg differ diff --git a/slides/06/average_precision.jpg.ref b/slides/06/average_precision.jpg.ref new file mode 100644 index 0000000..0bdfae7 --- /dev/null +++ b/slides/06/average_precision.jpg.ref @@ -0,0 +1 @@ +https://miro.medium.com/max/1400/1*naz02wO-XMywlwAdFzF-GA.jpeg diff --git a/slides/06/bbox_representation_cxcywh.webp b/slides/06/bbox_representation_cxcywh.webp new file mode 100644 index 0000000..745ad04 Binary files /dev/null and b/slides/06/bbox_representation_cxcywh.webp differ diff --git a/slides/06/bbox_representation_cxcywh.webp.ref b/slides/06/bbox_representation_cxcywh.webp.ref new file mode 100644 index 0000000..91b33ac --- /dev/null +++ b/slides/06/bbox_representation_cxcywh.webp.ref @@ -0,0 +1 @@ +https://miro.medium.com/1*Z80D7vwD-3UwP16asY-k6A.jpeg diff --git a/slides/06/bbox_representation_xywh.webp b/slides/06/bbox_representation_xywh.webp new file mode 100644 index 0000000..f82925e Binary files /dev/null and b/slides/06/bbox_representation_xywh.webp differ diff --git a/slides/06/bbox_representation_xywh.webp.ref b/slides/06/bbox_representation_xywh.webp.ref new file mode 100644 index 0000000..0e2a026 --- /dev/null +++ b/slides/06/bbox_representation_xywh.webp.ref @@ -0,0 +1 @@ +https://miro.medium.com/1*JLeFS2KIOzSTk6lUp1Ou2w.jpeg diff --git a/slides/06/bbox_representation_xyxy.webp b/slides/06/bbox_representation_xyxy.webp new file mode 100644 index 0000000..2f7d93b Binary files /dev/null and b/slides/06/bbox_representation_xyxy.webp differ diff --git a/slides/06/bbox_representation_xyxy.webp.ref b/slides/06/bbox_representation_xyxy.webp.ref new file mode 100644 index 0000000..7399ff7 --- /dev/null +++ b/slides/06/bbox_representation_xyxy.webp.ref @@ -0,0 +1 @@ +https://miro.medium.com/1*oZcZhzOWKb3kvBHPOHYfow.jpeg diff --git a/slides/06/cv_tasks.jpg b/slides/06/cv_tasks.jpg new file mode 100644 index 0000000..de4459b Binary files /dev/null and b/slides/06/cv_tasks.jpg differ diff --git a/slides/06/cv_tasks.jpg.ref b/slides/06/cv_tasks.jpg.ref new file mode 100644 index 0000000..1f5753a --- /dev/null +++ b/slides/06/cv_tasks.jpg.ref @@ -0,0 +1 @@ +https://www.implantology.or.kr/articles/xml/RvNO/ diff --git a/slides/06/efficientdet_ablations_backbone.svgz b/slides/06/efficientdet_ablations_backbone.svgz new file mode 100644 index 0000000..a73b0d0 Binary files /dev/null and b/slides/06/efficientdet_ablations_backbone.svgz differ diff --git a/slides/06/efficientdet_ablations_backbone.svgz.ref b/slides/06/efficientdet_ablations_backbone.svgz.ref new file mode 100644 index 0000000..8ea6795 --- /dev/null +++ b/slides/06/efficientdet_ablations_backbone.svgz.ref @@ -0,0 +1 @@ +Table 4 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_ablations_fpn.svgz b/slides/06/efficientdet_ablations_fpn.svgz new file mode 100644 index 0000000..ac3affa Binary files /dev/null and b/slides/06/efficientdet_ablations_fpn.svgz differ diff --git a/slides/06/efficientdet_ablations_fpn.svgz.ref b/slides/06/efficientdet_ablations_fpn.svgz.ref new file mode 100644 index 0000000..dd61bd6 --- /dev/null +++ b/slides/06/efficientdet_ablations_fpn.svgz.ref @@ -0,0 +1 @@ +Table 5 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_architecture.svgz b/slides/06/efficientdet_architecture.svgz new file mode 100644 index 0000000..dd376f1 Binary files /dev/null and b/slides/06/efficientdet_architecture.svgz differ diff --git a/slides/06/efficientdet_architecture.svgz.ref b/slides/06/efficientdet_architecture.svgz.ref new file mode 100644 index 0000000..66db1af --- /dev/null +++ b/slides/06/efficientdet_architecture.svgz.ref @@ -0,0 +1 @@ +Figure 3 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_bifpn.svgz b/slides/06/efficientdet_bifpn.svgz new file mode 100644 index 0000000..bc694d3 Binary files /dev/null and b/slides/06/efficientdet_bifpn.svgz differ diff --git a/slides/06/efficientdet_bifpn.svgz.ref b/slides/06/efficientdet_bifpn.svgz.ref new file mode 100644 index 0000000..86130e9 --- /dev/null +++ b/slides/06/efficientdet_bifpn.svgz.ref @@ -0,0 +1 @@ +Figure 2 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_example.jpg b/slides/06/efficientdet_example.jpg new file mode 100644 index 0000000..1f1aa1b Binary files /dev/null and b/slides/06/efficientdet_example.jpg differ diff --git a/slides/06/efficientdet_example.jpg.ref b/slides/06/efficientdet_example.jpg.ref new file mode 100644 index 0000000..2e9aaab --- /dev/null +++ b/slides/06/efficientdet_example.jpg.ref @@ -0,0 +1 @@ +https://github.com/google/automl/blob/master/efficientdet/g3doc/street.jpg diff --git a/slides/06/efficientdet_flops.svgz b/slides/06/efficientdet_flops.svgz new file mode 100644 index 0000000..24d9e8c Binary files /dev/null and b/slides/06/efficientdet_flops.svgz differ diff --git a/slides/06/efficientdet_flops.svgz.ref b/slides/06/efficientdet_flops.svgz.ref new file mode 100644 index 0000000..186b61d --- /dev/null +++ b/slides/06/efficientdet_flops.svgz.ref @@ -0,0 +1 @@ +Figure 1 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_latency.svgz b/slides/06/efficientdet_latency.svgz new file mode 100644 index 0000000..0a5dd99 Binary files /dev/null and b/slides/06/efficientdet_latency.svgz differ diff --git a/slides/06/efficientdet_latency.svgz.ref b/slides/06/efficientdet_latency.svgz.ref new file mode 100644 index 0000000..bb23a56 --- /dev/null +++ b/slides/06/efficientdet_latency.svgz.ref @@ -0,0 +1 @@ +Figure 4 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_results.svgz b/slides/06/efficientdet_results.svgz new file mode 100644 index 0000000..b2e4058 Binary files /dev/null and b/slides/06/efficientdet_results.svgz differ diff --git a/slides/06/efficientdet_results.svgz.ref b/slides/06/efficientdet_results.svgz.ref new file mode 100644 index 0000000..c4f6073 --- /dev/null +++ b/slides/06/efficientdet_results.svgz.ref @@ -0,0 +1 @@ +Table 2 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_scaling.svgz b/slides/06/efficientdet_scaling.svgz new file mode 100644 index 0000000..675dbb8 Binary files /dev/null and b/slides/06/efficientdet_scaling.svgz differ diff --git a/slides/06/efficientdet_scaling.svgz.ref b/slides/06/efficientdet_scaling.svgz.ref new file mode 100644 index 0000000..5f14bba --- /dev/null +++ b/slides/06/efficientdet_scaling.svgz.ref @@ -0,0 +1 @@ +Table 1 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_size.svgz b/slides/06/efficientdet_size.svgz new file mode 100644 index 0000000..f42947b Binary files /dev/null and b/slides/06/efficientdet_size.svgz differ diff --git a/slides/06/efficientdet_size.svgz.ref b/slides/06/efficientdet_size.svgz.ref new file mode 100644 index 0000000..bb23a56 --- /dev/null +++ b/slides/06/efficientdet_size.svgz.ref @@ -0,0 +1 @@ +Figure 4 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/fast_rcnn.jpg b/slides/06/fast_rcnn.jpg new file mode 100644 index 0000000..1803bb5 Binary files /dev/null and b/slides/06/fast_rcnn.jpg differ diff --git a/slides/06/fast_rcnn.jpg.ref b/slides/06/fast_rcnn.jpg.ref new file mode 100644 index 0000000..fbecdb1 --- /dev/null +++ b/slides/06/fast_rcnn.jpg.ref @@ -0,0 +1 @@ +Figure 1 of "Fast R-CNN", https://arxiv.org/abs/1504.08083 diff --git a/slides/06/fast_rcnn_architecture.svgz b/slides/06/fast_rcnn_architecture.svgz new file mode 100644 index 0000000..b7bda19 Binary files /dev/null and b/slides/06/fast_rcnn_architecture.svgz differ diff --git a/slides/06/fast_rcnn_architecture.svgz.ref b/slides/06/fast_rcnn_architecture.svgz.ref new file mode 100644 index 0000000..6efa2ff --- /dev/null +++ b/slides/06/fast_rcnn_architecture.svgz.ref @@ -0,0 +1 @@ +Slide 61 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf. diff --git a/slides/06/fast_rcnn_rumcajs.svgz b/slides/06/fast_rcnn_rumcajs.svgz new file mode 100644 index 0000000..c774a93 Binary files /dev/null and b/slides/06/fast_rcnn_rumcajs.svgz differ diff --git a/slides/06/fast_rcnn_rumcajs.svgz.ref b/slides/06/fast_rcnn_rumcajs.svgz.ref new file mode 100644 index 0000000..3ebdb63 --- /dev/null +++ b/slides/06/fast_rcnn_rumcajs.svgz.ref @@ -0,0 +1 @@ +https://commons.wikimedia.org/wiki/File:Tišnov,_Hajánky,_garážová_ozdoba_(6597).jpg diff --git a/slides/06/fast_rcnn_speed.svgz b/slides/06/fast_rcnn_speed.svgz new file mode 100644 index 0000000..9f24720 Binary files /dev/null and b/slides/06/fast_rcnn_speed.svgz differ diff --git a/slides/06/fast_rcnn_speed.svgz.ref b/slides/06/fast_rcnn_speed.svgz.ref new file mode 100644 index 0000000..436c3bf --- /dev/null +++ b/slides/06/fast_rcnn_speed.svgz.ref @@ -0,0 +1 @@ +Slide 76 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf. diff --git a/slides/06/fast_rcnn_vgg.png b/slides/06/fast_rcnn_vgg.png new file mode 100644 index 0000000..07cfbf0 Binary files /dev/null and b/slides/06/fast_rcnn_vgg.png differ diff --git a/slides/06/fast_rcnn_vgg.png.ref b/slides/06/fast_rcnn_vgg.png.ref new file mode 100644 index 0000000..62ac59b --- /dev/null +++ b/slides/06/fast_rcnn_vgg.png.ref @@ -0,0 +1 @@ +https://en.wikipedia.org/wiki/File:VGG_neural_network.png diff --git a/slides/06/faster_rcnn_architecture.png b/slides/06/faster_rcnn_architecture.png new file mode 100644 index 0000000..8464540 Binary files /dev/null and b/slides/06/faster_rcnn_architecture.png differ diff --git a/slides/06/faster_rcnn_architecture.png.ref b/slides/06/faster_rcnn_architecture.png.ref new file mode 100644 index 0000000..657ebdd --- /dev/null +++ b/slides/06/faster_rcnn_architecture.png.ref @@ -0,0 +1 @@ +Figure 2 of "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", https://arxiv.org/abs/1506.01497 diff --git a/slides/06/faster_rcnn_performance.svgz b/slides/06/faster_rcnn_performance.svgz new file mode 100644 index 0000000..f2ccc58 Binary files /dev/null and b/slides/06/faster_rcnn_performance.svgz differ diff --git a/slides/06/faster_rcnn_performance.svgz.ref b/slides/06/faster_rcnn_performance.svgz.ref new file mode 100644 index 0000000..8796742 --- /dev/null +++ b/slides/06/faster_rcnn_performance.svgz.ref @@ -0,0 +1 @@ +Tables 3 and 4 of "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", https://arxiv.org/abs/1506.01497 diff --git a/slides/06/faster_rcnn_rpn.svgz b/slides/06/faster_rcnn_rpn.svgz new file mode 100644 index 0000000..b493b07 Binary files /dev/null and b/slides/06/faster_rcnn_rpn.svgz differ diff --git a/slides/06/faster_rcnn_rpn.svgz.ref b/slides/06/faster_rcnn_rpn.svgz.ref new file mode 100644 index 0000000..1fac88c --- /dev/null +++ b/slides/06/faster_rcnn_rpn.svgz.ref @@ -0,0 +1 @@ +Figure 3 of "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", https://arxiv.org/abs/1506.01497 diff --git a/slides/06/focal_loss_cdf.svgz b/slides/06/focal_loss_cdf.svgz new file mode 100644 index 0000000..403d6d5 Binary files /dev/null and b/slides/06/focal_loss_cdf.svgz differ diff --git a/slides/06/focal_loss_cdf.svgz.ref b/slides/06/focal_loss_cdf.svgz.ref new file mode 100644 index 0000000..0dd7c12 --- /dev/null +++ b/slides/06/focal_loss_cdf.svgz.ref @@ -0,0 +1 @@ +Figure 4 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002 diff --git a/slides/06/focal_loss_graph.svgz b/slides/06/focal_loss_graph.svgz new file mode 100644 index 0000000..44ebdf2 Binary files /dev/null and b/slides/06/focal_loss_graph.svgz differ diff --git a/slides/06/focal_loss_graph.svgz.ref b/slides/06/focal_loss_graph.svgz.ref new file mode 100644 index 0000000..ccc201a --- /dev/null +++ b/slides/06/focal_loss_graph.svgz.ref @@ -0,0 +1 @@ +Figure 1 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002 diff --git a/slides/06/fpn_architecture.svgz b/slides/06/fpn_architecture.svgz new file mode 100644 index 0000000..af04b27 Binary files /dev/null and b/slides/06/fpn_architecture.svgz differ diff --git a/slides/06/fpn_architecture.svgz.ref b/slides/06/fpn_architecture.svgz.ref new file mode 100644 index 0000000..96d788c --- /dev/null +++ b/slides/06/fpn_architecture.svgz.ref @@ -0,0 +1 @@ +Figure 2 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144 diff --git a/slides/06/fpn_architecture_detailed.svgz b/slides/06/fpn_architecture_detailed.svgz new file mode 100644 index 0000000..ff42dd0 Binary files /dev/null and b/slides/06/fpn_architecture_detailed.svgz differ diff --git a/slides/06/fpn_architecture_detailed.svgz.ref b/slides/06/fpn_architecture_detailed.svgz.ref new file mode 100644 index 0000000..bfb0bc8 --- /dev/null +++ b/slides/06/fpn_architecture_detailed.svgz.ref @@ -0,0 +1 @@ +Figure 3 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144 diff --git a/slides/06/fpn_overview.svgz b/slides/06/fpn_overview.svgz new file mode 100644 index 0000000..c6c1574 Binary files /dev/null and b/slides/06/fpn_overview.svgz differ diff --git a/slides/06/fpn_overview.svgz.ref b/slides/06/fpn_overview.svgz.ref new file mode 100644 index 0000000..c00542b --- /dev/null +++ b/slides/06/fpn_overview.svgz.ref @@ -0,0 +1 @@ +Figure 1 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144 diff --git a/slides/06/fpn_results.svgz b/slides/06/fpn_results.svgz new file mode 100644 index 0000000..02db310 Binary files /dev/null and b/slides/06/fpn_results.svgz differ diff --git a/slides/06/fpn_results.svgz.ref b/slides/06/fpn_results.svgz.ref new file mode 100644 index 0000000..8ced9a5 --- /dev/null +++ b/slides/06/fpn_results.svgz.ref @@ -0,0 +1 @@ +Table 4 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144 diff --git a/slides/06/group_norm.svgz b/slides/06/group_norm.svgz new file mode 100644 index 0000000..0be782b Binary files /dev/null and b/slides/06/group_norm.svgz differ diff --git a/slides/06/group_norm.svgz.ref b/slides/06/group_norm.svgz.ref new file mode 100644 index 0000000..6e47f02 --- /dev/null +++ b/slides/06/group_norm.svgz.ref @@ -0,0 +1 @@ +Figure 1 of "Group Normalization", https://arxiv.org/abs/1803.08494 diff --git a/slides/06/group_norm_coco.svgz b/slides/06/group_norm_coco.svgz new file mode 100644 index 0000000..fe964af Binary files /dev/null and b/slides/06/group_norm_coco.svgz differ diff --git a/slides/06/group_norm_coco.svgz.ref b/slides/06/group_norm_coco.svgz.ref new file mode 100644 index 0000000..86ea266 --- /dev/null +++ b/slides/06/group_norm_coco.svgz.ref @@ -0,0 +1 @@ +Tables 4 and 5 of "Group Normalization", https://arxiv.org/abs/1803.08494 diff --git a/slides/06/group_norm_vs_batch_norm.svgz b/slides/06/group_norm_vs_batch_norm.svgz new file mode 100644 index 0000000..2c017ac Binary files /dev/null and b/slides/06/group_norm_vs_batch_norm.svgz differ diff --git a/slides/06/group_norm_vs_batch_norm.svgz.ref b/slides/06/group_norm_vs_batch_norm.svgz.ref new file mode 100644 index 0000000..e6c9431 --- /dev/null +++ b/slides/06/group_norm_vs_batch_norm.svgz.ref @@ -0,0 +1 @@ +Figures 4 and 5 of "Group Normalization", https://arxiv.org/abs/1803.08494 diff --git a/slides/06/huber_loss.py b/slides/06/huber_loss.py new file mode 100644 index 0000000..f6f93d2 --- /dev/null +++ b/slides/06/huber_loss.py @@ -0,0 +1,22 @@ +#!/usr/bin/env python3 +import os + +import matplotlib +import matplotlib.pyplot as plt +import numpy as np + +matplotlib.rcParams["mathtext.fontset"] = "cm" + +xs = np.linspace(-3, 3, 51) +l2 = xs * xs / 2 +huber = np.where(np.abs(xs) <= 1, xs * xs / 2, np.abs(xs) - 0.5) +d_huber = np.where(np.abs(xs) <= 1, xs, np.sign(xs)) + +plt.figure(figsize=(5, 3.5)) +plt.plot(xs, l2, label="L2 loss $\\frac{1}{2} x^2$") +plt.plot(xs, huber, label="Huber loss") +plt.plot(xs, d_huber, label="Huber loss derivative") +plt.gca().set_aspect(1) +plt.grid(True) +plt.legend(loc="upper center") +plt.savefig("huber_loss.svg", bbox_inches="tight", transparent=True) diff --git a/slides/06/huber_loss.svgz b/slides/06/huber_loss.svgz new file mode 100644 index 0000000..a3362fa Binary files /dev/null and b/slides/06/huber_loss.svgz differ diff --git a/slides/06/huber_loss.svgz.ref b/slides/06/huber_loss.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/06/mask_rcnn_ablation.svgz b/slides/06/mask_rcnn_ablation.svgz new file mode 100644 index 0000000..1b6b8e2 Binary files /dev/null and b/slides/06/mask_rcnn_ablation.svgz differ diff --git a/slides/06/mask_rcnn_ablation.svgz.ref b/slides/06/mask_rcnn_ablation.svgz.ref new file mode 100644 index 0000000..8877b9d --- /dev/null +++ b/slides/06/mask_rcnn_ablation.svgz.ref @@ -0,0 +1 @@ +Table 2 of "Mask R-CNN", https://arxiv.org/abs/1703.06870 diff --git a/slides/06/mask_rcnn_architecture.png b/slides/06/mask_rcnn_architecture.png new file mode 100644 index 0000000..5b9e6ed Binary files /dev/null and b/slides/06/mask_rcnn_architecture.png differ diff --git a/slides/06/mask_rcnn_architecture.png.ref b/slides/06/mask_rcnn_architecture.png.ref new file mode 100644 index 0000000..2d5bd13 --- /dev/null +++ b/slides/06/mask_rcnn_architecture.png.ref @@ -0,0 +1 @@ +Figure 1 of "Mask R-CNN", https://arxiv.org/abs/1703.06870 diff --git a/slides/06/mask_rcnn_heads.svgz b/slides/06/mask_rcnn_heads.svgz new file mode 100644 index 0000000..f5c90b1 Binary files /dev/null and b/slides/06/mask_rcnn_heads.svgz differ diff --git a/slides/06/mask_rcnn_heads.svgz.ref b/slides/06/mask_rcnn_heads.svgz.ref new file mode 100644 index 0000000..5e303ff --- /dev/null +++ b/slides/06/mask_rcnn_heads.svgz.ref @@ -0,0 +1 @@ +Figure 4 of "Mask R-CNN", https://arxiv.org/abs/1703.06870 diff --git a/slides/06/mask_rcnn_hpe_performance.svgz b/slides/06/mask_rcnn_hpe_performance.svgz new file mode 100644 index 0000000..b79f401 Binary files /dev/null and b/slides/06/mask_rcnn_hpe_performance.svgz differ diff --git a/slides/06/mask_rcnn_hpe_performance.svgz.ref b/slides/06/mask_rcnn_hpe_performance.svgz.ref new file mode 100644 index 0000000..19c0665 --- /dev/null +++ b/slides/06/mask_rcnn_hpe_performance.svgz.ref @@ -0,0 +1 @@ +Table 4 of "Mask R-CNN", https://arxiv.org/abs/1703.06870 diff --git a/slides/06/mask_rcnn_roialign.svgz b/slides/06/mask_rcnn_roialign.svgz new file mode 100644 index 0000000..0cefb39 Binary files /dev/null and b/slides/06/mask_rcnn_roialign.svgz differ diff --git a/slides/06/mask_rcnn_roialign.svgz.ref b/slides/06/mask_rcnn_roialign.svgz.ref new file mode 100644 index 0000000..b4070e5 --- /dev/null +++ b/slides/06/mask_rcnn_roialign.svgz.ref @@ -0,0 +1 @@ +Figure 3 of "Mask R-CNN", https://arxiv.org/abs/1703.06870 diff --git a/slides/06/normalizations.svgz b/slides/06/normalizations.svgz new file mode 100644 index 0000000..6230387 Binary files /dev/null and b/slides/06/normalizations.svgz differ diff --git a/slides/06/normalizations.svgz.ref b/slides/06/normalizations.svgz.ref new file mode 100644 index 0000000..7b89167 --- /dev/null +++ b/slides/06/normalizations.svgz.ref @@ -0,0 +1 @@ +Figure 2 of "Group Normalization", https://arxiv.org/abs/1803.08494 diff --git a/slides/06/object_localization.png b/slides/06/object_localization.png new file mode 100644 index 0000000..a6d3c85 Binary files /dev/null and b/slides/06/object_localization.png differ diff --git a/slides/06/object_localization.png.ref b/slides/06/object_localization.png.ref new file mode 100644 index 0000000..b84eac5 --- /dev/null +++ b/slides/06/object_localization.png.ref @@ -0,0 +1 @@ +Slide 38 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf. diff --git a/slides/06/precision_recall_bottle.svgz b/slides/06/precision_recall_bottle.svgz new file mode 100644 index 0000000..41de99d Binary files /dev/null and b/slides/06/precision_recall_bottle.svgz differ diff --git a/slides/06/precision_recall_bottle.svgz.ref b/slides/06/precision_recall_bottle.svgz.ref new file mode 100644 index 0000000..5a828ee --- /dev/null +++ b/slides/06/precision_recall_bottle.svgz.ref @@ -0,0 +1 @@ +Figure 6 of "The PASCAL Visual Object Classes (VOC) Challenge", http://homepages.inf.ed.ac.uk/ckiw/postscript/ijcv_voc09.pdf diff --git a/slides/06/precision_recall_curve.png b/slides/06/precision_recall_curve.png new file mode 100644 index 0000000..13f8fb9 Binary files /dev/null and b/slides/06/precision_recall_curve.png differ diff --git a/slides/06/precision_recall_curve.png.ref b/slides/06/precision_recall_curve.png.ref new file mode 100644 index 0000000..fc537f8 --- /dev/null +++ b/slides/06/precision_recall_curve.png.ref @@ -0,0 +1 @@ +https://miro.medium.com/max/1400/1*VenTq4IgxjmIpOXWdFb-jg.png diff --git a/slides/06/precision_recall_curve_interpolated.jpg b/slides/06/precision_recall_curve_interpolated.jpg new file mode 100644 index 0000000..817eae0 Binary files /dev/null and b/slides/06/precision_recall_curve_interpolated.jpg differ diff --git a/slides/06/precision_recall_curve_interpolated.jpg.ref b/slides/06/precision_recall_curve_interpolated.jpg.ref new file mode 100644 index 0000000..9a840d2 --- /dev/null +++ b/slides/06/precision_recall_curve_interpolated.jpg.ref @@ -0,0 +1 @@ +https://miro.medium.com/max/1400/1*pmSxeb4EfdGnzT6Xa68GEQ.jpeg diff --git a/slides/06/precision_recall_person.svgz b/slides/06/precision_recall_person.svgz new file mode 100644 index 0000000..808dd55 Binary files /dev/null and b/slides/06/precision_recall_person.svgz differ diff --git a/slides/06/precision_recall_person.svgz.ref b/slides/06/precision_recall_person.svgz.ref new file mode 100644 index 0000000..5a828ee --- /dev/null +++ b/slides/06/precision_recall_person.svgz.ref @@ -0,0 +1 @@ +Figure 6 of "The PASCAL Visual Object Classes (VOC) Challenge", http://homepages.inf.ed.ac.uk/ckiw/postscript/ijcv_voc09.pdf diff --git a/slides/06/pyramidnet_architecture.svgz b/slides/06/pyramidnet_architecture.svgz new file mode 100644 index 0000000..d773f10 Binary files /dev/null and b/slides/06/pyramidnet_architecture.svgz differ diff --git a/slides/06/pyramidnet_architecture.svgz.ref b/slides/06/pyramidnet_architecture.svgz.ref new file mode 100644 index 0000000..321784e --- /dev/null +++ b/slides/06/pyramidnet_architecture.svgz.ref @@ -0,0 +1 @@ +Table 1 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915 diff --git a/slides/06/pyramidnet_blocks.svgz b/slides/06/pyramidnet_blocks.svgz new file mode 100644 index 0000000..077785f Binary files /dev/null and b/slides/06/pyramidnet_blocks.svgz differ diff --git a/slides/06/pyramidnet_blocks.svgz.ref b/slides/06/pyramidnet_blocks.svgz.ref new file mode 100644 index 0000000..2fde23d --- /dev/null +++ b/slides/06/pyramidnet_blocks.svgz.ref @@ -0,0 +1 @@ +Figure 1 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915 diff --git a/slides/06/pyramidnet_cifar.svgz b/slides/06/pyramidnet_cifar.svgz new file mode 100644 index 0000000..4f2b985 Binary files /dev/null and b/slides/06/pyramidnet_cifar.svgz differ diff --git a/slides/06/pyramidnet_cifar.svgz.ref b/slides/06/pyramidnet_cifar.svgz.ref new file mode 100644 index 0000000..bc183f0 --- /dev/null +++ b/slides/06/pyramidnet_cifar.svgz.ref @@ -0,0 +1 @@ +Table 4 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915 diff --git a/slides/06/pyramidnet_growth_rate.svgz b/slides/06/pyramidnet_growth_rate.svgz new file mode 100644 index 0000000..5474788 Binary files /dev/null and b/slides/06/pyramidnet_growth_rate.svgz differ diff --git a/slides/06/pyramidnet_growth_rate.svgz.ref b/slides/06/pyramidnet_growth_rate.svgz.ref new file mode 100644 index 0000000..12ee550 --- /dev/null +++ b/slides/06/pyramidnet_growth_rate.svgz.ref @@ -0,0 +1 @@ +Figure 2 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915 diff --git a/slides/06/pyramidnet_residuals.svgz b/slides/06/pyramidnet_residuals.svgz new file mode 100644 index 0000000..c4290c1 Binary files /dev/null and b/slides/06/pyramidnet_residuals.svgz differ diff --git a/slides/06/pyramidnet_residuals.svgz.ref b/slides/06/pyramidnet_residuals.svgz.ref new file mode 100644 index 0000000..b53108d --- /dev/null +++ b/slides/06/pyramidnet_residuals.svgz.ref @@ -0,0 +1 @@ +Figure 5 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915 diff --git a/slides/06/rcnn_architecture.svgz b/slides/06/rcnn_architecture.svgz new file mode 100644 index 0000000..0a7cf0e Binary files /dev/null and b/slides/06/rcnn_architecture.svgz differ diff --git a/slides/06/rcnn_architecture.svgz.ref b/slides/06/rcnn_architecture.svgz.ref new file mode 100644 index 0000000..1a20f30 --- /dev/null +++ b/slides/06/rcnn_architecture.svgz.ref @@ -0,0 +1 @@ +Slide 54 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf. diff --git a/slides/06/retinanet.svgz b/slides/06/retinanet.svgz new file mode 100644 index 0000000..60fe4c1 Binary files /dev/null and b/slides/06/retinanet.svgz differ diff --git a/slides/06/retinanet.svgz.ref b/slides/06/retinanet.svgz.ref new file mode 100644 index 0000000..aab04d0 --- /dev/null +++ b/slides/06/retinanet.svgz.ref @@ -0,0 +1 @@ +Figure 3 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002 diff --git a/slides/06/retinanet_ablations.svgz b/slides/06/retinanet_ablations.svgz new file mode 100644 index 0000000..aec5956 Binary files /dev/null and b/slides/06/retinanet_ablations.svgz differ diff --git a/slides/06/retinanet_ablations.svgz.ref b/slides/06/retinanet_ablations.svgz.ref new file mode 100644 index 0000000..1e51d14 --- /dev/null +++ b/slides/06/retinanet_ablations.svgz.ref @@ -0,0 +1 @@ +Table 1 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002 diff --git a/slides/06/retinanet_graph.svgz b/slides/06/retinanet_graph.svgz new file mode 100644 index 0000000..299a928 Binary files /dev/null and b/slides/06/retinanet_graph.svgz differ diff --git a/slides/06/retinanet_graph.svgz.ref b/slides/06/retinanet_graph.svgz.ref new file mode 100644 index 0000000..b54356d --- /dev/null +++ b/slides/06/retinanet_graph.svgz.ref @@ -0,0 +1 @@ +Figure 2 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002 diff --git a/slides/06/retinanet_results.svgz b/slides/06/retinanet_results.svgz new file mode 100644 index 0000000..80a5c4d Binary files /dev/null and b/slides/06/retinanet_results.svgz differ diff --git a/slides/06/retinanet_results.svgz.ref b/slides/06/retinanet_results.svgz.ref new file mode 100644 index 0000000..38a2dcf --- /dev/null +++ b/slides/06/retinanet_results.svgz.ref @@ -0,0 +1 @@ +Table 2 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002 diff --git a/slides/06/roi_generation.jpg b/slides/06/roi_generation.jpg new file mode 100644 index 0000000..18f7350 Binary files /dev/null and b/slides/06/roi_generation.jpg differ diff --git a/slides/06/roi_generation.jpg.ref b/slides/06/roi_generation.jpg.ref new file mode 100644 index 0000000..fbb2b02 --- /dev/null +++ b/slides/06/roi_generation.jpg.ref @@ -0,0 +1 @@ +Slide 48 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf. diff --git a/slides/06/roi_pooling.svgz b/slides/06/roi_pooling.svgz new file mode 100644 index 0000000..b5d6c0d Binary files /dev/null and b/slides/06/roi_pooling.svgz differ diff --git a/slides/06/roi_pooling.svgz.ref b/slides/06/roi_pooling.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/06/roi_projection.svgz b/slides/06/roi_projection.svgz new file mode 100644 index 0000000..a6aee2e Binary files /dev/null and b/slides/06/roi_projection.svgz differ diff --git a/slides/06/roi_projection.svgz.ref b/slides/06/roi_projection.svgz.ref new file mode 100644 index 0000000..1cc5acc --- /dev/null +++ b/slides/06/roi_projection.svgz.ref @@ -0,0 +1 @@ +Slide 65 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf. diff --git a/tasks/bboxes_utils.md b/tasks/bboxes_utils.md new file mode 100644 index 0000000..64c58c8 --- /dev/null +++ b/tasks/bboxes_utils.md @@ -0,0 +1,26 @@ +### Assignment: bboxes_utils +#### Date: Deadline: Apr 09, 22:00 +#### Points: 2 points + +This is a preparatory assignment for `svhn_competition`. The goal is to +implement several bounding box manipulation routines in the +[bboxes_utils.py](https://github.com/ufal/npfl138/tree/master/labs/06/bboxes_utils.py) +module. Notably, you need to implement the following methods: +- `bboxes_to_rcnn`: convert given bounding boxes to a R-CNN-like + representation relative to the given anchors; +- `bboxes_from_rcnn`: convert R-CNN-like representations relative to + given anchors back to bounding boxes; +- `bboxes_training`: given a list of anchors and gold objects, assign gold + objects to anchors and generate suitable training data (the exact algorithm + is described in the template). + +The [bboxes_utils.py](https://github.com/ufal/npfl138/tree/master/labs/06/bboxes_utils.py) +contains simple unit tests, which are evaluated when executing the module, +which you can use to check the validity of your implementation. Note that +the template does not contain type annotations because Python typing system is +not flexible enough to describe the tensor shape changes. + +When submitting to ReCodEx, the method `main` is executed, returning the +implemented `bboxes_to_rcnn`, `bboxes_from_rcnn` and `bboxes_training` +methods. These methods are then executed and compared to the reference +implementation. diff --git a/tasks/cags_classification.md b/tasks/cags_classification.md index eedad3a..42009a4 100644 --- a/tasks/cags_classification.md +++ b/tasks/cags_classification.md @@ -31,8 +31,8 @@ estimates on the batch) or in inference regime. There is one exception though inference regime even when `training == True`._ The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits a solution -which achieves at least _93%_ test set accuracy will get 4 points; the rest -5 points will be distributed depending on relative ordering of your solutions. +achieving at least _93%_ test set accuracy gets 4 points; the remaining +5 bonus points are distributed depending on relative ordering of your solutions. You may want to start with the [cags_classification.py](https://github.com/ufal/npfl138/tree/master/labs/05/cags_classification.py) diff --git a/tasks/cags_segmentation.md b/tasks/cags_segmentation.md index 7669d70..d9677e5 100644 --- a/tasks/cags_segmentation.md +++ b/tasks/cags_segmentation.md @@ -18,8 +18,8 @@ module, which can also evaluate your predictions (either by running with `evaluate_segmentation_file` method). The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits a solution -which achieves at least _87%_ test set IoU gets 4 points; the rest -5 points will be distributed depending on relative ordering of your solutions. +achieving at least _87%_ test set IoU gets 4 points; the remaining +5 bonus points are distributed depending on relative ordering of your solutions. You may want to start with the [cags_segmentation.py](https://github.com/ufal/npfl138/tree/master/labs/05/cags_segmentation.py) diff --git a/tasks/cifar_competition.md b/tasks/cifar_competition.md index ae455fb..9505c18 100644 --- a/tasks/cifar_competition.md +++ b/tasks/cifar_competition.md @@ -8,8 +8,9 @@ You can load the data using the module. Note that the test set is different than that of official CIFAR-10. The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits a solution -which achieves at least _70%_ test set accuracy will get 4 points; the rest -5 points will be distributed depending on relative ordering of your solutions. +achieving at least _70%_ test set accuracy gets 4 points; the remaining +5 bonus points are distributed depending on relative ordering of your solutions. + Note that my solutions usually need to achieve around ~85% on the development set to score 70% on the test set. diff --git a/tasks/cnn_manual.md b/tasks/cnn_manual.md index 6d1483a..06f3a83 100644 --- a/tasks/cnn_manual.md +++ b/tasks/cnn_manual.md @@ -12,9 +12,9 @@ activation and `valid` padding, specified in the `args.cnn` option. The `args.cnn` contains comma-separated layer specifications in the format `filters-kernel_size-stride`. -Of course, you cannot use any TensorFlow convolutional operation (instead, +Of course, you cannot use any PyTorch convolutional operation (instead, implement the forward and backward pass using matrix multiplication and other -operations), nor the `tf.GradientTape` for gradient computation. +operations), nor the `.backward()` for gradient computation. To make debugging easier, the template supports a `--verify` option, which allows comparing the forward pass and the three gradients you compute in the diff --git a/tasks/mnist_ensemble.md b/tasks/mnist_ensemble.md index 9deb3d5..7bee385 100644 --- a/tasks/mnist_ensemble.md +++ b/tasks/mnist_ensemble.md @@ -8,7 +8,7 @@ Your goal in this assignment is to implement model ensembling. The [mnist_ensemble.py](https://github.com/ufal/npfl138/tree/master/labs/03/mnist_ensemble.py) template trains `args.models` individual models, and your goal is to perform an ensemble of the first model, first two models, first three models, …, all -models, and evaluate their accuracy on the test set. +models, and evaluate their accuracy on the development set. #### Tests Start: mnist_ensemble_tests _Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._ diff --git a/tasks/sgd_manual.md b/tasks/sgd_manual.md index fd268e7..d054d94 100644 --- a/tasks/sgd_manual.md +++ b/tasks/sgd_manual.md @@ -17,7 +17,7 @@ Start with the [sgd_manual.py](https://github.com/ufal/npfl138/tree/master/labs/02/sgd_manual.py) template, which is based on [sgd_backpropagation.py](https://github.com/ufal/npfl138/tree/master/labs/02/sgd_backpropagation.py) -one. Be aware that these templates generates each a different output file. +one. Note that ReCodEx disables the PyTorch automatic differentiation during evaluation. diff --git a/tasks/svhn_competition.md b/tasks/svhn_competition.md new file mode 100644 index 0000000..902484f --- /dev/null +++ b/tasks/svhn_competition.md @@ -0,0 +1,44 @@ +### Assignment: svhn_competition +#### Date: Deadline: Apr 09, 22:00 +#### Points: 5 points+5 bonus + +The goal of this assignment is to implement a system performing object +recognition, optionally utilizing the pretrained EfficientNetV2-B0 backbone +(or any other model from `keras.applications`). + +The [Street View House Numbers (SVHN) dataset](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/demos/svhn_train.html) +annotates for every photo all digits appearing on it, including their bounding +boxes. The dataset can be loaded using the [svhn_dataset.py](https://github.com/ufal/npfl138/tree/master/labs/06/svhn_dataset.py) +module. Similarly to the `CAGS` dataset, the `train/dev/test` are PyTorch +`torch.utils.data.Dataset`s, and every element is a dictionary with the following keys: +- `"image"`: a square 3-channel image stored using PyTorch tensor of type `torch.uint8`, +- `"classes"`: a 1D `np.ndarray` with all digit labels appearing in the image, +- `"bboxes"`: a `[num_digits, 4]` 2D `np.ndarray` with bounding boxes of every + digit in the image, each represented as `[TOP, LEFT, BOTTOM, RIGHT]`. + +Each test set image annotation consists of a sequence of space separated +five-tuples _label top left bottom right_, and the annotation is considered +correct, if exactly the gold digits are predicted, each with IoU at least 0.5. +The whole test set score is then the prediction accuracy of individual images. +You can again evaluate your predictions using the +[svhn_dataset.py](https://github.com/ufal/npfl138/tree/master/labs/06/svhn_dataset.py) +module, either by running with `--evaluate=path` arguments, or using its +`evaluate_file` method. + +The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). +Everyone who submits a solution achieving at least _20%_ test set accuracy gets +5 points; the remaining 5 bonus points are distributed depending on relative ordering +of your solutions. Note that I usually need at least _35%_ development set +accuracy to achieve the required test set performance. + +You should start with the +[svhn_competition.py](https://github.com/ufal/npfl138/tree/master/labs/06/svhn_competition.py) +template, which generates the test set annotation in the required format. + +_A baseline solution can use RetinaNet-like single stage detector, +using only a single level of convolutional features (no FPN) +with single-scale and single-aspect anchors. Focal loss is available +as [keras.losses.BinaryFocalCrossentropy](https://keras.io/api/losses/probabilistic_losses/#binaryfocalcrossentropy-class) +and non-maximum suppression as +[torchvision.ops.nms](https://pytorch.org/vision/main/generated/torchvision.ops.nms.html#nms) or +[torchvision.ops.batched_nms](https://pytorch.org/vision/main/generated/torchvision.ops.batched_nms.html#batched-nms)._ diff --git a/tasks/uppercase.md b/tasks/uppercase.md index 66288d4..5c9a9e4 100644 --- a/tasks/uppercase.md +++ b/tasks/uppercase.md @@ -15,8 +15,8 @@ only used to understand the approach you took, and to indicate teams). Explicitly, submit **exactly one .txt file** and **at least one .py/ipynb file**. The task is also a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits -a solution which achieves at least _98.5%_ accuracy will get 4 basic points; the -5 bonus points will be distributed depending on relative ordering of your +a solution achieving at least _98.5%_ accuracy gets 4 basic points; the +remaining 5 bonus points are distributed depending on relative ordering of your solutions. The accuracy is computed per-character and can be evaluated by running [uppercase_data.py](https://github.com/ufal/npfl138/tree/master/labs/03/uppercase_data.py) with `--evaluate` argument, or using its `evaluate_file` method.