Skip to content

Commit 19c46b6

Browse files
committed
Updated Practical Guidance
1 parent 5fdb001 commit 19c46b6

File tree

4 files changed

+198
-191
lines changed

4 files changed

+198
-191
lines changed

dl-overview.md

Lines changed: 0 additions & 191 deletions
This file was deleted.

dl-practical-guidance.md

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
# [fit] Deep Learning Practical Guidance
2+
_**Getting Started with Image & Text**_
3+
<br>
4+
<br>
5+
<br>
6+
7+
8+
**Amit Kapoor** [@amitkaps](http://amitkaps.com)
9+
**Bargava Subramanian** [@bargava](http://bargava.com)
10+
**Anand Chitipothu** [@anandology](http://anandology.com)
11+
12+
---
13+
14+
# Bootcamp Approach
15+
16+
- **Domain**: Image & Text
17+
- **Applied**: Proven & Practical
18+
- **Intuition**: Visualisation & Analogies
19+
- **Code**: Learning by Doing
20+
- **Math**: Attend HackerMath!
21+
22+
---
23+
24+
# Learning Paradigm
25+
26+
![inline 120%](img/learning_paradigm.png)
27+
28+
---
29+
30+
# Learning Types & Applications
31+
32+
- **Supervised**: Regression, Classification, ...
33+
- Unsupervised: Dimensionality Reduction, Clustering, ...
34+
- Self (Semi)-supervised: Auto-encoders, Generative Adversarial Network, ...
35+
- Reinforcement Learning: Games, Self-Driving Car, Robotics, ...
36+
37+
---
38+
39+
# Focus: Supervised Learning
40+
- **Classification**: Image, Text, Speech, Translation
41+
- Sequence generation: Given a picture, predict a caption describing it.
42+
- Syntax tree prediction: Given a sentence, predict its decomposition into a syntax tree.
43+
- Object detection: Given a picture, draw a bounding box around certain objects inside the picture.
44+
- Image segmentation: Given a picture, draw a pixel-level mask on a specific object.
45+
46+
---
47+
48+
# Learning Approach
49+
50+
<br>
51+
52+
![original 110%](img/learning_approach.png)
53+
54+
---
55+
56+
# Data Representation: Tensors
57+
58+
- Numpy arrays (aka Tensors)
59+
- Generalised form of matrix (2D array)
60+
- Attributes
61+
- Axes or Rank: `ndim`
62+
- Dimensions: `shape` e.g. (5, 3)
63+
- Data Type: `dtype` e.g. `float32`, `uint8`, `float64`
64+
65+
---
66+
67+
# Tensor Types
68+
69+
- **Scalar**: 0D Tensor
70+
- **Vector**: 1D Tensor
71+
- **Matrix**: 2D Tensor
72+
- **Higher-order**: 3D, 4D or 5D Tensor
73+
74+
---
75+
76+
# Input $$X$$
77+
78+
| Tensor | Example | Shape |
79+
|:-------|:---------|:----------------------|
80+
| 2D | Tabular | (samples, features) |
81+
| 3D | Sequence | (samples, steps, features) |
82+
| 4D | Images | (samples, height, width, channels) |
83+
| 5D | Videos | (samples, frames, height, width, channels) |
84+
85+
---
86+
87+
# Learning Unit
88+
89+
$$ y = RELU(dot(w,x) + bias) $$
90+
**weights** are $$ w_1 ... w_n$$ & **activation** is RELU $$ f(z) = max(z,0) $$
91+
92+
![inline 200%](img/learning_unit.png)
93+
94+
95+
---
96+
97+
# Model Architecture
98+
99+
Basic Model: **Sequential** - A linear stack of layers.
100+
101+
Core Layers
102+
- Dense Layers: Fully connected layer of learning units (Also called Multi-layer Perceptron)
103+
- Flatten
104+
105+
---
106+
107+
# Output $$y$$ & Loss
108+
109+
110+
| $$y$$ | Last Layer Activation | Loss Function |
111+
|:-------|:---------|:----------------------|
112+
| Binary Class | sigmoid | Binary Crossentropy |
113+
| Multi Class | softmax | Categorical Crossentropy |
114+
| Multi Class Multi Label | sigmoid | Binary Crossentropy |
115+
| Regression | None | Mean Square Error |
116+
| Regression (0-1) | sigmoid | MSE or Binary Crossentropy |
117+
118+
---
119+
120+
# Optimizers
121+
122+
- **SGD**: Excellent but requires tuning learning-rate decay, and momentum parameters
123+
- **RMSProp**: Good for RNNs
124+
- **Adam**: Adaptive momentum optimiser, generally a good starting point.
125+
126+
127+
---
128+
129+
# Guidance for DL
130+
> *General guidance on building and training neural networks. Treat them as heuristics (derived from experimentation) and as good starting points for your own explorations.*
131+
132+
---
133+
134+
# Pre-Processing
135+
- **Normalize** / **Whiten** your data (Not for text!)
136+
- **Scale** your data appropriately (for outlier)
137+
- Handle **Missing Values** - Make them 0 (Ensure it exists in training)
138+
- Create **Training & Validation Split**
139+
- **Stratified** split for multi-class data
140+
- **Shuffle** data for non-sequence data. Careful for sequence!!
141+
142+
---
143+
144+
# General Architecture
145+
- Use **ADAM** Optimizer (to start with)
146+
- Use **RELU** for non-linear activation (Faster for learning than others)
147+
- Add **Bias** to each layer
148+
- Use **Xavier** or **Variance-Scaling** initialisation (Better than random initialisation)
149+
- Refer to output layers activation & loss function guidance for tasks
150+
151+
---
152+
153+
# Dense / MLP Architecture
154+
- No. of units reduce in deeper layer
155+
- Units are typically $$2^n$$
156+
- Don't use more than 4 - 5 layers in dense networks
157+
158+
---
159+
160+
# CNN Architecture (for Images)
161+
- Increase **Convoluton filters** as you go deeper from 32 to 64 or 128 (Max)
162+
- Use **Pooling** to subsample: Makes image robust from translation, scaling, rotation
163+
- Use **pre-trained models** as *feature extractors* for similar tasks
164+
- Progressively **train n-last layers** if the model is not learning
165+
- **Image Augmentation** is key for small data and for faster learning
166+
167+
---
168+
169+
# RNN / CNN Architecture (for NLP)
170+
171+
- **Embedding** layer is critical. **Words** are better than **Characters**
172+
- Learn the embedding with the task or use pre-trained embedding as starting point
173+
- Use BiLSTM / LSTM vs Simple RNN. Remember, RNNs are really slow to train
174+
- Experiment with 1D CNN with larger kernel size (7 or 9) than used for images.
175+
- MLP can work with bi-grams for many simple tasks.
176+
177+
---
178+
179+
# Learning Process
180+
- **Validation Process**
181+
- Large Data: Hold-Out Validation
182+
- Smaller Data: K-Fold (Stratified) Validation
183+
- **For Underfitting**
184+
- Add more layers: **go Deeper**
185+
- Make the layers bigger: **go wider**
186+
- Train for more epochs
187+
188+
---
189+
190+
# Learning Process
191+
- **For Overfitting**
192+
- Get **more training data** (e.g. actual or image augmentation)
193+
- Reduce **Model Capacity**
194+
- Add **weight regularisation** (e.g. L1, L2)
195+
- Add **Dropouts** or use **Batch Normalization**
196+
197+
198+

dl-practical-guidance.pdf

538 KB
Binary file not shown.

img/learning_unit.png

-2.46 KB
Loading

0 commit comments

Comments
 (0)