Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified 01_materials/slides/01_introduction.pdf
Binary file not shown.
Binary file modified 01_materials/slides/02_data_engineering.pdf
Binary file not shown.
Binary file modified 01_materials/slides/03_training.pdf
Binary file not shown.
Binary file modified 01_materials/slides/04_feature_engineering.pdf
Binary file not shown.
Binary file modified 01_materials/slides/05_model_development.pdf
Binary file not shown.
Binary file modified 01_materials/slides/06_deployment.pdf
Binary file not shown.
Binary file modified 01_materials/slides/07_monitoring.pdf
Binary file not shown.
Binary file modified 01_materials/slides/08_infra_and_org.pdf
Binary file not shown.
81 changes: 41 additions & 40 deletions 03_instructional_team/markdown_slides/01_introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,27 +18,27 @@ $ echo "Data Sciences Institute"
## Agenda

- **1.1 Overview of ML Systems**
    - When to Use ML
    - ML in Production
    - ML vs Traditional Software
- When to Use ML
- ML in Production
- ML vs Traditional Software
- **1.2 Introduction to ML System Design**
    - Business and ML Objectives
    - Requirements of Data-Driven Products
    - Iterative Process
    - Framing ML Problems
- Business and ML Objectives
- Requirements of Data-Driven Products
- Iterative Process
- Framing ML Problems

---

## Agenda

- **1.3 Project Setup**
    - Introduction.
    - Repo File Structure.
    - Git, authorisation, and production pipelines.
    - VS Code and Git.
    - Python virtual environments.
    - Branching Strategies.
    - Commit Messages.
- Introduction.
- Repo File Structure.
- Git, authorisation, and production pipelines.
- VS Code and Git.
- Python virtual environments.
- Branching Strategies.
- Commit Messages.

---

Expand Down Expand Up @@ -88,9 +88,9 @@ ML is a collection of methods that allow a computer to:

- ML is used when a task is too complex or impractical to program explicitly.
- When applied successfully, ML will enable
    - Greater scale: automation.
    - Better performance.
    - Doing things that were not possible before.
- Greater scale: automation.
- Better performance.
- Doing things that were not possible before.

- ([Image Source](https://www.augmentedstartups.com/blog/overcoming-challenges-in-object-detection-accuracy-speed-and-scalability))

Expand Down Expand Up @@ -125,7 +125,7 @@ ML is a collection of methods that allow a computer to:
- There are patterns to learn, and they are complex.
- ML solutions are only helpful if there are patterns.
- An ML model can learn simple patterns, but the cost of applying ML may be unreasonable.
   
---

## Characteristics of ML Use Cases (2/4)
Expand Down Expand Up @@ -180,11 +180,11 @@ ML is a collection of methods that allow a computer to:

- ML methods are not ML systems: the learning method needs to be applied to data, assessed, tuned, deployed, governed, and so on.
- ML system design is a system approach to MLOps, i.e., we will consider the system holistically, including
    - Business requirements.
    - Data stack.
    - Infrastructure.
    - Deployment.
    - Monitoring.
- Business requirements.
- Data stack.
- Infrastructure.
- Deployment.
- Monitoring.

---

Expand Down Expand Up @@ -227,7 +227,7 @@ ML is a collection of methods that allow a computer to:

## Business and ML Objectives (2/5)

### Computational priorities during model development        
### Computational priorities during model development    

- Training is the bottleneck.
- Throughput, the number of cases processed, should be maximised.
Expand Down Expand Up @@ -284,8 +284,8 @@ ML is a collection of methods that allow a computer to:
## Designing Data-Intensive Applications

- Many applications today are data-intensive instead of compute-intensive.
    - The limiting factor is data and not computation.
    - Concerns: the amount of data, the complexity of data, and the speed at which it changes.
- The limiting factor is data and not computation.
- Concerns: the amount of data, the complexity of data, and the speed at which it changes.
- ML Systems tend to be embedded in data-intensive applications.
- (Kleppmann, 2017)

Expand All @@ -297,14 +297,15 @@ ML is a collection of methods that allow a computer to:

- **Reliability**: The system should continue to perform the correct function at the desired level of performance, even in the face of adversity.

    - May require reporting uncertainty of results.
    - Remove "silent failures": The system should alert the users to unexpected conditions.
    - If all else fails, shut down gracefully (e.g., close connections, log errors, alert downstream processes, etc.)
- May require reporting uncertainty of results.
- Remove "silent failures": The system should alert the users to unexpected conditions.
- If all else fails, shut down gracefully (e.g., close connections, log errors, alert downstream processes, etc.)

- **Scalability**: To ensure the possibility of growth.
    - Increase complexity.
    - Traffic volume or throughput.
    - Model count.

- Increase complexity.
- Traffic volume or throughput.
- Model count.

---

Expand Down Expand Up @@ -338,13 +339,13 @@ Have things changed that much? (Huyen, 2022) and [CRISP-DM (c. 1999)](https://ww

- The output of an ML model dictates the type of ML problem.
- In general, there are two types of ML tasks
    - Classification.
    - Regression.    
- Classification.
- Regression.    

- A regression model can be framed as a classification model and vice versa.
    - Regression to classification: apply quantisation.
    - Classification to regression: predict the likelihood of a class.
   
- Regression to classification: apply quantisation.
- Classification to regression: predict the likelihood of a class.
---

## Framing ML Problems (1/2)
Expand Down Expand Up @@ -374,9 +375,9 @@ Have things changed that much? (Huyen, 2022) and [CRISP-DM (c. 1999)](https://ww

- ML requires an objective function to guide the learning process through optimisation.
- In the context of ML:
   
    - Regression tasks generally employ error or accuracy metrics: Root Mean Square Error (RMSE) or Mean Absolute Error (MAE).
    - Classification tasks are generally performed using log loss or cross-entropy.
- Regression tasks generally employ error metrics: Root Mean Square Error (RMSE) or Mean Absolute Error (MAE).
- Classification tasks are generally performed using log loss or cross-entropy.
---

## Objective Functions (2/2)
Expand Down
30 changes: 17 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ The module covers the following areas:

We will discuss the tools and techniques required to do the above in good order and at scale. However, we will not discuss the inner workings of models, advantages, and so on. We will also not discuss the theoretical aspects of feature engineering or hyperparameter tuning. We will focus on tools and reproducibility.

This module follows the contents of [Desinging Machine Learning Systems, by Chip Huyen](https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/).

## Learning Outcomes

By the end of this module, participants will be able to:
Expand Down Expand Up @@ -76,20 +78,22 @@ Participants are encouraged to engage actively during the learning module. The k

# Schedule

|Live Learning Session |Date        |Topic                             |
| Session |Date        |Topic                             |
|-----|------------|----------------------------------|
|  1  | Tue., Jan. 13, 2025    | ML System Design                 |
|  2  | Wed., Jan. 14, 2025    | Data Engineering Fundamentals    |
|  3  | Thur., Jan. 15, 2025    | Working with Training Data       |
|  --  | Fri., Jan. 16, 2025     | Work Period  |
|  --  | Sat., Jan. 17, 2025     | Work Period  |
| --  | **Sun., Jan. 18, 2025**        | **Submission deadline for Assignment 1 and Quizzes 1-3** |
|  4  | Tue., Jan. 19, 2025     | Feature Engineering              |
|  5  | Wed., Jan. 20, 2025     | Model Development and Evaluation |
|  6  | Thur., Jan. 21, 2025     | Model Explanations and Monitoring|
|  --  | Fri., Jan. 22, 2025     | Work Period  |
|  --  | Sat., Jan. 23, 2025     | Work Period  |
|  --  | **Sun., Jan. 24, 2025**     | **Submission deadline for Assignment 2 and Quizzes 4-6** |
|  1  | Tue., Jan. 13, 2026    | ML System Design                 |
|  2  | Wed., Jan. 14, 2026    | Data Engineering Fundamentals    |
|  3  | Thur., Jan. 15, 2026    | Working with Training Data       |
|  --  | Fri., Jan. 16, 2026     | Work Period  |
|  --  | Sat., Jan. 17, 2026     | Work Period  |
| --  | **Sun., Jan. 18, 2026**        | **Submission deadline for Quizzes 1-3** |
| --  | **Mon., Jan. 19, 2026**        | **Submission deadline for Assignment 1** |
|  4  | Tue., Jan. 20, 2026     | Feature Engineering              |
|  5  | Wed., Jan. 21, 2026     | Model Development and Evaluation |
|  6  | Thur., Jan. 22, 2026     | Model Explanations and Monitoring|
|  --  | Fri., Jan. 23, 2026     | Work Period  |
|  --  | Sat., Jan. 24, 2026     | Work Period  |
|  --  | **Sun., Jan. 25, 2026**     | **Submission deadline for Quizzes 4-6** |
|  --  | **Mon., Jan. 26, 2026**     | **Submission deadline for Assignment 2** |

### Requirements

Expand Down