diff --git a/01_materials/slides/01_introduction.pdf b/01_materials/slides/01_introduction.pdf index ed6f302b4..bb5149f8a 100644 Binary files a/01_materials/slides/01_introduction.pdf and b/01_materials/slides/01_introduction.pdf differ diff --git a/01_materials/slides/02_data_engineering.pdf b/01_materials/slides/02_data_engineering.pdf index 94abd5b19..37c39115d 100644 Binary files a/01_materials/slides/02_data_engineering.pdf and b/01_materials/slides/02_data_engineering.pdf differ diff --git a/01_materials/slides/03_training.pdf b/01_materials/slides/03_training.pdf index fdcc8f8c5..103b43132 100644 Binary files a/01_materials/slides/03_training.pdf and b/01_materials/slides/03_training.pdf differ diff --git a/01_materials/slides/04_feature_engineering.pdf b/01_materials/slides/04_feature_engineering.pdf index bad110ee2..2feab77ac 100644 Binary files a/01_materials/slides/04_feature_engineering.pdf and b/01_materials/slides/04_feature_engineering.pdf differ diff --git a/01_materials/slides/05_model_development.pdf b/01_materials/slides/05_model_development.pdf index 10388a446..5a97988a6 100644 Binary files a/01_materials/slides/05_model_development.pdf and b/01_materials/slides/05_model_development.pdf differ diff --git a/01_materials/slides/06_deployment.pdf b/01_materials/slides/06_deployment.pdf index 36eb308c2..be4c4729c 100644 Binary files a/01_materials/slides/06_deployment.pdf and b/01_materials/slides/06_deployment.pdf differ diff --git a/01_materials/slides/07_monitoring.pdf b/01_materials/slides/07_monitoring.pdf index 581063927..ee6559dbc 100644 Binary files a/01_materials/slides/07_monitoring.pdf and b/01_materials/slides/07_monitoring.pdf differ diff --git a/01_materials/slides/08_infra_and_org.pdf b/01_materials/slides/08_infra_and_org.pdf index 6d405c5f8..8fe181f22 100644 Binary files a/01_materials/slides/08_infra_and_org.pdf and b/01_materials/slides/08_infra_and_org.pdf differ diff --git a/03_instructional_team/markdown_slides/01_introduction.md b/03_instructional_team/markdown_slides/01_introduction.md index f44d8061a..d9866f8a8 100644 --- a/03_instructional_team/markdown_slides/01_introduction.md +++ b/03_instructional_team/markdown_slides/01_introduction.md @@ -18,27 +18,27 @@ $ echo "Data Sciences Institute" ## Agenda - **1.1 Overview of ML Systems** -    - When to Use ML -    - ML in Production -    - ML vs Traditional Software + - When to Use ML + - ML in Production + - ML vs Traditional Software - **1.2 Introduction to ML System Design** -    - Business and ML Objectives -    - Requirements of Data-Driven Products -    - Iterative Process -    - Framing ML Problems + - Business and ML Objectives + - Requirements of Data-Driven Products + - Iterative Process + - Framing ML Problems --- ## Agenda - **1.3 Project Setup** -    - Introduction. -    - Repo File Structure. -    - Git, authorisation, and production pipelines. -    - VS Code and Git. -    - Python virtual environments. -    - Branching Strategies. -    - Commit Messages. + - Introduction. + - Repo File Structure. + - Git, authorisation, and production pipelines. + - VS Code and Git. + - Python virtual environments. + - Branching Strategies. + - Commit Messages. --- @@ -88,9 +88,9 @@ ML is a collection of methods that allow a computer to: - ML is used when a task is too complex or impractical to program explicitly. - When applied successfully, ML will enable -    - Greater scale: automation. -    - Better performance. -    - Doing things that were not possible before. + - Greater scale: automation. + - Better performance. + - Doing things that were not possible before. - ([Image Source](https://www.augmentedstartups.com/blog/overcoming-challenges-in-object-detection-accuracy-speed-and-scalability)) @@ -125,7 +125,7 @@ ML is a collection of methods that allow a computer to: - There are patterns to learn, and they are complex. - ML solutions are only helpful if there are patterns. - An ML model can learn simple patterns, but the cost of applying ML may be unreasonable. -    + --- ## Characteristics of ML Use Cases (2/4) @@ -180,11 +180,11 @@ ML is a collection of methods that allow a computer to: - ML methods are not ML systems: the learning method needs to be applied to data, assessed, tuned, deployed, governed, and so on. - ML system design is a system approach to MLOps, i.e., we will consider the system holistically, including -    - Business requirements. -    - Data stack. -    - Infrastructure. -    - Deployment. -    - Monitoring. + - Business requirements. + - Data stack. + - Infrastructure. + - Deployment. + - Monitoring. --- @@ -227,7 +227,7 @@ ML is a collection of methods that allow a computer to: ## Business and ML Objectives (2/5) -### Computational priorities during model development         +### Computational priorities during model development     - Training is the bottleneck. - Throughput, the number of cases processed, should be maximised. @@ -284,8 +284,8 @@ ML is a collection of methods that allow a computer to: ## Designing Data-Intensive Applications - Many applications today are data-intensive instead of compute-intensive. -    - The limiting factor is data and not computation. -    - Concerns: the amount of data, the complexity of data, and the speed at which it changes. + - The limiting factor is data and not computation. + - Concerns: the amount of data, the complexity of data, and the speed at which it changes. - ML Systems tend to be embedded in data-intensive applications. - (Kleppmann, 2017) @@ -297,14 +297,15 @@ ML is a collection of methods that allow a computer to: - **Reliability**: The system should continue to perform the correct function at the desired level of performance, even in the face of adversity. -    - May require reporting uncertainty of results. -    - Remove "silent failures": The system should alert the users to unexpected conditions. -    - If all else fails, shut down gracefully (e.g., close connections, log errors, alert downstream processes, etc.) + - May require reporting uncertainty of results. + - Remove "silent failures": The system should alert the users to unexpected conditions. + - If all else fails, shut down gracefully (e.g., close connections, log errors, alert downstream processes, etc.) - **Scalability**: To ensure the possibility of growth. -    - Increase complexity. -    - Traffic volume or throughput. -    - Model count. + + - Increase complexity. + - Traffic volume or throughput. + - Model count. --- @@ -338,13 +339,13 @@ Have things changed that much? (Huyen, 2022) and [CRISP-DM (c. 1999)](https://ww - The output of an ML model dictates the type of ML problem. - In general, there are two types of ML tasks -    - Classification. -    - Regression.     + - Classification. + - Regression.     - A regression model can be framed as a classification model and vice versa. -    - Regression to classification: apply quantisation. -    - Classification to regression: predict the likelihood of a class. -    + - Regression to classification: apply quantisation. + - Classification to regression: predict the likelihood of a class. + --- ## Framing ML Problems (1/2) @@ -374,9 +375,9 @@ Have things changed that much? (Huyen, 2022) and [CRISP-DM (c. 1999)](https://ww - ML requires an objective function to guide the learning process through optimisation. - In the context of ML: -    -    - Regression tasks generally employ error or accuracy metrics: Root Mean Square Error (RMSE) or Mean Absolute Error (MAE). -    - Classification tasks are generally performed using log loss or cross-entropy. + + - Regression tasks generally employ error metrics: Root Mean Square Error (RMSE) or Mean Absolute Error (MAE). + - Classification tasks are generally performed using log loss or cross-entropy. --- ## Objective Functions (2/2) diff --git a/README.md b/README.md index 588052033..fad870f8c 100644 --- a/README.md +++ b/README.md @@ -42,6 +42,8 @@ The module covers the following areas: We will discuss the tools and techniques required to do the above in good order and at scale. However, we will not discuss the inner workings of models, advantages, and so on. We will also not discuss the theoretical aspects of feature engineering or hyperparameter tuning. We will focus on tools and reproducibility. +This module follows the contents of [Desinging Machine Learning Systems, by Chip Huyen](https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/). + ## Learning Outcomes By the end of this module, participants will be able to: @@ -76,20 +78,22 @@ Participants are encouraged to engage actively during the learning module. The k # Schedule -|Live Learning Session |Date        |Topic                             | +| Session |Date        |Topic                             | |-----|------------|----------------------------------| -|  1  | Tue., Jan. 13, 2025    | ML System Design                 | -|  2  | Wed., Jan. 14, 2025    | Data Engineering Fundamentals    | -|  3  | Thur., Jan. 15, 2025    | Working with Training Data       | -|  --  | Fri., Jan. 16, 2025     | Work Period  | -|  --  | Sat., Jan. 17, 2025     | Work Period  | -| --  | **Sun., Jan. 18, 2025**        | **Submission deadline for Assignment 1 and Quizzes 1-3** | -|  4  | Tue., Jan. 19, 2025     | Feature Engineering              | -|  5  | Wed., Jan. 20, 2025     | Model Development and Evaluation | -|  6  | Thur., Jan. 21, 2025     | Model Explanations and Monitoring| -|  --  | Fri., Jan. 22, 2025     | Work Period  | -|  --  | Sat., Jan. 23, 2025     | Work Period  | -|  --  | **Sun., Jan. 24, 2025**     | **Submission deadline for Assignment 2 and Quizzes 4-6** | +|  1  | Tue., Jan. 13, 2026    | ML System Design                 | +|  2  | Wed., Jan. 14, 2026    | Data Engineering Fundamentals    | +|  3  | Thur., Jan. 15, 2026    | Working with Training Data       | +|  --  | Fri., Jan. 16, 2026     | Work Period  | +|  --  | Sat., Jan. 17, 2026     | Work Period  | +| --  | **Sun., Jan. 18, 2026**        | **Submission deadline for Quizzes 1-3** | +| --  | **Mon., Jan. 19, 2026**        | **Submission deadline for Assignment 1** | +|  4  | Tue., Jan. 20, 2026     | Feature Engineering              | +|  5  | Wed., Jan. 21, 2026     | Model Development and Evaluation | +|  6  | Thur., Jan. 22, 2026     | Model Explanations and Monitoring| +|  --  | Fri., Jan. 23, 2026     | Work Period  | +|  --  | Sat., Jan. 24, 2026     | Work Period  | +|  --  | **Sun., Jan. 25, 2026**     | **Submission deadline for Quizzes 4-6** | +|  --  | **Mon., Jan. 26, 2026**     | **Submission deadline for Assignment 2** | ### Requirements