|
1 |
| -# Building and Using an MLOps Stack with ZenML |
2 | 1 |
|
3 |
| -[](https://pypi.org/project/zenml/) |
4 | 2 |
|
5 |
| -The purpose of this repository is to demonstrate how [ZenML](https://github.com/zenml-io/zenml) enables your machine |
6 |
| -learning projects in a multitude of ways: |
7 |
| - |
8 |
| -- By offering you a framework or template to develop within |
9 |
| -- By seamlessly integrating into the tools you love and need |
10 |
| -- By allowing you to easily switch orchestrator for your pipelines |
11 |
| -- By bringing much needed Zen into your machine learning |
12 |
| - |
13 |
| -**ZenML** is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. Built for |
14 |
| -data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that |
15 |
| -are catered towards ML workflows. |
16 |
| - |
17 |
| -At its core, **ZenML pipelines execute ML-specific workflows** from sourcing data to splitting, preprocessing, training, |
18 |
| -all the way to the evaluation of results and even serving. There are many built-in batteries to support common ML |
19 |
| -development tasks. ZenML is not here to replace the great tools that solve these individual problems. Rather, it |
20 |
| -**integrates natively with popular ML tooling** and gives standard abstraction to write your workflows. |
21 |
| - |
22 |
| -Within this repo we will use ZenML to build pipelines that seamlessly use [Evidently](https://evidentlyai.com/), |
23 |
| -[MLFlow](https://mlflow.org/), [Kubeflow Pipelines](https://www.kubeflow.org/) and post |
24 |
| -results to our [Discord](https://discord.com/). |
25 |
| - |
26 |
| - |
27 |
| - |
28 |
| -[](https://www.youtube.com/watch?v=Ne-dt9tu11g) |
29 |
| - |
30 |
| -_Come watch along as Hamza Tahir, Co-Founder and CTO of ZenML showcases an early version of this repo |
31 |
| -to the MLOps.community._ |
32 |
| - |
33 |
| -## :computer: System Requirements |
34 |
| - |
35 |
| -In order to run this demo you need to have some packages installed on your machine. |
36 |
| - |
37 |
| -Currently, this will only run on UNIX systems. |
38 |
| - |
39 |
| -| package | MacOS installation | Linux installation | |
40 |
| -| ------- | -------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- | |
41 |
| -| docker | [Docker Desktop for Mac](https://docs.docker.com/desktop/mac/install/) | [Docker Engine for Linux ](https://docs.docker.com/engine/install/ubuntu/) | |
42 |
| -| kubectl | [kubectl for mac](https://kubernetes.io/docs/tasks/tools/install-kubectl-macos/) | [kubectl for linux](https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/) | |
43 |
| -| k3d | [Brew Installation of k3d](https://formulae.brew.sh/formula/k3d) | [k3d installation linux](https://k3d.io/v5.2.2/) | |
44 |
| - |
45 |
| -## :snake: Python Requirements |
46 |
| - |
47 |
| -Once you've got the system requirements figured out, let's jump into the Python packages you need. |
48 |
| -Within the Python environment of your choice, run: |
49 |
| - |
50 |
| -```bash |
51 |
| -git clone https://github.com/zenml-io/zenfiles |
52 |
| -cd nba-pipeline |
53 |
| -pip install -r requirements.txt |
54 |
| -``` |
55 |
| - |
56 |
| -If you are running the `run_pipeline.py` script, you will also need to install some integrations using zenml: |
57 |
| - |
58 |
| -```bash |
59 |
| -zenml integration install evidently -f |
60 |
| -zenml integration install mlflow -f |
61 |
| -zenml integration install kubeflow -f |
62 |
| -``` |
63 |
| - |
64 |
| -## :basketball: The Task |
65 |
| - |
66 |
| -A couple of weeks ago, we were looking for a fun project to work on for the next chapter of our ZenHacks. During our |
67 |
| -initial discussions, we realized that it would be really great to work with an NBA dataset, as we could quickly get |
68 |
| -close to a real-life application like a "3-Pointer Predictor" while simultaneously entertaining ourselves with one |
69 |
| -of the trending topics within our team. |
70 |
| - |
71 |
| -As we were building the dataset around a "3-Pointer Predictor", we realized that there is one factor that we need to |
72 |
| -take into consideration first: Stephen Curry, The Baby Faced Assassin. In our opinion, there is no denying that he |
73 |
| -changed the way that the games are played in the NBA and we wanted to actually prove that this was the case first. |
74 |
| - |
75 |
| -That's why our story in this ZenHack will start with a pipeline dedicated to drift detection. As the breakpoint of this |
76 |
| -drift, we will be using the famous "Double Bang" game that the Golden State Warriors played against Oklahoma City |
77 |
| -Thunder back in 2016. Following that, we will build a training pipeline which will generate a model that predicts |
78 |
| -the number of three-pointers made by a team in a single game, and ultimately, we will use these trained models and |
79 |
| -create an inference pipeline for the upcoming matches in the NBA. |
80 |
| - |
81 |
| - |
82 |
| - |
83 |
| -## :notebook: Diving into the code |
84 |
| - |
85 |
| -We're ready to go now. You have two options: |
86 |
| - |
87 |
| -### Notebook |
88 |
| - |
89 |
| -You can spin up a step-by-step guide in `Building and Using An MLOPs Stack With ZenML.ipynb`: |
90 |
| - |
91 |
| -```python |
92 |
| -jupyter notebook |
93 |
| -``` |
94 |
| - |
95 |
| -### Script |
96 |
| - |
97 |
| -You can also directly run the code, using the `run_pipeline.py` script. |
98 |
| - |
99 |
| -```python |
100 |
| -python run_pipeline.py drift # Run one-shot drift pipeline |
101 |
| -python run_pipeline.py train # Run training pipeline |
102 |
| -python run_pipeline.py infer # Run inference pipeline |
103 |
| -``` |
104 |
| - |
105 |
| -## :rocket: Going from local orchestration to kubeflow pipelines |
106 |
| - |
107 |
| -ZenML manages the configuration of the infrastructure where ZenML pipelines are run using ZenML `Stacks`. For now, a Stack consists of: |
108 |
| - |
109 |
| -- A metadata store: To store metadata like parameters and artifact URIs |
110 |
| -- An artifact store: To store interim data step output. |
111 |
| -- An orchestrator: A service that actually kicks off and runs each step of the pipeline. |
112 |
| -- An optional container registry: To store Docker images that are created to run your pipeline. |
113 |
| - |
114 |
| - |
115 |
| - |
116 |
| -To transition from running our pipelines locally (see diagram above) to running them on Kubeflow Pipelines, we only need to register a new stack: |
117 |
| - |
118 |
| -```bash |
119 |
| -zenml container-registry register local_registry --flavor=default --uri=localhost:5000 |
120 |
| -zenml orchestrator register kubeflow_orchestrator --flavor=kubeflow |
121 |
| -zenml stack register local_kubeflow_stack \ |
122 |
| - -m local_metadata_store \ |
123 |
| - -a local_artifact_store \ |
124 |
| - -o kubeflow_orchestrator \ |
125 |
| - -c local_registry |
126 |
| -``` |
127 |
| - |
128 |
| -To reduce the amount of manual setup steps, we decided to work with a local Kubeflow Pipelines deployment in this repository (if you're interested in running your ZenML pipelines remotely, check out [our docs](https://docs.zenml.io/component-gallery/orchestrators/kubeflow#how-to-use-it). |
129 |
| - |
130 |
| -For the local setup, our kubeflow stack keeps the existing `local_metadata_store` and `local_artifact_store` but replaces the orchestrator and adds a local container registry (see diagram below). |
131 |
| - |
132 |
| -Once the stack is registered we can activate it and provision resources for the local Kubeflow Pipelines deployment: |
133 |
| - |
134 |
| -```bash |
135 |
| -zenml stack set local_kubeflow_stack |
136 |
| -zenml stack up |
137 |
| -``` |
138 |
| - |
139 |
| - |
140 |
| - |
141 |
| -## :checkered_flag: Cleaning up when you're done |
142 |
| - |
143 |
| -Once you are done running this notebook you might want to stop all running processes. For this, run the following command. |
144 |
| -(This will tear down your `k3d` cluster and the local docker registry.) |
145 |
| - |
146 |
| -```shell |
147 |
| -zenml stack set local_kubeflow_stack |
148 |
| -zenml stack down -f |
149 |
| -``` |
150 |
| - |
151 |
| -## :question: FAQ |
152 |
| - |
153 |
| -1. **MacOS** When starting the container registry for Kubeflow, I get an error about port 5000 not being available. |
154 |
| - `OSError: [Errno 48] Address already in use` |
155 |
| - |
156 |
| -Solution: In order for Kubeflow to run, the docker container registry currently needs to be at port 5000. MacOS, however, uses |
157 |
| -port 5000 for the Airplay receiver. Here is a guide on how to fix this [Freeing up port 5000](https://12ft.io/proxy?q=https%3A%2F%2Fanandtripathi5.medium.com%2Fport-5000-already-in-use-macos-monterey-issue-d86b02edd36c). |
0 commit comments