add yolo submodule and materializer

safoinme · safoinme · commit bf366036a838 · 2022-11-10T15:46:28.000+01:00
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "sign-language-detection-yolov5/yolov5"]
+	path = sign-language-detection-yolov5/yolov5
+	url = https://github.com/safoinme/yolov5.git
diff --git a/sign-language-detection-yolov5/.dockerignore b/sign-language-detection-yolov5/.dockerignore
diff --git a/sign-language-detection-yolov5/README.md b/sign-language-detection-yolov5/README.md
@@ -1,157 +1,2 @@
-# Building and Using an MLOps Stack with ZenML
 
-[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/zenml)](https://pypi.org/project/zenml/)
 
-The purpose of this repository is to demonstrate how [ZenML](https://github.com/zenml-io/zenml) enables your machine
-learning projects in a multitude of ways:
-
-- By offering you a framework or template to develop within
-- By seamlessly integrating into the tools you love and need
-- By allowing you to easily switch orchestrator for your pipelines
-- By bringing much needed Zen into your machine learning
-
-**ZenML** is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. Built for
-data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that
-are catered towards ML workflows.
-
-At its core, **ZenML pipelines execute ML-specific workflows** from sourcing data to splitting, preprocessing, training,
-all the way to the evaluation of results and even serving. There are many built-in batteries to support common ML
-development tasks. ZenML is not here to replace the great tools that solve these individual problems. Rather, it
-**integrates natively with popular ML tooling** and gives standard abstraction to write your workflows.
-
-Within this repo we will use ZenML to build pipelines that seamlessly use [Evidently](https://evidentlyai.com/),
-[MLFlow](https://mlflow.org/), [Kubeflow Pipelines](https://www.kubeflow.org/) and post
-results to our [Discord](https://discord.com/).
-
-![](_assets/evidently+mlflow+discord+kubeflow.png)
-
-[![](https://img.youtube.com/vi/Ne-dt9tu11g/0.jpg)](https://www.youtube.com/watch?v=Ne-dt9tu11g)
-
-_Come watch along as Hamza Tahir, Co-Founder and CTO of ZenML showcases an early version of this repo
-to the MLOps.community._
-
-## :computer: System Requirements
-
-In order to run this demo you need to have some packages installed on your machine.
-
-Currently, this will only run on UNIX systems.
-
-| package | MacOS installation                                                               | Linux installation                                                                 |
-| ------- | -------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- |
-| docker  | [Docker Desktop for Mac](https://docs.docker.com/desktop/mac/install/)           | [Docker Engine for Linux ](https://docs.docker.com/engine/install/ubuntu/)         |
-| kubectl | [kubectl for mac](https://kubernetes.io/docs/tasks/tools/install-kubectl-macos/) | [kubectl for linux](https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/) |
-| k3d     | [Brew Installation of k3d](https://formulae.brew.sh/formula/k3d)                 | [k3d installation linux](https://k3d.io/v5.2.2/)                                   |
-
-## :snake: Python Requirements
-
-Once you've got the system requirements figured out, let's jump into the Python packages you need.
-Within the Python environment of your choice, run:
-
-```bash
-git clone https://github.com/zenml-io/zenfiles
-cd nba-pipeline
-pip install -r requirements.txt
-```
-
-If you are running the `run_pipeline.py` script, you will also need to install some integrations using zenml:
-
-```bash
-zenml integration install evidently -f
-zenml integration install mlflow -f
-zenml integration install kubeflow -f
-```
-
-## :basketball: The Task
-
-A couple of weeks ago, we were looking for a fun project to work on for the next chapter of our ZenHacks. During our
-initial discussions, we realized that it would be really great to work with an NBA dataset, as we could quickly get
-close to a real-life application like a "3-Pointer Predictor" while simultaneously entertaining ourselves with one
-of the trending topics within our team.
-
-As we were building the dataset around a "3-Pointer Predictor", we realized that there is one factor that we need to
-take into consideration first: Stephen Curry, The Baby Faced Assassin. In our opinion, there is no denying that he
-changed the way that the games are played in the NBA and we wanted to actually prove that this was the case first.
-
-That's why our story in this ZenHack will start with a pipeline dedicated to drift detection. As the breakpoint of this
-drift, we will be using the famous "Double Bang" game that the Golden State Warriors played against Oklahoma City
-Thunder back in 2016. Following that, we will build a training pipeline which will generate a model that predicts
-the number of three-pointers made by a team in a single game, and ultimately, we will use these trained models and
-create an inference pipeline for the upcoming matches in the NBA.
-
-![Diagram depicting the Training and Inference pipelines](_assets/Training and Inference Pipeline.png)
-
-## :notebook: Diving into the code
-
-We're ready to go now. You have two options:
-
-### Notebook
-
-You can spin up a step-by-step guide in `Building and Using An MLOPs Stack With ZenML.ipynb`:
-
-```python
-jupyter notebook
-```
-
-### Script
-
-You can also directly run the code, using the `run_pipeline.py` script.
-
-```python
-python run_pipeline.py drift  # Run one-shot drift pipeline
-python run_pipeline.py train  # Run training pipeline
-python run_pipeline.py infer  # Run inference pipeline
-```
-
-## :rocket: Going from local orchestration to kubeflow pipelines
-
-ZenML manages the configuration of the infrastructure where ZenML pipelines are run using ZenML `Stacks`. For now, a Stack consists of:
-
-- A metadata store: To store metadata like parameters and artifact URIs
-- An artifact store: To store interim data step output.
-- An orchestrator: A service that actually kicks off and runs each step of the pipeline.
-- An optional container registry: To store Docker images that are created to run your pipeline.
-
-![Local ZenML stack](_assets/localstack.png)
-
-To transition from running our pipelines locally (see diagram above) to running them on Kubeflow Pipelines, we only need to register a new stack:
-
-```bash
-zenml container-registry register local_registry  --flavor=default --uri=localhost:5000
-zenml orchestrator register kubeflow_orchestrator  --flavor=kubeflow
-zenml stack register local_kubeflow_stack \
-    -m local_metadata_store \
-    -a local_artifact_store \
-    -o kubeflow_orchestrator \
-    -c local_registry
-```
-
-To reduce the amount of manual setup steps, we decided to work with a local Kubeflow Pipelines deployment in this repository (if you're interested in running your ZenML pipelines remotely, check out [our docs](https://docs.zenml.io/component-gallery/orchestrators/kubeflow#how-to-use-it).
-
-For the local setup, our kubeflow stack keeps the existing `local_metadata_store` and `local_artifact_store` but replaces the orchestrator and adds a local container registry (see diagram below).
-
-Once the stack is registered we can activate it and provision resources for the local Kubeflow Pipelines deployment:
-
-```bash
-zenml stack set local_kubeflow_stack
-zenml stack up
-```
-
-![ZenML stack for running pipelines on a local Kubeflow Pipelines deployment](_assets/localstack-with-kubeflow-orchestrator.png)
-
-## :checkered_flag: Cleaning up when you're done
-
-Once you are done running this notebook you might want to stop all running processes. For this, run the following command.
-(This will tear down your `k3d` cluster and the local docker registry.)
-
-```shell
-zenml stack set local_kubeflow_stack
-zenml stack down -f
-```
-
-## :question: FAQ
-
-1. **MacOS** When starting the container registry for Kubeflow, I get an error about port 5000 not being available.
-   `OSError: [Errno 48] Address already in use`
-
-Solution: In order for Kubeflow to run, the docker container registry currently needs to be at port 5000. MacOS, however, uses
-port 5000 for the Airplay receiver. Here is a guide on how to fix this [Freeing up port 5000](https://12ft.io/proxy?q=https%3A%2F%2Fanandtripathi5.medium.com%2Fport-5000-already-in-use-macos-monterey-issue-d86b02edd36c).
diff --git a/sign-language-detection-yolov5/inference/model/best.pt b/sign-language-detection-yolov5/inference/model/best.pt
diff --git a/sign-language-detection-yolov5/materializer/__init__.py b/sign-language-detection-yolov5/materializer/__init__.py
@@ -0,0 +1,15 @@
+#  Copyright (c) ZenML GmbH 2022. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at:
+#
+#       https://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+#  or implied. See the License for the specific language governing
+#  permissions and limitations under the License.
+
+from materializer.yolo_model_materializer import Yolov5ModelMaterializer
diff --git a/sign-language-detection-yolov5/materializer/yolo_model_materializer.py b/sign-language-detection-yolov5/materializer/yolo_model_materializer.py
@@ -0,0 +1,79 @@
+#  Copyright (c) ZenML GmbH 2021. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at:
+#
+#       https://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+#  or implied. See the License for the specific language governing
+#  permissions and limitations under the License.
+"""Materializer for Yolov5 Trained Model."""
+
+import os
+from typing import Dict, Any
+import tempfile
+from typing import Type
+import torch 
+
+from zenml.artifacts import ModelArtifact
+from zenml.io import fileio
+from zenml.logger import get_logger
+from zenml.materializers.base_materializer import BaseMaterializer
+from zenml.utils import io_utils
+
+logger = get_logger(__name__)
+
+DEFAULT_YOLOV5_MODEL_FILENAME = "model.pt"
+
+
+class Yolov5ModelMaterializer(BaseMaterializer):
+    """Materializer for Yolo Trained Model."""
+
+    ASSOCIATED_TYPES = (dict,)
+    ASSOCIATED_ARTIFACT_TYPES = (ModelArtifact,)
+
+    def handle_input(self, data_type: Type[dict]) -> dict:
+        """Read from artifact store and return a Dict object.
+
+        Args:
+            data_type: An Dict type.
+
+        Returns:
+            An Dict object.
+        """
+        super().handle_input(data_type)
+
+        # Create a temporary directory to store the model
+        temp_dir = tempfile.TemporaryDirectory()
+
+        # Copy from artifact store to temporary directory
+        io_utils.copy_dir(self.artifact.uri, temp_dir.name)
+
+        # Load the Bento from the temporary directory
+        yolov5_model = torch.load(os.path.join(temp_dir.name, DEFAULT_YOLOV5_MODEL_FILENAME))
+        return yolov5_model
+
+    def handle_return(self, ckpt: dict) -> None:
+        """Write to artifact store.
+
+        Args:
+            ckpt: A Dict contains informations regarding yolov5 model.
+        """
+        super().handle_return(ckpt)
+
+        # Create a temporary directory to store the model
+        temp_dir = tempfile.TemporaryDirectory(prefix="zenml-temp-")
+        temp_ckpt_path = os.path.join(temp_dir.name, DEFAULT_YOLOV5_MODEL_FILENAME)
+
+        # save the image in a temporary directory
+        torch.save(ckpt, temp_ckpt_path)
+
+        # copy the saved image to the artifact store
+        io_utils.copy_dir(temp_dir.name, self.artifact.uri)
+
+        # Remove the temporary directory
+        fileio.rmtree(temp_dir.name)
diff --git a/sign-language-detection-yolov5/pipelines/__init__.py b/sign-language-detection-yolov5/pipelines/__init__.py
@@ -0,0 +1,13 @@
+#  Copyright (c) ZenML GmbH 2022. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at:
+#
+#       https://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+#  or implied. See the License for the specific language governing
+#  permissions and limitations under the License.
diff --git a/sign-language-detection-yolov5/pipelines/train_pipeline.py b/sign-language-detection-yolov5/pipelines/train_pipeline.py
@@ -14,9 +14,24 @@
 
 
 from zenml.pipelines import pipeline
+from zenml.config import DockerSettings
+from zenml.integrations.constants import MLFLOW
 
+docker_settings = DockerSettings(parent_image="ultralytics/yolov5:latest", requirements="./requirements.txt",required_integrations=[MLFLOW])
 
-@pipeline(enable_cache=True)
+@pipeline(enable_cache=True, 
+    settings={
+        "docker": docker_settings,
+        "orchestrator.local_docker": {
+            "run_args": {
+                "device_requests": [{ "device_ids": ["0"], "capabilities": [['gpu']] }],
+                "shm_size": 18446744073692774399,
+                "ipc_mode": "host",
+                "ulimit": [{ "name": "memlock", "soft": -1 },{ "name": "stack", "soft": -1 }],
+                }
+            }
+        }
+    )
 def yolov5_pipeline(
     data_loader,
     train_augmenter,
@@ -28,5 +43,4 @@ def yolov5_pipeline(
     augmented_trainset = train_augmenter(train)
     augmented_validset = valid_augmenter(valid)
     model = trainer(augmented_trainset,augmented_validset)
-    detector = detector(test,model)
-
+    detector = detector(test,model)
diff --git a/sign-language-detection-yolov5/requirements.txt b/sign-language-detection-yolov5/requirements.txt
@@ -1,3 +1,4 @@
-nba-api
-notebook
-zenml==0.6.2
+zenml==0.21.1
+roboflow==0.2.18
+albumentations==1.3.0
+albumentations[imgaug]
diff --git a/sign-language-detection-yolov5/run.py b/sign-language-detection-yolov5/run.py
@@ -21,7 +21,7 @@
         data_loader=data_loader(),
         train_augmenter=train_augmenter(),
         valid_augmenter=valid_augmenter(),
-        trainer=trainer(),
+        trainer=trainer(), # .configure(output_materializers=Yolov5ModelMaterializer),
         detector=detector(),
     )
     pipeline_instance.run()
diff --git a/sign-language-detection-yolov5/steps/data_loader.py b/sign-language-detection-yolov5/steps/data_loader.py
@@ -4,6 +4,7 @@
 import numpy as np
 from roboflow import Roboflow
 from zenml.steps import step, BaseParameters, Output
+#from zenml.materializers import BuiltInContainerMaterializer
 import cv2
 
 
@@ -22,6 +23,7 @@ def roboflow_download(api_key:str, workspace:str, project:str, annotation_type:s
     dataset = project.version(6).download(annotation_type)
     return dataset.location
 
+#@step(output_materializers={"train_images": BuiltInContainerMaterializer, "val_images": BuiltInContainerMaterializer, "test_images": BuiltInContainerMaterializer})
 @step
 def data_loader(
     params: TrainerParameters,
diff --git a/sign-language-detection-yolov5/steps/detector.py b/sign-language-detection-yolov5/steps/detector.py
@@ -61,5 +61,5 @@ def image_saver(image_set:Dict):
         resized_image = cv2.resize(value[0], dim, interpolation = cv2.INTER_AREA)
         cv2.imwrite(f'inference/images/{key}', resized_image)
 
-def model_saver(model:torch.nn.Module):
+def model_saver(model:Dict):
     torch.save(model, "./inference/model/best.pt")
diff --git a/sign-language-detection-yolov5/steps/train_augmenter.py b/sign-language-detection-yolov5/steps/train_augmenter.py
@@ -7,6 +7,7 @@
 import albumentations as A
 
 from zenml.steps import step, BaseParameters, Output
+#from zenml.materializers import BuiltInContainerMaterializer
 
 
 class AugmenterParameters(BaseParameters):
@@ -16,6 +17,7 @@ class AugmenterParameters(BaseParameters):
 
 
  
+#@step(output_materializers={"augmented_images": BuiltInContainerMaterializer})
 @step
 def train_augmenter(
     #params:AugmenterParameters,
diff --git a/sign-language-detection-yolov5/steps/trainer.py b/sign-language-detection-yolov5/steps/trainer.py
diff --git a/sign-language-detection-yolov5/steps/valid_augmenter.py b/sign-language-detection-yolov5/steps/valid_augmenter.py
diff --git a/sign-language-detection-yolov5/yolov5 b/sign-language-detection-yolov5/yolov5

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+[submodule "sign-language-detection-yolov5/yolov5"]`
	`2`	`+ path = sign-language-detection-yolov5/yolov5`
	`3`	`+ url = https://github.com/safoinme/yolov5.git`
Original file line number	Diff line number	Diff line change
`@@ -21,7 +21,7 @@`
`21`	`21`	`data_loader=data_loader(),`
`22`	`22`	`train_augmenter=train_augmenter(),`
`23`	`23`	`valid_augmenter=valid_augmenter(),`
`24`		`- trainer=trainer(),`
	`24`	`+ trainer=trainer(), # .configure(output_materializers=Yolov5ModelMaterializer),`
`25`	`25`	`detector=detector(),`
`26`	`26`	`)`
`27`	`27`	`pipeline_instance.run()`