Skip to content

Commit b42b7b8

Browse files
committed
add pixi support
1 parent 65edded commit b42b7b8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+167
-400
lines changed

README.md

Lines changed: 16 additions & 280 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,9 @@ As we have not tested it yet on MacOS and directly on Windows, we are not sure i
6161
## Overview
6262

6363
This template provides a standardized project structure for ML initiatives at
64-
BC, integrating essential MLOps tools:
64+
BC.
65+
66+
A python package [Gaiaflow](https://pypi.org/project/gaiaflow/) has also been developed for integrating essential MLOps tools:
6567
- **Apache Airflow**: For orchestrating ML pipelines and workflows
6668
- **MLflow**: For experiment tracking and model registry
6769
- **JupyterLab**: For interactive development and experimentation
@@ -79,7 +81,7 @@ your ML project.
7981
│ (you can either define dags using a config-file (dag-factory)
8082
│ or use Python scripts.)
8183
├── notebooks/ # JupyterLab notebooks
82-
├── your_package/
84+
├── your_package/ (If you chose pixi as env manager, this will be suffixed by `src/`
8385
│ │ (For new projects, it would be good to follow this standardized folder structure.
8486
│ │ You are of course allowed to add anything you like to it.)
8587
│ ├── dataloader/ # Your Data loading scripts
@@ -96,229 +98,32 @@ your ML project.
9698
├── pyproject.toml # Config file containing your package's build information and its metadata
9799
├── .env # Your environment variables that docker compose and python scripts can use (already added to .gitignore)
98100
├── .gitignore # Files to ignore when pushing to git.
99-
└── environment.yml # Libraries required for local mlops and your project
101+
└── environment.yml # Libraries required for local mlops and your project (if pixi is used, this will not be present)
100102
```
101103

102-
103-
## MLOps Components
104-
105-
Before you get started, let's explore the tools that we are using for this
106-
standardized MLOps framework
107-
108-
### 0. Cookiecutter
109-
Purpose: Project scaffolding and template generation
110-
111-
- Provides a standardized way to create ML projects with predefined structures.
112-
- Ensures consistency across different ML projects within BC
113-
114-
115-
### 1. Apache Airflow
116-
117-
Purpose: Workflow orchestration
118-
119-
- Manages and schedules data pipelines.
120-
- Automates end-to-end ML workflows, including data ingestion, training, deployment and re-training.
121-
- Provides a user-friendly web interface for tracking task execution's status.
122-
123-
#### Airflow UI
124-
125-
https://github.com/user-attachments/assets/b7a76c27-2f38-489f-9798-d0af4ac7619b
126-
127-
- **DAGs (Directed Acyclic Graphs)**: A workflow representation in Airflow. You
128-
can enable, disable, and trigger DAGs from the UI.
129-
- **Graph View**: Visual representation of task dependencies.
130-
- **Tree View**: Displays DAG execution history over time.
131-
- T**ask Instance**: A single execution of a task in a DAG.
132-
- **Logs**: Each task's execution details and errors.
133-
- **Code View**: Shows the Python code of a DAG.
134-
- **Trigger DAG**: Manually start a DAG run.
135-
- **Pause DAG**: Stops automatic DAG execution.
136-
137-
Common Actions
138-
139-
- **Enable a DAG**: Toggle the On/Off button.
140-
- **Manually trigger a DAG**: Click Trigger DAG ▶️.
141-
- **View logs**: Click on a task instance and select Logs.
142-
- **Restart a failed task**: Click Clear to rerun a specific task.
143-
144-
### 2. MLflow
145-
146-
Purpose: Experiment tracking and model management
147-
148-
- Tracks and records machine learning experiments, including hyperparameters, performance metrics, and model artifacts.
149-
- Facilitates model versioning and reproducibility.
150-
- Supports multiple deployment targets, including cloud platforms, Kubernetes, and on-premises environments.
151-
152-
#### MLFlow UI
153-
154-
https://github.com/user-attachments/assets/5c639c34-cba2-4682-a2ed-6a854e9386c1
155-
156-
- **Experiments**: Group of runs tracking different versions of ML models.
157-
- **Runs**: A single execution of an ML experiment with logged parameters,
158-
metrics, and artifacts.
159-
- **Parameters**: Hyperparameters or inputs logged during training.
160-
- **Metrics**: Performance indicators like accuracy or loss.
161-
- **Artifacts**: Files such as models, logs, or plots.
162-
- **Model Registry**: Centralized storage for trained models with versioning.
163-
164-
Common Actions
165-
166-
- **View experiment runs**: Go to Experiments > Select an experiment
167-
- **Compare runs**: Select multiple runs and click Compare.
168-
- **View parameters and metrics**: Click on a run to see details.
169-
- **View registered model**: Under Artifacts, select a model and click Register
170-
Model.
171-
172-
### 3. JupyterLab
173-
174-
Purpose: Interactive development environment
175-
176-
- Provides an intuitive and interactive web-based interface for exploratory data analysis, visualization, and model development.
177-
178-
### 4. MinIO
179-
180-
Purpose: Object storage for ML artifacts
181-
182-
- Acts as a cloud-native storage solution for datasets and models.
183-
- Provides an S3-compatible API for seamless integration with ML tools.
184-
185-
### 5. Minikube
186-
187-
Purpose: Local Kubernetes cluster for development & testing
188-
189-
- Allows you to run a single-node Kubernetes cluster locally.
190-
- Simulates a production-like environment to test Airflow DAGs end-to-end.
191-
- Great for validating KubernetesExecutor, and Dockerized task behavior before deploying to a real cluster.
192-
- Mimics production deployment without the cost or risk of real cloud infrastructure.
193-
194-
195104
## Getting Started
196105

197106
Please make sure that you install the following from the links provided as they
198107
have been tried and tested.
199108

200-
If you face any issues, please check out the [troubleshooting section](#troubleshooting)
201-
109+
If you face any issues, please let us know.
202110

203111
---
204112
### Prerequisites
205113

206-
> **Note:** These steps are required only once during setup. You may need to update individual components later, but you won’t need to repeat the full installation process.
207-
208-
- Docker and Docker Compose
209114
- [Mamba](https://github.com/conda-forge/miniforge) – Please make sure you install **Python 3.12**, as this repository has been tested with that version.
210-
- [Minikube on Linux](https://minikube.sigs.k8s.io/docs/start/?arch=%2Flinux%2Fx86-64%2Fstable%2Fbinary+download)
211-
- [Minikube on Windows](https://minikube.sigs.k8s.io/docs/start/?arch=%2Fwindows%2Fx86-64%2Fstable%2F.exe+download)
212-
213-
---
214-
215-
#### Docker and Docker Compose Plugin Installation
216-
217-
**For Linux users:** Follow the steps in the official Docker guide:
218-
https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository
219-
220-
**For Windows users:** Follow the steps in the official Docker Desktop guide:
221-
https://docs.docker.com/desktop/setup/install/windows-install/
222-
223-
- On Windows, make sure to use the **WSL2 version** in the system requirements.
224-
- This installation will also include the **Docker Compose plugin**.
225-
226-
Verify the installation by running:
227-
228-
docker --version
229-
docker compose version
230-
231-
Expected output will look similar to:
232-
233-
Docker version 27.5.1, build 9f9e405
234-
Docker Compose version v2.32.4
235-
236-
If you see something like the above, Docker is successfully installed.
237-
238-
---
239-
240-
#### Install WSL2 (Windows only)
241-
242-
Follow the official Microsoft instructions:
243-
https://learn.microsoft.com/en-us/windows/wsl/install
244-
245-
Run the following command in **PowerShell (Admin mode):**
246-
247-
wsl --install
248-
249-
After installation, log in to Ubuntu with:
250-
251-
wsl.exe -d Ubuntu
252-
253-
254-
NOTE: If there are any issues installing WSL2, see if this guide helps,
255-
if not contact us.
256-
https://allthings.how/how-to-install-virtual-machine-platform-in-optional-windows-features-on-windows-11/
257-
258-
---
259-
260-
#### Install Mamba (Miniforge) inside WSL2 / Linux
261-
262-
Follow instructions here:
263-
https://github.com/conda-forge/miniforge
264-
265-
Run inside your terminal:
266-
267-
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
268-
269-
bash Miniforge3-$(uname)-$(uname -m).sh
115+
or
116+
- [Pixi](https://pixi.prefix.dev/latest/installation/) (We recommend using this)
270117
---
271118

272-
#### Install Minikube inside WSL2 / Linux
273-
274-
Official guide:
275-
https://minikube.sigs.k8s.io/docs/start/?arch=%2Flinux%2Fx86-64%2Fstable%2Fbinary+download
276-
277-
Run inside your terminal:
278-
279-
curl -LO https://github.com/kubernetes/minikube/releases/latest/download/minikube-linux-amd64
280-
sudo install minikube-linux-amd64 /usr/local/bin/minikube
281-
rm minikube-linux-amd64
282-
283-
---
284119

285120
#### Verify Installations
286121

287122
Inside your terminal (Linux or WSL2), check:
288123

289-
docker # should print Docker help page
290-
minikube # should print Minikube help page
291124
mamba # should print Mamba help page
292-
ls -la /var/run/docker.sock # should print socket permissions
293-
294-
If `/var/run/docker.sock` does not appear or has wrong permissions, adjust Docker Desktop settings (Windows only):
295-
- **Settings → General → Use WSL**
296-
- **Settings → Resources → WSL Integration → Enable Ubuntu**
297-
298-
---
299-
300-
#### Configure Docker Permissions inside WSL2
301-
302-
Add your user to the docker group:
303-
304-
sudo usermod -aG docker $USER
305-
306-
Apply the group changes immediately:
307-
308-
newgrp docker
309-
310-
Alternatively, log out and back into your terminal session:
311-
312-
exit
313-
wsl -d Ubuntu-20.04 # Windows only
314-
315-
---
316-
317-
#### Fix Docker Socket Permissions (if needed)
318-
319-
If necessary, run:
320-
321-
sudo chmod 777 /var/run/docker.sock
125+
or
126+
pixi # should print pixi help page
322127

323128
---
324129

@@ -327,94 +132,25 @@ Once the pre-requisites are done, you can go ahead with the project creation:
327132

328133
1. Create a separate environment for cookiecutter
329134
```bash
330-
mamba create -n cc cookiecutter ruamel.yaml
135+
mamba create -n cc cookiecutter ruamel.yaml
331136
mamba activate cc
332137
```
333138

334139
2. Generate the project from template:
335140
```bash
336-
cookiecutter https://github.com/bcdev/gaiaflow
141+
cookiecutter https://github.com/bcdev/gaiaflow-cookiecutter
337142
```
338143

339144
When prompted for input, enter the details requested. If you dont provide any
340145
input for a given choice, the first choice from the list is taken as the default.
341146

342-
Once the project is created, please read the [user guide](https://bcdev.github.io/gaiaflow/dev_guide/).
147+
3. (Optional) - If you wish to use Gaiaflow dockerized MLOps services
148+
(Airflow, MLFlow, Minio) please follow the steps
149+
[here](https://github.com/bcdev/gaiaflow). Once gaiaflow is installed,
150+
please read the [user guide](https://bcdev.github.io/gaiaflow/dev_guide/).
343151

344152
---
345153

346-
347-
## Troubleshooting
348-
0. If you are windows, please use the `miniforge prompt` commandline.
349-
350-
1. If you face issue like `Docker Daemon not started`, start it using:
351-
```bash
352-
sudo systemctl start docker
353-
```
354-
and try the docker commands again in a new terminal.
355-
356-
357-
2. If you face an issue as follows:
358-
`Got permission denied while trying to connect to the Docker daemon socket at
359-
unix:///var/run/docker.sock: `,
360-
do the following
361-
```bash
362-
sudo chmod 666 /var/run/docker.sock
363-
```
364-
and try the docker commands again in a new terminal.
365-
366-
367-
3. If you face an issue like
368-
`Cannot connect to the Docker daemon at unix:///home//.docker/desktop/docker.sock.
369-
Is the docker daemon running?`,
370-
it is likely because of you have two contexts of docker running.
371-
372-
To view the docker contexts,
373-
```bash
374-
docker context ls
375-
```
376-
This will show the list of docker contexts. Check if default is enabled (it
377-
should have a * beside it)
378-
If not, you might probably have desktop as your context enabled.
379-
To confirm which context you are in:
380-
```bash
381-
docker context show
382-
```
383-
384-
To use the default context, do this:
385-
```bash
386-
docker context use default
387-
```
388-
389-
Check for the following file:
390-
```bash
391-
cat ~/.docker/config.json
392-
```
393-
If it is empty, all good, if not, it might be something like this:
394-
```
395-
{
396-
"auths": {},
397-
"credsStore": "desktop"
398-
}
399-
```
400-
Completely move this file away from this location or delete it and try running
401-
docker again.
402-
403-
4. If you face some permissions issues on some files like `Permission Denied`,
404-
as a workaround, please use this and let us know so that we can update this
405-
repo.
406-
```bash
407-
sudo chmod 666 <your-filename>
408-
```
409-
410-
If you face any other problems not mentioned above, please reach out to us.
411-
412-
413154
## Acknowledgments
414155

415156
- [Cookiecutter](https://github.com/cookiecutter/cookiecutter)
416-
- [Apache Airflow](https://airflow.apache.org/)
417-
- [MLflow](https://mlflow.org/)
418-
- [Minio](https://min.io/docs/minio/container/index.html)
419-
- [JupyterLab](https://jupyterlab.readthedocs.io/)
420-
- [Minikube](https://minikube.sigs.k8s.io/docs/)

cookiecutter.json

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
" ": "\n\n\n ______ _______ _____ _______ _______ _____ _ _ _\n | ____ |_____| | |_____| |______ | | | | | |\n |_____| | | __|__ | | | |_____ |_____| |__|__|\n\n\n\n\nGaiaFlow is a ML project template that helps you create standardized projects across BC and also providing you with a MLOps framework (currently local) to streamline your ML projects.\n\nIn this Cookiecutter ML project template, you will get the following questions.\n\nProject Name: Please provide your project name (only spaces, dots, underscores or dashes special characters allowed)\n\nProject Description: A small description of your project.\n\nYour name and email address: For adding it to the python package metadata.\n\nShow examples: Do you want to see the out-of-the-box airflow examples along with an example ML project working end-to-end? These examples would be visible in the Airflow UI. (Highly recommeded for first time users!!)\n\nFolder name: By default, we will provide you with a folder name based on your project name. If you don't like it, you can change it in this option.\n\nPackage Name: Please provide a package name where you will develop your project. It should be different than the folder name.\n\n[Please press enter to continue]",
2+
" ": "\n\n\n ______ _______ _____ _______ _______ _____ _ _ _\n | ____ |_____| | |_____| |______ | | | | | |\n |_____| | | __|__ | | | |_____ |_____| |__|__|\n\n\n\n\nGaiaFlow is a ML project template that helps you create standardized projects across BC and also providing you with a MLOps framework (currently local) to streamline your ML projects.\n\nIn this Cookiecutter ML project template, you will get the following questions.\n\nProject Name: Please provide your project name (only spaces, dots, underscores or dashes special characters allowed)\n\nProject Description: A small description of your project.\n\nYour name and email address: For adding it to the python package metadata.\n\nShow examples: Do you want to see the out-of-the-box airflow examples along with an example ML project working end-to-end? These examples would be visible in the Airflow UI. (Highly recommeded for first time users!!)\n\nEnvironment Manager: Please choose which python environment manager you would like to use for your project. We recommend using pixi, which is the default.\n\nPackage Name: Please provide a package name which you will develop in this project.\n\n[Please press enter to continue]",
33
"project_name": "Enter the name of your ML Project",
44
"project_description": "A short description of the project",
55
"author_name": "Your Name",
@@ -8,6 +8,9 @@
88
"yes",
99
"no"
1010
],
11-
"folder_name": "{{ cookiecutter.project_name.lower().replace(' ', '_').replace('-', '_').replace('.', '_') }}",
12-
"package_name": "Enter your package name (should be different from folder name, only underscores allowed)"
11+
"environment_manager": [
12+
"pixi",
13+
"conda"
14+
],
15+
"package_name": "Enter your package name (only underscores allowed)"
1316
}

0 commit comments

Comments
 (0)