Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions topic/machine-learning/classification-automl/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.env
.env.local
*.log
catboost_info/
mlruns/
model/
71 changes: 71 additions & 0 deletions topic/machine-learning/classification-automl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# AutoML with Pycaret and CrateDB

This folder provides examples, tutorials and runnable code on how to use CrateDB
with Pycaret to automatically create high-performing machine learning models.

The tutorials and examples focus on being easy to understand and use. They
should be a good starting point for your own projects.

## About Pycaret

[Pycaret] is a Python library that makes it easy to create and train machine
learning models in python. The outstanding feature of Pycaret is its AutoML
capabilities.

Pycaret is a high-level interface on top of popular machine learning
frameworks. Among them are scikit-learn, xgboost, ray, lightgbm and many more.

Pycaret provides a simple low-code interface to utilize these libraries without
needing to know the details of the underlying model architectures and
parameters.

The general concept of Pycaret - and for the matter of fact for AutoML in
general - is rather simple: One takes the raw data, splits it into a training
and a test set and then trains a number of different models on the training
set. The models are then evaluated on the test set and the best performing
model is selected. This process gets repeated for tuning the hyperparameters
of the best models. Again, this process is highly empirical. The parameters are
changed, the model is retrained and evaluated again. This process is repeated
until the best performing parameters are found.

Modern algorithms for executing all these experiments are - among other -
GridSearch, RandomSearch and BayesianSearch. For a quick introduction into
these methods, see this
[Introduction to hyperparameter tuning][Introduction to hyperparameter tuning]

In the past, all these try-and-error experiments had to be done manually -
which is a tedious and time-consuming task. Pycaret automates this process
and provides a simple interface to execute all these experiments in a
straightforward way. This notebook shows how.

## What's inside

[![Made with Jupyter](https://img.shields.io/badge/Made%20with-Jupyter-orange?logo=Jupyter)](https://jupyter.org/try) [![Made with Markdown](https://img.shields.io/badge/Made%20with-Markdown-1f425f.svg?logo=Markdown)](https://commonmark.org)

This folder provides guidelines and runnable code to get started with [Pycaret]
and [CrateDB].

- [readme.md](readme.md): The file you are currently reading contains a
walkthrough about how to get started with the Pycaret framework and CrateDB,
and guides you to corresponding example programs on how to train different
models.

- [requirements.txt](requirements.txt): Pulls the required dependencies to
run the example programs.

- `automl_classification_with_pycaret.ipynb` [![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](automl_classification_with_pycaret.ipynb) [![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/feature%2Fpycaret_example/topic/machine-learning/classification-automl/automl_classification_with_pycaret.ipynb)

This notebook explores the Pycaret framework and shows how to use it to
train different classification models - using a user churn dataset as an
example. The notebook demonstrates the usage of Pycaret to automatically train
and benchmark a multitude of models and at the end select the best performing
model. The notebook also shows how to use CrateDB as storage for both the raw
data and the expirement tracking and model registry data.

- Accompanied to the Jupyter Notebook files, there are also basic variants of
the above examples,
[automl_classification_with_pycaret.py](automl_classification_with_pycaret.py).

[Pycaret]: https://github.com/pycaret/pycaret
[CrateDB]: https://github.com/crate/crate
[Introduction to hyperparameter tuning]: https://medium.com/analytics-vidhya/comparison-of-hyperparameter-tuning-algorithms-grid-search-random-search-bayesian-optimization-5326aaef1bd1

Large diffs are not rendered by default.

Loading