-
Notifications
You must be signed in to change notification settings - Fork 9
[DRAFT] ML: AutoML classification example notebook, demo program, and Markdown documentation #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f765808 to
d5a9e16
Compare
amotl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added another suggestion about "naming things", wrt. to folder structure.
machine-learning/readme.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With 338397d, I've refactored the existing example to framework/mlflow-merlion. May I suggest to put your example side-by-side into framework/mlflow-automl?
We are employing a more technical, flat namespace here, not aligned to topics, but more to stable URLs. Topic-oriented concerns will be one layer on top. And we can well symlink stuff into a synthetic machine-learning folder if you really want to have it.
Otherwise, it will all be about proper guidance from the cratedb-guides:/docs/t/ml folder, which will be populated with prose within a specific section dedicated to machine learning topics, and link to the material here. At least, that's the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, it is easy to add content there, and it will be instantly rendered to the preview page at https://cratedb.com/docs/guide/t/ml/, all the wiring is in place already. Enjoy! 1
Footnotes
-
Just add water.
↩git clone https://github.com/crate/cratedb-guides cd cratedb-guides/docs make dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/guide/t/ml was just a suggestion, we can just say /guide/machine-learning there, if you like that URL slug better. I usually also prefer longer names -- so let's do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amotl Quick question, as the focus is not really on mlflow but rather the machine learning process itself, would a different dir name make sense?
Would you agree to ml-automl to have some sort of structure for machine learning in general?
However, no hard feelings there, your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Before], we've refactored the existing example to
framework/mlflow-merlion. May I suggest to put your example side-by-side intoframework/mlflow-automl?
- Now, with Refactor content from
frameworkfolder intotopic/machine-learningfolder #123, this earlier proposal quickly became/topic/machine-learning/, and we agreed on it, correct? - You suggested to use
/topic/machine-learning/{classification,timeseries}-automlfor your new examples, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this move makes total sense, what is a bit misleading now, is the folder https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/timeseries-basics --> the examples in there are mlflow, not timeseries-basics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a different perspective, those are basic examples about time-series modeling in the context of machine learning, in contrast to other machine learning domains (e.g. llm tech), and in contrast to a more advanced example around time-series data. But we can sure name it differently, and keep a more technical name. Suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GH-125 will rename the folder to mlops-mlflow, so it adheres to the same scheme like llm-langchain is doing it.
apparently github doesn't render style tags
af67167 to
8fd4885
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe squash first, and then amend the most recent commit, in order to skip the .csv file back out again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amotl Would you mind taking over this change request? I have very hard feelings about rewriting the commit history. Sorry for being a Diva 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take it over, sure. Shall I merge it afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a bunch!
If you and Christian are satisfied, feel free to merge on your behalf.
I'm happy to assist with content change requests though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think after squashing the commits we are good to merge, when there are no other objections?
I will re-add the .csv as the final stroke before actually integrating the PR, when you are done with any other changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I've decided not to touch your working branch, but divert the integration to ML: Add AutoML classification example using PyCaret #131 instead.
- The
churn-dataset.csvfile is now managed out-of-band instead, see Add Telecom Provider "churn" dataset cratedb-datasets#3.
...n-automl/automl_classification_with_pycaret_files/automl_classification_with_pycaret_5_0.png
Show resolved
Hide resolved
topic/machine-learning/classification-automl/automl_classification_with_pycaret.ipynb
Outdated
Show resolved
Hide resolved
topic/machine-learning/classification-automl/automl_classification_with_pycaret.ipynb
Outdated
Show resolved
Hide resolved
topic/machine-learning/classification-automl/automl_classification_with_pycaret.ipynb
Show resolved
Hide resolved
|
This PR has been converged into GH-131. Thank you again. |
Summary of the changes / Why this is an improvement
Adds end to end example for machine learning model creation using AutoML with Pycaret.
The PR aims to highlight the use of Pycaret and the ease of use of beforementioned package.
The examples utilize cratedb as data store (and MLflow as honorable mention), however the texts and examples are not focussed on cratedb exclusively - rather on machine learning workflows.
As of now, this PR also provides a markdown export of the demo for easier readability (and maybe publishability) - however I'm open to removing the markdown parts for DRY reasons.
Tests are not part of this PR, I suggest to add them after discussing the results and form of the delivery first.
Checklist