Skip to content

Conversation

@andnig
Copy link
Contributor

@andnig andnig commented Nov 6, 2023

Summary of the changes / Why this is an improvement

Adds end to end example for machine learning model creation using AutoML with Pycaret.
The PR aims to highlight the use of Pycaret and the ease of use of beforementioned package.

The examples utilize cratedb as data store (and MLflow as honorable mention), however the texts and examples are not focussed on cratedb exclusively - rather on machine learning workflows.

As of now, this PR also provides a markdown export of the demo for easier readability (and maybe publishability) - however I'm open to removing the markdown parts for DRY reasons.

Tests are not part of this PR, I suggest to add them after discussing the results and form of the delivery first.

Checklist

  • CLA is signed

WORK IN PROGRESS: Request for feedback before finalization

@andnig andnig requested a review from amotl November 6, 2023 13:46
@andnig andnig marked this pull request as draft November 6, 2023 13:46
@andnig andnig force-pushed the feature/pycaret_example branch from f765808 to d5a9e16 Compare November 6, 2023 14:00
Copy link
Member

@amotl amotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added another suggestion about "naming things", wrt. to folder structure.

Copy link
Member

@amotl amotl Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 338397d, I've refactored the existing example to framework/mlflow-merlion. May I suggest to put your example side-by-side into framework/mlflow-automl?

We are employing a more technical, flat namespace here, not aligned to topics, but more to stable URLs. Topic-oriented concerns will be one layer on top. And we can well symlink stuff into a synthetic machine-learning folder if you really want to have it.

Otherwise, it will all be about proper guidance from the cratedb-guides:/docs/t/ml folder, which will be populated with prose within a specific section dedicated to machine learning topics, and link to the material here. At least, that's the idea.

Copy link
Member

@amotl amotl Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, it is easy to add content there, and it will be instantly rendered to the preview page at https://cratedb.com/docs/guide/t/ml/, all the wiring is in place already. Enjoy! 1

Footnotes

  1. Just add water.

    git clone https://github.com/crate/cratedb-guides
    cd cratedb-guides/docs
    make dev
    

Copy link
Member

@amotl amotl Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/guide/t/ml was just a suggestion, we can just say /guide/machine-learning there, if you like that URL slug better. I usually also prefer longer names -- so let's do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amotl Quick question, as the focus is not really on mlflow but rather the machine learning process itself, would a different dir name make sense?
Would you agree to ml-automl to have some sort of structure for machine learning in general?
However, no hard feelings there, your call.

Copy link
Member

@amotl amotl Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Before], we've refactored the existing example to framework/mlflow-merlion. May I suggest to put your example side-by-side into framework/mlflow-automl?

  1. Now, with Refactor content from framework folder into topic/machine-learning folder #123, this earlier proposal quickly became /topic/machine-learning/, and we agreed on it, correct?
  2. You suggested to use /topic/machine-learning/{classification,timeseries}-automl for your new examples, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this move makes total sense, what is a bit misleading now, is the folder https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/timeseries-basics --> the examples in there are mlflow, not timeseries-basics.

@amotl

Copy link
Member

@amotl amotl Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a different perspective, those are basic examples about time-series modeling in the context of machine learning, in contrast to other machine learning domains (e.g. llm tech), and in contrast to a more advanced example around time-series data. But we can sure name it differently, and keep a more technical name. Suggestions?

Copy link
Member

@amotl amotl Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GH-125 will rename the folder to mlops-mlflow, so it adheres to the same scheme like llm-langchain is doing it.

@andnig andnig force-pushed the feature/pycaret_example branch from af67167 to 8fd4885 Compare November 6, 2023 15:20
Copy link
Member

@amotl amotl Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add .csv files with Git LFS. If you don't want to have the hassle, just remove it from your patch completely (no git rm, just amend the original patch), and I will re-add it right away using Git LFS. 1

Footnotes

  1. I've downloaded the file already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe squash first, and then amend the most recent commit, in order to skip the .csv file back out again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amotl Would you mind taking over this change request? I have very hard feelings about rewriting the commit history. Sorry for being a Diva 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take it over, sure. Shall I merge it afterwards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch!
If you and Christian are satisfied, feel free to merge on your behalf.

I'm happy to assist with content change requests though.

Copy link
Member

@amotl amotl Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think after squashing the commits we are good to merge, when there are no other objections?
I will re-add the .csv as the final stroke before actually integrating the PR, when you are done with any other changes.

Copy link
Member

@amotl amotl Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amotl amotl changed the title WIP: Add AutoML classification example notebook, script and MD AutoML: Classification example notebook, demo program, and Markdown documentation Nov 6, 2023
@amotl amotl changed the title AutoML: Classification example notebook, demo program, and Markdown documentation [DRAFT] AutoML: Classification example notebook, demo program, and Markdown documentation Nov 8, 2023
@amotl amotl changed the title [DRAFT] AutoML: Classification example notebook, demo program, and Markdown documentation [DRAFT] ML: AutoML classification example notebook, demo program, and Markdown documentation Nov 8, 2023
@amotl
Copy link
Member

amotl commented Nov 8, 2023

This PR has been converged into GH-131. Thank you again.

@amotl amotl closed this Nov 8, 2023
@amotl amotl deleted the feature/pycaret_example branch November 13, 2023 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants