GitHub - deem-data/lester

Supplemental material for our CIDR submission: Messy Code Makes Managing ML Pipelines Difficult? Just Let LLMs Rewrite the Code!

Abstraction for ML pipelines

We provide the source code for our prototypical implementation of our proposed pipeline abstraction. Below are pointers to the core components of Lester:

Code Rewriting with LLMs

We provide the messy original code for the ML pipeline from our running example
We detail the hand-crafted prompts that we used for rewriting our example pipeline
We provide the generated pipeline code and mark code locations that needed manual fixing

As detailed in our submission, we consider it future work to streamline this rewriting process with a conversational interface.

Experiments

Benefits of incremental view maintenance for a deployed ML pipeline

Generate synthetic data for experimentation with the following jupyter notebook: https://github.com/deem-data/lester/blob/main/utils/generate_synthetic_data.ipynb
Baseline -- retraining from scratch:
- Implementation available at experiment__retraining_time.py
- Execution via python experiment__retraining_time.py --num_customers <num_customers> --num_repetitions <num_repetitions>
Incremental updates with Lester
- Initial execution of the pipeline via python creditcard_example__initial_execution.py (requires adjustment of source paths to point to the generated data)
- IVM update of the captured artifacts of the pipeline
  - Implementation available at experiment__ivm.py
  - Execution via python experiment__ivm.py --run_id <run_id> --num_customers <num_customers> --num_repetitions <num_repetitions>

User Study

We conduct a small user study to showcase that even basic tasks like computing certain metadata in ML pipelines are difficult for data scientists without system support. We provide the tasks, code, reference solution, questionaire and participant code.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.lester		.lester
.scratchspace		.scratchspace
data		data
lester		lester
user-study		user-study
utils		utils
.gitignore		.gitignore
README.md		README.md
creditcard_example__initial_execution.py		creditcard_example__initial_execution.py
experiment__ivm.py		experiment__ivm.py
experiment__retraining_time.py		experiment__retraining_time.py
generated_pipeline_code.py		generated_pipeline_code.py
llm-based-rewrites.md		llm-based-rewrites.md
messy_original_pipeline.py		messy_original_pipeline.py
neuralnet.py		neuralnet.py
requirements.txt		requirements.txt
study.md		study.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstraction for ML pipelines

Code Rewriting with LLMs

Experiments

Benefits of incremental view maintenance for a deployed ML pipeline

User Study

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Abstraction for ML pipelines

Code Rewriting with LLMs

Experiments

Benefits of incremental view maintenance for a deployed ML pipeline

User Study

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages