Skip to content

deem-data/lester

Repository files navigation

Supplemental material for our CIDR submission: Messy Code Makes Managing ML Pipelines Difficult? Just Let LLMs Rewrite the Code!

Abstraction for ML pipelines

We provide the source code for our prototypical implementation of our proposed pipeline abstraction. Below are pointers to the core components of Lester:

Code Rewriting with LLMs

As detailed in our submission, we consider it future work to streamline this rewriting process with a conversational interface.

Experiments

Benefits of incremental view maintenance for a deployed ML pipeline

  1. Generate synthetic data for experimentation with the following jupyter notebook: https://github.com/deem-data/lester/blob/main/utils/generate_synthetic_data.ipynb
  2. Baseline -- retraining from scratch:
    • Implementation available at experiment__retraining_time.py
    • Execution via python experiment__retraining_time.py --num_customers <num_customers> --num_repetitions <num_repetitions>
  3. Incremental updates with Lester
    • Initial execution of the pipeline via python creditcard_example__initial_execution.py (requires adjustment of source paths to point to the generated data)
    • IVM update of the captured artifacts of the pipeline
      • Implementation available at experiment__ivm.py
      • Execution via python experiment__ivm.py --run_id <run_id> --num_customers <num_customers> --num_repetitions <num_repetitions>

User Study

We conduct a small user study to showcase that even basic tasks like computing certain metadata in ML pipelines are difficult for data scientists without system support. We provide the tasks, code, reference solution, questionaire and participant code.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors