Skip to content

Latest commit

 

History

History
162 lines (87 loc) · 7.35 KB

README.md

File metadata and controls

162 lines (87 loc) · 7.35 KB

Ploomber sample projects

CI

Join our community | Newsletter | Docs | Twitter | Blog | YouTube | Contact us

Tip

Deploy AI apps for free on Ploomber Cloud!

This repository contains sample pipelines developed using Ploomber.

Note: We recommend you to go through the first tutorial to learn the basics of Ploomber.

Running examples

Use Colab:

Open In Colab

Or run locally:

pip install ploomber

# list examples
ploomber examples

# download example with name
ploomber examples --name {name}

# example
ploomber examples --name templates/mlflow

How to read the examples

Each example contains a README.md file that describes it; a README.ipynb is also available with the same contents but in Jupyter notebook format and with command outputs. In addition, files for pip (requirements.txt) and conda (environment.yml) are provided for local execution.

Index

Templates

Starting points for common use cases. Use them to ramp up a project quickly.

  1. templates/etl Download a data file, upload it to a database, process it, and plot with Python and R.

  2. templates/exploratory-analysis Sample pipeline that explores penguins data.

  3. templates/google-cloud Use Google Cloud and Ploomber to develop a scalable and production-ready pipeline.

  4. templates/ml-advanced ML pipeline using the Python API. Shows how to create a Python package, test it with pytest, and train models in parallel.

  5. templates/ml-basic Download data, clean it, generate features and train a model.

  6. templates/ml-intermediate Training and serving ML pipelines with integration testing to evaluate training data quality.

  7. templates/ml-online Load data, generate features, train a model, and deploy model with flask.

  8. templates/mlflow Train a grid of models and log them to MLflow.

  9. templates/python-api Loads, clean, and plot data using the Python API.

  10. templates/pytorch Using GPUs to train models in Ploomber Cloud.

  11. templates/shell Create a pipeline with shell scripts as tasks.

  12. templates/spec-api-directory Create a pipeline from a directory with scripts (without a pipeline.yaml file).

  13. templates/spec-api-r Load, clean and plot data with R.

  14. templates/spec-api-sql Use SQL scripts to manipulate data in a database, dump a table, and plot it with Python.

Cookbook

Short and to-the-point examples showing how to use a specific feature.

  1. cookbook/dynamic-params Pipeline parameters whose values are computed at runtime.

  2. cookbook/file-client Upload task's products upon execution (local, S3, GCloud storage)

  3. cookbook/grid An example showing how to create a grid of tasks to train models with different parameters.

  4. cookbook/hooks Task hooks

  5. cookbook/incremental A pipeline that processes new records from a database and uploads them.

  6. cookbook/nested-cv Nested cross-validation for model selection and hyperparameter tuning.

  7. cookbook/python-load Load pipeline.yaml file in a Python session to customize initialization.

  8. cookbook/report-generation Generating HTML/PDF reports.

  9. cookbook/serialization Shows how to use the serializer and unserializer decorators.

  10. cookbook/sql-dump A minimal example showing how to dump a table from a SQL database.

  11. cookbook/variable-number-of-products Shows how to create tasks whose number of products depends on runtime conditions.

Guides

In-depth tutorials for learning. These are part of the documentation.

  1. guides/cron This guide shows how to schedule Ploomber pipelines using cron.

  2. guides/debugging Tutorial showing techniques for debugging pipelines.

  3. guides/first-pipeline Introductory tutorial to learn the basics of Ploomber.

  4. guides/intro-to-ploomber Introductory tutorial to learn the basics of Ploomber.

  5. guides/logging Tutorial showing how to add logging to a pipeline.

  6. guides/parametrized Tutorial showing how to parametrize pipelines and change parameters from the command-line.

  7. guides/refactor Using Soorgeon to convert a notebook into a Ploomber pipeline.

  8. guides/serialization Tutorial explaining how the serializer and unserializer fields in a pipeline.yaml file work.

  9. guides/sql-templating Introductory tutorial teaching how to develop modular SQL pipelines.

  10. guides/testing Tutorial showing how to use a task's on_finish hook to test data quality.

  11. guides/versioning A tutorial showing how to version pipeline products.

Python API

The simplest way to get started with Ploomber is via the Spec API, which allows you to describe pipelines using a pipeline.yaml file, most examples on this repository use the Spec API. However, if you want more flexibility, you may write pipelines with Python.

The templates/python-api/ directory contains a project written using the Python API. And the python-api-examples/ includes some tutorials and more examples.

Micro-pipelines

In Ploomber 0.21, we introduced a simplified API to write pipelines in a single Jupyter notebook (or .py) file. This is a great option for small projects.

You can find the examples in the micro-pipelines/ directory.