A library for defining and generating Apache Airflow DAGs declaratively using YAML. Currently focused on orchestration of GCP resources (Dataproc, BigQuery, Dataform) and DBT.
Note
This library is currently in Preview.
orchestration-pipelines allows you to define complex data workflows in simple, human-readable YAML files. It abstracts away the boilerplate of writing Airflow DAGs in Python, making it easier for non-Python experts to create and manage pipelines.
Python >= 3.9
- Declarative DAGs: Define your pipeline structure, triggers, and actions in YAML.
- Rich Actions Support: Built-in support for:
- Python Scripts
- Google Cloud BigQuery
- Google Cloud Dataproc (Serverless, Ephemeral and existing clusters)
- Google Cloud Dataform
- DBT
- Automatic Generation: A simple Python call generates the full Airflow DAG.
- Versioning: Supports versioning of pipelines via a manifest file(as of Preview, on Google Cloud Composer).
You can install orchestration-pipelines from PyPI:
pip install orchestration-pipelinesImportant
Ensure your apache-airflow-client version is fully compatible with Airflow 3 to prevent critical DAG parsing or runtime errors. This package utilizes Airflow Client API calls to interact with the metadata database; apache-airflow-client library introduces significant architectural shifts in newer versions, a version mismatch will likely break communication and disrupt your pipelines. Always verify that your client version aligns with your Airflow environment to ensure stability.
Create a file named my_pipeline.yml:
modelVersion: "1.0"
pipelineId: "my_pipeline"
description: "A simple example pipeline"
runner: "airflow"
defaults:
projectId: "your-gcp-project"
location: "us-central1"
triggers:
- schedule:
interval: "0 4 * * *"
startTime: "2026-01-01T00:00:00"
catchup: false
actions:
- sql:
name: "create_table"
query:
inline: "CREATE TABLE IF NOT EXISTS `your-gcp-project.my_dataset.my_table` (id INT64, name STRING);"
engine:
bigquery:
location: "US"Create a Python file named my_pipeline.py in your Airflow DAGs folder:
from orchestration_pipelines_lib.api import generate
# Generate Airflow DAG from pipeline definition file
# airflow | dag
# Root is "dags" directory in Composer bucket
generate("dataform-pipeline-local.yml")Airflow will parse this Python file and automatically generate the DAG based on your YAML definition.
You can manage multiple versions of your pipelines using a manifest.yaml file. This allows you to specify which version of a pipeline should be active.
See the examples/ directory for a sample manifest.yaml and how to use it.
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.