Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Workflow orchestration with Prefect

Python Prefect Pandas Docker

License

This GitHub project streamlines Prefect Flows to fetch NYC Taxi Tripdata CSV datasets from specified endpoints in app.yml and seamlessly sink them into Postgres and Google Cloud Storage.

Note: The Prefect Orion server is now called Prefect Server

Tech Stack

Up and Running

Developer Setup

1. Install the dependencies on pyproject.toml:

uv sync

2. Activate the virtualenv created by uv:

source .venv/bin/activate

3. (Optional) Install pre-commit:

brew install pre-commit

# From root folder where `.pre-commit-config.yaml` is located, run:
pre-commit install

4. Start the Prefect Server:

prefect server start

5. Setup Prefect server for the flows:

prefect config set PREFECT_API_URL=http://127.0.0.1:4200/api

Prefect Flows

flows/web_csv_to_gcs.py:

Set the GOOGLE_APPLICATION_CREDENTIALS env variable.

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/gcs_credentials.json

Execute the flow with:

python flows/web_cs_to_gcs.py

flows/sqlalchemy_ingest.py:

Set the environment variables:

export DB_USERNAME=postgres
export DB_PASSWORD=postgres
export DB_HOST=localhost
export DB_PORT=5432
export DB=nyc_taxi

And then execute with:

python flows/sqlalchemy_ingest.py

TODO:

  • PEP-517: Packaging and dependency management with uv
  • Deploy Prefect Server / Agent on Docker
  • Code format/lint with Ruff
  • Run Prefect flows on Docker