This GitHub project streamlines Prefect Flows
to fetch NYC Taxi Tripdata CSV datasets from specified endpoints in app.yml and seamlessly sink them into Postgres and Google Cloud Storage.
Note: The Prefect Orion
server is now called Prefect Server
1. Install the dependencies on pyproject.toml
:
uv sync
2. Activate the virtualenv created by uv
:
source .venv/bin/activate
3. (Optional) Install pre-commit:
brew install pre-commit
# From root folder where `.pre-commit-config.yaml` is located, run:
pre-commit install
4. Start the Prefect Server:
prefect server start
5. Setup Prefect server for the flows:
prefect config set PREFECT_API_URL=http://127.0.0.1:4200/api
Set the GOOGLE_APPLICATION_CREDENTIALS
env variable.
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/gcs_credentials.json
Execute the flow with:
python flows/web_cs_to_gcs.py
Set the environment variables:
export DB_USERNAME=postgres
export DB_PASSWORD=postgres
export DB_HOST=localhost
export DB_PORT=5432
export DB=nyc_taxi
And then execute with:
python flows/sqlalchemy_ingest.py
- PEP-517: Packaging and dependency management with
uv
- Deploy Prefect Server / Agent on Docker
- Code format/lint with Ruff
- Run Prefect flows on Docker