Name		Name	Last commit message	Last commit date
parent directory ..
analyses		analyses
macros		macros
models		models
seeds		seeds
snapshots		snapshots
tests		tests
Dockerfile		Dockerfile
README.md		README.md
dbt_project.yml		dbt_project.yml
package-lock.yml		package-lock.yml
packages.yml		packages.yml
profiles.tmpl.yml		profiles.tmpl.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

README.md

dbt and Redshift for Analytics

This project is meant for experimenting with dbt and the dbt-redshift adapter for Analytics, using NYC TLC Trip Record dataset as the datasource, with Kimball dimensional modeling technique.

IMPORTANT NOTE: To access awsdatacatalog from RedShift, IAM auth method is required. It also explicitly needs USAGE grants that DB, therefore, on Redshift Query Editor, run:

GRANT USAGE ON DATABASE awsdatacatalog to "IAM:my_iam_user";
GRANT ALL ON DATABASE <DATABASE_NAME> to "IAM:my_iam_user";

Getting Started

1. Install dependencies from pyproject.toml and activate the created virtualenv:

uv sync && source .venv/bin/activate

2. (Optional) Install pre-commit:

brew install pre-commit

# From root folder where `.pre-commit-config.yaml` is located, run:
pre-commit install

3. Setup dbt profiles.yaml accordingly (use the profiles.tmpl.yaml as template)

3.1. By default, the profiles_dir is the user '$HOME/.dbt/'

mkdir -p ~/.dbt/
cat profiles.tmpl.yml >> ~/.dbt/profiles.yml

3.2. Set the environment variables for dbt-bigquery:

export DBT_REDSHIFT_HOST=redshift.[id].[region].redshift-serverless.amazonaws.com
export DBT_REDSHIFT_DATABASE=dev
export DBT_REDSHIFT_USE_DATA_CATALOG=1
export DBT_REDSHIFT_SOURCE_GLUE_CATALOG_DB=raw_nyc_tlc_tripdata
export DBT_REDSHIFT_TARGET_SCHEMA=nyc_tlc_record_data

3.3. Also, either have your AWS credentials set on ~/.aws/credentials or set them as well:

export AWS_ACCESS_KEY_ID=AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=AWS_SECRET_ACCESS_KEY

4. Install dbt dependencies and trigger the pipeline

4.1. Run dbt deps to install dbt plugins

dbt deps

4.2. Run dbt seed to push/create the tables from the .csv seed files to the target schema

dbt seed

4.3. Run dbt run to trigger the dbt models to run

dbt build

# Alternatively you can run only a subset of the models with:

## +models/staging: Runs the dependencies/preceding models first that lead 
## to 'models/staging', and then the target models
dbt [build|run] --select +models/staging

## models/staging+: Runs the target models first, and then all models that depend on it
dbt [build|run] --select models/staging+

5. Generate the Docs and the Data Lineage graph with:

dbt docs generate
dbt docs serve

Access the generated docs at:

open http://localhost:8080

Containerization

1. Build the Docker Image with:

docker build -t dbt-redshift:latest . --no-cache

2. Start a container with it:

docker run -d --rm \
  -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY} \
  -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
  -e DBT_REDSHIFT_HOST=${DBT_REDSHIFT_HOST} \
  -e DBT_REDSHIFT_DATABASE=dev \
  -e DBT_REDSHIFT_USE_DATA_CATALOG=1 \
  -e DBT_REDSHIFT_SOURCE_GLUE_CATALOG_DB=raw_nyc_tlc_tripdata \
  -e DBT_REDSHIFT_TARGET_SCHEMA=nyc_tlc_record_data \
  --name dbt-redshift \
  dbt-redshift

TODO's:

PEP-517: Packaging and dependency management with uv
Bootstrap dbt with Redshift Adapter (dbt-redshift)
Add dbt macro to configure target schemas dinamically
Run dbt-core in Docker
Terraform AWS Glue Catalog and Crawler

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

redshift

redshift

README.md

dbt and Redshift for Analytics

Getting Started

Containerization

TODO's:

Files

redshift

Directory actions

More options

Directory actions

More options

Latest commit

History

redshift

Folders and files

parent directory

README.md

dbt and Redshift for Analytics

Getting Started

Containerization

TODO's: