Skip to content

Latest commit

 

History

History

04-analytics-engineering

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Module 4: Analytics Engineering

Goal: Transforming the data loaded in DWH into Analytical Views developing a dbt project.

Prerequisites

By this stage of the course you should have already:

  • A running warehouse (BigQuery or postgres)
  • A set of running pipelines ingesting the project dataset (week 3 completed)
  • The following datasets ingested from the course Datasets list:
    • Yellow taxi data - Years 2019 and 2020
    • Green taxi data - Years 2019 and 2020
    • fhv data - Year 2019.

Note

  • We have prepared a python script for loading the data that can be found through week3/extras.

Setting up your environment

Note

the cloud setup is the preferred option.

the local setup does not require a cloud database.

Alternative A Alternative B
Setting up dbt for using BigQuery (cloud) Setting up dbt for using Postgres locally
- Open a free developer dbt cloud account following this link - Open a free developer dbt cloud account following this link

- Following these instructions to connect to your BigQuery instance - follow the official dbt documentation or
- follow the dbt core with BigQuery on Docker guide to setup dbt locally on docker or
- use a docker image from official Install with Docker.
- More detailed instructions in dbt_cloud_setup.md - You will need to install the latest version with the BigQuery adapter (dbt-bigquery).
- You will need to install the latest version with the postgres adapter (dbt-postgres).
After local installation you will have to set up the connection to PG in the profiles.yml, you can find the templates here

Content

Introduction to analytics engineering

  • What is analytics engineering?
  • ETL vs ELT
  • Data modeling concepts (fact and dim tables)

What is dbt?

  • Introduction to dbt

Starting a dbt project

Alternative A Alternative B
Using BigQuery + dbt cloud Using Postgres + dbt core (locally)
- Starting a new project with dbt init (dbt cloud and core)
- dbt cloud setup
- project.yml

- Starting a new project with dbt init (dbt cloud and core)
- dbt core local setup
- profiles.yml
- project.yml

dbt models

  • Anatomy of a dbt model: written code vs compiled Sources
  • materializations: table, view, incremental, ephemeral
  • Seeds, sources and ref
  • Jinja and Macros
  • Packages
  • Variables

Note

This video is shown entirely on dbt cloud IDE but the same steps can be followed locally on the IDE of your choice

Tip

  • If you receive an error stating "Permission denied while globbing file pattern." when attempting to run fact_trips.sql this video may be helpful in resolving the issue

Testing and documenting dbt models

  • Tests
  • Documentation

Note

This video is shown entirely on dbt cloud IDE but the same steps can be followed locally on the IDE of your choice

Deployment

Alternative A Alternative B
Using BigQuery + dbt cloud Using Postgres + dbt core (locally)
- Deployment: development environment vs production
- dbt cloud: scheduler, sources and hosted documentation
- Deployment: development environment vs production
- dbt cloud: scheduler, sources and hosted documentation

visualizing the transformed data

🎥 Google data studio Video (Now renamed to Looker studio)

🎥 Metabase Video

Extra resources

Note

If you find the videos above overwhelming, we recommend completing the dbt Fundamentals course and then rewatching the module. It provides a solid foundation for all the key concepts you need in this module.

Advanced concepts

SQL refresher

The homework for this module focuses heavily on window functions and CTEs. If you need a refresher on these topics, you can refer to these notes.

Homework

Community notes

Did you take notes? You can share them here.

Useful links