Skip to content

Benchmark for Airflow with BigQuery as the Data Warehouse using TPC - DI

Notifications You must be signed in to change notification settings

raavioli/tpc-di_benchmark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TPC - DI - Benchmark artefacts for Cloud Composer and BigQuery

Install Airflow:

Create Airflow Docker Image

  1. Clone Repo Airflow Docker
  2. Build a docker image from
    cd docker-airflow
    docker build --rm --build-arg AIRFLOW_DEPS="gcp" -t tpc-di/benchmark-airflow .
  3. Ensure image is created by executing
    docker images

Start Airflow

  1. Run the following command to start Apache Airflow
    docker-compose up -d
  2. Navigate to Airflow Web UI Airflow Home Page

Create a Service Account in GCP for Airflow to use

  1. Navigate to GCP BigQuery API and enable it if not already enabled
  2. Navigate to the Credential Page GCP Credential Page
  3. Click on Create Credential button and choose Service Account GCP Credential Create
  4. Fill in Service Account Name and ID
  5. Click on Create
  6. Grant BigQuery Admin and Storage Admin roles and Continue GCP Assign Roles
  7. Click on Create Key and choose JSON key GCP Create Key
  8. Download generated JSON key and keep it safe

Create connections in Airflow Admin UI

  1. Navigate to Admin -> Connections Connections Page Airflow Connections Page
  2. Find bigquery_default connection or equivalent and edit to get the edit window BigQuery Connection Edit Page
  3. Fill in the contents of keyfile.json from GCP obtained in earlier step and save connection
  4. Repeat the above steps to setup connection for google_cloud_default

About

Benchmark for Airflow with BigQuery as the Data Warehouse using TPC - DI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 65.8%
  • TSQL 26.8%
  • PLSQL 6.5%
  • Shell 0.9%