This project demonstrates how to orchestrate DBT workflows on Snowflake using Apache Airflow.
It provides a modern, scalable data engineering pipeline leveraging Airflow for scheduling, DBT for transformations, and Snowflake as the data warehouse.
- Airflow: Orchestrates and schedules DBT runs.
- DBT: Handles SQL-based data transformations in Snowflake.
- Snowflake: Cloud data warehouse for storage and analytics.
- Python 3.8 or higher
- Snowflake account
- DBT CLI
- Apache Airflow
- Snowflake Python Connector
- DBT Snowflake Adapter
-
Clone the Repository
git clone https://github.com/<your-username>/snowflake-dbt-airflow-setup.git cd snowflake-dbt-airflow-setup
-
Create and Activate a Virtual Environment
python3 -m venv venv source venv/bin/activate -
Install Dependencies
pip install -r requirements.txt pip install dbt-snowflake
Airflow has already been installed and comes part of the requirements.txt file. Simply, type airflow version to confirm that airflow is up and running.
-
Set AIRFLOW_HOME as current directory
export AIRFLOW_HOME="$(pwd)"
-
Initialise Airflow
airflow db migrate
-
Kickstart Airflow using the pre build bash script
cd dbt_airflow ./operational_scripts/start_airflow.sh -
Access the airflow UI
Open your browser and navigate to http://localhost:8080. You should see the Airflow web interface.
- Stop Airflow services
To stop the Airflow services, you can use the provided stop script:
```sh
cd dbt_airflow
./operational_scripts/stop_airflow.sh
```
-
DAG not showing up:
- Ensure your DAG file is in the
$AIRFLOW_HOME/dagsdirectory. - The file must end with
.pyand contain a validDAGobject. - Restart the scheduler and webserver after adding new DAGs.
- Check logs in
$AIRFLOW_HOME/logsfor errors.
- Ensure your DAG file is in the
-
Port conflicts:
- If port 8080 is in use, start the webserver on a different port:
airflow webserver --port 8081
-
Database backend:
- For production, use PostgreSQL or MySQL instead of the default SQLite.
Update thesql_alchemy_connsetting inairflow.cfg.
- For production, use PostgreSQL or MySQL instead of the default SQLite.
- Do not commit your .env file or secrets to version control.
- Use Airflow Connections or environment variables for sensitive credentials.
- Regularly update dependencies to patch security vulnerabilities.
- Follow best practices for Snowflake security, including role-based access control and network policies.
- Regularly back up your Airflow metadata database.
- Monitor Airflow logs for any suspicious activity.