dq-vault

This DBT package provides an overview on Data Quality for all DataVault (DV) models in your DBT project.

dq-vault enables you to:

Monitor your test coverage for DV models
Monitor no. of tests are there for each model or each DV entity (Hub, Link, Satellite ...)
Be aware of tests that fail or throw warnings frequently
Compare the number of warnings/errors/test failures between each DBT run

What does this package do?

It works by scanning your DBT test run results related to DataVault models and putting them into the following views & metrics:

Views:
- dim_dv_test: information on all your available DBT tests (test name, test tag, which model or column it's assigned to ...)
- dim_dv_test_execution: details on each test run (execution time, run id, test name, status, failed rows count, test message ...)
- fact_dv_tests: a combined view of dim_dv_test and dim_dv_test_execution
- fact_dv_tests_unnest_models: fact_dv_tests but on the granularity of referenced models in each test
Metrics:
- average_test_execution_time
- no_of_test_of_each_model
- no_of_test_of_each_run
- test_status_of_each_run

What are the requirements to use this package?

A DBT project with:

DataVault models' names that match dq_vault__selected_model_rules below (the rules are configurable)
DBT tests for the mentioned models

Installation

packages:
  - package: infinitelambda/dq_vault
    version: [">=0.1.0", "<1.0.0"]

# dbt_project.yml
on-run-end:
  - '{{ dq_vault.store_test_results_json(results) }}'

Variables

vars:
  dq_vault__enable_store_test: true or false
  dq_vault__raw_db: 'your_custom_db or target.database'
  dq_vault__raw_schema: 'your_custom_schema or target.schema'
  dq_vault__selected_model_rules:
    - hub: ['hub']
    - sat: ['sat','satellite']
    - link: ['link','tlink','t_link','lnk','tlnk','t_lnk']
    - pit: ['pit']
    - bridge: ['bridge']
    - xts: ['xts']

In the above:

dq_vault__enable_store_test: bool
- Set as true to capture test results on the run-end of dbt command. Default false.
dq_vault__raw_db: string
- Configure the DATABASE for the raw test log table (RAW_TEST)
dq_vault__raw_schema: string
- Configure the SCHEMA for the raw test log table (RAW_TEST)
dq_vault__selected_model_rules: list
- Define the mapping for Data Vault model selection based on model names. The order of rules in the list determines the priority of classification.
- For example, with the default rules list ordered as above:
  - Model random_sample_hub.sql will be classified as HUB
  - Model this_sample_tlink will be classified as LINK
  - Model sample_SAT_hub.sql will also be classified as HUB and not SAT, because the hub rule ranks 1st in the rules list.

Classify the Test Type of your test cases

Currently, there are 4 built-in test types based on the test name:

Duplication: Generic tests' names that contain unique
Reconciliation:
- Singular tests
- Generic tests' names that contain the following: equality, equal
Reference: Generic tests' names that contain the following: reference, relationship
Unknown: default test type

Labeling test type for each test case:

Using test config

models:
  - name: my_model
    tests:
      - my_test:
          test_type: duplication

Using test meta

models:
  - name: my_model
    tests:
      - my_test:
          meta:
            test_type: duplication

Macros

List of custom built-in macros in this package:

store_test_results_json (source, doc)

get_datavault_type (source, doc)

get_test_type (source, doc)

where_select_dv_models (source, doc)

result_to_dict (source, doc)

get_raw_test (source, doc)

create_resources (source, doc)

refresh_resouces (source, doc)

Integration Tests

The integration_tests directory contains a DBT project that tests the macros/models/etc in this dq-vault package. An integration test typically involves making 1- a new seed file; 2- a new model file; 3- a generic test to assert anticipated behavior.

For an example on integration tests, check out the tests for get_datavault_type macro:

Once you've added all of these files, you should be able to run:

Make sure you are currently in the integration_tests folder,

dbt deps --target {your_target}
dbt seed --target {your_target}
dbt run --target {your_target} --model {your_model_name}
dbt test --target {your_target} --model {your_model_name}

Note: You might see some Failure and Warning while running tests for models specified in integration_tests. This is a part of this package testing and is completely normal. For the current version, running dbt build inside integration_tests will yield Completed with 10 errors and 3 warnings

Alternatively, at the repo's root (/dq-vault):

chmod +x run_test.sh
./run_test.sh {your_target} {your_models}

Once all the tests are passed you're good to go! All tests will be run automatically when you create a PR against this repo.

Developer's Guide

Quick Start (if you already setup the local dev, if not see the local dev setting up section below):

Start the shell

cd /path/to/dq-vault/integration_tests
python3 -m poetry shell

Some sample commands:

# Build model and capture test result - with refresh the resources (raw_tests table)
dbt build --exclude source:run_result_log+ tag:failed tag:sample_custom --vars '{dq_vault__enable_store_test: true, fresh: true}'
# Build integration test/models with potential failed cases
dbt build --select tag:failed tag:sample_custom --vars '{dq_vault__enable_store_test: true}'
# Build dq vault main models - downstream of test result log table
dbt build --select source:run_result_log+

Prerequisites:
- Install Python 3.9.6+ as recommended (specified in pyproject.toml)
  
  Assuming your python alias: python3
  
  Don't need to use an alias if your environment is not multi python version
- Install poetry
```
python3 -m pip install poetry
```
Setup dev local environment
1. Set working dir
```
cd /path/to/dq-vault
```
1. Install dependencies
```
python3 -m poetry install
```
1. Start shell (equivalent to activate virtualenv)
```
python3 -m poetry shell
```
1. Install dev dependencies
```
poe git-hooks
# Yes, it's poe, it's not a spelling mistake :)
```
Now you can set up your DBT:
- Verify dbt installed version
```
dbt --version
```
- Copy profiles to '.dbt' dir (create if not exists) under the Users dir.
```
# Linux/MacOs
mkdir ~/.dbt > /dev/null
cp ./profiles/profiles.yml ~/.dbt/profiles.yml
```
  NOTE: To simplify the dev, here we update the real password value (not using env_vars) in the profiles.yml after copying
- Check dbt configs:
```
cd integration_tests
dbt debug [--profiles-dir /path/to/profiles-dir]
```
- Run your model
```
dbt deps
dbt seed
dbt run
```
To exit the shell:
```
exit
# Enter
```

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
integration_tests		integration_tests
macros		macros
models		models
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
dbt_project.yml		dbt_project.yml
pyproject.toml		pyproject.toml
run_test.sh		run_test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dq-vault

What does this package do?

What are the requirements to use this package?

Variables

Classify the Test Type of your test cases

Labeling test type for each test case:

Macros

store_test_results_json (source, doc)

get_datavault_type (source, doc)

get_test_type (source, doc)

where_select_dv_models (source, doc)

result_to_dict (source, doc)

get_raw_test (source, doc)

create_resources (source, doc)

refresh_resouces (source, doc)

Integration Tests

Developer's Guide

About

Releases 2

Contributors 2

Languages

License

infinitelambda/dq-vault

Folders and files

Latest commit

History

Repository files navigation

dq-vault

What does this package do?

What are the requirements to use this package?

Variables

Classify the Test Type of your test cases

Labeling test type for each test case:

Macros

store_test_results_json (source, doc)

get_datavault_type (source, doc)

get_test_type (source, doc)

where_select_dv_models (source, doc)

result_to_dict (source, doc)

get_raw_test (source, doc)

create_resources (source, doc)

refresh_resouces (source, doc)

Integration Tests

Developer's Guide

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Contributors 2

Languages