Add functional_dependency test (PR included)

See PR #1019

### Describe the feature

Add a functional_dependency test:

```yaml
models:
  - name: orders
    columns:
      - name: customer_name
        tests:
        - dbt_utils.functional_dependency:
            depends_on:
              - customer_id
```

One column is [functionally dependent](https://en.wikipedia.org/wiki/Functional_dependency#Employee_department) on some other column(s) in a table if each distinct combination of those other column(s) always leads to the same (possibly null) value in our one column.

*Example*. In this table, `customer_name` is functionally dependent on `customer_id`. The important part is that both rows for customer 2 have the same name "Brock."

| order_id | customer_id | customer_name |
| --- | --- | --- |
| 1001 | 1 | Ash |
| 1002 | 2 | Brock |
| 1003 | 2 | Brock |
| 1004 | 3 | Ash |
| 1005 | 4 | |

<br>
<br>

### Describe alternatives you've considered

(1) **Practitioners make their own custom generic test**.  However, functional dependency is pretty root-level.

(2) **Practitioners add a normalizing table and test that downstream table for uniqueness**. For example, materialize a table using `select distinct customer_id, customer_name from orders` and check `customer_id` for uniqueness.

(3) **Practitioners adapt a dbt_expectations test as a workaround**. Functional dependency can be considered "constant within groups" aka "less than 2 distinct within groups." Metabase's dbt_expectations has an appropriate test. However, using a workaround probably obscures what we're testing.

```yaml
models:
  - name: orders
    columns:
      - name: customer_name
        tests:
          - dbt_expectations.expect_column_distinct_count_to_be_less_than:
              value: 2
              group_by:
                - customer_id
```

(4) **dbt Utils adds an is_constant test so it can be adapted as a workaround**. We can't do the dbt_expectations workaround in dbt_utils alone at present, because we don't have a is_constant test. We could add an is_constant test if that seems useful (or extend another test to cover that use case), and then build into it group_by functionality. However, an is_constant test isn't too useful otherwise.


### Additional context
Not database-specific.


### Who will this benefit?

Any dbt practitioner with messy source data!

This test is often useful for denormalized source data, where logical relationships between fields are implicitly expected but don't always hold, due to manual entry errors, or merges from different systems. Broken functional dependencies often surface as dupes and other anomalies downstream.

Even when downstream dupes are due to an "error in a model," the error might be that the model's logic assumes a functional dependency upstream that used to hold. For example, the doctor misspells a patient's name in 1 visit log, and now various models are doubling all that patient's rows.

### Are you interested in contributing this feature?
Yes! Here's the PR: https://github.com/dbt-labs/dbt-utils/pull/1019


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add functional_dependency test (PR included) #1020

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add functional_dependency test (PR included) #1020

Description

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions