Define set of tests that are executed on models/columns that meet condition #5644
Replies: 2 comments 4 replies
-
@filipjelenkovicFO Thanks for opening the issue! I know this is one that's come up for many many folks, searching for a more ergonomic way to apply tests to many many models at once. Some recent Slack threads here + here I think the use cases here are totally legitimate. The question is all around how we'd want to model this in dbt's internals, to expose and enable the right user experience. Here are some possibilities I can imagine, starting linear and getting increasingly far out: Explicit configurationAt simplest, make it possible to add tests in models:
subdirectory:
# every model in this subdirectory has a column called 'id', with unique + not_null tests
# if any model *doesn't* have a column called 'id', the tests will still run (and fail)
+columns:
- name: id
tests:
- unique
- not_null A better approach, and heavier lift, would be to start treating all tests—or generic tests, specifically—as actual properties of models, rather than separate nodes (what they are today). Then, we'd make it possible to configure/add tests from Implicit derivationMake it possible to define rule sets: "if this {property}, then that {generic test}." Those rules would be defined ... somewhere ... such that dbt would infer or derive the creation of tests based on another model property. Imagine: Whenever dbt finds a model with a column tagged We'd need a place to put these rules, front-and-center, so that it isn't hidden magic which just takes effect at runtime. I hesitate to say Another form of implicit derivation, which you mention: If model A has Other ways to define project resourcesIn dbt today, if you want to code up one function that produces multiple project resources, that requires code generation. Lots of folks have done this, with scripts and helpers written in Python, targeting dbt code (yaml + Jinja-SQL) as their "compilation target." It's totally possible to do this by reading in dbt project data (yaml files) or metadata ( What if (huge what if) we could support alternative modes of defining dbt project resources? So in addition to standard Inspired by some pseudo-code from @lostmygithubaccount, when we were talking about this issue the other day: models = dbt.Manifest.models()
for model in models:
for column in model.columns:
if "primary_key" in column.tags:
new_test = dbt.Test(
test_name = "unique",
model = model.name,
column_name = column.name
)
dbt.Manifest.tests().add(new_test) Love it? Hate it? We're not sure if this sort of "dbt Python SDK" would be all that interesting to the average end user — but it could make a lot of sense for developers of packages and integrations, who will want to interact programmatically with dbt project contents. We already know many folks today who want to interact programmatically with dbt's runtime (tasks, hooks, logging, etc). I don't see us making immediate progress on this one. We will continue making stepwise progress toward better defining dbt's internal constructs, so that we could imagine exposing them via a Python API. In the meantime, this need is one that should stay on our minds. I'm going to convert this into a discussion; let's keep the conversation flowing over there. |
Beta Was this translation helpful? Give feedback.
-
I'm doing something to this effect with a post-hook macro today. This means that I can define the post-hook in dbt_project.yml once for a particular directory structure and it will always run for any models in that folder. I would prefer if I could define this as test(s) rather than a macro as that makes the coding simpler - I don't have to specifically write the error/warning code etc. I have also in places where I generate code also generated yml files with tests in them, which works well where there is generated code, if there is hand-written code then there is more work involved in managing those yml files (as you point out, reading, modifying and then writing them). I also find that this can create an issue of "cannot see the trees for the forrest" when commiting changes to git if there is a change made to the files (I have about 400 generated sql files today, so then 400 yml files to go with them, and hence on a commit 400 files changing). I know that all of this can be handled, but I do like the concept of define a test (either table or column level) in the dbt_project.yml as being a single place to define a test that is applied to 100's of models. |
Beta Was this translation helpful? Give feedback.
-
Is this your first time submitting a feature request?
Describe the feature
I would like to have an option to define set of tests that are executed on models that meet condition. This way if I would like to add another test I can do that in one place and not go over hundred of columns and add it.
Also when you run "dbt test" those test whould be aplied to all models/columns.
Describe alternatives you've considered
Currently I can manually manage set of tests but it requires great effort in managing them.
Who will this benefit?
For example apply/run all unique and not_null tests on all columns that are tagged as "primary". I might want to add test "greater then zero" test on those columns.
Also all models downstream of ModelA should have tests for recent data (my custom test).
There are certain tests that are added to models/columns of similar type/meaning/semantic. This will help in those situations
Are you interested in contributing this feature?
No response
Anything else?
No response
Beta Was this translation helpful? Give feedback.
All reactions