-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Is there an existing issue for this?
- I have searched the existing issues
Describe the issue
This Issue
When testing the latest version of the package we uncovered an issue where the union data feature does not work as expected if the rag_*_schema
variable is not also defined and the default schema set within the source file is not the name of one of the schemas within the rag_*_union_schemas
variable. The end result is not a failure but simply an empty end model and a staging model with rows, but they're all null
records. Also, there is no "Table is missing so we're creating an empty model" error message which causes extra confusion.
Recreating the Issue
You can recreate this error by following the below steps
- Ensure you do not have a
hubspot
schema (default schema defined byrag_hubspot_schema
) in your destination. - Do not set a variable for
rag_hubspot_schema
- Instead define the
rag_hubspot_union_schemas
variable as your single (or multiple) hubspot schemas. Such as the example below
vars:
rag_hubspot_union_schemas: ['hubspot_testing'] ## In this scenario the `hubspot` schema (default) also doesn't exist.
rag__using_zendesk: false
rag__using_jira: false
rag__using_hubspot: true
- Run the models and see there are no warnings indicating the schemas weren't found or any other errors.
- Review the staging and end models and see that there are either no records or records populated with all null values except the source_relation field.
I realized when digging into this that the union data macro is working as expected, which is why we don't see the CLI warning that an empty table will be populated. Instead the error is presented in the fields cte, specifically the source_columns parameter. The reason this is an issue is because the source defined here doesn't actually exist. Therefore, the result of that parameter is simply no fields ([]
). As a result, when the match between the staging_columns and the source_columns is initiated, there are essentially no matches. This is why we see records in the staging model, but they're all null.
Based off this, my theory is that if we can ensure the source properly points to one of the schemas in the union variable then this should work as expected.
Possible Solution
If we can ensure the source_columns reference returns rows as expected then this will address the issue. As such, we can explore a few avenues to address this issue:
- Ensure the default schema is at least one of the union schemas if defined.
- I'm unsure of this approach as it's a bit of a workaround, but it will still allow proper execution.
- Support the latest version of the union data macro (such as what we have implemented in Zendesk). I'm unsure if this will solve this issue directly, but worth the exploration.
- Remove the
fields
cte and thefill_staging_columns
macro.- This could result in more missing field errors, but still a possible solution. Albeit with downsides that must be weighed and discussed.
- Any other ideas to ensure this works as expected. Obviously this will be addressed if the customer defines the individual sources for each union data schema. However, that's not realistic to expect a customer to do. Also, we should ensure this works regardless of that effort.
Relevant error log or model output
There actually is no error which is concerning. Instead the models succeed without issue, but the end models have no data within them. This is due to the fields cte in the staging models simply populating null values.
Expected behavior
The union schemas feature works as expected without also needing the rag_*_schema
variable defined to create a placeholder source.
Possible solution
See above notes, we should explore updating the union feature model to the one most recently used in other packages.
dbt Project configurations
The below results in successful model runs, but completely empty end models.
vars:
rag_hubspot_union_schemas: ['hubspot_testing'] ## In this scenario the `hubspot` schema (default) also doesn't exist.
rag__using_zendesk: false
rag__using_jira: false
rag__using_hubspot: true
However, the below results in a successful model runs and the fully populated end model with the proper results.
vars:
rag_hubspot_schema: hubspot_testing
rag_hubspot_union_schemas: ['hubspot_testing'] ## In this scenario the `hubspot` schema (default) also doesn't exist.
rag__using_zendesk: false
rag__using_jira: false
rag__using_hubspot: true
The above prove that it's the misconfiguration of the source that's resulting in the error.
Package versions
Latest
What database are you using dbt with?
snowflake
How are you running this dbt package?
dbt Core™
dbt Version
Latest
Additional Context
No response
Are you willing to open a PR to help address this issue?
- Yes.
- Yes, but I will need assistance.
- No.