Skip to content

[Bug] Source not working as expected when using union feature and default single schema not present #22

@fivetran-joemarkiewicz

Description

@fivetran-joemarkiewicz

Is there an existing issue for this?

  • I have searched the existing issues

Describe the issue

This Issue

When testing the latest version of the package we uncovered an issue where the union data feature does not work as expected if the rag_*_schema variable is not also defined and the default schema set within the source file is not the name of one of the schemas within the rag_*_union_schemas variable. The end result is not a failure but simply an empty end model and a staging model with rows, but they're all null records. Also, there is no "Table is missing so we're creating an empty model" error message which causes extra confusion.

Recreating the Issue

You can recreate this error by following the below steps

  • Ensure you do not have a hubspot schema (default schema defined by rag_hubspot_schema) in your destination.
  • Do not set a variable for rag_hubspot_schema
  • Instead define the rag_hubspot_union_schemas variable as your single (or multiple) hubspot schemas. Such as the example below
vars:
  rag_hubspot_union_schemas: ['hubspot_testing'] ## In this scenario the `hubspot` schema (default) also doesn't exist. 
  rag__using_zendesk: false
  rag__using_jira: false
  rag__using_hubspot: true
  • Run the models and see there are no warnings indicating the schemas weren't found or any other errors.
  • Review the staging and end models and see that there are either no records or records populated with all null values except the source_relation field.

I realized when digging into this that the union data macro is working as expected, which is why we don't see the CLI warning that an empty table will be populated. Instead the error is presented in the fields cte, specifically the source_columns parameter. The reason this is an issue is because the source defined here doesn't actually exist. Therefore, the result of that parameter is simply no fields ([]). As a result, when the match between the staging_columns and the source_columns is initiated, there are essentially no matches. This is why we see records in the staging model, but they're all null.

Based off this, my theory is that if we can ensure the source properly points to one of the schemas in the union variable then this should work as expected.

Possible Solution

If we can ensure the source_columns reference returns rows as expected then this will address the issue. As such, we can explore a few avenues to address this issue:

  • Ensure the default schema is at least one of the union schemas if defined.
    • I'm unsure of this approach as it's a bit of a workaround, but it will still allow proper execution.
  • Support the latest version of the union data macro (such as what we have implemented in Zendesk). I'm unsure if this will solve this issue directly, but worth the exploration.
  • Remove the fields cte and the fill_staging_columns macro.
    • This could result in more missing field errors, but still a possible solution. Albeit with downsides that must be weighed and discussed.
  • Any other ideas to ensure this works as expected. Obviously this will be addressed if the customer defines the individual sources for each union data schema. However, that's not realistic to expect a customer to do. Also, we should ensure this works regardless of that effort.

Relevant error log or model output

There actually is no error which is concerning. Instead the models succeed without issue, but the end models have no data within them. This is due to the fields cte in the staging models simply populating null values.

Expected behavior

The union schemas feature works as expected without also needing the rag_*_schema variable defined to create a placeholder source.

Possible solution

See above notes, we should explore updating the union feature model to the one most recently used in other packages.

dbt Project configurations

The below results in successful model runs, but completely empty end models.

vars:
  rag_hubspot_union_schemas: ['hubspot_testing'] ## In this scenario the `hubspot` schema (default) also doesn't exist. 
  rag__using_zendesk: false
  rag__using_jira: false
  rag__using_hubspot: true

However, the below results in a successful model runs and the fully populated end model with the proper results.

vars:
  rag_hubspot_schema: hubspot_testing
  rag_hubspot_union_schemas: ['hubspot_testing'] ## In this scenario the `hubspot` schema (default) also doesn't exist. 
  rag__using_zendesk: false
  rag__using_jira: false
  rag__using_hubspot: true

The above prove that it's the misconfiguration of the source that's resulting in the error.

Package versions

Latest

What database are you using dbt with?

snowflake

How are you running this dbt package?

dbt Core™

dbt Version

Latest

Additional Context

No response

Are you willing to open a PR to help address this issue?

  • Yes.
  • Yes, but I will need assistance.
  • No.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions