Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Wrong Cast in macro type_string for Databricks Hash Function #238

Open
ambullo opened this issue Jul 15, 2024 · 3 comments
Open

[BUG] Wrong Cast in macro type_string for Databricks Hash Function #238

ambullo opened this issue Jul 15, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@ambullo
Copy link

ambullo commented Jul 15, 2024

Describe the bug
In the type_string.sql line 22 to 31 you do a cast to a varchar based on the hashtype. This is not appropriate for databricks as databricks does not know the type varchar. The comparable type in databricks would be string and for it no length is defined.

This leads then so an output in the hash algorithm like this:
UPPER(SHA2(NULLIF(CONCAT(
IFNULL(NULLIF(UPPER(TRIM(CAST(NaturalKey AS VARCHAR(32)))), ''), '^^')
), '^^'), 256)) AS NaturalKey_HK,

This is bad, since the cast to 32 bit could potentially truncate the to be hashed column and lead to unwanted hashing collisions. Luckily, databricks does not truncate the input column. Nevertheless, it should be fixed as it has the potential to corrupt all hash keys.

Environment

dbt version: 1.8.3
automate_dv version: tested on 10.2 but most likely also on 11.0
Database/Platform: Databricks
hash algorithm: SHA
enable_binary_hash: false

Expected behavior
Replace Cast to Varchar with Cast to String

AB#5567

@ambullo ambullo added the bug Something isn't working label Jul 15, 2024
Copy link

azure-boards bot commented Jul 15, 2024

✅ Successfully linked to Azure Boards work item(s):

@DVAlexHiggs
Copy link
Member

DVAlexHiggs commented Jul 15, 2024

Hi @ambullo, thanks for this thorough and detailed report.

I believe we fixed this as part of v0.11.0. v0.11.0 was explicitly an update to resolve hashing and handling of hash types correctly in both Databricks and BigQuery. Please can you upgrade and try it out?

If it's still an issue then we'd be happy to take a look, however 😄 Thanks!

@ambullo
Copy link
Author

ambullo commented Jul 15, 2024

I just checked. this bug report applies also to version v0.11.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants