Incorrect user_identifier selected if snowplow__user_sql or user_identifiers is inconsistent #193
Open
1 of 6 tasks
Labels
status:needs_triage
Needs maintainer triage.
type:bug
Bugs or weaknesses. The issue has to contain steps to reproduce.
Describe the bug
When building base_create_snowplow_sessions_lifecycle_manifest with the
snowplow__user_sql
var populated, or ifsnowplow__user_identifiers
has > 1 value, if the value produced is inconsistent throughout a session, the value chosen asuser_identifier
can be unexpected because the values are aggregated viamax()
.This can occur for instance if
user_id
is one of the preferred values and one or more event in the session does not have auser_id
; e.g. with asnowplow__user_sql
likecoalesce(user_id, domain_userid)
if some events are missing theuser_id
, then thedomain_userid
will be selected for those events instead; the finaluser_identifier
when will be whichever value is largest, which will not consistently beuser_id
if the randomdomain_userid
happens to be larger.This is unexpected because the intent between coalescing multiple values is generally that the first found value is preferred but you will randomly get other values.
For
snowplow__user_identifiers
, it would be preferable if rather thanmax(coalesce(..., ..., ...))
the computation wascoalesce(max(...), max(...), max(...))
so preferences would be more deterministic.For
snowplow__user_sql
you can not use this approach because you can not nest aggregate functions. As a temporary workaround we're using an approach like:to turn the outer
max()
call into a window function and allow the inner aggregates; this is obviously undesirable.Steps to reproduce
user_identifier
options that are inconsistent throughout a sessionuser_identifier
valuesExpected results
Actual results
Which database are you using dbt with?
Additional context
This won't impact most users that use the default identifier settings, but for those that have their own surrogate values used for user identification if an event is missing the surrogates for any reason this issue occurs.
The text was updated successfully, but these errors were encountered: