-
Notifications
You must be signed in to change notification settings - Fork 988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect schema change detection in spanner cdc to bq template #2183
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2183 +/- ##
=========================================
Coverage 46.97% 46.97%
- Complexity 4049 4050 +1
=========================================
Files 876 876
Lines 52193 52193
Branches 5502 5502
=========================================
+ Hits 24517 24519 +2
+ Misses 25920 25919 -1
+ Partials 1756 1755 -1
|
@@ -42,8 +42,8 @@ public static boolean detectDiffColumnInMod( | |||
mod.getNewValuesJson() == "" | |||
? new JSONObject("{}").keySet() | |||
: new JSONObject(mod.getNewValuesJson()).keySet(); | |||
// At this mod's spannerCommitTimestamp, one column is added/dropped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the user does end up dropping the column, but we don't update the stored schema?
Will we still do a read to Spanner for the dropped column? Will that cause NOT_FOUND errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right. Seems querying info schema columns is unavoidable even if there is no schema changes but this mod did not modify all tracking columns. This feels pretty inefficient.
What do you think about making this configurable? ie: Disable support schema update handling through a config. If disabled, we don't perform these info schema queries.
We should only query information schema if the recorded tracked columns is less than the # of columns from a mod for
NEW_VALUES
andOLD_AND_NEW_VALUES
. For these two val capture types, it is expected to have less columns in new values json compare to all tracking columns and this doesn't necessarily means there are schema updates.Basically
detectDiffColumnInMod
should return false ifkeySetOfNewValuesJsonObject
is a subset ofspannerTable.getNonPkColumns()
.Before the fix the information schema columns query would trigger more often than needed, whenever there are less columns in new values json compare to all tracking columns for
NEW_VALUES
andOLD_AND_NEW_VALUES
. This is more likely to occur if user has just add a new column and only try to only update that new column(where mod will only contain the new column).