-
Notifications
You must be signed in to change notification settings - Fork 24
fix(sql): Add fallback to source_defined_primary_key
in CatalogProvider
#627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Fix CatalogProvider.get_primary_keys() to fall back to stream.source_defined_primary_key when primary_key is empty/None - Add comprehensive unit tests covering all fallback scenarios - Resolves bug where destinations ignore source-defined primary keys when configured primary key is not set - Affects all destinations using CDK SQL processor base classes Co-Authored-By: AJ Steers <[email protected]>
Original prompt from AJ Steers:
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Looks like our new test suite assumes all Python connectors are sources. Since this is a destination and doesn't support 'discover', the Nothing we need to solve here in this PR and not related to our change.
|
/bump-version |
…y fix - Update airbyte-cdk dependency to use dev branch devin/1751064114-fix-primary-key-fallback - This branch contains the fix for primary key fallback logic in CatalogProvider - Allows end-to-end testing of the primary key fix with MotherDuck destination - References CDK PR: airbytehq/airbyte-python-cdk#627 Co-Authored-By: AJ Steers <[email protected]>
Address @aaronsteers feedback to use more concise approach: pks = configured_stream.primary_key or configured_stream.stream.source_defined_primary_key or [] This maintains exact same functionality while being more readable. Co-Authored-By: AJ Steers <[email protected]>
- Consolidate 7 individual test methods into 1 parametrized test with 6 scenarios - Use list[str] format for parameters with wrapping logic in test - Remove case normalization test since CatalogProvider doesn't handle normalization - Reduce code from ~150 lines to ~50 lines while maintaining full coverage Co-Authored-By: AJ Steers <[email protected]>
…behavior - Add explanation that method uses primary_key if set explicitly in configured catalog - Otherwise falls back to source_defined_primary_key if set - Addresses GitHub comment from @aaronsteers Co-Authored-By: AJ Steers <[email protected]>
- Fix Ruff Format Check CI failure - Apply proper formatting to expanded docstring in get_primary_keys method Co-Authored-By: AJ Steers <[email protected]>
… unset - Reverse priority order to favor source_defined_primary_key over primary_key - Return None when neither primary key field is set - Update all callers to handle None gracefully with 'pks or []' coalescing - Update unit tests to reflect new behavior and priority order - Addresses @aaronsteers feedback on primary key fallback logic Co-Authored-By: AJ Steers <[email protected]>
…perations - Replace 'or []' coalescing with explicit guard statements in merge methods - Raise AirbyteInternalError when no primary keys available for merge operations - Addresses @aaronsteers feedback on code clarity and error handling - Ensures merge operations fail fast when primary keys are missing Co-Authored-By: AJ Steers <[email protected]>
- Split long docstring line in get_primary_keys method across multiple lines - Addresses @aaronsteers feedback on line length issue in PR comment - Maintains readability while complying with formatting standards Co-Authored-By: AJ Steers <[email protected]>
source_defined_primary_key
in CatalogProvider
…lues - Updated docstring to specify 'primary key column names' instead of just 'primary keys' - Added Returns section explaining the method returns column names or None - Addresses @dbgold17's clarifying question about terminology in GitHub comment Co-Authored-By: AJ Steers <[email protected]>
Hi @dbgold17! Great question about the terminology. I've just updated the docstring to clarify this. The method returns column names (not actual key values). Specifically:
The confusion was understandable since the original docstring just said "primary keys" which could be ambiguous. Thanks for pointing this out! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed on a huddle. LGTM minus some variable name confusion
…ithub.com/airbytehq/airbyte-python-cdk into devin/1751064114-fix-primary-key-fallback
Related:
source_defined_primary_key
when defined (CDK bump) airbyte#62133source_defined_primary_key
in CatalogProvider #627 (this pr)Fix primary key fallback in CatalogProvider for SQL destinations
Summary
This PR fixes a critical bug in the Airbyte Python CDK where the
CatalogProvider.get_primary_keys()
method ignores source-defined primary keys when configured primary keys are empty or None. This affects all destinations using the CDK's SQL processor base classes, not just MotherDuck.Root Cause: The CDK's
CatalogProvider.get_primary_keys()
method only usesConfiguredAirbyteStream.primary_key
and returns an empty list when it's falsy, without falling back tostream.source_defined_primary_key
as specified in the Airbyte protocol.Fix: Added fallback logic to use
stream.source_defined_primary_key
whenprimary_key
is empty/None, with comprehensive unit test coverage.Review & Testing Checklist for Human
source_defined_primary_key
whenprimary_key
is empty/Noneprimary_key
still work correctly (configured primary key should take precedence)configured_stream.stream.source_defined_primary_key
lookup doesn't cause performance issuesRecommended Test Plan
primary_key
but non-emptysource_defined_primary_key
Diagram
Notes
primary_key
inConfiguredAirbyteStream
overrides source-defined primary keys when set, andsource_defined_primary_key
provides the default from the sourceImportant
Auto-merge enabled.
This PR is set to merge automatically when all requirements are met.