Skip to content

fix: properly set token_expiry_is_time_of_expiration and mask access token when logging #637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

lmossman
Copy link
Contributor

@lmossman lmossman commented Jul 3, 2025

What

When testing out some Builder changes relating to OAuth, I noticed two issues with the implementation of OAuth in the CDK:

  1. When I didn't set a token expiry date in my config, I was receiving this error: Invalid expires_in value: 2025-08-01T21:34:33Z. Expected number of seconds when no format specified.
  2. The access token was not being masked with **** in the OAuth response shown in the Builder - I could see the raw value

How

I traced the first issue down to the fact that token_expiry_is_time_of_expiration is not being set when constructing DeclarativeSingleUseRefreshTokenOauth2Authenticator here.

This caused this if statement to always return False, causing the error to be thrown.

To fix this, I simply set the token_expiry_is_time_of_expiration the same way it is being set when constructing DeclarativeOauth2Authenticator below


The second issue I traced down to the fact that the OAuth response was being logged before anything had a chance to add the access token to the secrets list. The fix was to extract the access token and add it to the secrets mask list before logging the response

Summary by CodeRabbit

  • Bug Fixes
    • Improved handling of OAuth token expiry by correctly setting the token expiry parameter during authentication.
    • Enhanced security by masking access tokens in logs immediately after retrieval from the token refresh endpoint.
    • Improved error handling by checking response status before processing OAuth token requests.

@github-actions github-actions bot added bug Something isn't working security labels Jul 3, 2025
@lmossman lmossman marked this pull request as ready for review July 3, 2025 00:26
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes two OAuth implementation issues: ensuring the token_expiry_is_time_of_expiration flag is set for single-use token authenticators and masking access tokens during logging.

  • Set token_expiry_is_time_of_expiration based on presence of token_expiry_date_format in the model factory.
  • Extract and mask the access token before logging responses in the OAuth request handler.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py Cache response.json(), extract the access token to call add_to_secrets, then log and return the stored JSON.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Pass token_expiry_is_time_of_expiration=bool(model.token_expiry_date_format) when constructing the single-use refresh token authenticator.
Comments suppressed due to low confidence (3)

airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py:222

  • [nitpick] The variable name access_key is ambiguous; consider renaming it to access_token for clearer intent.
            access_key = self._extract_access_token(response_json)

airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py:220

  • Add a unit test to verify that access tokens are properly extracted and masked by add_to_secrets before logging.
            response_json = response.json()

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:2804

  • Add tests to confirm that token_expiry_is_time_of_expiration is correctly set when token_expiry_date_format is present or absent.
                token_expiry_is_time_of_expiration=bool(model.token_expiry_date_format),

Copy link

github-actions bot commented Jul 3, 2025

PyTest Results (Fast)

3 685 tests  ±0   3 674 ✅ ±0   6m 15s ⏱️ ±0s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 503dac6. ± Comparison against base commit a562875.

♻️ This comment has been updated with latest results.

Copy link
Contributor

coderabbitai bot commented Jul 3, 2025

📝 Walkthrough

Walkthrough

This change updates the OAuth authenticator creation process in the declarative source component factory by introducing a parameter that determines if the token expiry should be treated as the time of expiration. It also modifies the request handling in the abstract OAuth class to parse and mask access tokens earlier in the flow and adjust token handling logic.

Changes

File(s) Change Summary
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Added token_expiry_is_time_of_expiration parameter to authenticator constructors in the factory method.
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py Modified _make_handled_request to parse JSON response earlier, mask access token in logs, and adjust token handling logic.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ModelToComponentFactory
    participant DeclarativeOauth2Authenticator
    participant DeclarativeSingleUseRefreshTokenOauth2Authenticator

    Client->>ModelToComponentFactory: create_oauth_authenticator(model)
    alt model.refresh_token_updater present
        ModelToComponentFactory->>DeclarativeSingleUseRefreshTokenOauth2Authenticator: __init__(..., token_expiry_is_time_of_expiration)
    else
        ModelToComponentFactory->>DeclarativeOauth2Authenticator: __init__(..., token_expiry_is_time_of_expiration)
    end
Loading
sequenceDiagram
    participant OAuthAuthenticator
    participant TokenEndpoint

    OAuthAuthenticator->>TokenEndpoint: POST /token (refresh request)
    TokenEndpoint-->>OAuthAuthenticator: JSON response (includes access_token)
    OAuthAuthenticator->>OAuthAuthenticator: Parse JSON, mask access_token, log response
    OAuthAuthenticator->>OAuthAuthenticator: Return parsed JSON
Loading

Possibly related PRs

Suggested labels

oauth, oauth_in_builder


Would you like me to help draft a quick note for the reviewers or suggest any additional tests to cover these changes, wdyt?


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b24db1b and 503dac6.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)
airbyte_cdk/utils/airbyte_secrets_utils.py (1)
  • add_to_secrets (67-70)
🪛 Flake8 (7.2.0)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py

[error] 233-233: local variable 'e' is assigned to but never used

(F841)

⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Pytest (Fast)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (3)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (3)

220-239: Great restructuring to address the token masking issue!

The changes effectively solve the security concern by extracting and masking the access token before logging the response. The logic flow is much cleaner now - parsing JSON once, handling token extraction safely, and ensuring sensitive data is masked before any logging occurs. This addresses the core issue mentioned in the PR objectives, wdyt?


233-233: The static analysis hint about unused variable 'e' is a false positive.

The variable e is being used appropriately in the except clause context with the pass statement. This is valid Python syntax for catching and ignoring specific exceptions, so no changes are needed here, wdyt?


253-276: Confirm the duplicate add_to_secrets() call was properly addressed.

I can see that the _ensure_access_token_in_response method no longer includes the add_to_secrets() call that was mentioned in the past review comments. This successfully eliminates the duplication concern since the masking now only happens once in _make_handled_request before logging. Nice cleanup, wdyt?

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)

220-228: Great security improvement! The token masking logic looks solid.

I like how you've restructured this to parse the JSON once and mask the access token before any logging occurs. This definitely addresses the security concern from the PR objectives.

One small consideration - what happens if _extract_access_token fails on line 222? Should we wrap it in a try-catch to ensure the logging still happens even if token extraction fails, wdyt? The existing flow in _ensure_access_token_in_response already handles this case, but it might be worth being defensive here too.

The efficiency gain from parsing JSON once is also a nice bonus!

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

2826-2829: Consistent derivation vs user override?

We now always derive token_expiry_is_time_of_expiration from the presence of token_expiry_date_format. Should we let users explicitly override this behaviour (e.g. via a manifest flag) rather than forcing bool(format)? Maybe accept an optional field and fall back to the derived value if unspecified, to avoid surprising edge-cases—wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a562875 and 4e6b9f2.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1 hunks)
  • airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)

undefined

<retrieved_learning>
Learnt from: aaronsteers
PR: #174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py file, the strict module name checks in _get_class_from_fully_qualified_class_name (requiring module_name to be "components" and module_name_full to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
</retrieved_learning>

<retrieved_learning>
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource class in airbyte_cdk/sources/declarative/yaml_declarative_source.py, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
</retrieved_learning>

<retrieved_learning>
Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in airbyte_cdk/cli/source_declarative_manifest/, including _run.py, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
</retrieved_learning>

<retrieved_learning>
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the declarative_component_schema.py file is auto-generated from declarative_component_schema.yaml and should be ignored in the recommended reviewing order.
</retrieved_learning>

🧬 Code Graph Analysis (2)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)
airbyte_cdk/utils/airbyte_secrets_utils.py (1)
  • add_to_secrets (67-70)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (2)
  • token_expiry_is_time_of_expiration (68-73)
  • token_expiry_date_format (76-81)
airbyte_cdk/sources/streams/http/requests_native_auth/oauth.py (2)
  • token_expiry_is_time_of_expiration (127-128)
  • token_expiry_date_format (131-132)
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Check: source-shopify
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

2803-2805: Constructor signature mismatch risk for token_expiry_is_time_of_expiration

Nice catch adding the flag here! Could you double-check that DeclarativeSingleUseRefreshTokenOauth2Authenticator.__init__() indeed exposes a token_expiry_is_time_of_expiration keyword? If not, this call will raise a TypeError at runtime. A quick grep shows the getter property, but not the constructor arg—wdyt about confirming or extending the class accordingly?

Copy link

github-actions bot commented Jul 3, 2025

PyTest Results (Full)

3 688 tests  ±0   3 677 ✅ ±0   18m 2s ⏱️ -20s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 503dac6. ± Comparison against base commit a562875.

♻️ This comment has been updated with latest results.

@@ -217,10 +217,15 @@ def _make_handled_request(self) -> Any:
data=self.build_refresh_request_body(),
headers=self.build_refresh_request_headers(),
)
response_json = response.json()
# extract the access token and add to secrets to avoid logging the raw value
access_key = self._extract_access_token(response_json)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a more complex change than what's in scope for this PR, but it feels like a safer default to have whatever method actually make the API request (maybe make_handled_request to fetch access tokens also call the add_to_secrets() method. That way its impossible to not have it masked if you are using the cdk for auth. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is being applied to _make_handled_request, so anything that uses that will have the access token be added to secrets.

I don't think there is any lower-level method that this logic can be added to, because this method just calls requests.request() which is a library method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh my mistake. I thought this change was in another file. Looks great 👍

I did notice that we make another call to add_to_secrets() here which is probably now unnecessary.

# Add the access token to the list of secrets so it is replaced before logging the response
# An argument could be made to remove the prevous access key from the list of secrets, but unmasking values seems like a security incident waiting to happen...
add_to_secrets(access_key)

Should we take that out to avoid confusion for why we are doing it twice?

Probably also worth checking that all flavors of OAuth classes are calling the add_to_secrets method either through this method or if they are using some other method to make the auth request.

Copy link
Contributor

@dbgold17 dbgold17 Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on that last point, I'm not 100% sure that all OAuth classes subclass the AbstractOauth2Authenticator class and use _make_handled_request()

Copy link
Contributor

@dbgold17 dbgold17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving with one non-blocking comment

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)

252-268: Smart cleanup to avoid duplicate secret masking!

This change aligns perfectly with the past review feedback about removing the duplicate add_to_secrets() call. Now the responsibility is clearly separated - this method only validates the token exists, while the masking happens earlier in _make_handled_request.

🧹 Nitpick comments (1)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)

220-232: Great security improvement! A couple of minor cleanup suggestions - wdyt?

The logic to extract and mask the access token before logging is exactly what we needed to fix the security issue. I love how you're parsing the JSON once and reusing it for efficiency.

Two small nitpicks from the static analysis:

  • The variable e in the exception handler (line 226) is unused
  • The comment formatting on line 227 has too many # characters
-            except ResponseKeysMaxRecurtionReached as e:
-                ## Could not find the access token in the response, so do nothing
+            except ResponseKeysMaxRecurtionReached:
+                # Could not find the access token in the response, so do nothing
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e6b9f2 and b1d5d05.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)
airbyte_cdk/utils/airbyte_secrets_utils.py (1)
  • add_to_secrets (67-70)
🪛 Flake8 (7.2.0)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py

[error] 226-226: local variable 'e' is assigned to but never used

(F841)


[error] 227-227: too many leading '#' for block comment

(E266)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Analyze (python)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1d5d05 and bf1eaf2.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)
airbyte_cdk/utils/airbyte_secrets_utils.py (1)
  • add_to_secrets (67-70)
🪛 Flake8 (7.2.0)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py

[error] 233-233: local variable 'e' is assigned to but never used

(F841)


[error] 234-234: too many leading '#' for block comment

(E266)

🪛 Pylint (3.3.7)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py

[error] 241-241: Using variable 'json_exception' before assignment

(E0601)

🪛 GitHub Actions: Linters
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py

[error] 1-1: Ruff formatting check failed. File would be reformatted. Run 'ruff format --fix' to fix code style issues.

⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-amplitude
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Analyze (python)
🔇 Additional comments (3)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (3)

241-244: The JSON exception handling logic looks good!

The approach of caching the parsed JSON and raising exceptions later is a clean solution that ensures the response is logged regardless of JSON parsing success. This maintains the debugging capability while handling errors appropriately.


264-264: Minor comment update - looks fine!

The comment change is just a grammatical improvement and doesn't affect functionality.


227-235: No duplicate add_to_secrets() calls found in OAuth code

I ran a global search and the only invocation of add_to_secrets in the OAuth classes is at line 232 of abstract_oauth.py. There’s no redundant masking elsewhere—looks safe to leave as is, wdyt?

@lmossman lmossman requested a review from dbgold17 July 3, 2025 18:33
@lmossman
Copy link
Contributor Author

lmossman commented Jul 3, 2025

@dbgold17 I had to make a few more changes here to properly handle json parsing exceptions and cases where there is no access token in the response. Mind taking one more look?

@@ -217,10 +217,33 @@ def _make_handled_request(self) -> Any:
data=self.build_refresh_request_body(),
headers=self.build_refresh_request_headers(),
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: but wonder if there is a more straightforward way to write this logic. Something like

            # log the response even if the request failed for troubleshooting purposes
            self._log_response(response)
            response.raise_for_status()
            
            response_json = response.json()

            try:
                # extract the access token and add to secrets to avoid logging the raw value
                access_key = self._extract_access_token(response_json)
                if access_key is not None:
                  add_to_secrets(access_key)
            except ResponseKeysMaxRecurtionReached as e:
                # could not find the access token in the response, so do nothing
                pass
   
            return response_json

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, are we sure it makes sense to ignore the ResponseKeysMaxRecurtionReached exception? Admittedly this code is weird, but it seems like we could theoretically hit a max recursive depth but not fully explore the response json and therefore if we ignore the error, the access_key can actually be returned and we would not mask it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this over zoom and came to a simpler implementation that we both liked

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (2)

220-240: Consider refactoring to eliminate duplicate logging and address static analysis issues, wdyt?

A few observations about the current implementation:

  1. The response is being logged twice for successful responses (lines 223 and 237) - do we need both log calls?

  2. The variable e in the exception handler on line 233 is captured but never used, which triggers the static analysis warning.

  3. From the past review discussion, there's still a question about whether silently ignoring ResponseKeysMaxRecurtionReached is the right approach - if we hit max recursion depth but haven't fully explored the response JSON, we might miss the access token and fail to mask it.

How about consolidating the logic like this to address these concerns:

-            if not response.ok:
-                # log the response even if the request failed for troubleshooting purposes
-                self._log_response(response)
-                response.raise_for_status()
-
-            response_json = response.json()
-
-            try:
-                # extract the access token and add to secrets to avoid logging the raw value
-                access_key = self._extract_access_token(response_json)
-                if access_key:
-                    add_to_secrets(access_key)
-            except ResponseKeysMaxRecurtionReached as e:
-                # could not find the access token in the response, so do nothing
-                pass
-            
-            self._log_response(response)
-
-            return response_json
+            if not response.ok:
+                # log the response even if the request failed for troubleshooting purposes
+                self._log_response(response)
+                response.raise_for_status()
+
+            response_json = response.json()
+
+            try:
+                # extract the access token and add to secrets to avoid logging the raw value
+                access_key = self._extract_access_token(response_json)
+                if access_key:
+                    add_to_secrets(access_key)
+            except ResponseKeysMaxRecurtionReached:
+                # could not find the access token in the response, so do nothing
+                pass
+            
+            self._log_response(response)
+            return response_json

This removes the unused variable and consolidates the logging. What do you think about the exception handling approach though?

</review_comment_end>


221-237: Address the ruff formatting issue flagged by the pipeline, wdyt?

The pipeline is failing due to formatting issues. Could you run ruff format on this file to fix the code style issues?

</review_comment_end>

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab062f8 and b24db1b.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)
airbyte_cdk/utils/airbyte_secrets_utils.py (1)
  • add_to_secrets (67-70)
🪛 Flake8 (7.2.0)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py

[error] 233-233: local variable 'e' is assigned to but never used

(F841)

🪛 GitHub Actions: Linters
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py

[error] 1-1: ruff formatting check failed. File would be reformatted. Run 'ruff format' to fix code style issues.

⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-shopify
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants