Skip to content

Conversation

maxi297
Copy link
Contributor

@maxi297 maxi297 commented Sep 12, 2025

What

https://airbytehq-team.slack.com/archives/C02U9R3AF37/p1757701152311889?thread_ts=1757603749.279779&cid=C02U9R3AF37

Summary by CodeRabbit

  • Chores
    • Added a CLI tool to migrate acceptance test configuration files across connectors to a new nested format.
    • Validates configs, skips already-migrated files, and reports clear outcomes and errors for each processed file.
    • Preserves YAML formatting and improves list indentation to maintain consistency after migration.

@github-actions github-actions bot added the chore label Sep 12, 2025
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/update_acceptance_test_config_yml#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/update_acceptance_test_config_yml

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

Copy link
Contributor

coderabbitai bot commented Sep 12, 2025

📝 Walkthrough

Walkthrough

Adds a new CLI script that migrates connector acceptance-test YAML files from a top-level tests mapping to a nested acceptance_tests structure, including custom YAML dumping, validation, per-file transformation, and error handling via a main entrypoint.

Changes

Cohort / File(s) Summary
CLI migration script
bin/fix_acceptance_tests_yml.py
New script: defines AlreadyUpdatedError; FixingListIndentationDumper to control YAML list indentation; transform(file_path: Path) to validate, migrate testsacceptance_tests, and write YAML; main() to glob connector config files, call transform, and report errors; CLI entrypoint (if __name__ == "__main__": main()).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as Developer
  participant CLI as fix_acceptance_tests_yml.py
  participant FS as Filesystem
  participant YAML as YAML Parser

  Dev->>CLI: Run with repo_path
  CLI->>FS: Glob airbyte-integrations/connectors/source-*/acceptance-test-config.yml
  loop For each matched file
    CLI->>FS: Read file
    CLI->>YAML: safe_load(content)
    alt acceptance_tests already present
      CLI-->>Dev: Skip file (AlreadyUpdatedError)
    else "tests" present and valid
      CLI->>CLI: Build acceptance_tests structure (per-type {'tests': ...})
      CLI->>YAML: dump(updated, FixingListIndentationDumper, sort_keys=False)
      CLI->>FS: Write file
      CLI-->>Dev: Print success
    else invalid/missing "tests"
      CLI-->>Dev: Print error (ValueError or YAMLError)
    end
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • aaronsteers
  • aldogonzalez8
  • dbgold17

Would you like me to suggest small unit tests or example input/output fixtures for this script to speed review? wdyt?

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately and concisely reflects the main change—adding the new script bin/fix_acceptance_tests_yml.py—so it is on-topic and understandable to reviewers scanning the history.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e79545c and c40a57e.

📒 Files selected for processing (1)
  • bin/fix_acceptance_tests_yml.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • bin/fix_acceptance_tests_yml.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (Fast)
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch maxi297/update_acceptance_test_config_yml

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
bin/fix_acceptance_tests_yml.py (4)

49-51: Be tolerant if a test section is already nested

If some files already have test_type: { tests: [...] }, we’d double-nest. Prefer a conditional to pass through already-correct blocks, wdyt?

-    for test_type, test_content in tests_data.items():
-        data['acceptance_tests'][test_type] = {'tests': test_content}
+    for test_type, test_content in tests_data.items():
+        if isinstance(test_content, dict) and "tests" in test_content:
+            section = test_content
+        else:
+            section = {"tests": ([] if test_content is None else test_content)}
+        data["acceptance_tests"][test_type] = section

52-55: Clarify formatting claim or preserve comments via ruamel.yaml

PyYAML will drop comments; the “preserved formatting” comment is misleading. Do you want to switch to ruamel.yaml to preserve comments/round-tripping, or just update the comment and keep PyYAML, wdyt?

Option A (keep PyYAML; fix comment and use UTF-8):

-    # Write back to file with preserved formatting
-    with open(file_path, 'w') as f:
-        yaml.dump(data, f, default_flow_style=False, sort_keys=False, indent=2)
+    # Write back to file (note: comments/formatting may not be preserved)
+    with open(file_path, "w", encoding="utf-8") as f:
+        yaml.dump(data, f, default_flow_style=False, sort_keys=False, indent=2, allow_unicode=True)

Option B (preserve comments):

  • Replace PyYAML with ruamel.yaml’s YAML() round-trip loader/dumper (happy to provide a follow-up patch if you prefer).

60-63: Make usage resilient to script path

Would you prefer the usage use the invoked name so it’s correct whether run via bin/ or directly, wdyt?

-        print("Usage: python fix_acceptance_tests_yml.py <airbyte_repo_path>")
+        print(f"Usage: {Path(sys.argv[0]).name} <airbyte_repo_path>")

18-22: Trim unused imports

typing imports aren’t used. Shall we drop them to keep Ruff happy, wdyt?

-from typing import Dict, Any
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a075a1 and fc3888f.

📒 Files selected for processing (1)
  • bin/fix_acceptance_tests_yml.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
bin/fix_acceptance_tests_yml.py

[error] 1-1: Ruff format check failed. 1 file would be reformatted (bin/fix_acceptance_tests_yml.py). Command: 'poetry run ruff format --diff .'

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
🔇 Additional comments (1)
bin/fix_acceptance_tests_yml.py (1)

1-79: Fix Ruff formatting failure

CI shows "Ruff format check failed." I tried running the formatter here but poetry isn't available in the environment (poetry: command not found). Could you run locally: poetry run ruff format bin/fix_acceptance_tests_yml.py and push the formatted file, or should I apply the formatting and open a follow-up patch? wdyt?

Comment on lines 29 to 44
def transform(file_path: Path) -> None:
with open(file_path, 'r') as f:
data = yaml.safe_load(f)

if 'acceptance_tests' in data:
raise AlreadyUpdatedError()

if 'tests' not in data:
raise ValueError(f"No 'tests' key found in {file_path}, skipping transformation")

# Extract the tests data
tests_data = data.pop('tests')

if not isinstance(tests_data, dict):
raise ValueError(f"Error: 'tests' key in {file_path} is not a dictionary")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden YAML load (empty/non-mapping files) and use UTF-8

Empty YAML evaluates to None and non-mapping roots will raise TypeError on key checks. Shall we guard and add explicit UTF-8, plus clearer error messages, wdyt?

 def transform(file_path: Path) -> None:
-    with open(file_path, 'r') as f:
-        data = yaml.safe_load(f)
-    
-    if 'acceptance_tests' in data:
-        raise AlreadyUpdatedError()
-
-    if 'tests' not in data:
-        raise ValueError(f"No 'tests' key found in {file_path}, skipping transformation")
-    
-    # Extract the tests data
-    tests_data = data.pop('tests')
-    
-    if not isinstance(tests_data, dict):
-        raise ValueError(f"Error: 'tests' key in {file_path} is not a dictionary")
+    with open(file_path, "r", encoding="utf-8") as f:
+        data = yaml.safe_load(f) or {}
+
+    if not isinstance(data, dict):
+        raise ValueError(f"{file_path}: top-level YAML must be a mapping, got {type(data).__name__}")
+
+    if "acceptance_tests" in data:
+        raise AlreadyUpdatedError()
+
+    if "tests" not in data:
+        raise ValueError(f"{file_path}: no 'tests' key; skipping transformation")
+
+    # Extract the tests data
+    tests_data = data.pop("tests") or {}
+
+    if not isinstance(tests_data, dict):
+        raise ValueError(f"{file_path}: 'tests' must be a mapping, got {type(tests_data).__name__}")

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In bin/fix_acceptance_tests_yml.py around lines 29–44, open the file with
explicit UTF-8 (open(file_path, 'r', encoding='utf-8')), call yaml.safe_load,
then guard against empty or non-mapping YAML by checking if data is None and
raising a clear ValueError("Empty YAML file: {file_path}") and then if not
isinstance(data, dict) raise ValueError("Unexpected YAML root type: expected
mapping in {file_path}"); only after those guards check for 'acceptance_tests'
and 'tests' keys and validate that tests_data is a dict, keeping the existing
AlreadyUpdatedError and the other error messages but making them clearer per
above.

Comment on lines 66 to 75
for file_path in repo_path.glob('airbyte-integrations/connectors/source-*/acceptance-test-config.yml'):
try:
transform(file_path)
except AlreadyUpdatedError:
print(f"File {file_path} has already been updated, skipping transformation")
except yaml.YAMLError as e:
print(f"Error parsing YAML file {file_path}: {e}")
except Exception as e:
print(f"Error transforming {file_path}: {e}")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Broaden file search to destinations and .yaml; print errors to stderr

Many connectors are destination-* and some configs use .yaml. Shall we cover both and route errors to stderr, wdyt?

-    for file_path in repo_path.glob('airbyte-integrations/connectors/source-*/acceptance-test-config.yml'):
-        try:
-            transform(file_path)
-        except AlreadyUpdatedError:
-            print(f"File {file_path} has already been updated, skipping transformation")
-        except yaml.YAMLError as e:
-            print(f"Error parsing YAML file {file_path}: {e}")
-        except Exception as e:
-            print(f"Error transforming {file_path}: {e}")
+    patterns = [
+        "airbyte-integrations/connectors/source-*/acceptance-test-config.yml",
+        "airbyte-integrations/connectors/source-*/acceptance-test-config.yaml",
+        "airbyte-integrations/connectors/destination-*/acceptance-test-config.yml",
+        "airbyte-integrations/connectors/destination-*/acceptance-test-config.yaml",
+    ]
+    for pattern in patterns:
+        for file_path in repo_path.glob(pattern):
+            try:
+                transform(file_path)
+            except AlreadyUpdatedError:
+                print(f"File {file_path} has already been updated, skipping transformation")
+            except yaml.YAMLError as e:
+                print(f"Error parsing YAML file {file_path}: {e}", file=sys.stderr)
+            except Exception as e:
+                print(f"Error transforming {file_path}: {e}", file=sys.stderr)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In bin/fix_acceptance_tests_yml.py around lines 66 to 75, broaden the file
search to include destination-* connectors and files ending with both .yml and
.yaml, and route error output to stderr; change the glob loop to iterate over a
list of patterns (e.g. for patterns like
'airbyte-integrations/connectors/source-*/acceptance-test-config.yml',
'.../source-*/acceptance-test-config.yaml',
'.../destination-*/acceptance-test-config.yml',
'.../destination-*/acceptance-test-config.yaml') or generate matches with
multiple glob calls, then for each matched file call transform(file_path) and on
exceptions print the same messages to stderr using print(..., file=sys.stderr)
(including AlreadyUpdatedError, yaml.YAMLError as e, and generic Exception as
e).

@maxi297
Copy link
Contributor Author

maxi297 commented Sep 12, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

Copy link

PyTest Results (Fast)

3 746 tests  ±0   3 734 ✅ ±0   6m 20s ⏱️ +2s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit c40a57e. ± Comparison against base commit 9a075a1.

Copy link

PyTest Results (Full)

3 749 tests  ±0   3 737 ✅ ±0   10m 59s ⏱️ ±0s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit c40a57e. ± Comparison against base commit 9a075a1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant