Skip to content

Comments

docs: add atomic chaining follow-up implementation plan and tests#61

Merged
aryeko merged 5 commits intomainfrom
claude/plan-atomic-chaining-SG0k7
Feb 22, 2026
Merged

docs: add atomic chaining follow-up implementation plan and tests#61
aryeko merged 5 commits intomainfrom
claude/plan-atomic-chaining-SG0k7

Conversation

@aryeko
Copy link
Owner

@aryeko aryeko commented Feb 21, 2026

Summary

Implements the atomic chaining follow-up plan from the design document. Adds comprehensive test coverage for the document registry and executeTasks batch mutation paths, updates SKILL.md documentation with chaining examples, and introduces two new benchmark scenarios for atomic operations.

Changes

Documentation

  • SKILL.md: Updated capability count (69→70), removed check_run from domains list, and added new "Chain" section with syntax, examples, and guidance on when to use ghx chain vs ghx run
  • Implementation Plan: Added docs/plans/impl-plan-atomic-chaining-followup.md documenting remaining work items, status tracking, and execution order

Test Coverage

  • document-registry.test.ts: Expanded from 5 tests to 30 tests with one test per registered operation (21 mutations + 9 lookups) for better failure diagnostics
  • engine-execute-tasks-no-resolution.integration.test.ts: New integration tests validating Phase 2 batch mutation path for mutations without resolution config (e.g., issue.close, issue.comments.create)
  • engine-execute-tasks-pr-review-submit.integration.test.ts: New integration tests validating Phase 1→Phase 2 resolution path for pr.reviews.submit, including successful resolution and error handling when lookup returns null

Benchmark Scenarios

  • issue-triage-atomic-wf-001.json: Workflow scenario for atomically setting labels, assignees, and milestones on an issue via single chain call
  • pr-review-submit-atomic-wf-001.json: Workflow scenario for atomically submitting a PR review via chain
  • scenario-sets.json: Added new chaining scenario set and included both new scenarios in default and all sets

Related Issue

Derived from docs/plans/2026-02-20-atomic-chaining-followup.md. Items 3, 5, and 6 were already completed in PR #60.

Change Type

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Refactor/maintenance

Validation

  • pnpm run ci passes
  • Added comprehensive unit and integration tests for document registry and batch execution paths
  • No GraphQL operations changed (no codegen needed)
  • No public API changes

Release Notes

  • No changeset needed (documentation and test-only changes)

Docs

  • Updated SKILL.md with chaining documentation and examples
  • Added implementation plan document

https://claude.ai/code/session_014xrxL9Cssn7begmw8Jekub

Summary by CodeRabbit

  • New Features

    • Added atomic chaining benchmark scenarios for issue triage and PR review submission; updated benchmark scenario sets to include them.
  • Documentation

    • Added implementation plan for atomic chaining follow-up.
    • Updated SKILL.md with a new Chain section covering usage, variants, and examples.
  • Tests

    • Added/expanded integration tests for batched execution and PR review submission resolution.
    • Expanded unit tests for document-registry mutations and lookups.

Items from 2026-02-20-atomic-chaining-followup.md:

**Item 1 — SKILL.md:** Update capability count 69→70, remove check_run
domain, add Chain section covering ghx chain syntax, output shape,
chainable capabilities, and a close+comment example.

**Item 2 — Benchmark:** Add chaining scenario set with two new scenarios:
- issue-triage-atomic-wf-001 (labels + assignee + milestone in one chain)
- pr-review-submit-atomic-wf-001 (PR comment review via chain)
Add chaining set to scenario-sets.json; include both in default and all.

**Item 7 — Document registry tests:** Expand document-registry.test.ts
from 5 tests to exhaustive coverage of all 21 registered mutations and
all 9 registered lookups.

**Item 8 — pr.reviews.submit resolution validation:** Add two integration
tests for executeTasks exercising the Phase 1→Phase 2 resolution path:
- Verifies pullRequestId is correctly injected from repository.pullRequest.id
- Verifies partial status when lookup returns null pullRequest
Also add executeTasks no-resolution batch tests (Phase 2 only).

https://claude.ai/code/session_014xrxL9Cssn7begmw8Jekub
@changeset-bot
Copy link

changeset-bot bot commented Feb 21, 2026

⚠️ No Changeset found

Latest commit: 2c502d6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link

coderabbitai bot commented Feb 21, 2026

Warning

Rate limit exceeded

@aryeko has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 16 minutes and 2 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Adds documentation for atomic chaining follow-up, two benchmark chaining scenarios, updates scenario-set manifests and SKILL.md with chain usage, and expands integration and unit tests to cover batched mutation paths and pr.reviews.submit resolution across Phase 1 (ID lookup) and Phase 2 (batched mutation).

Changes

Cohort / File(s) Summary
Documentation & Plan
docs/plans/impl-plan-atomic-chaining-followup.md
New implementation follow-up plan documenting status, tasks, JSON templates, execution order, and pre-PR checklist for atomic chaining.
Benchmark Scenarios
packages/benchmark/scenarios/chaining/issue-triage-atomic-wf-001.json, packages/benchmark/scenarios/chaining/pr-review-submit-atomic-wf-001.json
Two new atomic chaining workflows: issue triage (label/assignee/milestone multi-step chain) and PR review submit (single-chain pr.reviews.submit scenario).
Benchmark Manifest
packages/benchmark/scenario-sets.json
Added top-level chaining set and appended new chaining workflow IDs to default, all, and full-seeded sets.
Skill Docs
packages/core/skills/using-ghx/SKILL.md
Updated capability count and domains; added a new "Chain" section describing ghx chain syntax, stdin variant, result format, chainable capabilities, and limitations.
Benchmark Tests
packages/benchmark/test/unit/scenario-sets-manifest.test.ts
Adjusted tests to include chaining in default, separated workflows set, and updated assertions comparing default/all sets.
Integration Tests — No-resolution Mutations
packages/core/test/integration/engine-execute-tasks-no-resolution.integration.test.ts
New Phase 2 batch mutation tests validating successful batching of no-resolution steps and transport error handling.
Integration Tests — PR Review Submit
packages/core/test/integration/engine-execute-tasks-pr-review-submit.integration.test.ts
New tests covering pr.reviews.submit resolution flow: Phase 1 ID resolution and Phase 2 batched mutation, including success and partial-failure cases.
Unit Tests — Document Registry
packages/core/test/unit/document-registry.test.ts
Reorganized into "mutations" and "lookups" blocks; expanded mutation coverage (many getMutationDocument assertions) and preserved lookup assertions.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Engine as Engine
  participant GH as GitHubAPI

  Client->>Engine: executeTasks(chain request)
  Engine->>GH: Phase 1 - lookup IDs (query)
  GH-->>Engine: resolved IDs (e.g., pullRequestId)
  Engine->>GH: Phase 2 - batched mutation (queryRaw with injected IDs)
  GH-->>Engine: mutation results
  Engine-->>Client: aggregated results (status, per-step outcomes)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I hopped through JSON, chains in tow,
Phase one finds IDs, phase two says go.
Benchmarks hum and tests confirm,
Atomic steps in tidy form —
A rabbit's cheer for code that flows 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the primary changes: documentation updates, implementation plan addition, and test coverage expansion for atomic chaining.
Description check ✅ Passed The description follows the template structure with all required sections completed: Summary, Related Issue, Change Type, Validation, Release Notes, and Docs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/plan-atomic-chaining-SG0k7

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Feb 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

aryeko and others added 2 commits February 21, 2026 20:26
- Move inline mock assertions to post-call in no-resolution integration test
- Remove redundant capturedQueryRawVars array; use vi.fn mock.calls instead
- Prefix all document-registry test names with "returns document for "
- Rename checkpoint issue_state_updated → issue_view_non_empty (accuracy)
- Add chaining scenarios to full-seeded scenario set
- Update impl-plan status table: items 1,2,7,8 → Done (this PR)
- Add chain limitation note to SKILL.md (no cross-step data passing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests assumed default/workflows/all were always equal. Now that chaining
scenarios are included in default and all, update assertions to match the
differentiated set structure (workflows ⊂ default = all = full-seeded).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@aryeko aryeko marked this pull request as ready for review February 21, 2026 18:31
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
packages/core/test/unit/document-registry.test.ts (1)

89-91: Strengthen the negative-path assertions to match the expected error message.

Both .toThrow() calls pass for any thrown value, not just the expected registry error. Since getMutationDocument and getLookupDocument throw a known message ("No mutation document registered for operation: ..." / "No lookup document registered for operation: ..."), pinning the assertion to that string makes the test fail if the error source changes unexpectedly.

♻️ Proposed refinement
-  it("throws on unknown mutation", () => {
-    expect(() => getMutationDocument("NonExistentMutation")).toThrow()
-  })
+  it("throws on unknown mutation", () => {
+    expect(() => getMutationDocument("NonExistentMutation")).toThrow(
+      "No mutation document registered for operation: NonExistentMutation",
+    )
+  })
-  it("throws on unknown lookup", () => {
-    expect(() => getLookupDocument("NonExistentLookup")).toThrow()
-  })
+  it("throws on unknown lookup", () => {
+    expect(() => getLookupDocument("NonExistentLookup")).toThrow(
+      "No lookup document registered for operation: NonExistentLookup",
+    )
+  })

Also applies to: 135-137

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/test/unit/document-registry.test.ts` around lines 89 - 91,
Update the negative-path assertions to assert the exact error message thrown by
getMutationDocument and getLookupDocument instead of using a generic .toThrow():
replace expect(() => getMutationDocument("NonExistentMutation")).toThrow() with
an assertion that matches the specific message "No mutation document registered
for operation: NonExistentMutation" (and similarly for getLookupDocument use "No
lookup document registered for operation: ..."), ensuring the tests will fail if
a different error or message is thrown.
packages/benchmark/scenarios/chaining/issue-triage-atomic-wf-001.json (1)

20-26: Checkpoint doesn't verify the actual triage outcome — only that the issue exists.

condition: "non_empty" on issue.view is satisfied by any readable issue, including one where the chain failed silently. It doesn't confirm the label 'bug', assignee 'aryeko', or milestone 1 were applied. A false-positive pass is possible if the issue already exists before the chain runs.

If the assertion runner supports field-level conditions (e.g., contains_label or similar), consider using a more specific check. Alternatively, split into three checkpoints — one per atomic mutation — to isolate which step's post-condition failed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/benchmark/scenarios/chaining/issue-triage-atomic-wf-001.json` around
lines 20 - 26, The checkpoint "issue_view_non_empty" currently uses
verification_task "issue.view" with condition "non_empty", which only checks
existence and can false-positive; update this checkpoint to verify the actual
triage outcomes by either (a) using field-level conditions on the same
verification_task (e.g., conditions that assert contains_label:"bug",
assignee:"aryeko", and milestone:1) or (b) replace the single checkpoint with
three atomic checkpoints (e.g., "issue_has_label_bug",
"issue_assigned_to_aryeko", "issue_milestone_1") each calling "issue.view" and
asserting the specific field-level post-condition so you can isolate which
mutation failed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/skills/using-ghx/SKILL.md`:
- Line 38: The sentence claiming a chain call is "one GraphQL round-trip" is
inaccurate for resolution-backed capabilities; update the SKILL.md text that
follows "Use `ghx chain`..." to clarify that while `ghx chain` batches mutations
to avoid partial state, some capabilities (e.g., `pr.reviews.submit`) require a
Phase 1 lookup before the Phase 2 mutation and therefore incur two network
round-trips per chain call; mirror the behavior asserted in the
`engine-execute-tasks-pr-review-submit.integration.test.ts` by replacing the
absolute "one GraphQL round-trip" claim with wording that notes
resolution-backed capabilities may need an extra lookup round-trip.

---

Nitpick comments:
In `@packages/benchmark/scenarios/chaining/issue-triage-atomic-wf-001.json`:
- Around line 20-26: The checkpoint "issue_view_non_empty" currently uses
verification_task "issue.view" with condition "non_empty", which only checks
existence and can false-positive; update this checkpoint to verify the actual
triage outcomes by either (a) using field-level conditions on the same
verification_task (e.g., conditions that assert contains_label:"bug",
assignee:"aryeko", and milestone:1) or (b) replace the single checkpoint with
three atomic checkpoints (e.g., "issue_has_label_bug",
"issue_assigned_to_aryeko", "issue_milestone_1") each calling "issue.view" and
asserting the specific field-level post-condition so you can isolate which
mutation failed.

In `@packages/core/test/unit/document-registry.test.ts`:
- Around line 89-91: Update the negative-path assertions to assert the exact
error message thrown by getMutationDocument and getLookupDocument instead of
using a generic .toThrow(): replace expect(() =>
getMutationDocument("NonExistentMutation")).toThrow() with an assertion that
matches the specific message "No mutation document registered for operation:
NonExistentMutation" (and similarly for getLookupDocument use "No lookup
document registered for operation: ..."), ensuring the tests will fail if a
different error or message is thrown.

…bilities

"One GraphQL round-trip" is only true for no-resolution capabilities.
Capabilities like pr.reviews.submit require a Phase 1 node-ID lookup
plus Phase 2 mutation — two round-trips. Clarify to "as few as possible
(typically one; node-ID lookups add a preflight query)".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@aryeko
Copy link
Owner Author

aryeko commented Feb 22, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Feb 22, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/skills/using-ghx/SKILL.md`:
- Around line 38-61: The intro overstates guarantees: replace phrasing like
"must succeed together" and "atomically" for the ghx chain command with clear
language that it batches steps into fewer GraphQL round-trips but does not
provide transactional rollback; explicitly state that the returned { status,
results[], meta } can be "success", "partial" or "failed" and that callers must
check status and handle partial results (results[i] containing { task, ok, data
| error }). Update mentions of "atomically" and "must succeed together" to
clarify "batched into a single request but not all-or-nothing."

---

Duplicate comments:
In `@packages/core/skills/using-ghx/SKILL.md`:
- Line 38: This review is a duplicate noting that the "one round-trip" wording
already accounts for resolution-backed preflight queries; no change to the
SKILL.md text is required—mark the duplicate review comment resolved/removed and
ensure the existing sentence mentioning "Use `ghx chain`" (and the example
`pr.reviews.submit`) remains as-is.

"Must succeed together" and "atomically" imply ACID rollback that ghx chain
doesn't provide — a partial result is possible. Replace with accurate batching
language and add an explicit note that steps are not transactional.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@aryeko aryeko merged commit cfb8904 into main Feb 22, 2026
7 checks passed
@aryeko aryeko deleted the claude/plan-atomic-chaining-SG0k7 branch February 22, 2026 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants