docs: add atomic chaining follow-up implementation plan and tests by aryeko · Pull Request #61 · aryeko/ghx

aryeko · 2026-02-21T18:16:27Z

Summary

Implements the atomic chaining follow-up plan from the design document. Adds comprehensive test coverage for the document registry and executeTasks batch mutation paths, updates SKILL.md documentation with chaining examples, and introduces two new benchmark scenarios for atomic operations.

Changes

Documentation

SKILL.md: Updated capability count (69→70), removed check_run from domains list, and added new "Chain" section with syntax, examples, and guidance on when to use ghx chain vs ghx run
Implementation Plan: Added docs/plans/impl-plan-atomic-chaining-followup.md documenting remaining work items, status tracking, and execution order

Test Coverage

document-registry.test.ts: Expanded from 5 tests to 30 tests with one test per registered operation (21 mutations + 9 lookups) for better failure diagnostics
engine-execute-tasks-no-resolution.integration.test.ts: New integration tests validating Phase 2 batch mutation path for mutations without resolution config (e.g., issue.close, issue.comments.create)
engine-execute-tasks-pr-review-submit.integration.test.ts: New integration tests validating Phase 1→Phase 2 resolution path for pr.reviews.submit, including successful resolution and error handling when lookup returns null

Benchmark Scenarios

issue-triage-atomic-wf-001.json: Workflow scenario for atomically setting labels, assignees, and milestones on an issue via single chain call
pr-review-submit-atomic-wf-001.json: Workflow scenario for atomically submitting a PR review via chain
scenario-sets.json: Added new chaining scenario set and included both new scenarios in default and all sets

Related Issue

Derived from docs/plans/2026-02-20-atomic-chaining-followup.md. Items 3, 5, and 6 were already completed in PR #60.

Change Type

Validation

pnpm run ci passes
Added comprehensive unit and integration tests for document registry and batch execution paths
No GraphQL operations changed (no codegen needed)
No public API changes

Release Notes

No changeset needed (documentation and test-only changes)

Docs

Updated SKILL.md with chaining documentation and examples
Added implementation plan document

https://claude.ai/code/session_014xrxL9Cssn7begmw8Jekub

Summary by CodeRabbit

New Features
- Added atomic chaining benchmark scenarios for issue triage and PR review submission; updated benchmark scenario sets to include them.
Documentation
- Added implementation plan for atomic chaining follow-up.
- Updated SKILL.md with a new Chain section covering usage, variants, and examples.
Tests
- Added/expanded integration tests for batched execution and PR review submission resolution.
- Expanded unit tests for document-registry mutations and lookups.

Items from 2026-02-20-atomic-chaining-followup.md: **Item 1 — SKILL.md:** Update capability count 69→70, remove check_run domain, add Chain section covering ghx chain syntax, output shape, chainable capabilities, and a close+comment example. **Item 2 — Benchmark:** Add chaining scenario set with two new scenarios: - issue-triage-atomic-wf-001 (labels + assignee + milestone in one chain) - pr-review-submit-atomic-wf-001 (PR comment review via chain) Add chaining set to scenario-sets.json; include both in default and all. **Item 7 — Document registry tests:** Expand document-registry.test.ts from 5 tests to exhaustive coverage of all 21 registered mutations and all 9 registered lookups. **Item 8 — pr.reviews.submit resolution validation:** Add two integration tests for executeTasks exercising the Phase 1→Phase 2 resolution path: - Verifies pullRequestId is correctly injected from repository.pullRequest.id - Verifies partial status when lookup returns null pullRequest Also add executeTasks no-resolution batch tests (Phase 2 only). https://claude.ai/code/session_014xrxL9Cssn7begmw8Jekub

changeset-bot · 2026-02-21T18:16:31Z

⚠️ No Changeset found

Latest commit: 2c502d6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-02-21T18:16:37Z

Warning

Rate limit exceeded

@aryeko has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 16 minutes and 2 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Adds documentation for atomic chaining follow-up, two benchmark chaining scenarios, updates scenario-set manifests and SKILL.md with chain usage, and expands integration and unit tests to cover batched mutation paths and pr.reviews.submit resolution across Phase 1 (ID lookup) and Phase 2 (batched mutation).

Changes

Cohort / File(s)	Summary
Documentation & Plan `docs/plans/impl-plan-atomic-chaining-followup.md`	New implementation follow-up plan documenting status, tasks, JSON templates, execution order, and pre-PR checklist for atomic chaining.
Benchmark Scenarios `packages/benchmark/scenarios/chaining/issue-triage-atomic-wf-001.json`, `packages/benchmark/scenarios/chaining/pr-review-submit-atomic-wf-001.json`	Two new atomic chaining workflows: issue triage (label/assignee/milestone multi-step chain) and PR review submit (single-chain pr.reviews.submit scenario).
Benchmark Manifest `packages/benchmark/scenario-sets.json`	Added top-level `chaining` set and appended new chaining workflow IDs to `default`, `all`, and `full-seeded` sets.
Skill Docs `packages/core/skills/using-ghx/SKILL.md`	Updated capability count and domains; added a new "Chain" section describing ghx chain syntax, stdin variant, result format, chainable capabilities, and limitations.
Benchmark Tests `packages/benchmark/test/unit/scenario-sets-manifest.test.ts`	Adjusted tests to include chaining in `default`, separated `workflows` set, and updated assertions comparing `default`/`all` sets.
Integration Tests — No-resolution Mutations `packages/core/test/integration/engine-execute-tasks-no-resolution.integration.test.ts`	New Phase 2 batch mutation tests validating successful batching of no-resolution steps and transport error handling.
Integration Tests — PR Review Submit `packages/core/test/integration/engine-execute-tasks-pr-review-submit.integration.test.ts`	New tests covering pr.reviews.submit resolution flow: Phase 1 ID resolution and Phase 2 batched mutation, including success and partial-failure cases.
Unit Tests — Document Registry `packages/core/test/unit/document-registry.test.ts`	Reorganized into "mutations" and "lookups" blocks; expanded mutation coverage (many getMutationDocument assertions) and preserved lookup assertions.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Engine as Engine
  participant GH as GitHubAPI

  Client->>Engine: executeTasks(chain request)
  Engine->>GH: Phase 1 - lookup IDs (query)
  GH-->>Engine: resolved IDs (e.g., pullRequestId)
  Engine->>GH: Phase 2 - batched mutation (queryRaw with injected IDs)
  GH-->>Engine: mutation results
  Engine-->>Client: aggregated results (status, per-step outcomes)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat(core): atomic capability chaining with two-phase GraphQL batching #59: Continues and expands atomic capability chaining work; directly related to the added docs, scenarios, and tests.
feat(core): add Claude Code plugin infrastructure #48: Modifies packages/core/skills/using-ghx/SKILL.md; overlapping edits to SKILL.md suggest a close relation.
feat(core): implement agentic interface runtime, benchmarks, and docs #2: Earlier chaining-related changes (scenarios, registry/tests) that overlap with the current follow-up additions.

Poem

🐰 I hopped through JSON, chains in tow,
Phase one finds IDs, phase two says go.
Benchmarks hum and tests confirm,
Atomic steps in tidy form —
A rabbit's cheer for code that flows 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the primary changes: documentation updates, implementation plan addition, and test coverage expansion for atomic chaining.
Description check	✅ Passed	The description follows the template structure with all required sections completed: Summary, Related Issue, Change Type, Validation, Release Notes, and Docs.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch claude/plan-atomic-chaining-SG0k7

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-02-21T18:17:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

- Move inline mock assertions to post-call in no-resolution integration test - Remove redundant capturedQueryRawVars array; use vi.fn mock.calls instead - Prefix all document-registry test names with "returns document for " - Rename checkpoint issue_state_updated → issue_view_non_empty (accuracy) - Add chaining scenarios to full-seeded scenario set - Update impl-plan status table: items 1,2,7,8 → Done (this PR) - Add chain limitation note to SKILL.md (no cross-step data passing) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Tests assumed default/workflows/all were always equal. Now that chaining scenarios are included in default and all, update assertions to match the differentiated set structure (workflows ⊂ default = all = full-seeded). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

packages/core/test/unit/document-registry.test.ts (1)
89-91: Strengthen the negative-path assertions to match the expected error message.

Both .toThrow() calls pass for any thrown value, not just the expected registry error. Since getMutationDocument and getLookupDocument throw a known message ("No mutation document registered for operation: ..." / "No lookup document registered for operation: ..."), pinning the assertion to that string makes the test fail if the error source changes unexpectedly.
♻️ Proposed refinement
-  it("throws on unknown mutation", () => {
-    expect(() => getMutationDocument("NonExistentMutation")).toThrow()
-  })
+  it("throws on unknown mutation", () => {
+    expect(() => getMutationDocument("NonExistentMutation")).toThrow(
+      "No mutation document registered for operation: NonExistentMutation",
+    )
+  })
-  it("throws on unknown lookup", () => {
-    expect(() => getLookupDocument("NonExistentLookup")).toThrow()
-  })
+  it("throws on unknown lookup", () => {
+    expect(() => getLookupDocument("NonExistentLookup")).toThrow(
+      "No lookup document registered for operation: NonExistentLookup",
+    )
+  })
Also applies to: 135-137
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/test/unit/document-registry.test.ts` around lines 89 - 91,
Update the negative-path assertions to assert the exact error message thrown by
getMutationDocument and getLookupDocument instead of using a generic .toThrow():
replace expect(() => getMutationDocument("NonExistentMutation")).toThrow() with
an assertion that matches the specific message "No mutation document registered
for operation: NonExistentMutation" (and similarly for getLookupDocument use "No
lookup document registered for operation: ..."), ensuring the tests will fail if
a different error or message is thrown.
packages/benchmark/scenarios/chaining/issue-triage-atomic-wf-001.json (1)
20-26: Checkpoint doesn't verify the actual triage outcome — only that the issue exists.

condition: "non_empty" on issue.view is satisfied by any readable issue, including one where the chain failed silently. It doesn't confirm the label 'bug', assignee 'aryeko', or milestone 1 were applied. A false-positive pass is possible if the issue already exists before the chain runs.

If the assertion runner supports field-level conditions (e.g., contains_label or similar), consider using a more specific check. Alternatively, split into three checkpoints — one per atomic mutation — to isolate which step's post-condition failed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/benchmark/scenarios/chaining/issue-triage-atomic-wf-001.json` around
lines 20 - 26, The checkpoint "issue_view_non_empty" currently uses
verification_task "issue.view" with condition "non_empty", which only checks
existence and can false-positive; update this checkpoint to verify the actual
triage outcomes by either (a) using field-level conditions on the same
verification_task (e.g., conditions that assert contains_label:"bug",
assignee:"aryeko", and milestone:1) or (b) replace the single checkpoint with
three atomic checkpoints (e.g., "issue_has_label_bug",
"issue_assigned_to_aryeko", "issue_milestone_1") each calling "issue.view" and
asserting the specific field-level post-condition so you can isolate which
mutation failed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/skills/using-ghx/SKILL.md`:
- Line 38: The sentence claiming a chain call is "one GraphQL round-trip" is
inaccurate for resolution-backed capabilities; update the SKILL.md text that
follows "Use `ghx chain`..." to clarify that while `ghx chain` batches mutations
to avoid partial state, some capabilities (e.g., `pr.reviews.submit`) require a
Phase 1 lookup before the Phase 2 mutation and therefore incur two network
round-trips per chain call; mirror the behavior asserted in the
`engine-execute-tasks-pr-review-submit.integration.test.ts` by replacing the
absolute "one GraphQL round-trip" claim with wording that notes
resolution-backed capabilities may need an extra lookup round-trip.

---

Nitpick comments:
In `@packages/benchmark/scenarios/chaining/issue-triage-atomic-wf-001.json`:
- Around line 20-26: The checkpoint "issue_view_non_empty" currently uses
verification_task "issue.view" with condition "non_empty", which only checks
existence and can false-positive; update this checkpoint to verify the actual
triage outcomes by either (a) using field-level conditions on the same
verification_task (e.g., conditions that assert contains_label:"bug",
assignee:"aryeko", and milestone:1) or (b) replace the single checkpoint with
three atomic checkpoints (e.g., "issue_has_label_bug",
"issue_assigned_to_aryeko", "issue_milestone_1") each calling "issue.view" and
asserting the specific field-level post-condition so you can isolate which
mutation failed.

In `@packages/core/test/unit/document-registry.test.ts`:
- Around line 89-91: Update the negative-path assertions to assert the exact
error message thrown by getMutationDocument and getLookupDocument instead of
using a generic .toThrow(): replace expect(() =>
getMutationDocument("NonExistentMutation")).toThrow() with an assertion that
matches the specific message "No mutation document registered for operation:
NonExistentMutation" (and similarly for getLookupDocument use "No lookup
document registered for operation: ..."), ensuring the tests will fail if a
different error or message is thrown.

packages/core/skills/using-ghx/SKILL.md

…bilities "One GraphQL round-trip" is only true for no-resolution capabilities. Capabilities like pr.reviews.submit require a Phase 1 node-ID lookup plus Phase 2 mutation — two round-trips. Clarify to "as few as possible (typically one; node-ID lookups add a preflight query)". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

aryeko · 2026-02-22T08:48:50Z

@coderabbitai review

coderabbitai · 2026-02-22T08:48:59Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/skills/using-ghx/SKILL.md`:
- Around line 38-61: The intro overstates guarantees: replace phrasing like
"must succeed together" and "atomically" for the ghx chain command with clear
language that it batches steps into fewer GraphQL round-trips but does not
provide transactional rollback; explicitly state that the returned { status,
results[], meta } can be "success", "partial" or "failed" and that callers must
check status and handle partial results (results[i] containing { task, ok, data
| error }). Update mentions of "atomically" and "must succeed together" to
clarify "batched into a single request but not all-or-nothing."

---

Duplicate comments:
In `@packages/core/skills/using-ghx/SKILL.md`:
- Line 38: This review is a duplicate noting that the "one round-trip" wording
already accounts for resolution-backed preflight queries; no change to the
SKILL.md text is required—mark the duplicate review comment resolved/removed and
ensure the existing sentence mentioning "Use `ghx chain`" (and the example
`pr.reviews.submit`) remains as-is.

packages/core/skills/using-ghx/SKILL.md

"Must succeed together" and "atomically" imply ACID rollback that ghx chain doesn't provide — a partial result is possible. Replace with accurate batching language and add an explicit note that steps are not transactional. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

aryeko and others added 2 commits February 21, 2026 20:26

aryeko marked this pull request as ready for review February 21, 2026 18:31

coderabbitai bot requested changes Feb 21, 2026

View reviewed changes

packages/core/skills/using-ghx/SKILL.md Outdated Show resolved Hide resolved

coderabbitai bot requested changes Feb 22, 2026

View reviewed changes

packages/core/skills/using-ghx/SKILL.md Outdated Show resolved Hide resolved

aryeko merged commit cfb8904 into main Feb 22, 2026
7 checks passed

aryeko deleted the claude/plan-atomic-chaining-SG0k7 branch February 22, 2026 09:04

Comments

Conversation

aryeko commented Feb 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Documentation

Test Coverage

Benchmark Scenarios

Related Issue

Change Type

Validation

Release Notes

Docs

Summary by CodeRabbit

Uh oh!

changeset-bot bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aryeko commented Feb 22, 2026

Uh oh!

coderabbitai bot commented Feb 22, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aryeko commented Feb 21, 2026 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Feb 21, 2026 •

edited

Loading

coderabbitai bot commented Feb 21, 2026 •

edited

Loading

codecov bot commented Feb 21, 2026 •

edited

Loading