Skip to content

Commit c1e2960

Browse files
authored
feat: Add CorrectionOps pattern (#29058)
1 parent 056900f commit c1e2960

3 files changed

Lines changed: 244 additions & 0 deletions

File tree

docs/astro.config.mjs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,7 @@ export default defineConfig({
284284
items: [
285285
{ label: 'BatchOps', link: '/patterns/batch-ops/' },
286286
{ label: 'CentralRepoOps', link: '/patterns/central-repo-ops/' },
287+
{ label: 'CorrectionOps', link: '/patterns/correction-ops/' },
287288
{ label: 'ChatOps', link: '/patterns/chat-ops/' },
288289
{ label: 'DailyOps', link: '/patterns/daily-ops/' },
289290
{ label: 'DataOps', link: '/patterns/data-ops/' },
Lines changed: 239 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
---
2+
title: CorrectionOps
3+
description: Improve agentic workflows from trusted human corrections without retraining the underlying model
4+
---
5+
6+
:::caution[Experimental]
7+
CorrectionOps is an experimental pattern. The guidance and workflow shape on this page may change as the pattern is tested in more real-world workflows.
8+
:::
9+
10+
CorrectionOps is a workflow pattern that compares predictions with later human corrections.
11+
12+
Instead of retraining the model, CorrectionOps improves the workflow around the model. It stores predictions at decision time, compares them with later trusted human truth, and uses that evidence to update instructions, routing, thresholds, and rollout decisions.
13+
14+
The basic loop is simple:
15+
16+
1. Save what the workflow predicted
17+
2. Collect what humans later decided
18+
3. Use the difference to improve the workflow
19+
20+
## When to Use CorrectionOps
21+
22+
Use CorrectionOps when you want to turn a human decision process into an agentic workflow iteratively rather than all at once.
23+
24+
It is a good fit when humans still make or correct the real decision, but you want the workflow to improve over time by updating instructions, routing, thresholds, or rollout state.
25+
26+
Typical fits include labeling and classification, routing and prioritization, moderation and approvals, and summaries or recommendations that humans later correct.
27+
28+
It is especially useful when the rollout path is gradual:
29+
30+
- Start with `staged: true`
31+
- Keep evaluation and reporting in Ops
32+
- Use later corrections to improve the workflow
33+
- Promote to direct writes only when the evidence is strong enough
34+
35+
## How It Works
36+
37+
A clean CorrectionOps setup has two long-lived surfaces. Production stays authoritative. Ops is the long-lived home for prediction, correction intake, reporting, instruction updates, and rollout control.
38+
39+
That means the workflows usually stay in Ops. Early on they report, compare, and adapt from Ops without writing back to production. After promotion they can write directly to production.
40+
41+
Most implementations reduce to three workflow classes: a thin relay that forwards stable facts into ops, a prediction workflow that persists snapshots and writes safely, and a compare/report/decide workflow that checks later human truth and updates the system when the evidence is strong enough.
42+
43+
The important rule is to keep relays, snapshot resolution, diffing, and grouping deterministic. Use the agent for semantic judgment, not for reconstructing event history or inferring provenance after the fact.
44+
45+
## Example: Issue Labeling
46+
47+
```mermaid
48+
flowchart TB
49+
subgraph ProductionRepo[Production Repo]
50+
A[Issue or item in production]
51+
D[Later human correction in production]
52+
B[Thin relay]
53+
end
54+
55+
subgraph OpsRepo[Ops Repo]
56+
C[Store prediction snapshot]
57+
E[Collect correction evidence]
58+
F[Build deterministic diff]
59+
G[Publish report or open instruction PR]
60+
H[Make rollout decision]
61+
end
62+
63+
A -->|item-created event| B
64+
B --> C
65+
D -->|truth-feedback event| E
66+
C --> F
67+
E --> F
68+
F --> G
69+
G --> H
70+
H -.->|improves next run| A
71+
```
72+
73+
In this shape, production stays authoritative. Ops records the original prediction, collects later human corrections, builds the diff, and decides whether the workflow should stay staged, update its instructions, or graduate to direct writes.
74+
75+
```aw wrap
76+
---
77+
on:
78+
schedule: daily
79+
workflow_dispatch:
80+
repository_dispatch:
81+
types: [truth-feedback]
82+
permissions:
83+
contents: read
84+
issues: read
85+
safe-outputs:
86+
create-issue:
87+
create-pull-request:
88+
---
89+
90+
# CorrectionOps Worker
91+
92+
Read persisted predictions and later trusted truth, compare them deterministically, then either publish a health report or open a draft PR updating instructions.
93+
```
94+
95+
CorrectionOps solves a different problem than model training. Reinforcement Learning from Human Feedback (RLHF) updates model weights from human feedback. CorrectionOps updates the workflow system *around* the model. In practice that usually means changing instruction files, routing rules, deterministic checks, thresholds, or rollout decisions rather than trying to retrain the engine.
96+
97+
In a healthy CorrectionOps loop, production truth stays authoritative, predictions are saved explicitly, corrections include provenance, and diffs are built deterministically before the agent is asked to reason about them.
98+
99+
CorrectionOps does not require a separate evaluation repository. The normal progression is to start with `staged: true`, then use ops-managed adaptation and gated review, then enable direct production writes once the evidence is strong enough.
100+
101+
### Full Workflow Pieces
102+
103+
If you want the explicit workflow split, the same example usually breaks into four pieces.
104+
105+
#### 1. Relay In The Source Repo
106+
107+
The relay only forwards stable facts and provenance into ops. It should not compute diffs, infer human intent, or decide whether the workflow was correct.
108+
109+
```yaml title="prod-repo/.github/workflows/relay-correction-signals.yml"
110+
name: Relay Correction Signals
111+
112+
on:
113+
issues:
114+
types: [opened, labeled, unlabeled]
115+
116+
jobs:
117+
relay:
118+
runs-on: ubuntu-latest
119+
steps:
120+
- name: Forward stable facts to ops
121+
uses: actions/github-script@v8
122+
with:
123+
github-token: ${{ secrets.OPS_DISPATCH_TOKEN }}
124+
script: |
125+
await github.rest.repos.createDispatchEvent({
126+
owner: 'org',
127+
repo: 'ops-repo',
128+
event_type: context.payload.action === 'opened' ? 'item-created' : 'truth-feedback',
129+
client_payload: {
130+
data: {
131+
source_repository: `${context.repo.owner}/${context.repo.repo}`,
132+
source_type: 'issue',
133+
item_number: context.payload.issue.number,
134+
item_title: context.payload.issue.title,
135+
item_url: context.payload.issue.html_url,
136+
event_type: context.payload.action,
137+
label: context.payload.label?.name || null,
138+
actor: context.actor,
139+
actor_type: context.actor.endsWith('[bot]') ? 'bot' : 'human',
140+
occurred_at: new Date().toISOString(),
141+
},
142+
},
143+
});
144+
```
145+
146+
#### 2. Prediction Workflow In Ops
147+
148+
The prediction workflow consumes normalized inputs, applies the current instructions, and persists a durable snapshot that can be compared later.
149+
150+
```aw wrap title="ops-repo/.github/workflows/predict-items.md"
151+
---
152+
name: Predict Items
153+
154+
on:
155+
schedule: daily
156+
workflow_dispatch:
157+
repository_dispatch:
158+
types: [item-created]
159+
160+
tools:
161+
github:
162+
toolsets: [issues, repos]
163+
164+
safe-outputs:
165+
create-issue:
166+
update-issue:
167+
---
168+
169+
# Predict Items
170+
171+
Read prepared items from `/tmp/gh-aw/agent/item-scan`, apply the current instructions, write review artifacts through safe outputs in Ops, and append a prediction snapshot containing the source identifier, predicted action, instruction version, and timestamp.
172+
```
173+
174+
#### 3. Compare, Report, And Decide In Ops
175+
176+
The review workflow reads persisted predictions and later human truth, builds deterministic diffs first, and only then asks the agent to summarize patterns or propose instruction updates.
177+
178+
```aw wrap title="ops-repo/.github/workflows/review-corrections.md"
179+
---
180+
name: Review Corrections
181+
182+
on:
183+
schedule: weekly
184+
workflow_dispatch:
185+
inputs:
186+
mode:
187+
description: report or adaptation
188+
required: false
189+
default: report
190+
type: choice
191+
options: [report, adaptation]
192+
193+
safe-outputs:
194+
create-issue:
195+
create-pull-request:
196+
---
197+
198+
# Review Corrections
199+
200+
Read `correction-diffs.json` from `/tmp/gh-aw/agent/correction-review`. In `report` mode, publish a health summary. In `adaptation` mode, open a draft PR updating the instruction file only when the grouped evidence is strong enough.
201+
```
202+
203+
#### 4. Optional Deterministic Collector
204+
205+
Add a separate collector only when the later-truth boundary deserves its own trigger, permissions, or serialized write path.
206+
207+
```yaml title="ops-repo/.github/workflows/collect-corrections.yml"
208+
name: Collect Corrections
209+
210+
on:
211+
repository_dispatch:
212+
types: [truth-feedback]
213+
214+
jobs:
215+
collect:
216+
runs-on: ubuntu-latest
217+
steps:
218+
- name: Resolve authoritative truth and store correction evidence
219+
run: ./scripts/store-correction-evidence.sh
220+
```
221+
222+
### Stable Contracts To Define First
223+
224+
Before adding rollout logic or adaptation prompts, define four small deterministic contracts:
225+
226+
1. relay payload: the minimal source identity, object identity, event type, actor facts, and timestamps forwarded into ops
227+
2. prediction snapshot: the durable record of what the workflow predicted and under which instruction version
228+
3. correction review input: the deterministic diff artifact used by reporting and adaptation
229+
4. rollout gate contract: what evidence or approvals are required before direct production writes are enabled
230+
231+
Discussion labeling, routing, moderation, prioritization, approvals, and summaries can all reuse this shape. The production object changes, but the CorrectionOps setup does not.
232+
233+
## Related Documentation
234+
235+
- [Staged Mode](/gh-aw/reference/staged-mode/) for the optional safe-write rollout guidance inside CorrectionOps
236+
- [SideRepoOps](/gh-aw/patterns/side-repo-ops/) for separating workflow infrastructure from the production repository
237+
- [MultiRepoOps](/gh-aw/patterns/multi-repo-ops/) for coordinating workflows across repository boundaries
238+
- [Safe Outputs Reference](/gh-aw/reference/safe-outputs/) for controlling write targets and protections
239+
- [GitHub Tools](/gh-aw/reference/github-tools/) for cross-repository reads and operations

docs/src/content/docs/reference/glossary.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -626,6 +626,10 @@ Pattern for processing large volumes of work items efficiently using chunked pag
626626

627627
A [MultiRepoOps](#multirepoops) deployment variant where a single private repository acts as a control plane for coordinating large-scale operations across many repositories. Enables consistent rollouts, policy updates, and centralized tracking using cross-repository safe outputs and secure authentication. See [CentralRepoOps](/gh-aw/patterns/central-repo-ops/).
628628

629+
### CorrectionOps
630+
631+
Pattern for improving workflows from trusted human corrections without retraining the underlying model. CorrectionOps stores predictions, compares them with later authoritative human decisions, and uses grouped diffs to update instructions, routing, thresholds, or rollout policy. See [CorrectionOps](/gh-aw/patterns/correction-ops/).
632+
629633
### ChatOps
630634

631635
Interactive automation triggered by slash commands (`/review`, `/deploy`) in issues and pull requests, enabling human-in-the-loop automation where developers invoke AI assistance on demand. See [ChatOps](/gh-aw/patterns/chat-ops/).

0 commit comments

Comments
 (0)