Skip to content

Fix runtime todo rendering and artifact echoes#4018

Merged
Siri-Ray merged 10 commits into
mainfrom
codex/runtime-todo-artifact-ui
Jun 10, 2026
Merged

Fix runtime todo rendering and artifact echoes#4018
Siri-Ray merged 10 commits into
mainfrom
codex/runtime-todo-artifact-ui

Conversation

@Siri-Ray

@Siri-Ray Siri-Ray commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Why

This fixes regressions found while validating TodoWrite behavior across agent runtimes. The user-facing pain was twofold: Gemini / Claude-style runs could dump a full code artifact into chat after the file was already written, and the TodoWrite UI had drifted from its original in-message position into a composer-pinned card.

The runtime conclusion is that Gemini 3 / preview does not reliably expose native write_todos, so Open Design should not synthesize TodoWrite state from markdown, TODO files, temporary JSON files, or shell heredocs. We only map standard structured runtime events.

What users will see

  • Todo lists render once at the first TodoWrite position in the assistant message stream.
  • Later TodoWrite updates refresh that original card instead of pinning a global card above the composer or creating duplicate cards.
  • Gemini and Claude Code runs no longer show a final duplicated <artifact> / full-code block after the actual file-write tool has already produced the file.
  • Gemini only shows TodoWrite when the CLI emits native write_todos; it no longer simulates todos from markdown/files/shell output.

Surface area

  • UI — new page / dialog / panel / menu item / setting / empty state in apps/web or apps/desktop (including Electron menu bar)
  • Keyboard shortcut — new or changed
  • CLI / env var — new od subcommand or flag, new tools-dev / tools-pack / tools-pr flag, or new OD_* env var
  • API / contract — new /api/* endpoint, new SSE event, or changed shape in packages/contracts
  • Extension point — new entry under skills/, design-systems/, design-templates/, or craft/, or change to the skills protocol
  • i18n keys — added new translation keys (see TRANSLATIONS.md for the locale workflow)
  • New top-level dependency — adding any new entry to the root package.json (dependencies or devDependencies); workspace-package package.json files are out of scope. Include a paragraph on what we get vs. what bytes we ship (see CONTRIBUTING.md → Code style)
  • Default behavior change — changes what existing users experience without opting in (default model, default setting, file/SQLite schema, auto-network on startup, auto-install)
  • None — internal refactor, docs, tests, or translation update only

Screenshots

Not attached from this environment. The changed UI behavior is covered by apps/web/tests/components/chat-todo-autoscroll.test.tsx: no .chat-pinned-todo, one inline Todo card, latest snapshot updates the original card.

Bug fix verification

  • Test path that reproduces the bug:
    • apps/daemon/tests/json-event-stream.test.ts
    • apps/daemon/tests/structured-streams.test.ts
    • apps/web/tests/components/chat-todo-autoscroll.test.tsx
  • Did the test go red on main and green on this branch? No. The local repro was interactive/manual; this PR adds focused regression coverage for the corrected behavior.

Validation

  • corepack pnpm --filter @open-design/daemon exec vitest run -c vitest.config.ts tests/json-event-stream.test.ts -t "gemini stream|codex json stream emits TodoWrite"
  • corepack pnpm --filter @open-design/daemon exec vitest run -c vitest.config.ts tests/structured-streams.test.ts -t "Claude"
  • corepack pnpm --filter @open-design/daemon typecheck
  • corepack pnpm --filter @open-design/web exec vitest run -c vitest.config.ts tests/components/chat-todo-autoscroll.test.tsx tests/components/assistant-message-unfinished-todos.test.tsx
  • corepack pnpm --filter @open-design/web typecheck

@lefarcen lefarcen added the size/XL PR changes 700-1500 lines label Jun 9, 2026
@lefarcen lefarcen requested a review from nettee June 9, 2026 16:50
@lefarcen lefarcen added risk/high High risk: apps/desktop, daemon, auth, migration, workflows, package deps type/bugfix Bug fix labels Jun 9, 2026
@lefarcen

lefarcen commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Hey @Siri-Ray! 👋 Triage complete — labeled size/XL / risk/high / type/bugfix, and @nettee is queued for pool review.

The regression scope is clear from the body: inline TodoWrite position restoration + artifact-echo suppression + Gemini native-only policy. The test coverage across both daemon and web is exactly the right shape for a fix like this.

CI is still pending — pool review will kick off once it's green.

💡 This one touches daemon event streaming and web rendering together, so there may be a couple of review rounds. If you'd like to drive it to merge hands-free, paste this to your AI coding agent (Claude Code / Codex / opencode / Cursor …):
Take over nexu-io/open-design#4018 until it merges — read https://raw.githubusercontent.com/nexu-io/looper/main/skills/pr-takeover/SKILL.md and follow it.

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few blocking regressions in the new todo/artifact handling. The biggest ones are that the inline todo card now depends on the original message row staying mounted, and the stream parsers still leak provider-native todo/task tools alongside the canonical TodoWrite events.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

Comment thread apps/web/src/components/ChatPane.tsx
Comment thread apps/daemon/src/json-event-stream.ts
Comment thread apps/daemon/src/json-event-stream.ts Outdated
Siri-Ray added 2 commits June 10, 2026 00:57
Generated-By: looper 0.9.7 (runner=fixer, agent=codex)
Generated-By: looper 0.9.7 (runner=fixer, agent=codex)

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three blocking regressions are still present on the current head: the canonical todo card can disappear in long conversations, the daemon still emits duplicate provider-native todo/task tool events alongside the normalized TodoWrite event, and artifact suppression now stays enabled for the rest of the run after the first file write. The inline comments below call out the concrete failure modes and the specific test coverage gaps to close.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

Comment thread apps/web/src/components/ChatPane.tsx
Comment thread apps/daemon/src/json-event-stream.ts
Comment thread apps/daemon/src/json-event-stream.ts Outdated
@lefarcen

lefarcen commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@Siri-Ray@nettee reviewed and flagged three blocking issues:

  1. fileWriteSeen suppression window too broad — the flag is never cleared, so any later legitimate artifact in the same stream also gets stripped (same issue in claude-stream.ts).
  2. Todo card tied to first message row — virtualization kicks in after 80 messages and initially renders only the last 16, so an early TodoWrite anchor can fall outside the window and make the card disappear entirely.
  3. Provider-native tool passthrough — after mapping Gemini's write_todosTodoWrite, the raw write_todos is still forwarded as a second event; the regression test currently asserts the duplicate behavior.

Looks like there's a new commit up (2d4e0f2) — nettee's review was against the prior head (36b73c5), so it'll need a re-review pass once CI finishes on the new head. Daemon workspace tests and browser tests are still running.

@Siri-Ray

Siri-Ray commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Looper fixer round completecca2dba

  • ✅ Review comment on apps/web/src/components/ChatPane.tsx (@nettee) — thread
    • Moved canonical TodoWrite rendering into a standalone ChatPane render item appended to the chat log and added a virtualized long-conversation regression in chat-todo-autoscroll.test.tsx.
  • ✅ Review comment on apps/daemon/src/json-event-stream.ts:268 (@nettee) — thread
    • Suppressed native Gemini write_todos and Claude TaskCreate/TaskUpdate passthrough after canonical TodoWrite mapping, with tests updated in json-event-stream.test.ts and structured-streams.test.ts.
  • ✅ Review comment on apps/daemon/src/json-event-stream.ts (@nettee) — thread
    • Scoped artifact echo suppression to the immediate post-write artifact window in json-event-stream.ts and claude-stream.ts, with regressions proving later final artifacts survive.
  • • Review comment on apps/web/src/components/ChatPane.tsx (@nettee) — thread
    • Agent did not provide a decision for this thread
  • • Review comment on apps/daemon/src/json-event-stream.ts:268 (@nettee) — thread
    • Agent did not provide a decision for this thread
  • • Review comment on apps/daemon/src/json-event-stream.ts (@nettee) — thread
    • Agent did not provide a decision for this thread

🔁 Powered by Looper · runner=fixer · agent=codex · An autonomous AI dev team for your GitHub repos.

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one remaining blocking regression in the artifact-echo suppression path. The inline comment below covers the concrete failure mode and the regression coverage that still needs to be added.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

Comment thread apps/daemon/src/json-event-stream.ts Outdated
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Visual regression review

Head: 8fe1415 · Base: b1c60c7

6 changed · 29 unchanged · 0 missing baseline · 0 failed

Changed cases

Case Main PR Diff
visual-avatar-menu main pr diff
visual-new-project-modal main pr diff
visual-settings-byok main pr diff
visual-settings-byok-openai main pr diff
visual-settings-local-cli main pr diff
visual-workspace-staged-contexts main pr diff
Unchanged cases
Case Main PR Diff
visual-avatar-local-agent-list main pr diff
visual-design-system-detail main pr diff
visual-design-systems main pr diff
visual-home main pr diff
visual-home-catalog main pr diff
visual-home-context-picker main pr diff
visual-home-plugin-filter main pr diff
visual-home-plugin-use-staged main pr diff
visual-home-plugin-use-with-query main pr diff
visual-home-staged-attachment main pr diff
visual-integrations main pr diff
visual-integrations-mcp main pr diff
visual-integrations-use-everywhere main pr diff
visual-onboarding-runtime main pr diff
visual-plugin-details main pr diff
visual-plugin-share-menu main pr diff
visual-plugins main pr diff
visual-project-avatar-model-dropdown main pr diff
visual-project-workspace main pr diff
visual-projects main pr diff

Visual diff is advisory only and does not block merging.

Generated-By: looper 0.9.7 (runner=fixer, agent=codex)
@Siri-Ray

Siri-Ray commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Looper fixer round complete3e39e0f

  • ✅ Review comment on apps/web/src/components/ChatPane.tsx (@nettee) — thread
    • Confirmed ChatPane now renders the canonical TodoWrite card as a standalone chat render item with virtualization coverage in chat-todo-autoscroll.test.tsx. Updated ChatPane.streaming.test.tsx to assert the standalone card instead of row props.
  • ✅ Review comment on apps/daemon/src/json-event-stream.ts:278 (@nettee) — thread
    • Confirmed Gemini write_todos and Claude TaskCreate/TaskUpdate stop after emitting the canonical TodoWrite event in json-event-stream.ts and claude-stream.ts. The parser suites expect only canonical todo events.
  • ✅ Review comment on apps/daemon/src/json-event-stream.ts (@nettee) — thread
    • Confirmed artifact echo suppression is one-shot in json-event-stream.ts and claude-stream.ts, so later legitimate artifacts render after the immediate echo is consumed. Existing parser regressions cover later final artifacts.
  • ✅ Review comment on apps/daemon/src/json-event-stream.ts (@nettee) — thread
    • Changed json-event-stream.ts and claude-stream.ts to keep a buffered suppression token across plain post-write chunks until the artifact open tag is consumed. Added Gemini and Claude regressions for prose first, then <artifact.

🔁 Powered by Looper · runner=fixer · agent=codex · An autonomous AI dev team for your GitHub repos.

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two blocking regressions remain on the current head: the standalone TodoWrite row is appended at the end of the transcript instead of the first TodoWrite position, and the artifact-echo filter now withholds ordinary post-write prose until flush/end when no <artifact> follows.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

Comment thread apps/web/src/components/ChatPane.tsx Outdated
Comment thread apps/daemon/src/claude-stream.ts
Generated-By: looper 0.9.7 (runner=fixer, agent=codex)
@Siri-Ray

Siri-Ray commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Looper fixer round completee05fda2

  • ✅ Review comment on apps/web/src/components/ChatPane.tsx:2428 (@nettee) — thread
    • Changed ChatPane render item construction to insert the canonical conversation-todo row after the first assistant TodoWrite message, with a virtualizer include key so the standalone row remains mounted. Added chat-todo-autoscroll coverage for ordering.
  • ✅ Review comment on apps/daemon/src/claude-stream.ts:228 (@nettee) — thread
    • Changed claude-stream.ts and json-event-stream.ts to emit plain post-write text immediately while buffering only trailing partial artifact-open candidates. Added Claude and Gemini parser coverage for file write followed by non-artifact prose.

🔁 Powered by Looper · runner=fixer · agent=codex · An autonomous AI dev team for your GitHub repos.

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blocking artifact-suppression regression is still present on the current head. The inline comment below covers the remaining sequence where ordinary prose leaves the parser armed and a later real artifact is hidden.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

Comment thread apps/daemon/src/json-event-stream.ts
Generated-By: looper 0.9.7 (runner=fixer, agent=codex)
@Siri-Ray

Siri-Ray commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Looper fixer round completef230de3

  • ✅ Review comment on apps/daemon/src/json-event-stream.ts:436 (@nettee) — thread
    • Cleared suppressNextArtifactText in json-event-stream.ts and claude-stream.ts when ordinary prose is emitted with no partial artifact candidate. Updated the Gemini and Claude parser regressions to verify a later real artifact survives after that prose.

🔁 Powered by Looper · runner=fixer · agent=codex · An autonomous AI dev team for your GitHub repos.

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blocking regression remains in the TodoWrite placement path: the canonical card is inserted only after the containing assistant message, so it still drifts from the original in-message TodoWrite position whenever that message has trailing prose or later tool output.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

Comment thread apps/web/src/components/ChatPane.tsx Outdated
Generated-By: looper 0.9.7 (runner=fixer, agent=codex)
@Siri-Ray

Siri-Ray commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Looper fixer round complete984ef8e

  • ✅ Review comment on apps/web/src/components/ChatPane.tsx (@nettee) — thread
    • Restored ChatPane to pass conversationTodoInput/showConversationTodoCard into the first TodoWrite assistant row, while always mounting that row during virtualization. Added chat-todo-autoscroll coverage for prose before and after the first TodoWrite event.

🔁 Powered by Looper · runner=fixer · agent=codex · An autonomous AI dev team for your GitHub repos.

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking correctness issue in the new Claude task-to-TodoWrite normalization path. The inline comment below covers the collision and the regression coverage that still needs to be added.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

Comment thread apps/daemon/src/claude-stream.ts Outdated
Generated-By: looper 0.9.7 (runner=fixer, agent=codex)

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one remaining blocking correctness issue in the Claude task-to-TodoWrite normalization path. The inline comment below covers the concrete collision and the regression coverage that still needs to be added.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

Comment thread apps/daemon/src/claude-stream.ts Outdated
Generated-By: looper 0.9.7 (runner=fixer, agent=codex)
@Siri-Ray

Siri-Ray commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Looper fixer round complete546e863

  • ✅ Review comment on apps/daemon/src/claude-stream.ts (@nettee) — thread
    • Added Claude runtime task ID allocation helpers in apps/daemon/src/claude-stream.ts to advance past explicit numeric IDs and skip occupied generated IDs. Added a regression in apps/daemon/tests/structured-streams.test.ts for explicit taskId '1' followed by an implicit TaskCreate.
  • • Review comment on apps/daemon/src/claude-stream.ts (@nettee) — thread
    • Agent did not provide a decision for this thread

🔁 Powered by Looper · runner=fixer · agent=codex · An autonomous AI dev team for your GitHub repos.

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one remaining blocking regression in the artifact-echo suppression path. The inline comment below covers the concrete truncation case and the regression coverage that still needs to be added.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

Comment thread apps/daemon/src/claude-stream.ts
Generated-By: looper 0.9.7 (runner=fixer, agent=codex)
@Siri-Ray

Siri-Ray commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Looper fixer round complete8fe1415

  • ✅ Review comment on apps/daemon/src/claude-stream.ts:244 (@nettee) — thread
    • Updated apps/daemon/src/claude-stream.ts and apps/daemon/src/json-event-stream.ts so flush emits buffered artifactOpenCandidate text. Added regressions in the daemon parser tests for Write/write_file followed by terminal 'Done <art'.

🔁 Powered by Looper · runner=fixer · agent=codex · An autonomous AI dev team for your GitHub repos.

@nettee nettee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Siri-Ray I re-ran the final pass on the current head and the remaining todo/artifact fixes look solid. I verified the daemon parser regressions for Claude/Gemini/Codex TodoWrite handling and artifact-echo suppression, and the web coverage for inline Todo card placement, virtualization, and streaming behavior all pass locally. I also re-ran the underlying tsc steps for apps/web and apps/daemon directly with the pinned toolchain after the packaged daemon typecheck script tripped over a nested pnpm engine mismatch in this environment. Nice cleanup on a tricky cross-runtime bug loop.

🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.

@Siri-Ray Siri-Ray added this pull request to the merge queue Jun 10, 2026
Merged via the queue into main with commit 4154e61 Jun 10, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk/high High risk: apps/desktop, daemon, auth, migration, workflows, package deps size/XL PR changes 700-1500 lines type/bugfix Bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants