Skip to content

feat(chat): suggest matching slash commands from natural language#1803

Open
sukria-koan0 wants to merge 8 commits into
mainfrom
koan0/implement-1799
Open

feat(chat): suggest matching slash commands from natural language#1803
sukria-koan0 wants to merge 8 commits into
mainfrom
koan0/implement-1799

Conversation

@sukria-koan0

@sukria-koan0 sukria-koan0 commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

Replace intent-routing/confirmation with simpler zero-cost approach.

Closes #1799


Quality Report

Changes: 24 files changed, 1504 insertions(+), 2011 deletions(-)

Code scan: clean

Tests: passed (403
tests)

Branch hygiene: clean

Generated by Kōan

@Koan-Bot Koan-Bot self-requested a review June 6, 2026 20:31
@Koan-Bot

Koan-Bot commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

PR Review — feat(chat): suggest matching slash commands from natural language

Clean feature design, but the prompt template has two issues that should be fixed before merge.

  • The example command /resume_recurring doesn't exist — will teach the LLM to hallucinate compound command names
  • The instruction block ("Available slash commands…", "suggest when…") stays in the prompt even when the catalog is empty/disabled, wasting tokens and risking hallucinated suggestions
  • Test assertion for catalog injection is vacuously true due to or with template text
  • Alphabetical top-5-per-group is arbitrary — not the "top" commands by usefulness

🟡 Important

1. Phantom command in prompt example (`koan/system-prompts/chat.md`, L21-26)

/resume_recurring doesn't exist. The actual command is /recurring resume <n> (subcommand syntax). An LLM seeing this example may hallucinate compound slash-command names that aren't in the catalog.

Use a real, simple command from the catalog — e.g. /review or /status.

suggest it in your reply — e.g. "That sounds like a job for `/resume_recurring` — want me to queue it?"
2. Instruction block leaks when catalog is empty (`koan/system-prompts/chat.md`, L21-26)

When suggestions are disabled or in lite mode, {SKILLS_CATALOG} is substituted with "" but the surrounding 5 lines ("Available slash commands…", "When the human's message clearly maps…") remain in the prompt. This wastes tokens and — worse — instructs the LLM to suggest commands from an empty list, which may cause hallucinated suggestions from training data.

Fix: move the entire block (header + catalog + instructions) into _build_command_catalog() so it returns the complete section when enabled, empty string when not. Then chat.md has just {SKILLS_CATALOG} on a line by itself.

Available slash commands (suggest when the human's message maps to one):
{SKILLS_CATALOG}

When the human's message clearly maps to a slash command from the list above...
3. Weak assertion — always passes via template text (`koan/tests/test_awake.py`, L3603-3607)

"Available slash commands" is hardcoded in the chat.md template, so the or branch always passes regardless of whether catalog injection works.

Assert on the actual injected content instead:

assert "/status" in prompt, "Catalog content should appear in prompt"
assert "/status" in prompt or "Available slash commands" in prompt, \
    "Non-lite prompt should have catalog"

🟢 Suggestions

1. Alphabetical sort != "top" commands (`koan/app/awake.py`, L207-210)

The docstring says "selects the top ~20 by group" but sorted(groups_dict[group])[:5] picks the first 5 alphabetically — /abort, /add_project, /ai… not necessarily the most useful.

Consider explicit priority ordering or at least sort by group relevance. Alternatively, add a suggest_priority field to SKILL.md frontmatter so operators can curate which commands surface.

for line in sorted(groups_dict[group])[:5]

Checklist

  • No hardcoded secrets
  • Input validation at boundaries
  • No bare except swallowing errors silently
  • Error handling returns safe fallback
  • No mutable default arguments
  • Tests verify observable behavior not implementation
  • No private skill/project names leaked — warning #1
  • Prompt template correctness — warning #1, warning #2
  • Test assertions are meaningful — warning #4
  • Documentation updated

To rebase specific severity levels, mention me: @Koan-Bot rebase critical (fixes 🔴 only), @Koan-Bot rebase important (fixes 🔴 + 🟡), or just @Koan-Bot rebase for all.


Silent Failure Analysis

🟡 **MEDIUM** — catch-all masks registry bugs (`koan/app/awake.py:213-216`)

Risk: The broad except Exception returns the same empty string as the feature-disabled path, so a persistent bug in the skill registry (e.g. a SKILL.md with a None description hitting [:70]) silently degrades the feature with no way for callers or users to distinguish 'disabled' from 'broken'.

    except Exception as e:
        log("warn", f"[chat] catalog builder failed: {e}")
        return ""

Fix: Narrow the catch to expected failures (e.g. FileNotFoundError, KeyError) or log at error level with the traceback so persistent registry issues surface in normal log monitoring.


Automated review by Kōan (Claude · model claude-opus-4-6) HEAD=8c0be44 4 min 37s

@Koan-Bot Koan-Bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issues found — see the review comment above.

@sukria-koan0

Copy link
Copy Markdown
Collaborator Author

Review comments addressed:

  • Phantom command: replaced /resume_recurring (non-existent) with /status in the suggestion example
  • Instruction block leak: moved the entire Available slash commands… + suggestion instruction into _build_command_catalog() — when disabled/empty, the section is completely absent from the prompt, not just the list
  • Weak test assertion: dropped the or "Available slash commands" branch that made the test pass vacuously via template text; now asserts on actual injected content (/status in prompt)

All 8 catalog/chat tests pass.

@sukria

sukria commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

@sukria-koan0 review

@sukria-koan0

sukria-koan0 commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator Author

Previous review — superseded by a newer review below.

@sukria

sukria commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

To fix :

  • /resume_recurring as been promoted to a sub command : /reccuring resume - change accordingly in user-manual.md and config.yaml example.
  • Fix the Important issue spotted (Narrow the catch to expected failures, or
    Log at error level with the traceback so persistent issues surface in normal monitoring.)

@sukria-koan0 rebase

sukria-koan0 and others added 4 commits June 7, 2026 11:22
Replace the intent-routing/confirmation machinery from PR #1799 with a
simpler, zero-cost approach: enrich the chat prompt with a compact catalog
of available slash commands and an instruction to suggest the matching
command when a user's conversational message maps to one of them.

Changes:
- Add _build_command_catalog() helper in awake.py to generate human-triggerable
  commands from the skill registry, filtered by audience and grouped by help group
- Update _build_chat_prompt() to inject {SKILLS_CATALOG} placeholder (empty in
  lite mode to save tokens)
- Enhance koan/system-prompts/chat.md with command-suggestion section and
  instruction: suggest at most one command when the mapping is clear
- Add get_chat_suggest_commands_enabled() config getter with default=true
- Add chat.suggest_commands opt-in flag to instance.example/config.yaml
- Add comprehensive tests for catalog builder, config getter, and prompt injection

Benefits over intent-routing:
- Zero extra Claude calls (no 5-10s latency, no round-trip)
- No confirmation state machine or guard-bypass surface
- Natural discovery inside the existing single-call chat path
- Human always in control (suggestions only, no auto-execution)
- Config opt-in for operators who prefer it disabled

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Update CLAUDE.md awake.py description to mention optional command suggestions
- Add new 'Chat Command Suggestions' subsection in user-manual.md explaining
  the feature: advisory suggestions with copy-pasteable commands, opt-in via config
- Clarify that suggestions are always user-controlled, not auto-executed

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…command

- _build_command_catalog() now returns the full section (header +
  body + suggestion instruction) so chat.md has a single {SKILLS_CATALOG}
  placeholder — when catalog is disabled/empty the surrounding text no
  longer leaks into the prompt
- Replace hallucinated /resume_recurring example with /status (a real command)
- Tighten test assertion: assert "/status" in prompt instead of the vacuous
  `or "Available slash commands"` branch that always passed via template text
@sukria-koan0

Copy link
Copy Markdown
Collaborator Author

Rebase with requested adjustments

Branch koan0/implement-1799 was rebased onto main and review feedback was applied.

Changes applied

  • Done.
  • Changes:
  • awake.py — narrowed silent failure: replaced broad except Exception returning the same empty string as the disabled path with log("error", ...) plus full traceback.format_exc(), so persistent registry bugs surface in monitoring instead of silently degrading to 'disabled'. Added import traceback. (Reviewer 🟡 Important Fix Python 3.8 compat, journal path, BACKLOG status #1)
  • awake.py — None-description guard: changed desc = cmd.description[:70] if cmd.description else skill.description[:70] to fall back through cmd.description or skill.description or "" before slicing, eliminating the TypeError when both are None. (Reviewer 🟡 Important Fix Python 3.8 compat, journal path, BACKLOG status #1)
  • user-manual.md + config.yaml example: replaced phantom /resume_recurring with the promoted sub-command /recurring resume. (Reviewer request)

Stats

8 files changed, 276 insertions(+), 1 deletion(-)
Actions performed
  • Already-solved check: negative (confidence=high, reasoning=No commit on main implements natural-language chat slash-command suggestions or the chat.suggest_com)
  • Rebased koan0/implement-1799 onto origin/main
  • Applied review feedback
  • Pre-push CI check: previous run passed
  • Force-pushed koan0/implement-1799 to origin
  • CI check enqueued in ## CI (async)

CI status

CI will be checked asynchronously.


Automated by Kōan

@sukria sukria force-pushed the koan0/implement-1799 branch from c8ec4ee to 2a2647a Compare June 7, 2026 09:23
@sukria

sukria commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

By reading the diff, I see that we lost the "queuing" mechanic here, koan will just spot the matching command and say "want me to queue it" whereas it will not.

This is misleading.

Queuing the command was a nice idea, but led to security concerns.

Advise on the best solution here: KISS would argue to just rephrase the comment : "Seems you want to run ." and stop here (enhanced, dynamic documentation, in a way, which is good). Or, really queue the command, ask for confirmation, and if user confirms, trigger the command → I would really like koan to be able to do that, but that must go through the natural workflow/gates/security of existing code paths.

@sukria-koan0

Alexis Sukrieh and others added 4 commits June 7, 2026 12:13
Chat can now offer to run an eligible slash command and execute it on a
one-word "yes" — including subcommands/args derived from the conversation
(e.g. /recurring then /recurring run 3).

Security: no new trigger path. A confirmed "yes" replays the *exact*
literal command shown, through the existing handle_message -> handle_command
pipeline, so every existing gate (channel filter, permission checks,
pause/passive state) still applies. No LLM at confirmation time.

- Per-skill opt-in via `chat_confirmable: true` in SKILL.md (mechanism, not
  a hardcoded list). Destructive commands are never eligible.
- Tight affirmatives only; channel-bound, single-use, 5-min-TTL pending slot;
  injection-char rejection; rejected/garbage markers stripped before display.
- Kill switch: chat.confirm_commands: false (prose suggestions stay on).
- Opt-in set: 23 read-only/reversible core skills.

Also fixes a pre-existing PERF401 lint error in recurring.py.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hard anti-fabrication guardrail

The model never emitted the SUGGEST_COMMAND marker and instead fabricated
actions ("je lance la mission… elle entre en Pending") because:

1. The catalog showed only each skill's FIRST command, so the force-run form
   (/recurring run <n>) was invisible — the model saw only /daily. Rebuild the
   catalog with a dedicated "Commands you can OFFER TO RUN" section listing each
   confirmable skill's representative command WITH its real usage string, so the
   model can construct the exact subcommand the human asked for.
2. The selection surfaced junk/dangerous non-⚡ commands and dropped key ones
   (/status wasn't even listed). Split into a runnable ⚡ section and a prose-only
   section.
3. The directive was weak ("you MAY offer") and buried mid-prompt. Add a hard,
   last-word guardrail in chat.md: the chat cannot run/queue/launch anything and
   must NEVER claim it did — only the marker + the human's "yes" runs anything.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… in validator

Silences the spurious "[config] unrecognized key 'chat'" startup warning. The
keys were already read correctly; this just teaches the validator their schema.

Also includes the refined chat.md anti-fabrication guardrail wording.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sukria

sukria commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

Tested locally. Works.

Capture d’écran 2026-06-07 à 14 52 15

@sukria

sukria commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

@sukria-koan0 review

@sukria sukria marked this pull request as ready for review June 7, 2026 12:54
@sukria-koan0

Copy link
Copy Markdown
Collaborator Author

@sukria: By reading the diff, I see that we lost the "queuing" mechanic here, koan will just spot the matchin...

Now implemented as confirm-to-run, not queue. Chat offers via SUGGEST_COMMAND: marker; footer shows literal command. A tight "yes" replays that exact string through handle_command — every existing gate applies, no LLM at execution time. Per-skill opt-in (chat_confirmable: true); destructive commands excluded. Footer wording is "Reply yes to run", not "queue", so no longer misleading.

@sukria-koan0

Copy link
Copy Markdown
Collaborator Author

PR Review — feat(chat): suggest matching slash commands from natural language

Solid confirm-to-run design; addresses prior feedback and the queuing concern. Merge-ready.

  • Security boundary is clean: command_confirm.py only validates + holds a pending offer; execution replays the literal command through the existing handle_command path, so all gates still apply.
  • Eligibility is correctly opt-in per skill (chat_confirmable: true), and destructive commands stay excluded.
  • Earlier review points resolved: narrowed catch now logs full traceback at error level; None-description fallback fixed; docs/config use the promoted /recurring resume.
  • Pending offer is channel-bound, single-use, TTL'd, and cleared on any non-affirmative — good defense in depth.
  • One micro edge case (peek/take race) noted as a suggestion, not blocking.


Checklist

  • No hardcoded secrets
  • Command replay validated against registry + opt-in flag
  • No bare except swallowing errors silently
  • Error handling returns safe fallback
  • Documentation consistent with code
  • Tests verify observable behavior not implementation

Silent Failure Analysis

🟡 **MEDIUM** — fallback value re-leaks marker on error (`koan/app/awake.py:478-481`)

Risk: On any failure during suggestion processing the raw, unstripped reply is returned, so a stray SUGGEST_COMMAND: marker leaks verbatim to the human instead of being scrubbed.

    except Exception:
        log("error", f"[chat] command-suggestion processing failed:\n{traceback.format_exc()}")
        return response

Fix: On exception, fall back to a marker-stripped reply (e.g. _MARKER_RE.sub("", response or "")) rather than the raw response.

🟡 **MEDIUM** — fallback value hides empty-strip result (`koan/app/awake.py:465-467`)

Risk: When the stripped reply is empty/whitespace, the or response fallback returns the original text including the SUGGEST_COMMAND: marker, defeating the strip that this disabled-path is meant to guarantee.

from app.command_confirm import _MARKER_RE
return _MARKER_RE.sub("", response or "").strip() or response

Fix: Drop the or response fallback (return the stripped string even if empty) so a stray marker can never reach the human when confirmation is disabled.


Automated review by Kōan (Claude · model opus) HEAD=17baab3 1 min 17s

@atoomic

atoomic commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

Note: Would be nice to also support telegram group for such interactions and be able to reply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants