diff --git a/CHANGELOG.md b/CHANGELOG.md index 44016bfc..2ede6699 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -24,6 +24,45 @@ Agents can propose **durable learnings** (conventions, preferences, facts, pitfa --- +## [0.8.8] - 2026-06-16 + +### Added — Anti-hallucination `enforce` mode (PR-A audit gate · PR-B disc refusal · PR-C beta) + +`AntiHallucMode::Enforce` previously behaved exactly like `warn`. PR-A gives it teeth on the **audit pipeline** — the surface that writes the most durable docs. + +- **Per-step citation gate** (`api/audit/anti_hallu_enforce.rs`, pure + unit-tested): in `enforce`, after an audit step's agent writes its `docs/` file, the file's formal `[src: …]` markers are mechanically re-linted against the real tree (`core::anti_halluc::analyze_roots`). A fabricated citation (path missing / line out of bounds / outside project / training-data) **re-runs the step** with a corrective addendum naming the broken citations (bounded by `MAX_ATTEMPTS = 3`, new `step_retry` SSE event). If they still don't resolve after the cap, the step fails → the run ends *Interrupted* (no validation discussion) instead of committing a doc with invented citations. +- **Auto-stamp `audit=""`** on every `curated="ai"` section of a clean step file (idempotent, deterministic, 0 tokens) — the date honestly reflects "verified conformant today". +- `off`/`warn` are unchanged: one attempt per step, gate inert. The gate's branching is a pure `decide(verdict, attempt, max) -> Pass|Retry|Fail` so it's unit-tested without a live agent. 12 new unit tests; clippy `-D warnings` clean. + +PR-B extends `enforce` to **discussions** (chat / batch / WF agent steps), at the runner chokepoint + the streaming finalize: + +- **Auto-attached `kronn-doc-author` skill.** When `enforce` and the agent's project carries a `docs/AGENTS.md`, the doc-authoring cheat-sheet (`kronn:section` markers + `[src:]` grammar) is injected inline so any agent that edits docs writes in the convention the lint accepts — even if the user never attached the skill. Idempotent (skipped when already in `skill_ids`), inert outside enforce. +- **Non-destructive P3 fail-fast.** When a finalized agent reply carries a fabricated `[src:]` citation, the message is **kept** (with its red pill) and a System note is appended (`⛔ Réponse refusée (enforce) : N citation(s) fabriquée(s) …`) so the human arbitrates a correction. No auto-retry — on a user disc the user decides. +- Both branches are pure, unit-tested policies in `core::anti_halluc` (`should_auto_attach_doc_author`, `enforce_refusal_needed`, `enforce_refusal_message`); 3 new tests. clippy `-D warnings` clean. +PR-C lifts the `enforce` mode out of preview: the Settings label is now **Strict (beta · 0.8.8)** (FR/EN/ES) and the help text + selection toast — which still claimed *"behaves like Warn until 0.8.8, write-refusal ships then"* — now describe what `enforce` actually does (audit step-retry → clean fail; disc reply kept but flagged). The enforce feedback is already visible through existing surfaces (the disc refusal renders as a System message; an exhausted audit gate surfaces as a `step_warning` in the audit recap). *Optional remaining polish: a live `step_retry` chip during an audit, and extending the existing checksum drift banner with the anti-hallu signals (audit date > 6 mo, unresolved `[src:]`).* + +### Fixed — Feasibility AutoPilot `run_tests` PHP verdict reported ERROR(harness) on a healthy suite + +The parent `run_tests` step mounted the project's `vendor/` conditionally, gated by a fragile host→container path back-substitution (`${vend/#…}`); when it mis-evaluated, `vendor/` was left unmounted → phpunit couldn't autoload → boot failure mis-classified as `ERROR(php harness)`. Now vendor is resolved by checking the **container** paths directly (worktree's own `vendor/` first, else borrow main's), and a genuinely absent `vendor/` is an honest **SKIP (run composer install)** instead of a scary ERROR. Also added `--colors=never` so the `Tests: N` / `Failures:` summary parse is ANSI-free. Verified live on front_euronews (3602 tests → PASS; filtered class → clean OK). +4 assertions on the existing `run_tests` template test. + +### Fixed — Sidebar message count inflated by System rows + +The "N msg" label in the discussion sidebar (`SwipeableDiscItem`) and on the dashboard `ProjectCard` showed the raw `message_count`, which counts tool-call breadcrumbs, cached-summary lines and the new enforce-refusal note (all `MessageRole::System`) — wildly higher than the real conversation length. The 0.8.7 fix had switched the unread *badge* to `non_system_message_count` (via `unseenBasis`) but the visible total label was missed. Both now use `unseenBasis(disc)` (the backend already exposes the System-excluding count via a subquery). Render-level regression test added. + +### Fixed — Auto-summary kept firing after being disabled in Settings + +`maybe_generate_summary` only checked the **per-disc** `summary_strategy`, which is frozen at creation from the global default. Turning auto-summary off in Settings only affected NEW discs, so older long threads (created when the default was `Auto`) kept summarising. The global `default_summary_strategy` is now a **master kill-switch**: `SummaryStrategy::auto_fires(global, disc)` returns false whenever the global is `Off`, regardless of the disc's frozen value; otherwise the per-disc strategy decides as before. Pure + unit-tested. + +### Fixed — Anti-hallucination: bare-filename citations no longer false-flagged + +A dominant source of false "unverified" amber pills: an agent that cited a file by **bare name + line** (`` `NewslettersManager.ts:107` ``) without its full path was flagged unverified even though the file exists — `verify_file_ref` only probed the path at each root's top level, so a nested file never resolved. + +- `verify_file_ref` now falls back to a **unique-basename walk** when a separator-less name doesn't resolve at root level: exactly one matching file in the tree → `Verified` (with line-bounds check + the resolved relative path shown in the pill detail); 2+ matches → stays unresolved but with an actionable *"ambiguous, cite the full path"* reason; 0 → `NotFound` as before. Full paths are unchanged. +- The walk reuses the `scanner` skip-list (`node_modules`, `vendor`, `target`, **`.kronn`** …) — skipping `.kronn` is load-bearing: its `worktrees/` hold full project copies that otherwise make every basename look ambiguous (real case: front_euronews had 11 copies of one file, 1 real). Multi-root (Isolated worktree + main) is first-root-wins, so a file present in both isn't double-counted as ambiguous. Bounded walk (caps at 60k entries → never a false unique on a partial scan). +- Verified live on the exact false positive (disc `d344b52b`): `NewslettersManager.ts:107` and `SocialLoginManager.ts:199` now resolve `Verified` against the real checkout. 8 new unit tests (incl. the `.kronn/worktrees` skip, ambiguity, multi-root, out-of-bounds, and the end-to-end inline-anchor case); clippy `-D warnings` clean. + +--- + ## [0.8.7] - 2026-05-28 ### Added — Big-ticket AutoPilot: multi-agent debate + per-task test→fix loop (2026-06-13) diff --git a/VERSION b/VERSION index 1e9b46b2..6201b5f7 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.8.7 +0.8.8 diff --git a/backend/Cargo.lock b/backend/Cargo.lock index f7d920dc..81b5521b 100644 --- a/backend/Cargo.lock +++ b/backend/Cargo.lock @@ -1403,7 +1403,7 @@ dependencies = [ [[package]] name = "kronn" -version = "0.8.7" +version = "0.8.8" dependencies = [ "aes-gcm", "anyhow", diff --git a/backend/Cargo.toml b/backend/Cargo.toml index 20a0db06..81efc367 100644 --- a/backend/Cargo.toml +++ b/backend/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "kronn" -version = "0.8.7" +version = "0.8.8" edition = "2021" description = "Self-hosted AI dev workflow control plane" license = "AGPL-3.0-only" diff --git a/backend/src/agents/runner.rs b/backend/src/agents/runner.rs index 5f87fbf2..6a7462dc 100644 --- a/backend/src/agents/runner.rs +++ b/backend/src/agents/runner.rs @@ -542,6 +542,28 @@ pub async fn start_agent_with_config(config: AgentStartConfig<'_>) -> Result) -> Result, +} + +impl CitationVerdict { + pub fn count(&self) -> usize { + self.fabricated.len() + } + pub fn is_clean(&self) -> bool { + self.fabricated.is_empty() + } +} + +/// What the per-step enforce gate decides after re-linting the written file. +/// Pure so the streaming generator's branching is unit-testable without a live +/// agent (the generator owns the IO: re-run, emit SSE, stamp). +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum GateDecision { + /// Citations clean — stamp `audit=` dates and finish the step. + Pass, + /// Fabricated citations, but attempts remain — re-run with corrective feedback. + Retry, + /// Fabricated citations and the attempt budget is spent — fail the step. + Fail, +} + +/// Decide the gate outcome from the lint verdict and where we are in the retry +/// budget. `attempt` is 1-based (the attempt that just produced this verdict). +pub fn decide(verdict: &CitationVerdict, attempt: usize, max_attempts: usize) -> GateDecision { + if verdict.is_clean() { + GateDecision::Pass + } else if attempt < max_attempts { + GateDecision::Retry + } else { + GateDecision::Fail + } +} + +/// Mechanically lint a written step file's content for fabricated formal +/// citations. `roots` are the project roots the `[src: file:line]` markers are +/// resolved against (the audit runs in the main checkout, so a single root). +pub fn lint_step_file(content: &str, roots: &[&Path]) -> CitationVerdict { + let report = anti_halluc::analyze_roots(content, roots); + let fabricated = report + .sources + .into_iter() + .filter(|s| s.status.is_fabricated()) + .collect(); + CitationVerdict { fabricated } +} + +/// Build the corrective prompt addendum re-injected on a retry. Names each +/// fabricated citation and the verdict so the agent can fix the reference or +/// drop the claim — it must NOT invent a new path to satisfy the linter. +pub fn corrective_feedback(file_label: &str, verdict: &CitationVerdict) -> String { + let mut out = String::from( + "## ⛔ Anti-hallucination gate (enforce mode) — fix before this step can pass\n\n", + ); + out.push_str(&format!( + "The file you just wrote for **{file_label}** contains {} formal `[src: …]` citation(s) \ +that do NOT resolve against the real codebase. A citation that points at a non-existent \ +path / out-of-bounds line / outside the project is treated as **fabricated** and blocks the audit.\n\n", + verdict.count() + )); + out.push_str("Fabricated citations:\n"); + for s in &verdict.fabricated { + out.push_str(&format!("- `[src: {}]` → {}\n", s.raw.trim(), s.detail.trim())); + } + out.push_str( + "\nFor EACH one: either correct it to a real `path:line` you have actually read, OR \ +remove the unverifiable claim entirely (do not weaken it into prose — drop it). \ +Do NOT invent a path just to pass the check. Re-write the file, then finish.\n", + ); + out +} + +/// Idempotently stamp `audit=""` on every `curated="ai"` section opener +/// in `content`. Returns `Some(new_content)` when something changed, `None` +/// when every `curated="ai"` marker already carries today's date (no write +/// needed). The audit just (re)generated this file, so today's date honestly +/// reflects "verified conformant today". +pub fn stamp_curated_audit_dates(content: &str, today: &str) -> Option { + let today_attr = format!("audit=\"{today}\""); + let mut changed = false; + let mut lines: Vec = content.lines().map(String::from).collect(); + + for line in &mut lines { + let trimmed = line.trim_start(); + if !trimmed.starts_with("") { + // No audit attr yet — insert one just before the closing marker. + line.insert_str(close, &format!(" {today_attr}")); + changed = true; + } + } + + if !changed { + return None; + } + let mut out = lines.join("\n"); + if content.ends_with('\n') && !out.ends_with('\n') { + out.push('\n'); + } + Some(out) +} + +#[cfg(test)] +mod tests { + use super::*; + use std::fs; + use tempfile::tempdir; + + #[test] + fn clean_file_yields_no_fabricated() { + let dir = tempdir().unwrap(); + fs::write(dir.path().join("real.rs"), "line1\nline2\nline3\n").unwrap(); + let content = "Stack uses real.rs [src: file: real.rs:2]."; + let verdict = lint_step_file(content, &[dir.path()]); + assert!(verdict.is_clean(), "verified citation must not be fabricated"); + } + + #[test] + fn nonexistent_path_is_fabricated() { + let dir = tempdir().unwrap(); + let content = "It lives in [src: file: does/not/exist.rs:10]."; + let verdict = lint_step_file(content, &[dir.path()]); + assert_eq!(verdict.count(), 1, "missing file → one fabricated citation"); + assert!(!verdict.is_clean()); + } + + #[test] + fn out_of_bounds_line_is_fabricated() { + let dir = tempdir().unwrap(); + fs::write(dir.path().join("short.rs"), "only one line\n").unwrap(); + let content = "See [src: file: short.rs:999]."; + let verdict = lint_step_file(content, &[dir.path()]); + assert_eq!(verdict.count(), 1, "out-of-bounds line → fabricated"); + } + + #[test] + fn decide_passes_on_clean_verdict() { + let clean = CitationVerdict::default(); + assert_eq!(decide(&clean, 1, MAX_ATTEMPTS), GateDecision::Pass); + // Clean always passes, even on the last attempt. + assert_eq!(decide(&clean, MAX_ATTEMPTS, MAX_ATTEMPTS), GateDecision::Pass); + } + + #[test] + fn decide_retries_while_budget_remains() { + let dir = tempdir().unwrap(); + let dirty = lint_step_file("X [src: file: nope.rs:1].", &[dir.path()]); + assert!(!dirty.is_clean()); + assert_eq!(decide(&dirty, 1, 3), GateDecision::Retry); + assert_eq!(decide(&dirty, 2, 3), GateDecision::Retry); + } + + #[test] + fn decide_fails_when_budget_exhausted() { + let dir = tempdir().unwrap(); + let dirty = lint_step_file("X [src: file: nope.rs:1].", &[dir.path()]); + assert_eq!(decide(&dirty, 3, 3), GateDecision::Fail); + // A single-attempt budget (warn/off would never call this) fails immediately. + assert_eq!(decide(&dirty, 1, 1), GateDecision::Fail); + } + + #[test] + fn corrective_feedback_names_each_broken_citation() { + let dir = tempdir().unwrap(); + let content = + "A [src: file: ghost.rs:1] and B [src: file: phantom.rs:2] are made up."; + let verdict = lint_step_file(content, &[dir.path()]); + let fb = corrective_feedback("docs/AGENTS.md", &verdict); + assert!(fb.contains("docs/AGENTS.md")); + assert!(fb.contains("ghost.rs:1")); + assert!(fb.contains("phantom.rs:2")); + assert!(fb.contains("enforce mode")); + // Must steer the agent away from inventing a path. + assert!(fb.to_lowercase().contains("do not invent")); + } + + #[test] + fn stamp_inserts_missing_audit_attr() { + let input = "\nBODY\n\n"; + let out = stamp_curated_audit_dates(input, "2026-06-14").expect("should change"); + assert!(out.contains("audit=\"2026-06-14\"")); + assert!(out.contains("curated=\"ai\"")); + // closing marker untouched + assert!(out.contains("")); + } + + #[test] + fn stamp_refreshes_stale_audit_date() { + let input = "\nB\n"; + let out = stamp_curated_audit_dates(input, "2026-06-14").expect("should change"); + assert!(out.contains("audit=\"2026-06-14\"")); + assert!(!out.contains("2026-01-01"), "stale date must be replaced"); + } + + #[test] + fn stamp_is_noop_when_already_today() { + let input = "\nB\n"; + assert_eq!(stamp_curated_audit_dates(input, "2026-06-14"), None); + } + + #[test] + fn stamp_ignores_human_sections() { + let input = "\nfree form\n"; + assert_eq!( + stamp_curated_audit_dates(input, "2026-06-14"), + None, + "human-curated sections are never stamped" + ); + } + + #[test] + fn stamp_preserves_trailing_newline() { + let with_nl = "\nB\n"; + assert!(stamp_curated_audit_dates(with_nl, "2026-06-14").unwrap().ends_with('\n')); + let no_nl = ""; + assert!(!stamp_curated_audit_dates(no_nl, "2026-06-14").unwrap().ends_with('\n')); + } +} diff --git a/backend/src/api/audit/full.rs b/backend/src/api/audit/full.rs index 1cd81411..e5d30f81 100644 --- a/backend/src/api/audit/full.rs +++ b/backend/src/api/audit/full.rs @@ -470,6 +470,13 @@ pub async fn full_audit( } }; + // 0.8.8 PR-A — read the anti-hallucination mode once for the whole run. + // In `enforce`, each step that writes a doc is gated on mechanical + // `[src:]` citation verification, with a bounded corrective retry; in + // `off`/`warn` the gate is inert and each step runs exactly once. + let enforce_mode = crate::core::anti_halluc::current_mode() + == crate::core::anti_halluc::AntiHallucMode::Enforce; + for (step_num, analysis_step) in steps.iter().enumerate() { // Check for cancellation before each step if audit_tracker.lock().map(|t| t.cancelled.contains(&project_id)).unwrap_or(false) { @@ -599,12 +606,32 @@ pub async fn full_audit( // live elapsed counter (computed client-side from the // step_started_at wallclock). let step_started_at = std::time::Instant::now(); + + // 0.8.8 PR-A — enforce-mode per-step retry loop. `off`/`warn` run a + // single attempt (`max_attempts == 1`) with the gate inert, so the + // behaviour is unchanged. In `enforce`, a step that wrote fabricated + // `[src:]` citations re-runs with a corrective addendum, bounded by + // `MAX_ATTEMPTS`. The terminal attempt falls through to `step_done` + // / DB finalize exactly once. + let max_attempts = if enforce_mode { + super::anti_hallu_enforce::MAX_ATTEMPTS + } else { + 1 + }; + let mut attempt: usize = 0; + let mut citation_feedback: Option = None; + 'attempts: loop { + attempt += 1; let mut step_tokens: u64 = 0; + let attempt_prompt = match &citation_feedback { + Some(fb) => format!("{full_prompt}\n\n{fb}"), + None => full_prompt.clone(), + }; match runner::start_agent_with_config(runner::AgentStartConfig { full_access: true, tier: crate::models::ModelTier::Reasoning, - ..runner::AgentStartConfig::new(&agent_type, &project_path_str, &full_prompt, &tokens) + ..runner::AgentStartConfig::new(&agent_type, &project_path_str, &attempt_prompt, &tokens) }).await { Ok(mut process) => { // Register the child PID for cancellation @@ -824,6 +851,82 @@ pub async fn full_audit( } } + // 0.8.8 PR-A — enforce-mode citation gate. Runs only when + // the step is otherwise successful and wrote a real file + // (the "REVIEW" pseudo-step writes nothing). Re-lints the + // written file's `[src:]` markers against the real tree. + if enforce_mode && success && analysis_step.target_file != "REVIEW" { + let target_path = project_path.join(analysis_step.target_file); + if let Ok(written) = std::fs::read_to_string(&target_path) { + let verdict = super::anti_hallu_enforce::lint_step_file( + &written, + &[&project_path], + ); + use super::anti_hallu_enforce::GateDecision; + match super::anti_hallu_enforce::decide(&verdict, attempt, max_attempts) { + GateDecision::Retry => { + // Re-run the step with a corrective addendum + // naming the fabricated citations. + tracing::warn!( + "Audit step {} ({}) enforce gate: {} fabricated citation(s), retry {}/{}", + step, file_label, verdict.count(), attempt + 1, max_attempts + ); + yield Event::default().event("step_retry").data( + serde_json::json!({ + "step": step, + "file": file_label, + "attempt": attempt, + "max_attempts": max_attempts, + "fabricated_count": verdict.count(), + "reason": "anti_hallu_fabricated_citations", + }).to_string() + ); + citation_feedback = Some( + super::anti_hallu_enforce::corrective_feedback(file_label, &verdict), + ); + continue 'attempts; + } + GateDecision::Fail => { + // Retries exhausted — fail the step so the run + // ends Interrupted (no validation disc) rather + // than committing a doc with fabricated citations. + success = false; + let reason = format!( + "{} fabricated `[src:]` citation(s) still present after {} attempts (enforce mode)", + verdict.count(), max_attempts + ); + tracing::warn!("Audit step {} ({}) enforce gate failed: {}", step, file_label, reason); + yield Event::default().event("step_warning").data( + serde_json::json!({ + "step": step, + "file": file_label, + "reason": reason.clone(), + "repaired_from_template": false, + }).to_string() + ); + warning = Some(crate::api::audit::validation::StepValidationWarning { + reason, + repaired: false, + }); + } + GateDecision::Pass => { + // Clean citations — stamp `audit=""` on + // any curated="ai" section (deterministic, 0 tokens). + if let Some(stamped) = + super::anti_hallu_enforce::stamp_curated_audit_dates(&written, &today) + { + if let Err(e) = std::fs::write(&target_path, &stamped) { + tracing::warn!( + "Audit step {} ({}): failed to stamp audit dates: {}", + step, file_label, e + ); + } + } + } + } + } + } + let step_done = serde_json::json!({ "step": step, "success": success, @@ -899,6 +1002,10 @@ pub async fn full_audit( yield Event::default().event("step_error").data(err.to_string()); } } + // Terminal attempt (success, exhausted retries, or start failure). + // The retry path `continue`s before reaching here. + break 'attempts; + } // 'attempts: per-step enforce retry loop } // ── Auto-detect project skills ── diff --git a/backend/src/api/audit/mod.rs b/backend/src/api/audit/mod.rs index 8a8cb081..d16a8d32 100644 --- a/backend/src/api/audit/mod.rs +++ b/backend/src/api/audit/mod.rs @@ -12,6 +12,7 @@ use std::pin::Pin; use axum::response::sse::Event; use futures::Stream; +pub mod anti_hallu_enforce; pub mod anti_hallu_step; pub mod briefing; pub mod drift; diff --git a/backend/src/api/discussions/orchestration.rs b/backend/src/api/discussions/orchestration.rs index 3126ca66..aad97085 100644 --- a/backend/src/api/discussions/orchestration.rs +++ b/backend/src/api/discussions/orchestration.rs @@ -569,15 +569,20 @@ pub(super) async fn maybe_generate_summary( _ => return, }; - // Per-disc strategy override (added 2026-05-09 — `OnDemand` and `Off` - // suppress the auto-fire entirely; `Auto` keeps the historical - // threshold-based behaviour). The cache itself stays around so an - // explicit summarise call (planned tool surface) can still write to - // it. - if !matches!(disc.summary_strategy, crate::models::SummaryStrategy::Auto) { + // Auto-fire gate. The GLOBAL Settings default is a master kill-switch + // (global `Off` suppresses everywhere — fixes "disabled in config but old + // long discs keep summarising": the default only seeds NEW discs, so older + // rows kept a frozen per-disc `Auto`). Otherwise the per-disc strategy + // decides (`OnDemand`/`Off` suppress; `Auto` keeps the threshold behaviour). + // The cache stays around either way so an explicit summarise call can write. + let global_default = { + let cfg = state.config.read().await; + cfg.server.default_summary_strategy + }; + if !crate::models::SummaryStrategy::auto_fires(global_default, disc.summary_strategy) { tracing::debug!( - "Summary auto-fire disabled for {} (strategy: {:?})", - discussion_id, disc.summary_strategy + "Summary auto-fire suppressed for {} (global: {:?}, disc: {:?})", + discussion_id, global_default, disc.summary_strategy ); return; } diff --git a/backend/src/api/discussions/streaming.rs b/backend/src/api/discussions/streaming.rs index a6847c17..9b4974bc 100644 --- a/backend/src/api/discussions/streaming.rs +++ b/backend/src/api/discussions/streaming.rs @@ -935,6 +935,49 @@ pub(crate) async fn make_agent_stream( tracing::error!("Failed to save agent message: {e}"); } + // 0.8.8 PR-B — enforce-mode P3 fail-fast (non-destructive). The + // agent message above is kept (with its red pill); when it + // carries a fabricated `[src:]` citation, append a System refusal + // so the human arbitrates a correction. No auto-retry — on a user + // disc the user decides. Inert outside enforce / when clean. + let fabricated_count = agent_msg + .lint_report + .as_ref() + .map(|r| r.fabricated_count) + .unwrap_or(0); + if crate::core::anti_halluc::enforce_refusal_needed( + crate::core::anti_halluc::current_mode(), + fabricated_count, + ) { + let refusal = DiscussionMessage { + lint_report: None, + id: Uuid::new_v4().to_string(), + role: MessageRole::System, + content: crate::core::anti_halluc::enforce_refusal_message(fabricated_count), + agent_type: None, + timestamp: Utc::now(), + tokens_used: 0, + auth_mode: None, + model_tier: None, + cost_usd: None, + author_pseudo: None, + author_avatar_email: None, + source_msg_id: None, + duration_ms: None, + }; + let did_ref = disc_id.clone(); + let m = refusal.clone(); + if let Err(e) = state.db.with_conn(move |conn| { + crate::db::discussions::insert_message(conn, &did_ref, &m) + }).await { + tracing::warn!("Failed to insert enforce refusal system message: {e}"); + } + tracing::info!( + "enforce P3: disc {} agent reply has {} fabricated citation(s) — refusal surfaced", + disc_id, fabricated_count + ); + } + // ── Slash-marker fallback (Vibe / Ollama) ────────────── // Agents that don't speak MCP can request introspection // by emitting `KRONN:DISC_*` lines in their reply. Scan diff --git a/backend/src/core/anti_halluc.rs b/backend/src/core/anti_halluc.rs index c51bf0cb..08e01ea6 100644 --- a/backend/src/core/anti_halluc.rs +++ b/backend/src/core/anti_halluc.rs @@ -104,6 +104,42 @@ pub fn current_mode() -> AntiHallucMode { *cell().read().unwrap() } +// ─── P3 (0.8.8 PR-B) — enforce-mode disc policy helpers ────────────────── +// +// Pure decision functions so the runner / streaming wiring is unit-testable +// without spawning a live agent (the call sites supply the FS check + mode). + +/// Whether the enforce gate should auto-attach the `kronn-doc-author` skill to +/// an agent run: only in enforce, only when the project carries a +/// `docs/AGENTS.md`, and only if the skill isn't already attached (idempotent). +pub fn should_auto_attach_doc_author( + mode: AntiHallucMode, + skill_ids: &[String], + project_has_agents_md: bool, +) -> bool { + mode == AntiHallucMode::Enforce + && project_has_agents_md + && !skill_ids.iter().any(|s| s == "kronn-doc-author") +} + +/// Whether a finalized agent message must get the enforce P3 refusal note: only +/// in enforce, and only when ≥1 formal `[src:]` citation is high-confidence +/// fabricated. +pub fn enforce_refusal_needed(mode: AntiHallucMode, fabricated_count: u32) -> bool { + mode == AntiHallucMode::Enforce && fabricated_count > 0 +} + +/// The System message surfaced when a disc reply is refused under enforce. +/// Non-destructive: the agent message stays; this note tells the human to get +/// it corrected before relying on it (no auto-retry — the user arbitrates). +pub fn enforce_refusal_message(fabricated_count: u32) -> String { + format!( + "⛔ Réponse refusée (anti-hallucination · enforce) : {fabricated_count} citation(s) `[src: …]` \ +fabriquée(s) — fichier ou ligne introuvable / hors projet. La réponse est conservée mais NON validée : \ +demandez à l'agent de corriger ou retirer ces citations avant de vous en servir." + ) +} + // ─── P1 — directive ────────────────────────────────────────────────────── /// The sourcing-discipline directive injected into agent prompts. @@ -981,6 +1017,78 @@ fn line_bounds_status(candidate: &Path, line_spec: Option<(usize, usize)>) -> (S } } +/// Heavy / generated / Kronn-internal dirs never descended when resolving a +/// bare basename. Mirrors `scanner::scan_kronn_markers`'s skip list. Skipping +/// `.kronn` is load-bearing: its `worktrees/` hold FULL project copies, so +/// without it every basename looks ambiguous — the exact false-ambiguity that +/// masked the real unique file in front_euronews (11 copies, 1 real). +const BASENAME_WALK_SKIP_DIRS: &[&str] = &[ + "node_modules", "vendor", "target", ".git", "dist", "build", ".next", + ".kronn", ".kronn-worktrees", ".venv", "__pycache__", +]; + +/// Outcome of walking the tree for a bare basename (no path separator). +enum BasenameResolution { + /// Exactly one file under the (first matching) root carries this basename. + Unique(PathBuf), + /// Two or more candidates — too ambiguous to green-light. + Ambiguous(usize), + /// No file with this basename, or the walk was capped before deciding. + NotFound, +} + +/// Resolve a bare basename (e.g. `Foo.ts`) to a UNIQUE file in the project +/// tree, so an agent that cites `Foo.ts:42` without the full path stops being a +/// false "unverified". Roots are tried IN ORDER (Isolated worktree before the +/// main checkout, mirroring `verify_file_ref`), and the FIRST root that holds +/// any match decides — so the same file present in both a worktree root and the +/// main root is not double-counted as ambiguous. Heavy / `.kronn` dirs are +/// pruned. Bounded: past `MAX_WALK_ENTRIES` we bail to `NotFound` rather than +/// risk a false `Unique` on a partial scan. +fn resolve_unique_basename(basename: &str, roots: &[&Path]) -> BasenameResolution { + const MAX_WALK_ENTRIES: usize = 60_000; + for root in roots { + let mut found: Option = None; + let mut count = 0usize; + let mut scanned = 0usize; + let mut capped = false; + let walker = walkdir::WalkDir::new(root) + .into_iter() + .filter_entry(|e| { + // Prune skip dirs by name (depth > 0 so the root itself stays). + if e.depth() > 0 && e.file_type().is_dir() { + if let Some(name) = e.file_name().to_str() { + return !BASENAME_WALK_SKIP_DIRS.contains(&name); + } + } + true + }); + for entry in walker.filter_map(|e| e.ok()) { + scanned += 1; + if scanned > MAX_WALK_ENTRIES { + capped = true; + break; + } + if entry.file_type().is_file() && entry.file_name().to_str() == Some(basename) { + count += 1; + if count == 1 { + found = Some(entry.path().to_path_buf()); + } else { + return BasenameResolution::Ambiguous(count); // ≥2 in this root + } + } + } + if capped { + return BasenameResolution::NotFound; // don't claim a unique on a partial scan + } + if let Some(p) = found { + return BasenameResolution::Unique(p); // exactly one in this root → decided + } + // 0 matches in this root → fall through to the next root. + } + BasenameResolution::NotFound +} + /// Verify a file reference against one or more candidate roots. The FIRST root /// where the (jailed) relative path exists wins — this is how an Isolated /// discussion's git worktree is tried before the main checkout, fixing the @@ -991,6 +1099,8 @@ fn line_bounds_status(candidate: &Path, line_spec: Option<(usize, usize)>) -> (S /// - Relative path → lexical jail + symlink-escape re-check under EACH root. /// `../` escape in every root → OutsideProject; jailed-but-absent everywhere /// → NotFound. The SSRF/jail guarantee is unchanged (applied per-root). +/// - Bare basename (no separator) unresolved at root level → unique-match walk +/// (`resolve_unique_basename`) before NotFound. fn verify_file_ref(reference: &str, roots: &[&Path]) -> (SourceStatus, String) { let reference = clean_reference(reference); if reference.is_empty() { @@ -1048,6 +1158,34 @@ fn verify_file_ref(reference: &str, roots: &[&Path]) -> (SourceStatus, String) { "relative path escapes the project root via ../".into(), ); } + + // Bare basename (no separator) that didn't resolve at any root level — it + // very likely lives deeper in the tree. Walk for a UNIQUE basename match + // before giving up. This kills the dominant false positive: an agent citing + // `Foo.ts:42` without the full path was flagged "unverified" even though the + // file exists. Only a UNIQUE match is green-lit; 2+ stays unresolved (we + // never claim a fact we can't pin), but with an actionable reason. + if !path_str.is_empty() && !path_str.contains('/') && !path_str.contains('\\') { + match resolve_unique_basename(path_str, roots) { + BasenameResolution::Unique(found) => { + let (status, detail) = line_bounds_status(&found, line_spec); + let shown = roots + .iter() + .find_map(|r| found.strip_prefix(r).ok()) + .unwrap_or(found.as_path()) + .display(); + return (status, format!("{detail} — resolved by unique basename → {shown}")); + } + BasenameResolution::Ambiguous(n) => { + return ( + SourceStatus::NotFound, + format!("bare name `{path_str}` matches {n} files — ambiguous, cite the full path"), + ); + } + BasenameResolution::NotFound => {} + } + } + (SourceStatus::NotFound, format!("file not found: {}", path_str)) } @@ -1255,6 +1393,43 @@ mod tests { assert!(AntiHallucMode::Enforce.is_active()); } + // ── PR-B enforce-mode disc helpers ──────────────────────────────── + + #[test] + fn auto_attach_doc_author_only_in_enforce_with_agents_md() { + let none: Vec = vec![]; + // Enforce + project has AGENTS.md + not already attached → attach. + assert!(should_auto_attach_doc_author(AntiHallucMode::Enforce, &none, true)); + // Wrong mode → never. + assert!(!should_auto_attach_doc_author(AntiHallucMode::Warn, &none, true)); + assert!(!should_auto_attach_doc_author(AntiHallucMode::Off, &none, true)); + // No docs/AGENTS.md → never (nothing to discipline against). + assert!(!should_auto_attach_doc_author(AntiHallucMode::Enforce, &none, false)); + // Idempotent: already attached → don't duplicate. + let attached = vec!["rust".to_string(), "kronn-doc-author".to_string()]; + assert!(!should_auto_attach_doc_author(AntiHallucMode::Enforce, &attached, true)); + } + + #[test] + fn enforce_refusal_only_when_enforce_and_fabricated() { + assert!(enforce_refusal_needed(AntiHallucMode::Enforce, 1)); + assert!(enforce_refusal_needed(AntiHallucMode::Enforce, 7)); + // No fabricated citations → no refusal even in enforce. + assert!(!enforce_refusal_needed(AntiHallucMode::Enforce, 0)); + // Warn/off never refuse, regardless of count. + assert!(!enforce_refusal_needed(AntiHallucMode::Warn, 3)); + assert!(!enforce_refusal_needed(AntiHallucMode::Off, 3)); + } + + #[test] + fn refusal_message_states_count_and_non_destructive() { + let msg = enforce_refusal_message(2); + assert!(msg.contains('2')); + assert!(msg.to_lowercase().contains("refus")); + // Must convey the message is KEPT (non-destructive), not deleted. + assert!(msg.contains("conservée")); + } + #[test] fn is_valid_mode_accepts_only_three() { for ok in ["off", "warn", "enforce", "OFF", " Warn "] { @@ -1787,6 +1962,107 @@ mod tests { std::fs::remove_dir_all(&root).ok(); } + // ── Bare-basename resolution (kills the "Foo.ts:42 unverified" FP) ── + + /// Make a fresh temp project root and run `f` to populate it. Returns root. + fn temp_root_with(f: impl FnOnce(&Path)) -> PathBuf { + let mut d = std::env::temp_dir(); + d.push(format!("kronn_basename_{}", uuid::Uuid::new_v4())); + std::fs::create_dir_all(&d).unwrap(); + f(&d); + d + } + + fn write_file(root: &Path, rel: &str, lines: usize) { + let p = root.join(rel); + std::fs::create_dir_all(p.parent().unwrap()).unwrap(); + std::fs::write(p, "x\n".repeat(lines)).unwrap(); + } + + #[test] + fn bare_basename_resolves_to_unique_nested_file() { + // The core fix: a bare `Widget.ts:2` cited without its full path, living + // deep in the tree, now verifies instead of showing "unverified". + let root = temp_root_with(|r| write_file(r, "app/assets/ts/Services/Widget.ts", 3)); + let (status, detail) = verify_file_ref("Widget.ts:2", &[&root]); + assert_eq!(status, SourceStatus::Verified, "{detail}"); + assert!(detail.contains("unique basename"), "{detail}"); + std::fs::remove_dir_all(&root).ok(); + } + + #[test] + fn bare_basename_skips_dot_kronn_worktree_copies() { + // The real front_euronews case: 1 real file + a copy under + // `.kronn/worktrees/`. Without the skip the basename looks ambiguous; + // with it, the main copy is the unique match → Verified. + let root = temp_root_with(|r| { + write_file(r, "app/assets/ts/Widget.ts", 3); + write_file(r, ".kronn/worktrees/wt-abc/app/assets/ts/Widget.ts", 3); + write_file(r, "node_modules/pkg/Widget.ts", 3); + }); + let (status, _d) = verify_file_ref("Widget.ts:1", &[&root]); + assert_eq!(status, SourceStatus::Verified, "{_d}"); + std::fs::remove_dir_all(&root).ok(); + } + + #[test] + fn bare_basename_two_real_copies_stay_ambiguous() { + // Two genuine copies (NOT under a skip dir) → we refuse to guess. + let root = temp_root_with(|r| { + write_file(r, "a/Widget.ts", 3); + write_file(r, "b/Widget.ts", 3); + }); + let (status, detail) = verify_file_ref("Widget.ts:1", &[&root]); + assert_eq!(status, SourceStatus::NotFound); + assert!(detail.contains("ambiguous"), "{detail}"); + std::fs::remove_dir_all(&root).ok(); + } + + #[test] + fn bare_basename_absent_is_not_found() { + let root = temp_root_with(|r| write_file(r, "app/Other.ts", 3)); + let (status, _d) = verify_file_ref("Widget.ts:1", &[&root]); + assert_eq!(status, SourceStatus::NotFound); + std::fs::remove_dir_all(&root).ok(); + } + + #[test] + fn bare_basename_unique_but_line_out_of_bounds() { + // Resolved by name, but the cited line is past EOF → honest OutOfBounds. + let root = temp_root_with(|r| write_file(r, "deep/dir/Widget.ts", 2)); + let (status, _d) = verify_file_ref("Widget.ts:99", &[&root]); + assert_eq!(status, SourceStatus::OutOfBounds, "{_d}"); + std::fs::remove_dir_all(&root).ok(); + } + + #[test] + fn bare_basename_same_file_in_two_roots_is_not_ambiguous() { + // Isolated-disc shape: roots = [worktree, main], same file in both. + // First root wins (mirrors exact-path semantics) → not double-counted. + let wt = temp_root_with(|r| write_file(r, "app/Widget.ts", 3)); + let main = temp_root_with(|r| write_file(r, "app/Widget.ts", 3)); + let (status, _d) = verify_file_ref("Widget.ts:1", &[&wt, &main]); + assert_eq!(status, SourceStatus::Verified, "{_d}"); + std::fs::remove_dir_all(&wt).ok(); + std::fs::remove_dir_all(&main).ok(); + } + + #[test] + fn inline_bare_basename_anchor_no_longer_unverified() { + // End-to-end via analyze_roots: the exact disc d344b52b false positive. + // A backticked bare anchor `Widget.ts:2` for a nested file used to land + // in `unverified_count`; it now resolves green. + let root = temp_root_with(|r| write_file(r, "app/assets/ts/Services/Widget.ts", 5)); + let report = analyze_roots("See `Widget.ts:2` for the logic.", &[&root]); + assert_eq!(report.unverified_count, 0, "should not be a soft-amber FP"); + assert!( + report.sources.iter().any(|s| s.status == SourceStatus::Verified), + "the inline anchor must resolve to a Verified source: {:?}", + report.sources + ); + std::fs::remove_dir_all(&root).ok(); + } + #[test] fn verify_relative_path_traversal_is_jailed() { // Relative paths still go through the lexical jail under `root`, diff --git a/backend/src/models/discussions.rs b/backend/src/models/discussions.rs index e2482cf6..32d0b68f 100644 --- a/backend/src/models/discussions.rs +++ b/backend/src/models/discussions.rs @@ -182,6 +182,49 @@ pub enum SummaryStrategy { Off, } +impl SummaryStrategy { + /// Whether the background auto-summary should fire, given the GLOBAL default + /// (`ServerConfig::default_summary_strategy`, the Settings toggle) and THIS + /// disc's stored strategy. + /// + /// The global `Off` is a **master kill-switch**: turning auto-summary off in + /// Settings suppresses it everywhere, including older discs whose per-disc + /// strategy was frozen to `Auto` at creation (the global default is only + /// applied to NEW discs, so changing it never rewrote existing rows — the + /// "I disabled it but long discs keep summarising" bug). Otherwise the + /// per-disc strategy decides, and only `Auto` auto-fires. + pub fn auto_fires(global_default: SummaryStrategy, disc: SummaryStrategy) -> bool { + if matches!(global_default, SummaryStrategy::Off) { + return false; + } + matches!(disc, SummaryStrategy::Auto) + } +} + +#[cfg(test)] +mod summary_strategy_tests { + use super::SummaryStrategy::{Auto, Off, OnDemand}; + use super::SummaryStrategy; + + #[test] + fn global_off_is_a_master_kill_switch() { + // The reported bug: global Off must suppress even an old disc frozen to Auto. + assert!(!SummaryStrategy::auto_fires(Off, Auto)); + assert!(!SummaryStrategy::auto_fires(Off, OnDemand)); + assert!(!SummaryStrategy::auto_fires(Off, Off)); + } + + #[test] + fn per_disc_decides_when_global_is_not_off() { + // Global Auto (or OnDemand) → the per-disc strategy is honoured. + assert!(SummaryStrategy::auto_fires(Auto, Auto)); + assert!(!SummaryStrategy::auto_fires(Auto, Off)); + assert!(!SummaryStrategy::auto_fires(Auto, OnDemand)); + assert!(SummaryStrategy::auto_fires(OnDemand, Auto)); + assert!(!SummaryStrategy::auto_fires(OnDemand, Off)); + } +} + #[derive(Debug, Clone, PartialEq, Serialize, Deserialize, TS)] #[ts(export)] pub enum MessageRole { diff --git a/backend/src/skills/workflow-architect.md b/backend/src/skills/workflow-architect.md index fa7cc4ff..03735de3 100644 --- a/backend/src/skills/workflow-architect.md +++ b/backend/src/skills/workflow-architect.md @@ -139,6 +139,10 @@ When you need to run the same task on N items in parallel (e.g. "review each PR" } ``` +**Item shape → QP variables.** `batch_items_from` may resolve to either: +- **An array of scalars** (`["EW-1","EW-2"]`) — each value fills the QP's **first** variable and doubles as the discussion title (legacy single-var fan-out). +- **An array of objects** (`[{"id":"EW-1","summary":"…","descriptionWiki":"…"}, …]`) — each object's keys map onto the QP's `{{var}}` placeholders **by name** (multi-variable, identical to the MCP `qp_batch_run` path). The disc title is the first present & non-empty of `_title` / `id` / `key` / `number` (reserved keys; `_title` is title-only and not injected as a variable). This is how you feed a multi-variable Quick Prompt (e.g. a triage QP taking `id` + `descriptionWiki` + `summary` + `status` + `parentKey` + `labels`) from pre-fetched data — **0 tokens** spent shaping it, no per-child MCP fetch. JSONPath can't rename/restructure keys, so produce the flat objects with an upstream `Exec` (`python3`/`jq`) step whose object keys exactly match the QP variable names, then point `batch_items_from` at `{{steps..data.stdout}}`. + ### 6. `BatchApiCall` — fan out an API call over a list (0 tokens) The mechanical counterpart of `BatchQuickPrompt`: same fan-out semantics, but **each child fires a templated HTTP request**, not an LLM run. Use this whenever the user wants to "create N tickets", "post N comments", "update N statuses", "test 8 sub-domains" — anything that's the same call with varying inputs. **Zero tokens consumed**, parallel HTTP capped by `batch_concurrent_limit` (default 5, max 20). The aggregated envelope reports per-item status so a downstream Agent step can correlate inputs with outcomes (e.g. setting `blocks` links between freshly-created tickets). @@ -401,7 +405,7 @@ A workflow is created via `POST /api/workflows` with this JSON structure: | Field | Type | Description | |-------|------|-------------| | `batch_quick_prompt_id` | string | ID of the Quick Prompt template to fan out | -| `batch_items_from` | string | Template resolving to a list (`{{steps.fetch.data}}` or raw text) | +| `batch_items_from` | string | Template resolving to a list. **Scalars** (`["EW-1","EW-2"]`) fill the QP's first variable + become the disc title. **Objects** (`[{"id":"EW-1","summary":"…"}, …]`) map keys → QP variables **by name**; title from `_title`/`id`/`key`/`number`. Lets you drive a multi-variable QP from pre-fetched data (reshape with an upstream `Exec` step, point at `{{steps.X.data.stdout}}`). | | `batch_wait_for_completion` | boolean | Default `true` — workflow waits for all children before next step | | `batch_max_items` | number | Cap (default 50). Refuses to spawn more. | | `batch_workspace_mode` | string | `"Direct"` (default, all share main worktree) or `"Isolated"` (per-disc git worktree — required if children write code in parallel; needs `project_id`) | diff --git a/backend/src/workflows/big_ticket_template.rs b/backend/src/workflows/big_ticket_template.rs index 34710f99..bebf4add 100644 --- a/backend/src/workflows/big_ticket_template.rs +++ b/backend/src/workflows/big_ticket_template.rs @@ -701,20 +701,32 @@ fn build_run_tests_step() -> WorkflowStep { "compose=''; for c in \"$main/docker-compose.yml\" \"$main/docker-compose.yaml\" \"$main/compose.yml\"; do [ -f \"$c\" ] && { compose=\"$c\"; break; }; done", "if [ -n \"$phpdir\" ] && [ -n \"$compose\" ] && command -v docker >/dev/null 2>&1; then", " svc=\"$(grep -oE '^ [a-zA-Z0-9_-]+:' \"$compose\" | tr -d ' :' | grep -iE '^php' | head -1)\"; [ -z \"$svc\" ] && svc='php'", - " sub=\"${phpdir#./}\"", - " if [ \"$sub\" = '.' ] || [ -z \"$sub\" ]; then mnt=\"$(hosttr \"$wt\")\"; vend=\"$(hosttr \"$main\")/vendor\"; else mnt=\"$(hosttr \"$wt\")/$sub\"; vend=\"$(hosttr \"$main\")/$sub/vendor\"; fi", - " echo \"→ PHP via docker compose service '$svc' (worktree mounted, main vendor borrowed)\"", - " vmount=''; [ -d \"${vend/#${KRONN_HOST_HOME:-/host-home}//host-home}\" ] 2>/dev/null && vmount=\"-v $vend:/app/vendor\"", - " docker compose -f \"$compose\" run --rm --no-deps -T -v \"$mnt:/app\" $vmount -w /app \"$svc\" vendor/bin/phpunit -c phpunit.xml.dist >/tmp/php.out 2>&1", - " rc=$?; tail -20 /tmp/php.out", - " # classify: a phpunit summary line ('Tests: N') means the suite RAN —", - " # rc!=0 with a summary = real failures (FAIL); rc!=0 WITHOUT a summary", - " # = phpunit couldn't boot (harness/env error). run-11b: 176 real PHP", - " # failures were mis-tagged ERROR by a too-loose substring match.", - " if [ $rc -eq 0 ]; then php_v='PASS'", - " elif grep -qE 'Tests: [0-9]+' /tmp/php.out; then fails=\"$(grep -oE '(Failures|Errors): [0-9]+' /tmp/php.out | paste -sd, -)\"; php_v=\"FAIL($fails)\"", - " elif grep -qE '(No tests executed|Cannot open|could not open|Fatal error|Class .* not found|bootstrap)' /tmp/php.out; then php_v='ERROR(php harness — not a code failure)'", - " else php_v='FAIL'; fi", + " sub=\"${phpdir#./}\"; [ \"$sub\" = '.' ] && sub=''", + " base=''; [ -n \"$sub\" ] && base=\"/$sub\"", + " mnt=\"$(hosttr \"$wt\")$base\"", + " # vendor: prefer the worktree's own, else borrow main's. Check the", + " # CONTAINER paths directly ($wt/$main come from git inside the Kronn", + " # container) — the previous host→container back-substitution was", + " # fragile and, when it left vendor unmounted, phpunit failed to boot", + " # and got mis-tagged ERROR(harness). A truly absent vendor is now an", + " # honest SKIP, not a scary ERROR.", + " vend=''", + " if [ -d \"$wt$base/vendor\" ]; then vend=\"$(hosttr \"$wt\")$base/vendor\"", + " elif [ -d \"$main$base/vendor\" ]; then vend=\"$(hosttr \"$main\")$base/vendor\"; fi", + " if [ -z \"$vend\" ]; then php_v='SKIP(no vendor/ — run composer install in the project)'", + " else", + " echo \"→ PHP via docker compose service '$svc' (worktree mounted, vendor: $vend)\"", + " docker compose -f \"$compose\" run --rm --no-deps -T -v \"$mnt:/app\" -v \"$vend:/app/vendor\" -w /app \"$svc\" vendor/bin/phpunit -c phpunit.xml.dist --colors=never >/tmp/php.out 2>&1", + " rc=$?; tail -20 /tmp/php.out", + " # classify: a phpunit summary line ('Tests: N') means the suite RAN —", + " # rc!=0 with a summary = real failures (FAIL); rc!=0 WITHOUT a summary", + " # = phpunit couldn't boot (harness/env error). --colors=never keeps", + " # the summary parse free of ANSI codes.", + " if [ $rc -eq 0 ]; then php_v='PASS'", + " elif grep -qE 'Tests: [0-9]+' /tmp/php.out; then fails=\"$(grep -oE '(Failures|Errors): [0-9]+' /tmp/php.out | paste -sd, -)\"; php_v=\"FAIL($fails)\"", + " elif grep -qE '(No tests executed|Cannot open|could not open|Fatal error|Class .* not found|bootstrap)' /tmp/php.out; then php_v='ERROR(php harness — not a code failure)'", + " else php_v='FAIL'; fi", + " fi", "else", " php_v='SKIP(no dockerized php stack at repo root — run `make test` in the project)'", "fi", @@ -1209,6 +1221,11 @@ mod tests { assert!(script.contains("vendor/bin/phpunit -c phpunit.xml.dist"), "PHP suite actually runs"); assert!(script.contains("hosttr"), "container→host path translation for bind mounts"); assert!(script.contains("no dockerized php stack"), "honest SKIP when no stack — never a false FAIL"); + // 0.8.8 fignolage — robust PHP verdict: + assert!(script.contains("--colors=never"), "ANSI-free phpunit output → reliable summary parse"); + assert!(script.contains("composer install"), "absent vendor → honest SKIP, not a scary ERROR(harness)"); + assert!(script.contains("$wt$base/vendor"), "vendor resolved via CONTAINER path (worktree first) — no fragile host→container back-substitution"); + assert!(!script.contains("${vend/#"), "the fragile parameter back-substitution is gone"); assert!(!script.contains("no php runtime in the Kronn container"), "no longer installs/needs local php"); assert!(script.contains("TEST VERDICT"), "per-suite verdict for the PR"); } diff --git a/desktop/package.json b/desktop/package.json index 71de667d..2ab05f4f 100644 --- a/desktop/package.json +++ b/desktop/package.json @@ -1,6 +1,6 @@ { "name": "kronn-desktop", - "version": "0.8.7", + "version": "0.8.8", "private": true, "scripts": { "dev": "tauri dev", diff --git a/desktop/src-tauri/Cargo.lock b/desktop/src-tauri/Cargo.lock index b0bb92d3..1427c470 100644 --- a/desktop/src-tauri/Cargo.lock +++ b/desktop/src-tauri/Cargo.lock @@ -2298,7 +2298,7 @@ dependencies = [ [[package]] name = "kronn" -version = "0.8.7" +version = "0.8.8" dependencies = [ "aes-gcm", "anyhow", @@ -2342,7 +2342,7 @@ dependencies = [ [[package]] name = "kronn-desktop" -version = "0.8.7" +version = "0.8.8" dependencies = [ "anyhow", "axum", diff --git a/desktop/src-tauri/Cargo.toml b/desktop/src-tauri/Cargo.toml index b23b8f4f..f58a4349 100644 --- a/desktop/src-tauri/Cargo.toml +++ b/desktop/src-tauri/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "kronn-desktop" -version = "0.8.7" +version = "0.8.8" edition = "2021" description = "Kronn Desktop — Self-hosted AI coding agent control plane" license = "AGPL-3.0-only" diff --git a/desktop/src-tauri/tauri.conf.json b/desktop/src-tauri/tauri.conf.json index 5af07e26..12e8fd78 100644 --- a/desktop/src-tauri/tauri.conf.json +++ b/desktop/src-tauri/tauri.conf.json @@ -1,7 +1,7 @@ { "$schema": "https://raw.githubusercontent.com/tauri-apps/tauri/dev/crates/tauri-config-schema/schema.json", "productName": "Kronn", - "version": "0.8.7", + "version": "0.8.8", "identifier": "com.kronn.desktop", "build": { "frontendDist": "../../frontend/dist", diff --git a/frontend/package.json b/frontend/package.json index 9c252a88..3ca1ebb3 100644 --- a/frontend/package.json +++ b/frontend/package.json @@ -1,6 +1,6 @@ { "name": "kronn-frontend", - "version": "0.8.7", + "version": "0.8.8", "private": true, "type": "module", "engines": { diff --git a/frontend/src/components/ProjectCard.tsx b/frontend/src/components/ProjectCard.tsx index 5591df25..ac8feb80 100644 --- a/frontend/src/components/ProjectCard.tsx +++ b/frontend/src/components/ProjectCard.tsx @@ -5,6 +5,7 @@ import { useT } from '../lib/I18nContext'; import { useIsMobile } from '../hooks/useMediaQuery'; import { isValidationDisc, isBriefingDisc, isBootstrapDisc, isUsable, isTrackerMcp } from '../lib/constants'; import { AiDocViewer } from './AiDocViewer'; +import { unseenBasis } from './SwipeableDiscItem'; import AuditRecapPanel from './AuditRecapPanel'; import type { AuditKind } from '../types/AuditKind'; import { ProjectSkills } from './ProjectSkills'; @@ -1040,7 +1041,7 @@ export function ProjectCard({ {disc.title} - {disc.message_count ?? disc.messages.length} msg · {disc.agent} + {unseenBasis(disc)} msg · {disc.agent}