feat: issue 标题以核心问题打头，而非仅类型+文件 by NeverENG · Pull Request #16 · NeverENG/BanGD-AI-PR-Review-Tool

NeverENG · 2026-05-31T13:43:06Z

架构 finding 新增可选 title 字段（核心问题的一句话标题），使提出的 issue 标题从
🐯 BanGD [并发] storage/zstorage/memtable.go
升级为
🐯 [并发] 读路径无同步自增 hits 计数 · memtable.go，
一眼可见问题本身。标题同时出现在 finding 头部与 PR 评论的 issue 列表。

title 为可选 + 降级回退（缺失时回退到类型+完整路径）：纯装饰性标题不应能让校验失败、拖垮整条评审。dedup key 仍为代码计算的 pr<N>:<file>:<type>，与标题无关，重复评审照常去重。

Schema（Zod + JSON-schema 镜像，同步测试通过）、系统提示词（≤20 字、具体可定位）、格式化、反驳提示、主 few-shot 范例与测试 fixture 全部更新。dist/ 已重建，node20 Action 直接生效。

🤖 Generated with Claude Code

…eralFindings) Architecture findings are BanGD's differentiator, but emitting only those cedes ordinary bugs (off-by-one, swallowed errors, nil deref, races) to Copilot. Add a second result class, generalFindings: concrete diff-evidenced correctness/logic defects, lightweight (no four-段式), filed inline in the PR comment (not as tracked issues). - schema: GeneralFindingSchema + generalFindings (.default([]) for graceful degradation), kept in sync with the tool JSON schema by a test - prompt/system-prompt: 4th output part + quality red-lines (diff-evidenced only, no style nits, no dup of architecture findings, <=6, [] when none) - verify: generalize VerifyOutcome<T>/verifyItems<T>; verifyGeneralFindings runs the SAME adversarial majority-refute pass (refuter prompt reframed to "is this a real correctness defect") - review: verify both finding kinds in parallel; ReviewOutcome.droppedGeneralFindings - format: render general findings inline; omit the section when empty - action(.yml): general_finding_count output; dropped count covers both kinds - DESIGN.md §七: why both classes coexist, structural/delivery differences - tests: rendering, verification, schema-sync, default-omit coverage Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

PR #13 shipped the generalFindings niche described in the system prompt but with no worked example, while every architecture dimension has one. A wrong-or- absent few-shot is the single biggest quality lever per CLAUDE.md, so add the missing exemplar. - prompts/examples/general-findings.md: a worked example (binary-search off-by- one that infinite-loops) showing the 7-field generalFinding shape AND, just as important, the red-line — what NOT to report (style/naming nits, "add a test", unfounded speculation, dup of an architecture finding), plus the boundary vs architecture findings. - prompt.ts: assembleSystemPrompt takes an always-on generalExample, appended in its own block regardless of selected dimensions (generalFindings is requested on every review). Architecture examples relabeled "架构级 Few-shot 范例". - prompts.ts: PromptTexts.generalExample, loaded unconditionally (not dimension- gated); lives in the prompt-cached system block so marginal token cost ~= nil. - DESIGN.md §七: note the niche now ships with its own few-shot (parity with the per-dimension architecture examples). - tests: new prompt.test.ts (assembleSystemPrompt always-on behavior + user prompt parts); prompts.test.ts loads + red-line check; review.test.ts asserts the exemplar is present regardless of dimension selection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…DESIGN §六.2) Builds the measurement scaffolding that turns "feels accurate" into precision/recall, and runs the first real A/B. The corpus is 6 REAL BanDB PRs (eval/cases.json, frozen from GitHub), labels anchored on author intent or a maintainer-confirmed fix (e.g. #58's crash-consistency + Stat→Seek race, which BanGD flagged and the maintainer fixed in #63). PR bodies are neutralized so the model detects from code, not from a body that states the answer. - src/eval/score.ts: pure precision/recall/F1 scorer; fuzzy match on basename + accepted type/category, one-to-one greedy. Fully unit-tested (13 tests). - src/eval/corpus.ts + eval/cases.json + eval/build-corpus.mjs: the frozen real-PR corpus and its (re-runnable) builder. - src/eval/run.ts: single-variable A/B (adversarial verify off vs on, all else held constant, fresh client per config to isolate tokens); writes docs/评测报告.md. Key from ANTHROPIC_API_KEY/DEEPSEEK_API_KEY or the gitignored eval/.apikey; npm run eval. - docs/评测报告.md: the first run (DeepSeek). Honest, counterintuitive result — on the cheap model the adversarial refuters OVER-refute: recall 100%→0% (true findings killed), F1 35.3%→0%, +173% tokens. Empirically backs DESIGN §二 (model choice is the divide) and motivates §六.3 (perspective-diverse refuters instead of homogeneous ones). Real accuracy must be re-run on Opus. Stacks on the generalFindings branch (eval scores result.generalFindings too). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…findings) The first eval run (docs/评测报告.md) exposed a disaster: on a weak refuter model the verification pass deleted EVERY true finding (recall 100%→0%, F1→0, +173% tokens). Root cause: the refuter prompt said "default to refuted when evidence is thin", and N identical refuters echoed the same over-refute. Fix: - (A) keep-by-default: a refuter rejects a finding only when it can cite concrete evidence it is wrong; "unsure" keeps it. Inverts the old uncertain⇒refute bias. - (B) perspective-diverse lenses: each refuter scrutinizes one angle hardest (diff-grounding / misread / can-it-happen) but returns a HOLISTIC verdict, so N refuters decorrelate. (Holistic, not per-axis — per-axis refuting would let a one-axis FP survive unanimity; DESIGN §六.3.) - (C) unanimous-to-drop: a finding is dropped only if every refuter rejects it — one defender keeps it. Makes killing a real finding hard. Re-run on DeepSeek confirms the fix: recall held 33.3%→33.3% (the true finding #43 survived verification, vs being killed before); F1 22.2%→40%; +5% tokens. The report is honest about confounds: part of the apparent precision gain is a DeepSeek parse-error artifact (#35) and the A/B confounds verify-effect with generation non-determinism — the clean claim is "over-refute eliminated". Also surfaced: DeepSeek returned unparseable output 3/12 times (#58 ×2, #35 ×1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Architecture findings gain an optional `title` field — a concise one-line headline of the core problem — so a filed issue reads `🐯 [并发] 读路径无同步自增 hits 计数 · memtable.go` instead of the old `🐯 BanGD [并发] storage/zstorage/memtable.go`. The headline also surfaces in the finding header and the PR comment's issue list. `title` is optional with a graceful fallback (type + full path) on the rare miss: a cosmetic headline must not be able to fail validation and sink an otherwise-good review. The dedup key stays code-computed (pr<N>:<file>:<type>), independent of the title, so re-reviews still de-spam correctly. Schema (Zod + JSON-schema mirror, sync test green), system prompt (≤20-char specific headline), formatter, refute prompt, the primary few-shot exemplar, and fixtures all updated. dist/ rebuilt so the node20 Action ships it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

NeverENG and others added 5 commits May 31, 2026 10:55

NeverENG merged commit b1573eb into main May 31, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: issue 标题以核心问题打头，而非仅类型+文件#16

feat: issue 标题以核心问题打头，而非仅类型+文件#16
NeverENG merged 5 commits into
mainfrom
feat/issue-title-headline

NeverENG commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NeverENG commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant