fix: grep false negatives, output mangling, and truncation annotations#791
Conversation
📊 Automated PR Analysis
SummaryFixes three issues in grep and smart_truncate that caused AI agents to waste tokens on retry loops: adds --no-ignore to rg so gitignored files aren't silently skipped, passes through raw grep output for small result sets (<=50 matches) instead of a grouped format that confused AI parsers, and replaces synthetic '// ... N lines omitted' truncation markers with clean first-N-lines truncation plus a single '[X more lines]' suffix. Review Checklist
Analyzed automatically by wshm · This is an automated analysis, not a human review. |
pszymkowiak
left a comment
There was a problem hiding this comment.
Thanks @BadassBison — good analysis on grep false negatives and AI retry loops.
Please retarget to develop — all PRs must target develop, not master.
Review notes:
--no-ignoreis risky — this searches insidenode_modules/,target/, etc. Consider--no-ignore-vcsinstead (skips .gitignore but respects .ignore)- Passthrough <=50 — interesting idea but the threshold should be configurable, and it changes RTK's savings metrics
- 10 README files — doc changes should be separate from the code fix
Please retarget and address the --no-ignore concern. Thanks!
|
Hi! Two things needed before we can review:
Thanks! |
|
Hey We are cleaning up the codebase and improving the project structure for better onboarding. As part of this effort, PR #826 reorganizes No logic changes — only file moves and import path updates. What you need to doRebase your branch on git fetch origin && git rebase origin/developGit detects renames automatically. If you get import conflicts, update the paths: use crate::git; // now: use crate::cmds::git::git;
use crate::tracking; // now: use crate::core::tracking;
use crate::config; // now: use crate::core::config;
use crate::init; // now: use crate::hooks::init;
use crate::gain; // now: use crate::analytics::gain;Need help rebasing? Tag @aeppling |
c6c979a to
2cc8a19
Compare
|
@pszymkowiak @aeppling — addressed all feedback:
Thanks for the thorough review! |
Documents the changes from rtk-ai#791: - grep now passes through raw output for <=50 matches (standard file:line:content) - grep uses grouped format only for >50 matches where token savings are meaningful - --no-ignore-vcs flag added to match grep -r behavior for .gitignore'd files - savings range updated to 0-90% to reflect passthrough for small result sets
|
also awaiting changes ;) |
baddd42 to
36041c5
Compare
|
@nicklloyd — all changes have been addressed! Retargeted to |
|
@BadassBison - just following from the sidelines as this one is a blocker. Looking forward to being able to try it out 🤘🏻 |
|
@pszymkowiak, |
|
Hello, this look fine but checks are not passing could you please check why ? Maybe you're missing platforme tags to have clean checks for each platform |
|
@BadassBison — reviewed the CI failure. Single test failing on all 3 platforms: The test parser looks for a word that parses as Fix the parsing in the test to strip brackets: let reported_more: usize = overflow_line
.split_whitespace()
.find_map(|w| w.trim_matches(|c: char| !c.is_ascii_digit()).parse().ok())
.unwrap_or_else(|| panic!("Could not parse overflow count from: {}", overflow_line));Or simpler — since you control the format, just parse directly: // Parse "[N more lines]"
let reported_more: usize = overflow_line
.trim()
.strip_prefix('[')
.and_then(|s| s.split_whitespace().next())
.and_then(|n| n.parse().ok())
.unwrap_or_else(|| panic!("Could not parse overflow count from: {}", overflow_line));Once that's fixed, CI should go green. The rest of the PR looks good — |
36041c5 to
1e4a14d
Compare
|
Rebased on latest The // Parse "[N more lines]"
let reported_more: usize = overflow_line
.trim()
.strip_prefix('[')
.and_then(|s| s.split_whitespace().next())
.and_then(|n| n.parse().ok())
.unwrap_or_else(|| panic!("Could not parse overflow count from: {}", overflow_line));CI should go green now. |
|
The However the passthrough approach for <=50 results goes against RTK's design: RTK is a proxy — we always filter and compress output to save tokens. Passthrough = 0% savings = no reason for RTK to exist on that path. The right fix for grep is to filter better, not to stop filtering:
Same concern for Core principle: RTK is a proxy — we filter and compress output, but we never disable filtering or pass through raw output. See CONTRIBUTING.md for design philosophy. |
1e4a14d to
8813737
Compare
|
@pszymkowiak — addressed both concerns from the Apr 12 review. Rebased on latest grep: removed passthrough, unified to
|
|
LGTM |
- grep: use --no-ignore-vcs so .gitignore'd files aren't silently skipped (matches grep -r behavior, avoids false negatives in large monorepos) - grep: passthrough raw output for <=50 matches so AI agents can parse standard file:line:content format without retry loops - filter: replace smart_truncate heuristic with clean first-N-lines truncation and a single [X more lines] suffix (eliminates synthetic // ... markers that AI agents misread as code, causing parsing confusion and retries)
8813737 to
9bdf435
Compare
Summary
Fixes three issues where RTK's output filtering causes AI agents (Claude Code) to burn extra tokens on retry loops, producing net-negative token impact during analysis-heavy workflows.
--no-ignore-vcsto rg — prevents false negatives in repos with.gitignorewhile still respecting.ignoreand.rgignorefile:line:contentformat AI agents can parse// ... N lines omittedannotations that break AI parsingProblem
Observed in a real session across a large Rails monorepo (~83K files, 1,633 RTK commands):
rgrespects.gitignoreby default,grep -rdoesn't"217 matches in 1F:"format// ... 81 lines omittedin file readssmart_truncateinserts synthetic comment markersQuantified impact: grep had the lowest savings rate (9.3%) but the highest retry cost. Estimated 200-500K tokens burned on retries across ~15 retry patterns, each requiring 2-4 extra tool calls.
Changes
1.
src/cmds/system/grep_cmd.rs—--no-ignore-vcsflagAdded
--no-ignore-vcsto therginvocation so it doesn't skip files listed in.gitignore/.hgignore. This matchesgrep -rbehavior and eliminates false negatives in repos where test files, build artifacts, or generated code live in gitignored directories. Using--no-ignore-vcs(not--no-ignore) so.ignoreand.rgignoreare still respected.2.
src/cmds/system/grep_cmd.rs— Passthrough for small resultsResults with <=50 matches now output raw
file:line:contentformat (standard grep output that AI agents already know how to parse). The grouped"X matches in Y files:"format is preserved only for >50 matches where token savings are meaningful.3.
src/core/filter.rs— Clean truncation insmart_truncateReplaced the "smart" truncation logic that scattered
" // ... N lines omitted"markers throughout file content with clean first-N-lines truncation. A single[X more lines]marker appears at the end only.Tests
test_smart_truncate_no_annotations— verifies no// ...markers in outputtest_smart_truncate_no_truncation_when_under_limit— no truncation when content fitstest_smart_truncate_exact_limit— edge case at exact line counttest_rg_no_ignore_vcs_flag_accepted— verifies rg accepts the new flagTest plan
cargo fmt --all && cargo clippy --all-targets && cargo test --allrtk grep "fn run" src/with <=50 results outputs rawfile:line:contentformatrtk read src/main.rs --max-lines 5shows clean truncation without// ...markers.gitignored directories