fix(grep): raise asyncio stream limit and catch ValueError on long lines#110
Conversation
Ripgrep can output lines longer than asyncio's default StreamReader limit (64 KB), for example in minified JS/CSS files or lock files. When that happens, readline() internally raises LimitOverrunError which is re-raised as ValueError and propagates unhandled all the way to the REPL, crashing the session. Two-layer fix: - Pass limit=8*1024*1024 to asyncio.create_subprocess_exec so lines up to 8 MB are handled transparently. - Catch ValueError inside _collect_rg_matches and _collect_rg_file_matches as a safety net so any line that somehow exceeds even that limit is silently skipped rather than aborting. Closes HKUDS#108 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes a crash in the grep tool when ripgrep outputs extremely long lines that exceed asyncio’s default StreamReader line limit, preventing unhandled ValueError from terminating sessions.
Changes:
- Increased asyncio subprocess stream line limit to 8 MB for ripgrep-based directory and single-file searches.
- Added
ValueErrorhandling in ripgrep output collection loops to skip oversized lines instead of crashing. - Documented the fix in the changelog.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/openharness/tools/grep_tool.py |
Raises asyncio subprocess stream limit and adds ValueError handling during stdout line collection to avoid crashes on long ripgrep output lines. |
CHANGELOG.md |
Adds an entry describing the long-line crash fix and the new behavior (8 MB limit + skipping oversized lines). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cwd=str(root), | ||
| stdout=asyncio.subprocess.PIPE, | ||
| stderr=asyncio.subprocess.PIPE, | ||
| limit=8 * 1024 * 1024, # 8 MB per line — avoids LimitOverrunError on long lines | ||
| ) |
There was a problem hiding this comment.
The 8 MB stream limit value is duplicated here and in _rg_grep_file. Consider extracting it into a module-level constant (e.g., _RG_STREAM_LIMIT_BYTES) so it’s easy to tune and avoids the two call sites drifting over time.
| cwd=str(path.parent), | ||
| stdout=asyncio.subprocess.PIPE, | ||
| stderr=asyncio.subprocess.PIPE, | ||
| limit=8 * 1024 * 1024, # 8 MB per line — avoids LimitOverrunError on long lines | ||
| ) |
There was a problem hiding this comment.
Same duplicated 8 MB stream limit as in _rg_grep. Please consider using a shared constant so future adjustments don’t accidentally update only one of the two subprocess call sites.
| try: | ||
| raw = await process.stdout.readline() | ||
| except ValueError: | ||
| # Line exceeded the stream buffer limit; skip it and continue. | ||
| continue |
There was a problem hiding this comment.
New behavior: when StreamReader.readline() raises ValueError for an over-limit line, the code skips it and continues. Please add a unit test that simulates process.stdout.readline() raising ValueError once (then returning a normal match / EOF) to prevent regressions and ensure the loop doesn’t terminate or hang.
| try: | ||
| raw = await process.stdout.readline() | ||
| except ValueError: | ||
| # Line exceeded the stream buffer limit; skip it and continue. | ||
| continue |
There was a problem hiding this comment.
Please add test coverage for this ValueError skip-path in _collect_rg_file_matches as well (simulate readline() raising ValueError and verify the tool continues collecting subsequent lines). This will help ensure long-line output can’t crash the session again.
Summary
Problem: The grep tool crashes the session with an unhandled
ValueErrorwhen ripgrep outputs a line longer than asyncio's defaultStreamReaderlimit (64 KB). This happens on minified JS/CSS files, lock files, and any generated asset with very long lines.Root cause:
asyncio.create_subprocess_execuses a 64 KB per-line buffer. Whenrghits a longer line,readline()raisesLimitOverrunError, which is re-raised asValueErrorand propagates unhandled all the way to the REPL.Fix — two-layer defence in
grep_tool.py:limit=8*1024*1024toasyncio.create_subprocess_execso lines up to 8 MB are handled transparently (covers the vast majority of real-world long lines).ValueErrorinside_collect_rg_matchesand_collect_rg_file_matchesas a safety net: any line that somehow exceeds even 8 MB is silently skipped rather than terminating the session.Both
_rg_grep(directory search) and_rg_grep_file(single-file search) are fixed.Validation
uv run ruff check src tests scripts— all checks passeduv run pytest -q— passedcd frontend/terminal && npx tsc --noEmit— not applicable (backend-only change)Notes