Skip to content

fix(grep): raise asyncio stream limit and catch ValueError on long lines#110

Merged
tjb-tech merged 1 commit intoHKUDS:mainfrom
yl-jiang:fix/grep-limit-overrun
Apr 11, 2026
Merged

fix(grep): raise asyncio stream limit and catch ValueError on long lines#110
tjb-tech merged 1 commit intoHKUDS:mainfrom
yl-jiang:fix/grep-limit-overrun

Conversation

@yl-jiang
Copy link
Copy Markdown
Contributor

Summary

Problem: The grep tool crashes the session with an unhandled ValueError when ripgrep outputs a line longer than asyncio's default StreamReader limit (64 KB). This happens on minified JS/CSS files, lock files, and any generated asset with very long lines.

Root cause: asyncio.create_subprocess_exec uses a 64 KB per-line buffer. When rg hits a longer line, readline() raises LimitOverrunError, which is re-raised as ValueError and propagates unhandled all the way to the REPL.

Fix — two-layer defence in grep_tool.py:

  1. Pass limit=8*1024*1024 to asyncio.create_subprocess_exec so lines up to 8 MB are handled transparently (covers the vast majority of real-world long lines).
  2. Catch ValueError inside _collect_rg_matches and _collect_rg_file_matches as a safety net: any line that somehow exceeds even 8 MB is silently skipped rather than terminating the session.

Both _rg_grep (directory search) and _rg_grep_file (single-file search) are fixed.

Validation

  • uv run ruff check src tests scripts — all checks passed
  • uv run pytest -q — passed
  • cd frontend/terminal && npx tsc --noEmit — not applicable (backend-only change)

Notes

Ripgrep can output lines longer than asyncio's default StreamReader
limit (64 KB), for example in minified JS/CSS files or lock files.
When that happens, readline() internally raises LimitOverrunError which
is re-raised as ValueError and propagates unhandled all the way to the
REPL, crashing the session.

Two-layer fix:
- Pass limit=8*1024*1024 to asyncio.create_subprocess_exec so lines up
  to 8 MB are handled transparently.
- Catch ValueError inside _collect_rg_matches and
  _collect_rg_file_matches as a safety net so any line that somehow
  exceeds even that limit is silently skipped rather than aborting.

Closes HKUDS#108

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 11, 2026 08:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a crash in the grep tool when ripgrep outputs extremely long lines that exceed asyncio’s default StreamReader line limit, preventing unhandled ValueError from terminating sessions.

Changes:

  • Increased asyncio subprocess stream line limit to 8 MB for ripgrep-based directory and single-file searches.
  • Added ValueError handling in ripgrep output collection loops to skip oversized lines instead of crashing.
  • Documented the fix in the changelog.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/openharness/tools/grep_tool.py Raises asyncio subprocess stream limit and adds ValueError handling during stdout line collection to avoid crashes on long ripgrep output lines.
CHANGELOG.md Adds an entry describing the long-line crash fix and the new behavior (8 MB limit + skipping oversized lines).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 183 to 187
cwd=str(root),
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
limit=8 * 1024 * 1024, # 8 MB per line — avoids LimitOverrunError on long lines
)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 8 MB stream limit value is duplicated here and in _rg_grep_file. Consider extracting it into a module-level constant (e.g., _RG_STREAM_LIMIT_BYTES) so it’s easy to tune and avoids the two call sites drifting over time.

Copilot uses AI. Check for mistakes.
Comment on lines 240 to 244
cwd=str(path.parent),
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
limit=8 * 1024 * 1024, # 8 MB per line — avoids LimitOverrunError on long lines
)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same duplicated 8 MB stream limit as in _rg_grep. Please consider using a shared constant so future adjustments don’t accidentally update only one of the two subprocess call sites.

Copilot uses AI. Check for mistakes.
Comment on lines +287 to +291
try:
raw = await process.stdout.readline()
except ValueError:
# Line exceeded the stream buffer limit; skip it and continue.
continue
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior: when StreamReader.readline() raises ValueError for an over-limit line, the code skips it and continues. Please add a unit test that simulates process.stdout.readline() raising ValueError once (then returning a normal match / EOF) to prevent regressions and ensure the loop doesn’t terminate or hang.

Copilot uses AI. Check for mistakes.
Comment on lines +309 to +313
try:
raw = await process.stdout.readline()
except ValueError:
# Line exceeded the stream buffer limit; skip it and continue.
continue
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test coverage for this ValueError skip-path in _collect_rg_file_matches as well (simulate readline() raising ValueError and verify the tool continues collecting subsequent lines). This will help ensure long-line output can’t crash the session again.

Copilot uses AI. Check for mistakes.
@tjb-tech tjb-tech merged commit 33d0f0d into HKUDS:main Apr 11, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants