fix(grep): raise asyncio stream limit and catch ValueError on long lines by yl-jiang · Pull Request #110 · HKUDS/OpenHarness

yl-jiang · 2026-04-11T08:33:05Z

Summary

Problem: The grep tool crashes the session with an unhandled ValueError when ripgrep outputs a line longer than asyncio's default StreamReader limit (64 KB). This happens on minified JS/CSS files, lock files, and any generated asset with very long lines.

Root cause: asyncio.create_subprocess_exec uses a 64 KB per-line buffer. When rg hits a longer line, readline() raises LimitOverrunError, which is re-raised as ValueError and propagates unhandled all the way to the REPL.

Fix — two-layer defence in grep_tool.py:

Pass limit=8*1024*1024 to asyncio.create_subprocess_exec so lines up to 8 MB are handled transparently (covers the vast majority of real-world long lines).
Catch ValueError inside _collect_rg_matches and _collect_rg_file_matches as a safety net: any line that somehow exceeds even 8 MB is silently skipped rather than terminating the session.

Both _rg_grep (directory search) and _rg_grep_file (single-file search) are fixed.

Validation

uv run ruff check src tests scripts — all checks passed
uv run pytest -q — passed
cd frontend/terminal && npx tsc --noEmit — not applicable (backend-only change)

Notes

Related issue: bug: grep tool crashes with ValueError / LimitOverrunError on long lines #108
Follow-up work: none

Ripgrep can output lines longer than asyncio's default StreamReader limit (64 KB), for example in minified JS/CSS files or lock files. When that happens, readline() internally raises LimitOverrunError which is re-raised as ValueError and propagates unhandled all the way to the REPL, crashing the session. Two-layer fix: - Pass limit=8*1024*1024 to asyncio.create_subprocess_exec so lines up to 8 MB are handled transparently. - Catch ValueError inside _collect_rg_matches and _collect_rg_file_matches as a safety net so any line that somehow exceeds even that limit is silently skipped rather than aborting. Closes HKUDS#108 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Fixes a crash in the grep tool when ripgrep outputs extremely long lines that exceed asyncio’s default StreamReader line limit, preventing unhandled ValueError from terminating sessions.

Changes:

Increased asyncio subprocess stream line limit to 8 MB for ripgrep-based directory and single-file searches.
Added ValueError handling in ripgrep output collection loops to skip oversized lines instead of crashing.
Documented the fix in the changelog.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`src/openharness/tools/grep_tool.py`	Raises asyncio subprocess stream limit and adds `ValueError` handling during stdout line collection to avoid crashes on long ripgrep output lines.
`CHANGELOG.md`	Adds an entry describing the long-line crash fix and the new behavior (8 MB limit + skipping oversized lines).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-11T08:35:58Z

src/openharness/tools/grep_tool.py

        cwd=str(root),
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
+        limit=8 * 1024 * 1024,  # 8 MB per line — avoids LimitOverrunError on long lines
    )


The 8 MB stream limit value is duplicated here and in _rg_grep_file. Consider extracting it into a module-level constant (e.g., _RG_STREAM_LIMIT_BYTES) so it’s easy to tune and avoids the two call sites drifting over time.

Copilot · 2026-04-11T08:35:58Z

src/openharness/tools/grep_tool.py

        cwd=str(path.parent),
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
+        limit=8 * 1024 * 1024,  # 8 MB per line — avoids LimitOverrunError on long lines
    )


Same duplicated 8 MB stream limit as in _rg_grep. Please consider using a shared constant so future adjustments don’t accidentally update only one of the two subprocess call sites.

Copilot · 2026-04-11T08:35:59Z

src/openharness/tools/grep_tool.py

+        try:
+            raw = await process.stdout.readline()
+        except ValueError:
+            # Line exceeded the stream buffer limit; skip it and continue.
+            continue


New behavior: when StreamReader.readline() raises ValueError for an over-limit line, the code skips it and continues. Please add a unit test that simulates process.stdout.readline() raising ValueError once (then returning a normal match / EOF) to prevent regressions and ensure the loop doesn’t terminate or hang.

Copilot · 2026-04-11T08:35:59Z

src/openharness/tools/grep_tool.py

+        try:
+            raw = await process.stdout.readline()
+        except ValueError:
+            # Line exceeded the stream buffer limit; skip it and continue.
+            continue


Please add test coverage for this ValueError skip-path in _collect_rg_file_matches as well (simulate readline() raising ValueError and verify the tool continues collecting subsequent lines). This will help ensure long-line output can’t crash the session again.

Copilot AI review requested due to automatic review settings April 11, 2026 08:33

Copilot started reviewing on behalf of yl-jiang April 11, 2026 08:33 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

tjb-tech added a commit that referenced this pull request Apr 11, 2026

Merge pull request #110 from yl-jiang/fix/grep-limit-overrun

76b2b48

tjb-tech merged commit 33d0f0d into HKUDS:main Apr 11, 2026
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(grep): raise asyncio stream limit and catch ValueError on long lines#110

fix(grep): raise asyncio stream limit and catch ValueError on long lines#110
tjb-tech merged 1 commit intoHKUDS:mainfrom
yl-jiang:fix/grep-limit-overrun

yl-jiang commented Apr 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yl-jiang commented Apr 11, 2026

Summary

Validation

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants