Skip to content

fix: remove stale test skips, fix regex backtracking, file search bug, and test flakiness#4766

Closed
LucidPaths wants to merge 1 commit intoNousResearch:mainfrom
LucidPaths:fix/stale-test-skips-and-bugs
Closed

fix: remove stale test skips, fix regex backtracking, file search bug, and test flakiness#4766
LucidPaths wants to merge 1 commit intoNousResearch:mainfrom
LucidPaths:fix/stale-test-skips-and-bugs

Conversation

@LucidPaths
Copy link
Copy Markdown

Summary

Fix 2 real bugs and remove stale test skips that were hiding 256 passing tests.

Bug Fixes

agent/redact.py — catastrophic regex backtracking in _ENV_ASSIGN_RE
The pattern [A-Z_]* with re.IGNORECASE matches any letter, causing exponential backtracking on large strings (100K+ tool output). Removed re.IGNORECASE and changed to [A-Z0-9_]* to restrict to actual env var name characters. This was causing test_file_read_guards to time out.

tools/file_operations.py — over-escaped newline in find -printf
The format string had \\\\n (4 backslashes) which produced literal \n in shell output instead of a real newline. File search results were concatenated into one entry, breaking total_count and path resolution.

Test Fixes

Remove stale blanket skips from 4 test modules:
All were marked pytestmark = pytest.mark.skip(reason="Hangs in non-interactive environments") but run fine:

  • test_413_compression.py (12 tests, 25s)
  • test_file_tools_live.py (71 tests, 24s)
  • test_code_execution.py (61 tests, 99s)
  • test_agent_loop_tool_calling.py (already has proper OPENROUTER_API_KEY skip)

Fix 4 test issues:

  • test_413_compression.py: Threshold values too small for compressed output to fit in one pass
  • test_mcp_probe.py: Missing _MCP_AVAILABLE mock — tests returned early without MCP SDK
  • test_mcp_tool_issue_948.py: patch() targets don't exist without MCP SDK — inject symbols
  • test_approve_deny_commands.py: Race condition — time.sleep(0.3) replaced with deterministic polling of _gateway_queues to wait for threads to register before resolving

Impact

+256 tests recovered from skip, 8 failures fixed. Tested on Ubuntu 24.04 / Python 3.12.

Checklist

  • Changes are focused on bug fixes and test improvements only
  • No new dependencies
  • All modified tests pass individually and in full suite
  • No unrelated changes in diff

…, and test flakiness

Bug fixes:
- agent/redact.py: catastrophic regex backtracking in _ENV_ASSIGN_RE — removed
  re.IGNORECASE and changed [A-Z_]* to [A-Z0-9_]* to restrict matching to actual
  env var name chars. Without this, the pattern backtracks exponentially on large
  strings (e.g. 100K tool output), causing test_file_read_guards to time out.
- tools/file_operations.py: over-escaped newline in find -printf format string
  produced literal backslash-n instead of a real newline, breaking file search
  result parsing (total_count always 1, paths concatenated).

Test fixes:
- Remove stale pytestmark.skip from 4 test modules that were blanket-skipped as
  'Hangs in non-interactive environments' but actually run fine:
  - test_413_compression.py (12 tests, 25s)
  - test_file_tools_live.py (71 tests, 24s)
  - test_code_execution.py (61 tests, 99s)
  - test_agent_loop_tool_calling.py (has proper OPENROUTER_API_KEY skip already)
- test_413_compression.py: fix threshold values in 2 preflight compression tests
  where context_length was too small for the compressed output to fit in one pass.
- test_mcp_probe.py: add missing _MCP_AVAILABLE mock so tests work without MCP SDK.
- test_mcp_tool_issue_948.py: inject MCP symbols (StdioServerParameters etc.) when
  SDK is not installed so patch() targets exist.
- test_approve_deny_commands.py: replace time.sleep(0.3) with deterministic polling
  of _gateway_queues — fixes race condition where resolve fires before threads
  register their approval entries, causing the test to hang indefinitely.

Net effect: +256 tests recovered from skip, 8 real failures fixed.
@britrik
Copy link
Copy Markdown

britrik commented Apr 3, 2026

Code Review: PR #4766

Thank you for this PR addressing multiple issues. I've reviewed the changes and have the following feedback:

✅ Changes Look Good

  1. Regex Backtracking Fix (agent/redact.py)

    • Changing to correctly includes digits in environment variable names (e.g., API_KEY_1)
    • Removing re.IGNORECASE is appropriate since we're now explicitly including both cases in the character class
    • This also mitigates potential ReDoS (Regular Expression Denial of Service) from catastrophic backtracking
  2. Test Flakiness Fix (tests/gateway/test_approve_deny_commands.py)

    • Replacing fixed time.sleep(0.3) with polling for _gateway_queues entries is a robust solution
    • 5-second deadline with 50ms polling intervals provides both reliability and a safety net
  3. Stale Test Skip Removals

    • test_413_compression.py: Test parameters adjusted sensibly (context_length: 100 → 2000) to avoid edge cases
    • test_agent_loop_tool_calling.py: Good comment explaining fallback to API key check
    • test_code_execution.py and test_file_tools_live.py: Properly unskipped
  4. File Search Bug Fix (tools/file_operations.py)

    • The fix from -printf '%T@ %p\\\\n' to -printf '%T@ %p\\n' correctly outputs actual newlines instead of literal backslash-n sequences
    • This was causing parsing issues when reading find results
  5. MCP Test Fixes (test_mcp_probe.py, test_mcp_tool_issue_948.py)

    • Adding _MCP_AVAILABLE patches ensures tests work without the full MCP SDK installed
    • The fallback symbol definitions in test_mcp_tool_issue_948.py are a good defensive measure

Minor Suggestions

  • test_413_compression.py: Consider adding a brief comment explaining why the specific values (2000/1700) were chosen, since the ratio matters for test reliability

  • regex pattern: The regex change looks good, but you might want to verify with some edge cases like A1B2TH=test to ensure it works as expected

Security Considerations

  • ✅ No security issues introduced
  • ✅ The regex fix actually improves security by preventing potential ReDoS

Overall

This is a solid bugfix PR that addresses real issues. The changes are well-thought-out and the test improvements should reduce flakiness. LGTM!

@britrik
Copy link
Copy Markdown

britrik commented Apr 3, 2026

Code Review - PR #4766

Overall: APPROVED - Good set of fixes addressing real bugs and test issues.

Bug Fixes

✅ agent/redact.py - Regex Backtracking (ReDoS)

  • Removing re.IGNORECASE and limiting to [A-Z0-9_]* is the correct fix
  • Environment variables only contain uppercase, numbers, and underscores
  • Prevents catastrophic backtracking on large inputs

✅ tools/file_operations.py - File Search Bug

  • Fixing \\\\n\\n resolves the escaped newline issue
  • File results were being concatenated into single entries

✅ tests/gateway/test_approve_deny_commands.py - Test Flakiness

  • Deterministic polling is much better than fixed sleep
  • Avoids race conditions in CI

Test Improvements

✅ Stale Skip Removal

  • Good to see 256 tests recovered
  • Ensure CI runs these to prevent future regressions

✅ MCP Test Fixes

  • Adding _MCP_AVAILABLE patches is the right approach
  • The fallback symbol injection is smart defensive coding

Minor Suggestions

  1. agent/redact.py: Consider adding a comment explaining why re.IGNORECASE was removed (security rationale) for future maintainers

  2. tests/test_413_compression.py: The adjusted threshold values (2000 context, 1700 threshold) look reasonable - ensure these aren't too close and cause flakiness

Overall, these are solid fixes. Nice work!

@teknium1
Copy link
Copy Markdown
Contributor

teknium1 commented Apr 4, 2026

Merged via PR #4969. Your commit was cherry-picked onto current main with your authorship preserved in git log. Conflict in redact.py resolved by combining your IGNORECASE removal with main's {0,50} length bound. Both bug fixes E2E verified. Thanks @LucidPaths!

@teknium1 teknium1 closed this Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants