Skip to content

perf: lift repeated regex compilations to module level#231

Open
vincerevu wants to merge 3 commits intoaiming-lab:mainfrom
vincerevu:main
Open

perf: lift repeated regex compilations to module level#231
vincerevu wants to merge 3 commits intoaiming-lab:mainfrom
vincerevu:main

Conversation

@vincerevu
Copy link
Copy Markdown

Summary

This PR improves regex performance by moving repeatedly compiled patterns to the module level, so they are compiled once at import time instead of on every function call.

What changed

  • Moved the regex patterns used by _extract_multi_file_blocks in researchclaw/pipeline/_helpers.py to a module-level constant, e.g. _MULTI_FILE_PATTERNS
  • Preserved existing matching behavior
  • Kept the change intentionally small and review-friendly

Why

Some regex patterns were being compiled inside function bodies during execution. This means the same patterns could be recompiled many times across repeated calls, adding unnecessary overhead.

Moving these patterns to module scope is a low-risk optimization that:

  • reduces repeated regex compilation
  • keeps runtime behavior the same
  • makes the intent clearer

Scope

This PR focuses only on the _extract_multi_file_blocks path in:

  • researchclaw/pipeline/_helpers.py

Other possible regex cleanup opportunities were identified, but they are intentionally left out of this PR to keep the change narrow and easy to review.

Validation

  • Ran the relevant test suite to confirm behavior is preserved

Notes

Additional low-risk regex optimizations may be proposed separately for:

  • _parse_metrics_from_stdout in researchclaw/pipeline/_helpers.py
  • _deduplicate_tables in researchclaw/templates/converter.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant