Skip to content

Conversation

tausbn
Copy link
Contributor

@tausbn tausbn commented Sep 19, 2025

Our previous modelling did not account for the fact that a lookahead can potentially extend all the way to the end of the input (and similarly, that a lookbehind can extend all the way to the beginning).

To fix this, I extended firstPart and lastPart to handle lookbehinds and lookaheads correctly, and added some test cases (all of which yield no new results).

Fixes #20429.

Our previous modelling did not account for the fact that a lookahead can
potentially extend all the way to the end of the input (and similarly,
that a lookbehind can extend all the way to the beginning).

To fix this, I extended `firstPart` and `lastPart` to handle lookbehinds
and lookaheads correctly, and added some test cases (all of which yield
no new results).

Fixes #20429.
@tausbn tausbn marked this pull request as ready for review September 19, 2025 18:49
@tausbn tausbn requested a review from a team as a code owner September 19, 2025 18:49
@Copilot Copilot AI review requested due to automatic review settings September 19, 2025 18:49
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes false positives in CodeQL queries that check for unmatchable $ and ^ anchors in regular expressions by improving the handling of lookahead and lookbehind assertions. The fix ensures that anchors inside these assertions are correctly analyzed for their potential to match the beginning or end of input.

Key changes:

  • Extended lookahead/lookbehind predicate signatures to include content boundaries
  • Updated firstPart and lastPart logic to handle assertions that can reach string boundaries
  • Added test cases covering various lookahead and lookbehind scenarios

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
python/ql/lib/semmle/python/regexp/internal/ParseRegExp.qll Core fix: extended assertion predicates and added logic for handling assertions in first/last part detection
python/ql/lib/semmle/python/regexp/RegexTreeView.qll Updated calls to assertion predicates to match new signatures
python/ql/test/query-tests/Expressions/Regex/test.py Added test cases for lookahead and lookbehind assertions
python/ql/test/library-tests/regex/FirstLast.expected Updated expected test results reflecting the improved analysis
python/ql/src/change-notes/2025-09-19-fix-unmatchable-dollar-and-caret-in-assertions.md Added release notes documenting the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

False positive: Unmatchable dollar in regular expression with lookahead assertion
1 participant