Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Bitwise Check for Newline Mode Causes Regex Matching Failure in POCO RegularExpression #4870

Open
sdottaka opened this issue Feb 14, 2025 · 0 comments
Labels

Comments

@sdottaka
Copy link

Describe the bug
The regular expression "\\s+$" with RegularExpression::RE_MULTILINE | RegularExpression::RE_NEWLINE_ANYCRLF does not correctly remove trailing whitespace at the end of lines. The issue is caused by improper bitwise checks when setting the newline mode in RegularExpression.cpp. As a result, RE_NEWLINE_ANYCRLF is not correctly recognized, leading to incorrect behavior when matching line endings.

To Reproduce
Steps to reproduce the behavior:

  1. Compile and run the following test case:
void RegularExpressionTest::testSubst5()
{
    RegularExpression re("\\s+$", RegularExpression::RE_MULTILINE | RegularExpression::RE_NEWLINE_ANYCRLF);
    std::string s = "ABC 123  \n456 789 \nDEF  ";
    assertTrue(re.subst(s, "", RegularExpression::RE_GLOBAL) == 3);
    assertTrue(s == "ABC 123\n456 789\nDEF");
}
  1. Observe that the test fails because the expected substitution does not occur.

Expected behavior
The regular expression should correctly remove trailing whitespace at the end of each line, resulting in the string:

ABC 123
456 789
DEF

Logs
No specific logs are generated, but the test assertion fails.

Screenshots
N/A

Please add relevant environment information:

  • OS Type and Version: (e.g., Windows 10 64-bit, macOS 12, Ubuntu 22.04)
  • POCO Version: (1.14.1)
  • Third-party product type and version: (if applicable)

Additional context
The issue is caused by improper bitwise checks in RegularExpression.cpp. The following patch resolves the issue by ensuring that RE_NEWLINE_ANYCRLF is properly detected:

--- a/Foundation/src/RegularExpression.cpp
+++ b/Foundation/src/RegularExpression.cpp
@@ -75,13 +75,14 @@ RegularExpression::RegularExpression(const std::string& pattern, int options, bo
        pcre2_compile_context* context = pcre2_compile_context_create(nullptr);
        if (!context) throw Poco::RegularExpressionException("cannot create compile context");

-       if (options & RE_NEWLINE_LF)
+       const unsigned int RE_NEWLINE_MASK = 0x00f00000;
+       if ((options & RE_NEWLINE_MASK) == RE_NEWLINE_LF)
                pcre2_set_newline(context, PCRE2_NEWLINE_LF);
-       else if (options & RE_NEWLINE_CRLF)
+       else if ((options & RE_NEWLINE_MASK) == RE_NEWLINE_CRLF)
                pcre2_set_newline(context, PCRE2_NEWLINE_CRLF);
-       else if (options & RE_NEWLINE_ANY)
+       else if ((options & RE_NEWLINE_MASK) == RE_NEWLINE_ANY)
                pcre2_set_newline(context, PCRE2_NEWLINE_ANY);
-       else if (options & RE_NEWLINE_ANYCRLF)
+       else if ((options & RE_NEWLINE_MASK) == RE_NEWLINE_ANYCRLF)
                pcre2_set_newline(context, PCRE2_NEWLINE_ANYCRLF);
        else // default RE_NEWLINE_CR
                pcre2_set_newline(context, PCRE2_NEWLINE_CR);

This ensures that only the correct bits are checked when setting the newline mode, fixing the issue.

@sdottaka sdottaka added the bug label Feb 14, 2025
@obiltschnig obiltschnig added this to the Release 1.15.0 milestone Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants