Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<regex>: wregex with regular expression [\w\s] fails to match some spaces #5243

Open
muellerj2 opened this issue Jan 17, 2025 · 0 comments
Open

Comments

@muellerj2
Copy link
Contributor

The regular expression [\w\s] fails to match whitespace characters with code points > 255.

Test case

#include <iostream>
#include <regex>

using namespace std;

int main() {
    const wregex re1(LR"([\s])");
    const wregex re2(LR"([\w\s])");
    cout << R"(U+0020 SPACE is matched by "[\s]": )" <<  regex_match(L" ", re1) << '\n';
    cout << R"(U+0020 SPACE is matched by "[\w\s]": )" <<  regex_match(L" ", re2) << '\n';
    cout << R"(U+2028 LINE SEPARATOR is matched by "[\s]": )" <<  regex_match(L"\u2028", re1) << '\n';
    cout << R"(U+2028 LINE SEPARATOR is matched by "[\w\s]": )" <<  regex_match(L"\u2028", re2) << '\n';
}

https://godbolt.org/z/oEdTs3Th4

This prints:

U+0020 SPACE is matched by "[\s]": 1
U+0020 SPACE is matched by "[\w\s]": 1
U+2028 LINE SEPARATOR is matched by "[\s]": 1
U+2028 LINE SEPARATOR is matched by "[\w\s]": 0

Expected result

This should print:

U+0020 SPACE is matched by "[\s]": 1
U+0020 SPACE is matched by "[\w\s]": 1
U+2028 LINE SEPARATOR is matched by "[\s]": 1
U+2028 LINE SEPARATOR is matched by "[\w\s]": 1

Additional remarks

The underlying cause is #5242. But while I consider fixing #5242 ABI-breaking, I think this issue can be fixed without breaking ABI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant