Skip to content

Regex misses Windows-style newlines #822

@BigZaphod

Description

@BigZaphod

Description

I ran into a problem while trying to strip HTML whitespace from a string. The pattern I used here comes from many online examples and appears to match the rules for what HTML considers a whitespace, but when used in Swift it seems to not be catching Windows-style two byte newlines (\r\n) in my input.

I did find a workaround by turning on the .matchingSemantics(.unicodeScalar) mode, but that was pretty unexpected and so I'm filing this on the off chance it's an actual bug.

Reproduction

import Foundation

// I'm building a string from hex here so that we don't lose the Windows newlines
// somewhere along the way. I tried copy-pasting the offending string into the
// editor, but I think Xcode or something else was converting things when I did
// that. This seemed a good enough way to ensure nothing gets confused anywhere.

let bytes: [CChar] = [
    0x48, 0x65, 0x6C, 0x6C, 0x6F,
    0x0D, 0x0A, // Windows-style two byte newline (\r\n)
    0x57, 0x6F, 0x72, 0x6C, 0x64, 0x21,
    0x00
]

let clip = String(utf8String: bytes)!

// This first pattern does not catch the Windows-style newline. In fact it looks
// like it misses it entirely which is not what I expected at all. The printed
// string contains the Windows newline within and no replacing occurred.
let pattern1 = #/[\t\n\r ]+/#
print(clip.replacing(pattern1, with: ", "))

// <this line intentionally left blank>
print()

// Changing the pattern to use the unicodeScalar semantics seems to workaround
// it and the Windows-style newline is properly replaced. I don't know if
// this is more "technically correct" or if I'm hitting a bug here? It seems
// unexpected. The above regex is one I see referenced all over on the web for
// matching the same whitespaces definition that HTML uses, so it seemed odd
// it didn't work in Swift without turning on a flag first.
let pattern2 = pattern1.matchingSemantics(.unicodeScalar)
print(clip.replacing(pattern2, with: ", "))

Here's an Xcode playground file:
RegexBug2.playground.zip

Expected behavior

I expected the pattern to catch Windows-style newlines, but it didn't!

Environment

Xcode 26.0 beta 5 (17A5295f)
swift-driver version: 1.127.11.2 Apple Swift version 6.2 (swiftlang-6.2.0.16.14 clang-1700.3.16.4)
Target: arm64-apple-macosx15.0

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingsemanticsGrapheme cluster / Unicode scalar differences

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions