-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update swift/main with recent changes #651
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Since we're atomically initializing the compiled program in `Regex.Program`, we need to pair that with an atomic load. Resolves swiftlang#609.
The `default` and `simple` word boundaries have different behaviors at the start and end of strings/lines. These tests validate that we have the correct behavior implemented. Related to issue swiftlang#613.
The old version looks like it was accidentally duplicated from anchorsMatchLineEndings(_:) just below it.
* Remove `RegexConsumer` and fix its dependencies This eliminates the RegexConsumer type and rewrites its users to call through to other, existing functionality on Regex or in the Algorithms implementations. RegexConsumer doesn't take account of the dual subranges required for matching, so it can produce results that are inconsistent with matches(of:) and ranges(of:), which were rewritten earlier. rdar://102841216 * Remove remaining from-end algorithm methods This removes methods that are left over from when we were considering from-end algorithms. These aren't tested and may not have the correct semantics, so it's safer to remove them entirely.
This includes documentation improvements for core types/methods, RegexBuilder types along with their generated variadic initializers, and adds some curation. It also includes tests of the documentation code samples.
This feature depends on running with a Swift 5.7 stdlib, and fails when that isn't available.
Add tweaks for Android
These changes work around a change to the way result builders are compiled that removes the ability for result builder closure outputs to affect the overload resolution elsewhere in an expression. Workarounds for rdar://104881395 and rdar://104645543
A recent compiler change results in fileprivate arrays sometimes not keeping their buffers around long enough. This change avoids that issue by removing the fileprivate annotations from the affected type.
…rkaround Add type annotations in RegexBuilder tests
…in the result builder. <rdar://104480703>
…onversion Fix an issue where named character classes weren't getting converted …
When searching for a substring that doesn't exist, it was possible for TwoWaySearcher to advance beyond the end of the search string, causing a crash. This change adds a `limitedBy:` parameter to that index movement, avoiding the invalid movement. Fixes rdar://105154010
vertial -> vertical rdar://104602317
Some regex literals (and presumably other `Regex` instances) lose their output type information when used in a RegexBuilder closure due to the way the concatenating builder calls are overloaded. In particular, any output type with labeled tuples or where the sum of tuple components in the accumulated and new output types is greater than 10 will be ignored. Regex internals don't make this distinction, however, so there ends up being a mismatch between what a `Regex.Match` instance tries to produce and the output type of the outermost regex. For example, this code results in a crash, because `regex` is a `Regex<Substring>` but the match tries to produce a `(Substring, number: Substring)`: let regex = Regex { ZeroOrMore(.whitespace) /:(?<number>\d+):/ ZeroOrMore(.whitespace) } let match = try regex.wholeMatch(in: " :21: ") print(match!.output) To fix this, we add a new `ignoreCapturesInTypedOutput` DSLTree node to mark situations where the output type is discarded. This status is propagated through the capture list into the match's storage, which lets us produce the correct output type. Note that we can't just drop the capture groups when building the compiled program because (1) different parts of the regex might reference the capture group and (2) all capture groups are available if a developer converts the output to `AnyRegexOutput`. let anyOutput = AnyRegexOutput(match) // anyOutput[1] == "21" // anyOutput["number"] == Optional("21") Fixes swiftlang#625. rdar://104823356 Note: Linux seems to crash on different tests when the two customTest overloads have `internal` visibility or are called. Switching one of the functions to be generic over a RegexComponent works around the issue.
This supports a type checker fix after the change in how result builder closure parameters are type-checked.
Type checker workaround: adjust test
Short-circuit Character.isASCII checks inside built in character class matching. Also, make benchmark try a few more times before giving up.
General ASCII fast-paths for builtin character classes
We decided not to support the `anyScalar` character class, which would match a single Unicode scalar regardless of matching mode. However, its representation was still included in the various character class types in the regex engine, leading to unreachable code and unclear requirements when changing or adding new code. This change removes that representation where possible. The `DSLTree.Atom.CharacterClass` enum is left unchanged, since it is marked `@_spi(RegexBuilder) public`. Any use of that enum case is handled with a `fatalError("Unsupported")`, and it isn't produced on any code path.
@swift-ci please test |
@swift-ci please test macOS platform |
Full Swift CI passed: swiftlang/swift#64922 |
milseman
added a commit
to milseman/swift-experimental-string-processing
that referenced
this pull request
Apr 5, 2023
* Atomically load the lowered program (swiftlang#610) Since we're atomically initializing the compiled program in `Regex.Program`, we need to pair that with an atomic load. Resolves swiftlang#609. * Add tests for line start/end word boundary diffs (swiftlang#616) The `default` and `simple` word boundaries have different behaviors at the start and end of strings/lines. These tests validate that we have the correct behavior implemented. Related to issue swiftlang#613. * Add tweaks for Android * Fix documentation typo (swiftlang#615) * Fix abstract for Regex.dotMatchesNewlines(_:). (swiftlang#614) The old version looks like it was accidentally duplicated from anchorsMatchLineEndings(_:) just below it. * Remove `RegexConsumer` and fix its dependencies (swiftlang#617) * Remove `RegexConsumer` and fix its dependencies This eliminates the RegexConsumer type and rewrites its users to call through to other, existing functionality on Regex or in the Algorithms implementations. RegexConsumer doesn't take account of the dual subranges required for matching, so it can produce results that are inconsistent with matches(of:) and ranges(of:), which were rewritten earlier. rdar://102841216 * Remove remaining from-end algorithm methods This removes methods that are left over from when we were considering from-end algorithms. These aren't tested and may not have the correct semantics, so it's safer to remove them entirely. * Improve StringProcessing and RegexBuilder documentation (swiftlang#611) This includes documentation improvements for core types/methods, RegexBuilder types along with their generated variadic initializers, and adds some curation. It also includes tests of the documentation code samples. * Set availability for inverted character class test (swiftlang#621) This feature depends on running with a Swift 5.7 stdlib, and fails when that isn't available. * Add type annotations in RegexBuilder tests These changes work around a change to the way result builders are compiled that removes the ability for result builder closure outputs to affect the overload resolution elsewhere in an expression. Workarounds for rdar://104881395 and rdar://104645543 * Workaround for fileprivate array issue A recent compiler change results in fileprivate arrays sometimes not keeping their buffers around long enough. This change avoids that issue by removing the fileprivate annotations from the affected type. * Fix an issue where named character classes weren't getting converted in the result builder. <rdar://104480703> * Stop at end of search string in TwoWaySearcher (swiftlang#631) When searching for a substring that doesn't exist, it was possible for TwoWaySearcher to advance beyond the end of the search string, causing a crash. This change adds a `limitedBy:` parameter to that index movement, avoiding the invalid movement. Fixes rdar://105154010 * Correct misspelling in DSL renderer (swiftlang#627) vertial -> vertical rdar://104602317 * Fix output type mismatch with RegexBuilder (swiftlang#626) Some regex literals (and presumably other `Regex` instances) lose their output type information when used in a RegexBuilder closure due to the way the concatenating builder calls are overloaded. In particular, any output type with labeled tuples or where the sum of tuple components in the accumulated and new output types is greater than 10 will be ignored. Regex internals don't make this distinction, however, so there ends up being a mismatch between what a `Regex.Match` instance tries to produce and the output type of the outermost regex. For example, this code results in a crash, because `regex` is a `Regex<Substring>` but the match tries to produce a `(Substring, number: Substring)`: let regex = Regex { ZeroOrMore(.whitespace) /:(?<number>\d+):/ ZeroOrMore(.whitespace) } let match = try regex.wholeMatch(in: " :21: ") print(match!.output) To fix this, we add a new `ignoreCapturesInTypedOutput` DSLTree node to mark situations where the output type is discarded. This status is propagated through the capture list into the match's storage, which lets us produce the correct output type. Note that we can't just drop the capture groups when building the compiled program because (1) different parts of the regex might reference the capture group and (2) all capture groups are available if a developer converts the output to `AnyRegexOutput`. let anyOutput = AnyRegexOutput(match) // anyOutput[1] == "21" // anyOutput["number"] == Optional("21") Fixes swiftlang#625. rdar://104823356 Note: Linux seems to crash on different tests when the two customTest overloads have `internal` visibility or are called. Switching one of the functions to be generic over a RegexComponent works around the issue. * Revert "Merge pull request swiftlang#628 from apple/result_builder_changes_workaround" This reverts commit 7e059b7, reversing changes made to 3ca8b13. * Use `some` syntax in variadics This supports a type checker fix after the change in how result builder closure parameters are type-checked. * Type checker workaround: adjust test * Further refactor to work around type checker regression * Align availability macro with OS versions (swiftlang#641) * Speed up general character class matching (swiftlang#642) Short-circuit Character.isASCII checks inside built in character class matching. Also, make benchmark try a few more times before giving up. * Test for \s matching CRLF when scalar matching (swiftlang#648) * General ascii fast paths for character classes (swiftlang#644) General ASCII fast-paths for builtin character classes * Remove the unsupported `anyScalar` case (swiftlang#650) We decided not to support the `anyScalar` character class, which would match a single Unicode scalar regardless of matching mode. However, its representation was still included in the various character class types in the regex engine, leading to unreachable code and unclear requirements when changing or adding new code. This change removes that representation where possible. The `DSLTree.Atom.CharacterClass` enum is left unchanged, since it is marked `@_spi(RegexBuilder) public`. Any use of that enum case is handled with a `fatalError("Unsupported")`, and it isn't produced on any code path. --------- Co-authored-by: Nate Cook <[email protected]> Co-authored-by: Butta <[email protected]> Co-authored-by: Ole Begemann <[email protected]> Co-authored-by: Alex Martini <[email protected]> Co-authored-by: Alejandro Alonso <[email protected]> Co-authored-by: David Ewing <[email protected]> Co-authored-by: Dave Ewing <[email protected]>
milseman
added a commit
that referenced
this pull request
Apr 5, 2023
* Atomically load the lowered program (#610) Since we're atomically initializing the compiled program in `Regex.Program`, we need to pair that with an atomic load. Resolves #609. * Add tests for line start/end word boundary diffs (#616) The `default` and `simple` word boundaries have different behaviors at the start and end of strings/lines. These tests validate that we have the correct behavior implemented. Related to issue #613. * Add tweaks for Android * Fix documentation typo (#615) * Fix abstract for Regex.dotMatchesNewlines(_:). (#614) The old version looks like it was accidentally duplicated from anchorsMatchLineEndings(_:) just below it. * Remove `RegexConsumer` and fix its dependencies (#617) * Remove `RegexConsumer` and fix its dependencies This eliminates the RegexConsumer type and rewrites its users to call through to other, existing functionality on Regex or in the Algorithms implementations. RegexConsumer doesn't take account of the dual subranges required for matching, so it can produce results that are inconsistent with matches(of:) and ranges(of:), which were rewritten earlier. rdar://102841216 * Remove remaining from-end algorithm methods This removes methods that are left over from when we were considering from-end algorithms. These aren't tested and may not have the correct semantics, so it's safer to remove them entirely. * Improve StringProcessing and RegexBuilder documentation (#611) This includes documentation improvements for core types/methods, RegexBuilder types along with their generated variadic initializers, and adds some curation. It also includes tests of the documentation code samples. * Set availability for inverted character class test (#621) This feature depends on running with a Swift 5.7 stdlib, and fails when that isn't available. * Add type annotations in RegexBuilder tests These changes work around a change to the way result builders are compiled that removes the ability for result builder closure outputs to affect the overload resolution elsewhere in an expression. Workarounds for rdar://104881395 and rdar://104645543 * Workaround for fileprivate array issue A recent compiler change results in fileprivate arrays sometimes not keeping their buffers around long enough. This change avoids that issue by removing the fileprivate annotations from the affected type. * Fix an issue where named character classes weren't getting converted in the result builder. <rdar://104480703> * Stop at end of search string in TwoWaySearcher (#631) When searching for a substring that doesn't exist, it was possible for TwoWaySearcher to advance beyond the end of the search string, causing a crash. This change adds a `limitedBy:` parameter to that index movement, avoiding the invalid movement. Fixes rdar://105154010 * Correct misspelling in DSL renderer (#627) vertial -> vertical rdar://104602317 * Fix output type mismatch with RegexBuilder (#626) Some regex literals (and presumably other `Regex` instances) lose their output type information when used in a RegexBuilder closure due to the way the concatenating builder calls are overloaded. In particular, any output type with labeled tuples or where the sum of tuple components in the accumulated and new output types is greater than 10 will be ignored. Regex internals don't make this distinction, however, so there ends up being a mismatch between what a `Regex.Match` instance tries to produce and the output type of the outermost regex. For example, this code results in a crash, because `regex` is a `Regex<Substring>` but the match tries to produce a `(Substring, number: Substring)`: let regex = Regex { ZeroOrMore(.whitespace) /:(?<number>\d+):/ ZeroOrMore(.whitespace) } let match = try regex.wholeMatch(in: " :21: ") print(match!.output) To fix this, we add a new `ignoreCapturesInTypedOutput` DSLTree node to mark situations where the output type is discarded. This status is propagated through the capture list into the match's storage, which lets us produce the correct output type. Note that we can't just drop the capture groups when building the compiled program because (1) different parts of the regex might reference the capture group and (2) all capture groups are available if a developer converts the output to `AnyRegexOutput`. let anyOutput = AnyRegexOutput(match) // anyOutput[1] == "21" // anyOutput["number"] == Optional("21") Fixes #625. rdar://104823356 Note: Linux seems to crash on different tests when the two customTest overloads have `internal` visibility or are called. Switching one of the functions to be generic over a RegexComponent works around the issue. * Revert "Merge pull request #628 from apple/result_builder_changes_workaround" This reverts commit 7e059b7, reversing changes made to 3ca8b13. * Use `some` syntax in variadics This supports a type checker fix after the change in how result builder closure parameters are type-checked. * Type checker workaround: adjust test * Further refactor to work around type checker regression * Align availability macro with OS versions (#641) * Speed up general character class matching (#642) Short-circuit Character.isASCII checks inside built in character class matching. Also, make benchmark try a few more times before giving up. * Test for \s matching CRLF when scalar matching (#648) * General ascii fast paths for character classes (#644) General ASCII fast-paths for builtin character classes * Remove the unsupported `anyScalar` case (#650) We decided not to support the `anyScalar` character class, which would match a single Unicode scalar regardless of matching mode. However, its representation was still included in the various character class types in the regex engine, leading to unreachable code and unclear requirements when changing or adding new code. This change removes that representation where possible. The `DSLTree.Atom.CharacterClass` enum is left unchanged, since it is marked `@_spi(RegexBuilder) public`. Any use of that enum case is handled with a `fatalError("Unsupported")`, and it isn't produced on any code path. --------- Co-authored-by: Nate Cook <[email protected]> Co-authored-by: Butta <[email protected]> Co-authored-by: Ole Begemann <[email protected]> Co-authored-by: Alex Martini <[email protected]> Co-authored-by: Alejandro Alonso <[email protected]> Co-authored-by: David Ewing <[email protected]> Co-authored-by: Dave Ewing <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Includes the improvements to documentation, dead code removal, and significant character class performance improvements.