Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize matching to match on scalar values when possible #525

Merged
merged 31 commits into from
Jul 12, 2022

Conversation

rctcwyvrn
Copy link
Contributor

@rctcwyvrn rctcwyvrn commented Jun 30, 2022

  • Emit matchScalar instructions when emitting ascii quoted literals, ascii characters, and any scalar, as well as when in unicode scalars mode. matchScalar stores the scalar value inline instead of in a register like the character for match, making it faster and less susceptible to ARC
  • Add matchScalarBitset to use the optimization from Use a bitset for ascii-only character classes #511 when in unicode scalars mode

Future work: We currently on match on ASCII because we may be given non normalized strings. Two options to deal with this are to A. normalize inputs B. compile two different programs, one optimized for if the input string is normalized and the other not making any assumptions

Large wins in the DNA benchmarks from #515 since they have long quoted literal sequences in their patterns

=== Improvements =====================================================================
- DnaEndsMatchAll                         200ms	288ms	-88.8ms		-30.8%
- DnaMatchAll                             112ms	158ms	-45.9ms		-29.0%
- DnaEndsMatchFirst                       38.2ms	55.3ms	-17.1ms		-30.9%
- LiteralSearchAll                        70.6ms	85.4ms	-14.8ms		-17.3%
- EmailRFCNoMatchesAll                    98.3ms	108ms	-9.45ms		-8.8%
- EmailRFCAll                             42.2ms	51.2ms	-8.98ms		-17.5%
- DnaMatchFirst                           19.5ms	27.9ms	-8.38ms		-30.0%
- EmailLookaheadAll                       74.9ms	82.5ms	-7.63ms		-9.2%
- EmailLookaheadNoMatchesAll              53.8ms	60.5ms	-6.74ms		-11.1%
- ReluctantQuantWithTerminalWhole         8.6ms	10.5ms	-1.87ms		-17.9%
- HtmlAll                                 11.5ms	13.4ms	-1.85ms		-13.8%
- HangulSyllableAll                       6.69ms	8.25ms	-1.56ms		-18.9%
- TaggedEmojisAll                         16.6ms	18.1ms	-1.5ms		-8.3%
- InvertedCCC                             27.8ms	29.1ms	-1.35ms		-4.6%
- CaseInsensitiveCCC                      11.1ms	12ms	-933µs		-7.8%
- HangulSyllableFirst                     3.1ms	3.99ms	-891µs		-22.3%
- CssAll                                  4.05ms	4.85ms	-794µs		-16.4%
- BasicRangeCCC                           10.4ms	11.2ms	-785µs		-7.0%
- BasicCCC                                9.68ms	10.4ms	-742µs		-7.1%
- NotFoundAll                             6.41ms	7.06ms	-654µs		-9.3%

@rctcwyvrn rctcwyvrn requested a review from milseman June 30, 2022 00:40
if options.semanticLevel == .graphemeCluster {
if options.isCaseInsensitive {
// future work: if all ascii, emit matchBitset instructions with
// case insensitive bitsets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other ideas are to have instructions that say to do case-insensitive matching, that is we have variants of match-character/scalar and similarly for match-sequence.

A special case for ASCII could be a dedicated instruction for the parts of ASCII that are case-sensitive. It could store both scalars to match, or just store the lowercase and the engine would do the arithmetic when matching.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. we'll want to get all of this off of using consumer interfaces at some point, but doesn't have to be this PR.

if optimizationsEnabled && c.isASCII {
for scalar in c.unicodeScalars {
let boundaryCheck = scalar == c.unicodeScalars.last!
builder.buildMatchScalar(scalar, boundaryCheck: boundaryCheck)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea for the builder is to have a convenience function that adds a grapheme boundary check at the current position. It could either update the most recent instruction if that instruction is capable of supporting it or insert the anchor otherwise.

mutating func matchScalar(_ s: Unicode.Scalar, boundaryCheck: Bool) -> Bool {
guard let curScalar = loadScalar(),
curScalar == s,
let idx = nextScalarIndex(offsetBy: 1, boundaryCheck: boundaryCheck)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make more sense to drop the helper function and just do this directly:

guard s == loadScalar(), !boundaryCheck || isOnGrapheme... else { ... }

Longer term we will want to use a lower-level String interface that returns both the current scalar and the next index. E.g. a nicer version of https://github.com/apple/swift/blob/main/stdlib/public/core/UnicodeHelpers.swift#L107.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you could consider adding that now if you like, something like the below and we can micro-optimize it later:

extension String {
  func _consumeScalar(in bounds: Range<String.Index>) -> (Unicode.Scalar, Index)? {
    assert(indices.contains(bounds))
    ... 
  }
}

@rctcwyvrn
Copy link
Contributor Author

While implementing case insensitive matching instructions I opted for adding new instructions for each new case, however after doing that I benchmarked it and noticed pretty significant regressions

Comparing against benchmark result file before_case_insens_instrs.json
=== Regressions ======================================================================
- HangulSyllableFirst                     3.22ms	3.02ms	195µs		6.4%
- CssAll                                  4.34ms	4.05ms	289µs		7.1%
- HangulSyllableAll                       7.09ms	6.69ms	394µs		5.9%
- EagarQuantWithTerminalWhole             7ms	6.5ms	505µs		7.8%
- NotFoundAll                             6.95ms	6.42ms	535µs		8.3%
- GraphemeBreakNoCapAll                   9.8ms	9.22ms	578µs		6.3%
- BasicRangeCCC                           11.2ms	10.4ms	830µs		8.0%
- TaggedEmojisAll                         17.3ms	16.5ms	833µs		5.0%
- BasicCCC                                10.5ms	9.69ms	852µs		8.8%
- ReluctantQuantWithTerminalWhole         9.47ms	8.6ms	869µs		10.1%
- HtmlAll                                 12.4ms	11.5ms	873µs		7.6%
- IntersectionCCC                         16.2ms	15.3ms	901µs		5.9%
- SubtractionCCC                          15.7ms	14.8ms	936µs		6.3%
- CaseInsensitiveCCC                      12.1ms	11.1ms	1.03ms		9.3%
- symDiffCCC                              39.5ms	38.4ms	1.04ms		2.7%
- AnchoredNotFoundAll                     16ms	14.9ms	1.1ms		7.4%
- ReluctantQuantWhole                     15.2ms	13.9ms	1.27ms		9.1%
- InvertedCCC                             28.7ms	27.3ms	1.36ms		5.0%
- DnaMatchFirst                           21.4ms	19.6ms	1.79ms		9.2%
- EmailRFCAll                             45.5ms	42.2ms	3.28ms		7.8%
- DnaEndsMatchFirst                       41.9ms	38.1ms	3.8ms		10.0%
- EmailRFCNoMatchesAll                    102ms	98.3ms	3.99ms		4.1%
- LiteralSearchNotFoundAll                67.7ms	63.1ms	4.57ms		7.2%
- EmailLookaheadNoMatchesAll              58.8ms	53.8ms	4.94ms		9.2%
- LiteralSearchAll                        75.7ms	70.7ms	5.04ms		7.1%
- EmojiRegexAll                           68.1ms	61.7ms	6.38ms		10.3%
- EmailLookaheadAll                       81.3ms	74.8ms	6.47ms		8.7%
- DnaMatchAll                             123ms	112ms	10.6ms		9.4%
- DnaEndsMatchAll                         222ms	200ms	21.9ms		11.0%

After adjusting the scalar matching instructions to use one instruction with a more complex payload instead of 3 instructions the regression was much less significant

Comparing against benchmark result file before_case_insens_instrs.json
=== Regressions ======================================================================
- CssAll                                  4.08ms	4.05ms	29.7µs		0.7%
- EagarQuantWithTerminalWhole             6.56ms	6.5ms	61.7µs		0.9%
- NotFoundAll                             6.5ms	6.42ms	77.9µs		1.2%
- GraphemeBreakNoCapAll                   9.31ms	9.22ms	94.8µs		1.0%
- HtmlAll                                 11.6ms	11.5ms	101µs		0.9%
- ReluctantQuantWithTerminalWhole         8.75ms	8.6ms	150µs		1.7%
- InvertedCCC                             27.5ms	27.3ms	153µs		0.6%
- ReluctantQuantWhole                     14.1ms	13.9ms	164µs		1.2%
- DnaMatchFirst                           19.7ms	19.6ms	168µs		0.9%
- TaggedEmojisAll                         16.7ms	16.5ms	172µs		1.0%
- IntersectionCCC                         15.5ms	15.3ms	181µs		1.2%
- SubtractionCCC                          15ms	14.8ms	188µs		1.3%
- DnaEndsMatchFirst                       38.4ms	38.1ms	217µs		0.6%
- LiteralSearchAll                        71.4ms	70.7ms	729µs		1.0%
- EmailRFCAll                             43ms	42.2ms	787µs		1.9%
- DnaMatchAll                             114ms	112ms	1.16ms		1.0%
- symDiffCCC                              39.9ms	38.4ms	1.49ms		3.9%
- DnaEndsMatchAll                         202ms	200ms	2.46ms		1.2%
- EmailRFCNoMatchesAll                    102ms	98.3ms	3.96ms		4.0%
=== Improvements =====================================================================
- AnchoredNotFoundAll                     14.7ms	14.9ms	-143µs		-1.0%
- BasicRangeCCC                           10.3ms	10.4ms	-121µs		-1.2%
- BasicCCC                                9.58ms	9.69ms	-116µs		-1.2%
- CaseInsensitiveCCC                      11ms	11.1ms	-99.6µs		-0.9%

In the future we could probably continue to squeeze performance out by creating specialized instructions only in the hottest paths but for now I'll implement everything by adding bits into the payload

Got pretty large improvements from removing 6 instructions (matchScalar unchecked, case insensitive, case insensitive unchecked, match case insensitive, match seq, match scalar bitset). I'll have to rework how CompileTests work though, since it'll have to look in the payloads to differentiate the instructions now

Comparing against benchmark result file matchSeq_and_many_case_insens_instrs.json
=== Improvements =====================================================================
- DnaEndsMatchAll                         205ms	220ms	-15.4ms		-7.0%
- DnaMatchAll                             116ms	123ms	-7.73ms		-6.3%
- EmailLookaheadAll                       75.1ms	81.4ms	-6.26ms		-7.7%
- EmailRFCAll                             42.4ms	48.3ms	-5.91ms		-12.2%
- EmojiRegexAll                           62.4ms	68ms	-5.58ms		-8.2%
- EmailRFCNoMatchesAll                    98.8ms	104ms	-5.49ms		-5.3%

@rctcwyvrn
Copy link
Contributor Author

Currently CompileTests are broken, I'll fix them tomorrow

@rctcwyvrn
Copy link
Contributor Author

@swift-ci test

init(scalar: Unicode.Scalar, caseInsensitive: Bool, boundaryCheck: Bool) {
let raw = UInt64(scalar.value)
+ (caseInsensitive ? 1 << 55: 0)
+ (boundaryCheck ? 1 << 54 : 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future: we'll be redoing instruction encoding and can clean this up alongside other changes.

case captureValue
case builtinAssertion
case builtinCharacterClass
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future: this will just be replaced with the encoding/decoding infrastructure

hamishknight and others added 3 commits July 8, 2022 14:57
Previously we would allow consuming a single
scalar in grapheme semantic mode for a DSL `.scalar`.
Change this behavior such that it is treated the
same as a character with a single scalar. That is:

- In grapheme mode it must match an entire grapheme.
- In scalar semantic mode, it may match a single scalar.
Convert AST scalars to DSL scalars such that we
can preserve the `\u{...}` syntax where the user
chooses to use it. This requires fixing up some
escaping logic such that we don't escape `\u{...}`
sequences.
@rctcwyvrn
Copy link
Contributor Author

@swift-ci test

@milseman
Copy link
Member

@swift-ci please test

@rctcwyvrn
Copy link
Contributor Author

@swift-ci test

@rctcwyvrn
Copy link
Contributor Author

@swift-ci test

emitCharacter(Character(s))
} else {
emitMatchScalar(s)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fail tests, and if it's not we may need to write the tests and XFAIL them. E.g. /a\u{301}/, IIUC, as written would try to make a proper Character from the combining scalar.

@rctcwyvrn
Copy link
Contributor Author

rctcwyvrn commented Jul 12, 2022

@swift-ci test

1 similar comment
@rctcwyvrn
Copy link
Contributor Author

@swift-ci test

@rctcwyvrn
Copy link
Contributor Author

@swift-ci test

@rctcwyvrn rctcwyvrn merged commit 7752047 into swiftlang:main Jul 12, 2022
rctcwyvrn added a commit to rctcwyvrn/swift-experimental-string-processing that referenced this pull request Jul 12, 2022
)

- Adds new instructions for matching characters and scalars case insensitively
- Compiles ascii character matches into the faster scalar match instructions even in grapheme semantic mode
- Optimizes out unnecessary runtime grapheme boundary checks for all ascii strings
- Also includes fixes to scalar matching in grapheme semantic mode (swiftlang#565)
hamishknight pushed a commit to hamishknight/swift-experimental-string-processing that referenced this pull request Jul 19, 2022
)

- Adds new instructions for matching characters and scalars case insensitively
- Compiles ascii character matches into the faster scalar match instructions even in grapheme semantic mode
- Optimizes out unnecessary runtime grapheme boundary checks for all ascii strings
- Also includes fixes to scalar matching in grapheme semantic mode (swiftlang#565)
hamishknight pushed a commit to hamishknight/swift-experimental-string-processing that referenced this pull request Jul 21, 2022
)

- Adds new instructions for matching characters and scalars case insensitively
- Compiles ascii character matches into the faster scalar match instructions even in grapheme semantic mode
- Optimizes out unnecessary runtime grapheme boundary checks for all ascii strings
- Also includes fixes to scalar matching in grapheme semantic mode (swiftlang#565)
hamishknight pushed a commit to rctcwyvrn/swift-experimental-string-processing that referenced this pull request Jul 21, 2022
)

- Adds new instructions for matching characters and scalars case insensitively
- Compiles ascii character matches into the faster scalar match instructions even in grapheme semantic mode
- Optimizes out unnecessary runtime grapheme boundary checks for all ascii strings
- Also includes fixes to scalar matching in grapheme semantic mode (swiftlang#565)
hamishknight pushed a commit to hamishknight/swift-experimental-string-processing that referenced this pull request Jul 21, 2022
)

- Adds new instructions for matching characters and scalars case insensitively
- Compiles ascii character matches into the faster scalar match instructions even in grapheme semantic mode
- Optimizes out unnecessary runtime grapheme boundary checks for all ascii strings
- Also includes fixes to scalar matching in grapheme semantic mode (swiftlang#565)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants