You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider the regular expression "(¬")*", which will match some text enclosed in double quotes.
There are two ways to run the "tight loop" currently:
We may use the "scalar" DFA and repeatedly check each character, exiting the loop when we encounter a " character.
We may use SIMD and scan several characters at a time.
The latter is faster than the former for long texts, but not for shorter texts. (I suspect less than one SIMD vector wide? idk) So to achieve best performance, we should choose the more appropriate tight loop implementation. There is no indication of what lengths are expected in the regular expression, and we have no way to annotate such things, so we must rely on runtime feedback (which is probably for the better, in terms of performance/effort on behalf of the user).
According to Cliff Click (from a conversation in the coffee compiler club) it should suffice to maintain a count of how many times the tight loop is iterated. Upon entering the loop, we compare the last count to some crossover point. If we have a higher count, we take the SIMD route, else we use scalar code.
There is also a case for unrolling the scalar code according to Gilbert Baumann, but I haven't tried that yet. For very long loop counts, unrolling the SIMD code may also produce some benefit, but we may face more code bloat; I still intend to compile everything eagerly, because it is simpler, and the duplicate code is not too large anyway.
The text was updated successfully, but these errors were encountered:
More complex tests, such as that for ab|ac, will end up using all the vector pipes of a processor and unrolling won't help there. We need pretty simple tests to benefit from unrolling SIMD code, e.g. just a single code point to test for, rather than a range.
Consider the regular expression
"(¬")*"
, which will match some text enclosed in double quotes.There are two ways to run the "tight loop" currently:
"
character.The latter is faster than the former for long texts, but not for shorter texts. (I suspect less than one SIMD vector wide? idk) So to achieve best performance, we should choose the more appropriate tight loop implementation. There is no indication of what lengths are expected in the regular expression, and we have no way to annotate such things, so we must rely on runtime feedback (which is probably for the better, in terms of performance/effort on behalf of the user).
According to Cliff Click (from a conversation in the coffee compiler club) it should suffice to maintain a count of how many times the tight loop is iterated. Upon entering the loop, we compare the last count to some crossover point. If we have a higher count, we take the SIMD route, else we use scalar code.
There is also a case for unrolling the scalar code according to Gilbert Baumann, but I haven't tried that yet. For very long loop counts, unrolling the SIMD code may also produce some benefit, but we may face more code bloat; I still intend to compile everything eagerly, because it is simpler, and the duplicate code is not too large anyway.
The text was updated successfully, but these errors were encountered: