-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memoize minimumTokenRankContainingGrapheme #1825
Conversation
minimumTokenRankContainingGrapheme was accidentally quadratic in the number of tokens sharing a grapheme. It was executed for every token, and for each token, it considered every single other token sharing a candidate grapheme. It dominated hat allocation performance for larger numbers of tokens. Memoizing the result by grapheme text makes it linear again. No functional changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find. I don't love that we rely on the honour system to determine the cache key, though. I'd prefer if we memoized the sub-function that sees only hatCandidate.grapheme.text
. Does that make sense?
I hear you. I'm not sure there's a good alternative.
is this what you meant? --- a/packages/cursorless-engine/src/util/allocateHats/HatMetrics.ts
+++ b/packages/cursorless-engine/src/util/allocateHats/HatMetrics.ts
@@ -50,14 +50,12 @@ export function minimumTokenRankContainingGrapheme(
tokenRank: number,
graphemeTokenRanks: { [key: string]: number[] },
): HatMetric {
- return memoizedHatMetric(
- ({ grapheme: { text } }): number => {
+ return ({ grapheme: { text } }): number =>
+ memoizedHatScore(() => {
return (
min(graphemeTokenRanks[text].filter((r) => r > tokenRank)) ?? Infinity
);
- },
- ({ grapheme }) => grapheme.text,
- );
+ })();
}
/**
@@ -93,27 +91,21 @@ export function penaltyEquivalenceClass(hatStability: HatStability): HatMetric {
}
/**
- * Memoizes a hat metric based on a key function.
+ * Memoizes a function that returns a number.
* Hat allocation can be highly repetitive across any given dimension
* (grapheme, hat style, etc).
* This helps us avoid accidentally quadratic behavior in the number of tokens
* in minimumTokenRankContainingGrapheme.
- * @param fn The hat metric to memoize
* @param key A function that returns a key for a given hat candidate
- * @returns A memoized version of the hat metric
+ * @returns A memoized version of the function
*/
-function memoizedHatMetric(
- fn: HatMetric,
- key: (hat: HatCandidate) => any,
-): HatMetric {
- const cache = new Map<any, number>();
- return (hat: HatCandidate): number => {
- const k = key(hat);
- if (cache.has(k)) {
- return cache.get(k) as number;
+function memoizedHatScore(fn: () => number): () => number {
+ let cache: number | undefined = undefined;
+ return (): number => {
+ if (cache != null) {
+ return cache;
}
- const result = fn(hat);
- cache.set(k, result);
- return result;
+ cache = fn();
+ return cache;
};
} if so, unfortunately, it doesn't work--the memoization happens at the wrong level to be effective. |
it is also worth mentioning that I have designs on deleting minimumTokenRankContainingGrapheme entirely. :) but I'm not yet at a point where I can confidently assert that that will actually happen. :P |
I was thinking something like 37ba8bc, but maybe I'm missing something? Worth checking that it still works / has desired performance effect tho; not sure how to do that If 37ba8bc works / looks good; feel free to merge this one Btw how far are we from getting your hat tests working? Is that just waiting on a review from me? Would be great if you didn't have to just keep testing this stuff locally 😅 Would also be cool if we could get performance regression tests on hats, but I know performance regression testing is a challenge so I'm fine with punting on that |
i jotted down a todo list at #1815 (comment) |
Once #1815 is in, if there are any significant performance regressions, it will likely cause test time outs in CI which is...something? |
oooh, library memoize. :) will look soon. |
finally tested this--looks good! thanks, this is much better. lodash docs:
I feel so deeply ambivalent about that. but it works! ship it! |
😅 yeah I had the same reaction |
minimumTokenRankContainingGrapheme was accidentally
quadratic in the number of tokens sharing a grapheme.
It was executed for every token, and for each token,
it considered every single other token sharing a candidate grapheme.
It dominated hat allocation performance for larger numbers of tokens.
Memoizing the result by grapheme text makes it linear again.
No functional changes. (Confirmed by hat golden tests on another branch.)
Checklist