memoize minimumTokenRankContainingGrapheme #1825

josharian · 2023-08-23T00:31:44Z

minimumTokenRankContainingGrapheme was accidentally
quadratic in the number of tokens sharing a grapheme.

It was executed for every token, and for each token,
it considered every single other token sharing a candidate grapheme.

It dominated hat allocation performance for larger numbers of tokens.

Memoizing the result by grapheme text makes it linear again.

No functional changes. (Confirmed by hat golden tests on another branch.)

Checklist

[/] I have added tests
[/] I have updated the docs and cheatsheet
[/] I have not broken the cheatsheet

minimumTokenRankContainingGrapheme was accidentally quadratic in the number of tokens sharing a grapheme. It was executed for every token, and for each token, it considered every single other token sharing a candidate grapheme. It dominated hat allocation performance for larger numbers of tokens. Memoizing the result by grapheme text makes it linear again. No functional changes.

pokey

Nice find. I don't love that we rely on the honour system to determine the cache key, though. I'd prefer if we memoized the sub-function that sees only hatCandidate.grapheme.text. Does that make sense?

josharian · 2023-08-31T00:33:10Z

I don't love that we rely on the honour system to determine the cache key, though

I hear you. I'm not sure there's a good alternative.

Does that make sense?

is this what you meant?

--- a/packages/cursorless-engine/src/util/allocateHats/HatMetrics.ts
+++ b/packages/cursorless-engine/src/util/allocateHats/HatMetrics.ts
@@ -50,14 +50,12 @@ export function minimumTokenRankContainingGrapheme(
   tokenRank: number,
   graphemeTokenRanks: { [key: string]: number[] },
 ): HatMetric {
-  return memoizedHatMetric(
-    ({ grapheme: { text } }): number => {
+  return ({ grapheme: { text } }): number =>
+    memoizedHatScore(() => {
       return (
         min(graphemeTokenRanks[text].filter((r) => r > tokenRank)) ?? Infinity
       );
-    },
-    ({ grapheme }) => grapheme.text,
-  );
+    })();
 }
 
 /**
@@ -93,27 +91,21 @@ export function penaltyEquivalenceClass(hatStability: HatStability): HatMetric {
 }
 
 /**
- * Memoizes a hat metric based on a key function.
+ * Memoizes a function that returns a number.
  * Hat allocation can be highly repetitive across any given dimension
  * (grapheme, hat style, etc).
  * This helps us avoid accidentally quadratic behavior in the number of tokens
  * in minimumTokenRankContainingGrapheme.
- * @param fn The hat metric to memoize
  * @param key A function that returns a key for a given hat candidate
- * @returns A memoized version of the hat metric
+ * @returns A memoized version of the function
  */
-function memoizedHatMetric(
-  fn: HatMetric,
-  key: (hat: HatCandidate) => any,
-): HatMetric {
-  const cache = new Map<any, number>();
-  return (hat: HatCandidate): number => {
-    const k = key(hat);
-    if (cache.has(k)) {
-      return cache.get(k) as number;
+function memoizedHatScore(fn: () => number): () => number {
+  let cache: number | undefined = undefined;
+  return (): number => {
+    if (cache != null) {
+      return cache;
     }
-    const result = fn(hat);
-    cache.set(k, result);
-    return result;
+    cache = fn();
+    return cache;
   };
 }

if so, unfortunately, it doesn't work--the memoization happens at the wrong level to be effective.

josharian · 2023-08-31T00:51:52Z

it is also worth mentioning that I have designs on deleting minimumTokenRankContainingGrapheme entirely. :)

but I'm not yet at a point where I can confidently assert that that will actually happen. :P

pokey · 2023-09-01T14:09:47Z

I was thinking something like 37ba8bc, but maybe I'm missing something? Worth checking that it still works / has desired performance effect tho; not sure how to do that

If 37ba8bc works / looks good; feel free to merge this one

Btw how far are we from getting your hat tests working? Is that just waiting on a review from me? Would be great if you didn't have to just keep testing this stuff locally 😅

Would also be cool if we could get performance regression tests on hats, but I know performance regression testing is a challenge so I'm fine with punting on that

josharian · 2023-09-01T18:24:48Z

far are we from getting your hat tests working? Is that just waiting on a review from me?

i jotted down a todo list at #1815 (comment)

josharian · 2023-09-01T18:25:27Z

Would also be cool if we could get performance regression tests on hats

Once #1815 is in, if there are any significant performance regressions, it will likely cause test time outs in CI which is...something?

josharian · 2023-09-01T18:26:47Z

I was thinking something like 37ba8bc

oooh, library memoize. :) will look soon.

josharian · 2023-09-06T00:19:46Z

checking that it still works / has desired performance effect

finally tested this--looks good! thanks, this is much better.

lodash docs:

By default, the first argument provided to the memoized function is used as the map cache key.

I feel so deeply ambivalent about that.

but it works! ship it!

pokey · 2023-09-06T02:52:41Z

lodash docs:

By default, the first argument provided to the memoized function is used as the map cache key.

I feel so deeply ambivalent about that.

😅 yeah I had the same reaction

josharian requested a review from pokey as a code owner August 23, 2023 00:31

josharian mentioned this pull request Aug 24, 2023

start playing with hat tests #1815

Draft

pokey reviewed Aug 30, 2023

View reviewed changes

Attempt to memoize just core function

37ba8bc

pokey approved these changes Sep 1, 2023

View reviewed changes

josharian added this pull request to the merge queue Sep 6, 2023

Merged via the queue into main with commit 6e2b5ff Sep 6, 2023

josharian deleted the josh/memoize branch September 6, 2023 00:32

auscompgeek added the performance label Sep 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

memoize minimumTokenRankContainingGrapheme #1825

memoize minimumTokenRankContainingGrapheme #1825

Uh oh!

josharian commented Aug 23, 2023

Uh oh!

pokey left a comment

Uh oh!

josharian commented Aug 31, 2023

Uh oh!

josharian commented Aug 31, 2023

Uh oh!

pokey commented Sep 1, 2023

Uh oh!

josharian commented Sep 1, 2023

Uh oh!

josharian commented Sep 1, 2023 •

edited

Loading

Uh oh!

josharian commented Sep 1, 2023

Uh oh!

josharian commented Sep 6, 2023

Uh oh!

pokey commented Sep 6, 2023

Uh oh!

Uh oh!

Uh oh!

memoize minimumTokenRankContainingGrapheme #1825

memoize minimumTokenRankContainingGrapheme #1825

Uh oh!

Conversation

josharian commented Aug 23, 2023

Checklist

Uh oh!

pokey left a comment

Choose a reason for hiding this comment

Uh oh!

josharian commented Aug 31, 2023

Uh oh!

josharian commented Aug 31, 2023

Uh oh!

pokey commented Sep 1, 2023

Uh oh!

josharian commented Sep 1, 2023

Uh oh!

josharian commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josharian commented Sep 1, 2023

Uh oh!

josharian commented Sep 6, 2023

Uh oh!

pokey commented Sep 6, 2023

Uh oh!

Uh oh!

josharian commented Sep 1, 2023 •

edited

Loading