Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memoize minimumTokenRankContainingGrapheme #1825

Merged
merged 2 commits into from
Sep 6, 2023
Merged

memoize minimumTokenRankContainingGrapheme #1825

merged 2 commits into from
Sep 6, 2023

Conversation

josharian
Copy link
Collaborator

minimumTokenRankContainingGrapheme was accidentally
quadratic in the number of tokens sharing a grapheme.

It was executed for every token, and for each token,
it considered every single other token sharing a candidate grapheme.

It dominated hat allocation performance for larger numbers of tokens.

Memoizing the result by grapheme text makes it linear again.

No functional changes. (Confirmed by hat golden tests on another branch.)

Checklist

  • [/] I have added tests
  • [/] I have updated the docs and cheatsheet
  • [/] I have not broken the cheatsheet

minimumTokenRankContainingGrapheme was accidentally
quadratic in the number of tokens sharing a grapheme.

It was executed for every token, and for each token,
it considered every single other token sharing a candidate grapheme.

It dominated hat allocation performance for larger numbers of tokens.

Memoizing the result by grapheme text makes it linear again.

No functional changes.
Copy link
Member

@pokey pokey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find. I don't love that we rely on the honour system to determine the cache key, though. I'd prefer if we memoized the sub-function that sees only hatCandidate.grapheme.text. Does that make sense?

@josharian
Copy link
Collaborator Author

I don't love that we rely on the honour system to determine the cache key, though

I hear you. I'm not sure there's a good alternative.

Does that make sense?

is this what you meant?

--- a/packages/cursorless-engine/src/util/allocateHats/HatMetrics.ts
+++ b/packages/cursorless-engine/src/util/allocateHats/HatMetrics.ts
@@ -50,14 +50,12 @@ export function minimumTokenRankContainingGrapheme(
   tokenRank: number,
   graphemeTokenRanks: { [key: string]: number[] },
 ): HatMetric {
-  return memoizedHatMetric(
-    ({ grapheme: { text } }): number => {
+  return ({ grapheme: { text } }): number =>
+    memoizedHatScore(() => {
       return (
         min(graphemeTokenRanks[text].filter((r) => r > tokenRank)) ?? Infinity
       );
-    },
-    ({ grapheme }) => grapheme.text,
-  );
+    })();
 }
 
 /**
@@ -93,27 +91,21 @@ export function penaltyEquivalenceClass(hatStability: HatStability): HatMetric {
 }
 
 /**
- * Memoizes a hat metric based on a key function.
+ * Memoizes a function that returns a number.
  * Hat allocation can be highly repetitive across any given dimension
  * (grapheme, hat style, etc).
  * This helps us avoid accidentally quadratic behavior in the number of tokens
  * in minimumTokenRankContainingGrapheme.
- * @param fn The hat metric to memoize
  * @param key A function that returns a key for a given hat candidate
- * @returns A memoized version of the hat metric
+ * @returns A memoized version of the function
  */
-function memoizedHatMetric(
-  fn: HatMetric,
-  key: (hat: HatCandidate) => any,
-): HatMetric {
-  const cache = new Map<any, number>();
-  return (hat: HatCandidate): number => {
-    const k = key(hat);
-    if (cache.has(k)) {
-      return cache.get(k) as number;
+function memoizedHatScore(fn: () => number): () => number {
+  let cache: number | undefined = undefined;
+  return (): number => {
+    if (cache != null) {
+      return cache;
     }
-    const result = fn(hat);
-    cache.set(k, result);
-    return result;
+    cache = fn();
+    return cache;
   };
 }

if so, unfortunately, it doesn't work--the memoization happens at the wrong level to be effective.

@josharian
Copy link
Collaborator Author

it is also worth mentioning that I have designs on deleting minimumTokenRankContainingGrapheme entirely. :)

but I'm not yet at a point where I can confidently assert that that will actually happen. :P

@pokey
Copy link
Member

pokey commented Sep 1, 2023

I was thinking something like 37ba8bc, but maybe I'm missing something? Worth checking that it still works / has desired performance effect tho; not sure how to do that

If 37ba8bc works / looks good; feel free to merge this one

Btw how far are we from getting your hat tests working? Is that just waiting on a review from me? Would be great if you didn't have to just keep testing this stuff locally 😅

Would also be cool if we could get performance regression tests on hats, but I know performance regression testing is a challenge so I'm fine with punting on that

@josharian
Copy link
Collaborator Author

far are we from getting your hat tests working? Is that just waiting on a review from me?

i jotted down a todo list at #1815 (comment)

@josharian
Copy link
Collaborator Author

josharian commented Sep 1, 2023

Would also be cool if we could get performance regression tests on hats

Once #1815 is in, if there are any significant performance regressions, it will likely cause test time outs in CI which is...something?

@josharian
Copy link
Collaborator Author

I was thinking something like 37ba8bc

oooh, library memoize. :) will look soon.

@josharian
Copy link
Collaborator Author

checking that it still works / has desired performance effect

finally tested this--looks good! thanks, this is much better.

lodash docs:

By default, the first argument provided to the memoized function is used as the map cache key.

I feel so deeply ambivalent about that.

but it works! ship it!

@josharian josharian added this pull request to the merge queue Sep 6, 2023
Merged via the queue into main with commit 6e2b5ff Sep 6, 2023
@josharian josharian deleted the josh/memoize branch September 6, 2023 00:32
@pokey
Copy link
Member

pokey commented Sep 6, 2023

lodash docs:

By default, the first argument provided to the memoized function is used as the map cache key.

I feel so deeply ambivalent about that.

😅 yeah I had the same reaction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants