Optimize methods in List. #337

AStepanov25 · 2025-06-24T14:57:27Z

Profiling results, here time is average among n iterations, memory is cumulative garbage generated by calling the corresponding method.

More details: https://github.com/research-ag/canister-profiling/tree/list

n = 100000

Time:

method	List	Refactored
get	205	205
getOpt	253	246
put	253	225
forEach	106	105
reverseForEach	133	112
find	196	127
findIndex	163	127
findLastIndex	203	134
all	175	122
any	163	127
repeat	15	15
addRepeat	16	16
fromArray	164	156
fromVarArray	164	156
toArray	155	155
toVarArray	223	167
toText	446	322
map	169	152
clone	188	116
min	183	166
max	183	166
size	127	127

Memory:

method	List	Refactored
get	0	0
getOpt	0	0
put	0	0
forEach	8	0
reverseForEach	0	0
find	172	16
findIndex	8	0
findLastIndex	0	0
all	24	0
any	8	0
repeat	408688	408688
addRepeat	406544	406544
fromArray	408716	408688
fromVarArray	408716	408688
toArray	400180	400084
toVarArray	400180	400008
toText	3200164	3199992
map	425036	408688
clone	425032	409672
min	36	0
max	36	0
size	0	0

timohanke · 2025-06-25T15:54:09Z

src/List.mo

+    while (i < blocksCount) {
+      let oldBlock = list.blocks[i];
+      let blockSize = oldBlock.size();
+      let newBlock = VarArray.repeat<?R>(null, blockSize);


How would it perform if we used the VarArray.map function on the data blocks?

First, we have array of options, and if we encounter null we return earlier, this early return can speed up a little bit the whole method.

Second, map uses tabulate, which uses closures, which can be not that efficient and can generate garbage.

timohanke · 2025-06-25T16:00:05Z

src/List.mo

+    let (a, b) = do {
+      let i = Nat32.fromNat(index);
+      let lz = Nat32.bitcountLeadingZero(i);
+      let lz2 = lz >> 1;
+      if (lz & 1 == 0) {
+        (Nat32.toNat(((i << lz2) >> 16) ^ (0x10000 >> lz2)), Nat32.toNat(i & (0xFFFF >> lz2)))
+      } else {
+        (Nat32.toNat(((i << lz2) >> 15) ^ (0x18000 >> lz2)), Nat32.toNat(i & (0x7FFF >> lz2)))
+      }
+    };


Not sure if this inlining is worth it. We're saving only a function call, or?

If the gain is so small then readability is probably worth more.

put and getOpt along with get are the most used methods of List, so they should be optimized at any cost. Anyway there a lot of code duplication there.

timohanke · 2025-06-25T16:00:33Z

src/List.mo

+    let (a, b) = do {
+      let i = Nat32.fromNat(index);
+      let lz = Nat32.bitcountLeadingZero(i);
+      let lz2 = lz >> 1;
+      if (lz & 1 == 0) {
+        (Nat32.toNat(((i << lz2) >> 16) ^ (0x10000 >> lz2)), Nat32.toNat(i & (0xFFFF >> lz2)))
+      } else {
+        (Nat32.toNat(((i << lz2) >> 15) ^ (0x18000 >> lz2)), Nat32.toNat(i & (0x7FFF >> lz2)))
+      }
+    };


same as above

timohanke · 2025-06-25T16:08:00Z

src/List.mo

+          if (predicate(x)) return ?size<T>({
+            var blocks = [var];
+            var blockIndex = blockIndex;
+            var elementIndex = elementIndex
+          })


Is it worth defining a size_ (internal) function that takes blockIndex, elementIndex as arguments and that the public size() can use?

timohanke · 2025-06-25T16:25:58Z

src/List.mo

+    let blocks1 = list1.blocks;
+    let blocks2 = list2.blocks;
+    let blockCount = Nat.min(blocks1.size(), blocks2.size());


Is this part worth it? The size calculation only happens once for the whole list. Cost does not depend on length of inputs.

It's not clear what do you mean. Index blocks sizes can be different even if sizes of lists are equal.

timohanke · 2025-06-25T16:28:06Z

src/List.mo

+  public func singleton<T>(element : T) : List<T> = {
+    var blockIndex = 2;
+    var blocks = [var [var], [var ?element]];
+    var elementIndex = 0
+  };


Not sure if worth the optimization. Does not depend on length of list.

There can be initializations in cycle, if the data structure is List<List<T>> for example.

timohanke

Left some questions.

Great PR which offers some substantial optimizations.

Generally, I think optimizations are worth it whenever we walk through an entire list which is the case for most functions that are changed. However, in some (few) cases we are only optimizing something that does not get called repeatedly. That is, the optimization does not depend on the length of the list. In those cases I would not do it if it makes code harder to read.

I found some case of this where the locate() or size() function was inlined. Also in the singleton function. Maybe others that I overlooked?

timohanke · 2025-06-25T16:35:14Z

Not all improved functions are visible in the profiling table. For those, it is hard to tell from just the code diff to tell how much the improvement is and if it worth the inlining or not.

AStepanov25 · 2025-06-25T18:18:40Z

Not all improved functions are visible in the profiling table. For those, it is hard to tell from just the code diff to tell how much the improvement is and if it worth the inlining or not.

I profiled only required functions, for others performance increase is obvious as the code similar to others functions implementations.

What functions are you interested in, I'll add profiling if needed.

This reverts commit 7003edd.

timohanke · 2025-09-29T19:09:11Z

Ready to merge now from my perspective.

Review dismissed by automation script.

Andrei1998 · 2025-10-06T14:23:14Z

Very cool to see this change, thank you! I just ran the benchmarks for the new PriorityQueue (which heavily depends on List) and the present change brings consistent improvements in instruction counts in the -4% to -8% range (below are the benchmarks in bench/PriorityQueues.bench.mo). There is only a very small regression in Garbage Collection.

Instructions (Operations)

	A) PriorityQueue (Old `List`)	A) PriorityQueue (New `List`)	Δ%	B) PriorityQueueSet
1.) 100000 operations (push:pop = 1:1)	597_528_283	568_913_057	-4.8%	522_729_861
2.) 100000 operations (push:pop = 2:1)	742_952_999	707_495_424	-4.8%	809_693_415
3.) 100000 operations (push:pop = 10:1)	357_911_737	336_409_578	-6.0%	873_181_028
4.) 100000 operations (only push)	192_422_882	176_982_954	-8.0%	886_824_792
5.) 50000 pushes, then 50000 pops	776_632_572	745_226_615	-4.0%	961_776_534
6.) 50000 pushes, then 25000 "pop;push"es	529_475_053	504_254_228	-4.8%	922_137_111

Heap (likely broken at the moment)

	A) PriorityQueue (Old `List`)	A) PriorityQueue (New `List`)	Δ%	B) PriorityQueueSet
1.) 100000 operations (push:pop = 1:1)	272 B	272 B	0%	272 B
2.) 100000 operations (push:pop = 2:1)	272 B	272 B	0%	272 B
3.) 100000 operations (push:pop = 10:1)	272 B	272 B	0%	272 B
4.) 100000 operations (only push)	272 B	272 B	0%	272 B
5.) 50000 pushes, then 50000 pops	272 B	272 B	0%	272 B
6.) 50000 pushes, then 25000 "pop;push"es	272 B	272 B	0%	272 B

Garbage Collection

	A) PriorityQueue (Old `List`)	A) PriorityQueue (New `List`)	Δ%	B) PriorityQueueSet
1.) 100000 operations (push:pop = 1:1)	15.03 MiB	15.07 MiB	+0.3%	17.43 MiB
2.) 100000 operations (push:pop = 2:1)	19.73 MiB	19.73 MiB	0%	19.32 MiB
3.) 100000 operations (push:pop = 10:1)	8.67 MiB	8.67 MiB	0%	12.64 MiB
4.) 100000 operations (only push)	3.87 MiB	3.87 MiB	0%	9.96 MiB
5.) 50000 pushes, then 50000 pops	22.03 MiB	22.03 MiB	0%	26.20 MiB
6.) 50000 pushes, then 25000 "pop;push"es	14.22 MiB	14.22 MiB	0%	18.44 MiB

timohanke · 2025-10-06T16:02:22Z

Which List operations are used in the test for which you observed garbage increase?

Andrei1998 · 2025-10-06T16:55:50Z

We use the same operations in all benchmarks except number 4.) [which only uses PriorityQueue.push, all others do PriorityQueue.pop as well]:

PriorityQueue.push uses List.add, List.size, List.put, List.at.
PriorityQueue.pop uses List.removeLast, List.size, List.isEmpty, List.put, List.at, List.get.

The list always starts empty as follows:

let priorityQueue = PriorityQueue.empty<Nat>();

Yet interestingly, the anomaly is for number 1.) [not number 4.)].

Number 1.) basically consists of a random sequence of PriorityQueue.push and PriorityQueue.pop, with an equal probability of each entry being a push or a pop. The same anomaly does not show for 2.), where push is twice as likely as pop. The length of the underlying List in 1.) and 2.) is a $\pm 1$ random walk, with the $+1$ and $-1$ probabilities being either $0.5-0.5$ or $0.66-0.33$. Number 3.) also does the same thing, just with an even higher probability to increase the length of the list. My current hypothesis is that what is special about 1.) is that it grows and shrinks the List many times, in contrast to the others, which mostly grow it, or grow and shrink it only once. Hence, this could point to something like a memory leak (maybe not exactly, but going in that direction, i.e., being reallocation related).

timohanke · 2025-10-06T19:14:30Z

The garbage increased by 0.4 bytes per operation. I wonder if the garbage increasing can actually mean that the code got better (for example freeing more after a pop)?

How often do you think the random walk hits length 0? I also wonder if it can be related to how much is freed when reaching the empty list again.

Andrei1998 · 2025-10-06T19:50:43Z

The garbage increased by 0.4 bytes per operation. I wonder if the garbage increasing can actually mean that the code got better (for example freeing more after a pop)?

This could also be the case, indeed. It could be that we used to have extra data lying around for no good reason and we are now deallocating it more eagerly. Alternatively, it could be that we now also allocate more data, and hence also need to free more. It might also be a bit of both, for organic reasons, as seen next. Technically, if someone told us after a pop that the queue is gonna grow again (which in this benchmark should happen a lot), then it would be better to not deallocate memory eagerly, because the queue will grow again later. So, in general, it's a tradeoff: deallocate too eagerly and then reallocate later when the queue grows again, or deallocate less aggressively and risk no more operations coming in the future (and hence waste space long-term). So, a good working hypothesis is that we now deallocate more, and hence also need to reallocate more when the queue grows back.

How often do you think the random walk hits length 0? I also wonder if it can be related to how much is freed when reaching the empty list again.

Digging into this further might take more effort, but at least a quick and dirty experiment, written with the help of LLMs, shows that we should hit 0 around 500 times (the LLM computed around 504 when asked to do the math directly). Some of those 500 were probably already close to 0, but the expected maximum length of the list during the test should grow roughly with the square root of the number of operations. Of course, I could have just run the test with the fixed seed from the benchmark, but this should at least shed some preliminary light on the behavior.

rvanasa

Is there any remaining work for this PR? Happy to merge once ready.

Review dismissed by automation script.

Optimize methods in List.

63c31d7

AStepanov25 requested a review from a team as a code owner June 24, 2025 14:57

cla-idx-bot bot added the external-contributor label Jun 24, 2025

Update changelog, rename variable.

55376d8

timohanke reviewed Jun 25, 2025

View reviewed changes

AStepanov25 added 17 commits June 25, 2025 21:19

Change size singature.

c6bbb1c

Optimize and simplify reverse iteration methods.

54b8520

Format tests, fix api.

e4c94f9

Optimize isEmpty

3ec3ef9

Refactor addRepeat.

84fda61

Format

2ee5385

Fix memory leak in addRepeat.

24980c8

Optimize put.

2e68be2

Added assertValid function.

1c30316

Merge branch 'main' into optimize-list

7e8961c

Format

01cda84

Remove debug output.

dd1e691

Added assertValid into tests.

2d6c2ac

Inline prim functions.

7003edd

Revert "Inline prim functions."

edfdbd3

This reverts commit 7003edd.

Refactor addRepeatInternal

ee375bc

Merge branch 'main' into optimize-list

9d3d9ad

AStepanov25 added 17 commits August 29, 2025 16:12

Fix.

b9421c8

Merge branch 'main' into optimize-list

6f223a4

Responded to some comments.

f836c06

Merge branch 'main' into optimize-list

deb26e6

Refactor reverse and reverseInPlace.

3e8f812

Fixes.

6eea7cb

Optimize reverseInPlace.

8e13bff

Try to optimize binarySearch.

8eb4e1e

Refactor binarySearch.

39aa58e

Fix.

e16f0bd

Commented binary search.

107150f

Added indexByBlockElement function

ff7d174

Fixes.

6261112

Comments on List structure.

574beec

Fixes.

53225c2

Add tests to clone.

d4c26b6

Fix.

bf8e2c5

timohanke previously approved these changes Sep 29, 2025

View reviewed changes

Merge branch 'main' into optimize-list

9794e90

rvanasa previously approved these changes Oct 13, 2025

View reviewed changes

Merge branch 'main' into optimize-list

d36dc15

Fix.

c5033dd

Optimize methods in List. #337

Are you sure you want to change the base?

Optimize methods in List. #337

Uh oh!

Conversation

AStepanov25 commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timohanke left a comment

Choose a reason for hiding this comment

Uh oh!

timohanke commented Jun 25, 2025

Uh oh!

AStepanov25 commented Jun 25, 2025

Uh oh!

timohanke commented Sep 29, 2025

Uh oh!

Andrei1998 commented Oct 6, 2025

Instructions (Operations)

Heap (likely broken at the moment)

Garbage Collection

Uh oh!

timohanke commented Oct 6, 2025

Uh oh!

Andrei1998 commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timohanke commented Oct 6, 2025

Uh oh!

Andrei1998 commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rvanasa left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

AStepanov25 commented Jun 24, 2025 •

edited

Loading

Andrei1998 commented Oct 6, 2025 •

edited

Loading

Andrei1998 commented Oct 6, 2025 •

edited

Loading