-
Notifications
You must be signed in to change notification settings - Fork 9
Optimize methods in List. #337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
while (i < blocksCount) { | ||
let oldBlock = list.blocks[i]; | ||
let blockSize = oldBlock.size(); | ||
let newBlock = VarArray.repeat<?R>(null, blockSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would it perform if we used the VarArray.map function on the data blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, we have array of options, and if we encounter null we return earlier, this early return can speed up a little bit the whole method.
Second, map uses tabulate, which uses closures, which can be not that efficient and can generate garbage.
let (a, b) = do { | ||
let i = Nat32.fromNat(index); | ||
let lz = Nat32.bitcountLeadingZero(i); | ||
let lz2 = lz >> 1; | ||
if (lz & 1 == 0) { | ||
(Nat32.toNat(((i << lz2) >> 16) ^ (0x10000 >> lz2)), Nat32.toNat(i & (0xFFFF >> lz2))) | ||
} else { | ||
(Nat32.toNat(((i << lz2) >> 15) ^ (0x18000 >> lz2)), Nat32.toNat(i & (0x7FFF >> lz2))) | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this inlining is worth it. We're saving only a function call, or?
If the gain is so small then readability is probably worth more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put
and getOpt
along with get
are the most used methods of List
, so they should be optimized at any cost. Anyway there a lot of code duplication there.
src/List.mo
Outdated
let (a, b) = do { | ||
let i = Nat32.fromNat(index); | ||
let lz = Nat32.bitcountLeadingZero(i); | ||
let lz2 = lz >> 1; | ||
if (lz & 1 == 0) { | ||
(Nat32.toNat(((i << lz2) >> 16) ^ (0x10000 >> lz2)), Nat32.toNat(i & (0xFFFF >> lz2))) | ||
} else { | ||
(Nat32.toNat(((i << lz2) >> 15) ^ (0x18000 >> lz2)), Nat32.toNat(i & (0x7FFF >> lz2))) | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
src/List.mo
Outdated
if (predicate(x)) return ?size<T>({ | ||
var blocks = [var]; | ||
var blockIndex = blockIndex; | ||
var elementIndex = elementIndex | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth defining a size_
(internal) function that takes blockIndex, elementIndex as arguments and that the public size()
can use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
let blocks1 = list1.blocks; | ||
let blocks2 = list2.blocks; | ||
let blockCount = Nat.min(blocks1.size(), blocks2.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this part worth it? The size calculation only happens once for the whole list. Cost does not depend on length of inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear what do you mean. Index blocks sizes can be different even if sizes of lists are equal.
public func singleton<T>(element : T) : List<T> = { | ||
var blockIndex = 2; | ||
var blocks = [var [var], [var ?element]]; | ||
var elementIndex = 0 | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if worth the optimization. Does not depend on length of list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be initializations in cycle, if the data structure is List<List<T>>
for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some questions.
Great PR which offers some substantial optimizations.
Generally, I think optimizations are worth it whenever we walk through an entire list which is the case for most functions that are changed. However, in some (few) cases we are only optimizing something that does not get called repeatedly. That is, the optimization does not depend on the length of the list. In those cases I would not do it if it makes code harder to read.
I found some case of this where the locate()
or size()
function was inlined. Also in the singleton
function. Maybe others that I overlooked?
Not all improved functions are visible in the profiling table. For those, it is hard to tell from just the code diff to tell how much the improvement is and if it worth the inlining or not. |
I profiled only required functions, for others performance increase is obvious as the code similar to others functions implementations. What functions are you interested in, I'll add profiling if needed. |
This reverts commit 7003edd.
Ready to merge now from my perspective. |
Review dismissed by automation script.
Very cool to see this change, thank you! I just ran the benchmarks for the new Instructions (Operations)
Heap (likely broken at the moment)
Garbage Collection
|
Which List operations are used in the test for which you observed garbage increase? |
We use the same operations in all benchmarks except number 4.) [which only uses
The list always starts empty as follows:
Yet interestingly, the anomaly is for number 1.) [not number 4.)]. Number 1.) basically consists of a random sequence of |
The garbage increased by 0.4 bytes per operation. I wonder if the garbage increasing can actually mean that the code got better (for example freeing more after a pop)? How often do you think the random walk hits length 0? I also wonder if it can be related to how much is freed when reaching the empty list again. |
This could also be the case, indeed. It could be that we used to have extra data lying around for no good reason and we are now deallocating it more eagerly. Alternatively, it could be that we now also allocate more data, and hence also need to free more. It might also be a bit of both, for organic reasons, as seen next. Technically, if someone told us after a pop that the queue is gonna grow again (which in this benchmark should happen a lot), then it would be better to not deallocate memory eagerly, because the queue will grow again later. So, in general, it's a tradeoff: deallocate too eagerly and then reallocate later when the queue grows again, or deallocate less aggressively and risk no more operations coming in the future (and hence waste space long-term). So, a good working hypothesis is that we now deallocate more, and hence also need to reallocate more when the queue grows back.
Digging into this further might take more effort, but at least a quick and dirty experiment, written with the help of LLMs, shows that we should hit 0 around 500 times (the LLM computed around 504 when asked to do the math directly). Some of those 500 were probably already close to 0, but the expected maximum length of the list during the test should grow roughly with the square root of the number of operations. Of course, I could have just run the test with the fixed seed from the benchmark, but this should at least shed some preliminary light on the behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any remaining work for this PR? Happy to merge once ready.
Review dismissed by automation script.
Profiling results, here time is average among
n
iterations, memory is cumulative garbage generated by calling the corresponding method.More details: https://github.com/research-ag/canister-profiling/tree/list
n = 100000
Time:
Memory: