-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization: Use the stack for local fixed length arrays #511
Comments
A limitation of using the stack is that we would still need to initialize the "array" elements every time the function is called. In this particular case where the array elements are compile-time constants it might be better to move the array outside the function: local t:{integer} = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4}
function m.dayOfWeek(i: integer)
return t[i]
end Still, it's an intriguing idea... |
In response to your suggestion of moving the array outside the function, yes, that is a possible workaround, but it has trade-offs which I am not always willing to make. Specifically, since Lua has a limit on the number of local variables you can have in a scope (something I sometimes hit my head on), I am reluctant to do that when working in larger modules (or working with teams of people). Also, bug #508 is still causing me a lot of problems, and I find that one terrifying because I am aimlessly changing code hoping my program will build again, so I've been limiting my use of global local variables hoping it helps. As for this optimization specifically, I guess there are two cases (and I guess I'm assuming basic primitive types too).
For case 1, I think the the Pallene compiler could emit:
This would allow the C compiler to optimize out needing to initialize the array elements every time the function is called. As for case 2, I would argue that even though there is still work in initializing the array on the stack every time the function is called, it is still a win. I consider avoiding memory allocations on the heap, and avoiding work on the garbage collector to be significant wins. Perhaps I am hyper-sensitive to these issues because I spent too much time working in game engines (and also dealing with pathologically stupid and slow garbage collectors). From a soft-realtime perspective, constant latency factors are manageable, but it is the arbitrary random latencies that get you. It seems to me that Pallene is in a unique situation that it can optimize some of these cases, so that makes it a huge win if it does. I know this is a bit advanced for the current stage Pallene is in right now, and would need things like escape analysis and constant analysis. But I wanted to put it on the radar. |
Hopefully this should fix the assertion failures from #511
I encountered some algorithms that use temporary fixed length arrays. For example, here is a popular compute which day of the week algorithm which I translated to Pallene:
After reflecting on what Pallene might do and then poking at the emitted C code, I think Pallene is creating an array in Lua on the heap, using it, and then the Lua garbage collector will have to clean it up later. Since this table is fixed size and is local to only this function, a much more performant solution would be to create the array on the stack in C, which avoids both the allocator hits and avoids creating unnecessary work for the garbage collector.
I expect this is probably a long way off for Pallene and will require things like doing escape analysis. But I thought I would file this as a potential nice-to-have optimization way down the line.
The text was updated successfully, but these errors were encountered: