-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mask generator optimizations #15935
base: develop
Are you sure you want to change the base?
Mask generator optimizations #15935
Conversation
Thank you for your contribution to the Solidity compiler! A team member will follow up shortly. If you haven't read our contributing guidelines and our review checklist before, please do it now, this makes the reviewing process and accepting your contribution smoother. If you have any questions or need our help, feel free to post them in the PR or talk to us directly on the #solidity-dev channel on Matrix. |
Looking at the build failures... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the CI failures are due to missing updates on the gas statistics in some tests.
There is an easy way to do that:
You can run
<build>/test/tools/isoltest -t semanticTests/* --optimize --accept-updates
Which should adjust the test expectations (while of course verifying that only gas costs change - you can also skip --accept-updates
to manually inspect the changes and adjust case by case).
We're also wondering how best to add a unit tests for this and may get back to you with a suggestion for that.
Changelog.md
Outdated
@@ -4,7 +4,7 @@ Language Features: | |||
|
|||
|
|||
Compiler Features: | |||
|
|||
* Optimized constant generation for masks. With low compiler runs (optimized for size), reduces the byte code length and gas. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Optimized constant generation for masks. With low compiler runs (optimized for size), reduces the byte code length and gas. | |
* Constant Optimizer: Compute masks using shifts when optimizing for size. |
Feel free to adjust the bit after the colon, but to keep things consistent, we usually categorize by component (see e.g. 0.8.27 for reference)
Similarly |
One option for testing would be to borrow from solidity/test/libevmasm/Optimiser.cpp Line 1333 in 340ae32
ConstantOptimisationMethod::optimiseConstants on Assembly as constructed there. A few simple cases for masks in the middle bits, the high bits, and the low bits and maybe a non-mask should do it. If there's problems with that let us know!
|
We talked about it a bit and we could generally use a new test case type to test EVM assembly import and assembly-level optimizations. I'm going to work on this and then ping you here when you'll be able to use this. It will work like |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
The optimized tests run per default with runs set to 200, since that's the default compiler setting (it's arguable whether that's a good default, but it's been so historically) - with that on the case in question I currently see the masks as constants on develop - without having looked at it more closely my guess would be that the increase in runtime gas cost under that runs value pays off with a decrease in code size (which the test in this case doesn't display). |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
My pedestrian solution for EOF and constant optimizer. Happy to include in this PR or another, or just ignore. |
3872849
to
7f45522
Compare
I decided messing with the default behavior was not appropriate. Changed the thresholds for the default runs of 200 so that the size optimization doesn't kick in for simple constants. The change is now strictly better (at 200), and will favor size reduction below 200. |
7f45522
to
15226e0
Compare
libevmasm/ConstantOptimiser.cpp
Outdated
//only fully optimize for size if the compiler parameters are not default of 200 | ||
unsigned threshold1 = m_params.runs < 200 ? 32 : 128; | ||
unsigned threshold2 = m_params.runs < 200 ? 16 : 128; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using arbitrary thresholds to choose between the old and the new method, why not try both and choose the better one based on gasNeeded()
?
Note that this is how the constant optimizer generally operates. For example below we run the decomposition in a loop multiple times and choose the best result based on gas. And we select between using a literal, data or code based on gas as well.
This may solve some of the cost regressions you're seeing, because in cases where your method is not better, we'll simply fall back to the old one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad you asked that. The original code did exactly as you've suggested: it was using gasNeeded
to pick between using a constant or code to compute the constant. The real problem is, gasNeeded
is literally a heuristic. This is the formula:
m_params.runs * _runGas + m_params.multiplicity * _repeatedDataGas
both m_params.runs
and m_params.multiplicity
are arbitrary. What is 200? where did it come from? (I personally never run with that default -- I either go with 2, for size, or 2M for speed; but this PR is not about my preferences).
Given the above heuristic, the new code path was so much better, that with the default of 200, it was getting picked over constant representation. The new code will always produce shorter code, no "regressions" on that, unless you want minimum gas at runtime. The effect is simply to change the switch over point from constant to code based on the runs value. If this was my compiler, I'd go with it, but this compiler is used by a lot of people who may depend on that default behavior based on the arbitrary 200 value and an arbitrary heuristic, so I didn't want to disturb the switch-over point.
Just to emphasize, both values of runGas
and dataGas
are lower with the new algorithm for the code path. The competition is not between the old and new code representations, but between code representation and a constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, arbitrariness aside, my main concern is that it introduces another knob to tweak which potentially makes things more complex than making it depend just on the current formula.
both
m_params.runs
andm_params.multiplicity
are arbitrary. What is 200? where did it come from?
It's actually not as arbitrary as it may seem. runs
has a pretty straightforward interpretation. It's the number of times you expect the contract to be executed over its lifetime, with a simplifying assumption that each time you hit every opcode exactly once. The optimizer is supposed to give you the bytecode that will minimize gas usage over contract's whole lifetime, accounting for initial deployment and all executions.
200 is the value that could be considered the middle ground between the cost of of deployment and execution. The cost of deployment of each byte of the runtime code is 200 gas and is included as a factor in repeatedDataGas
. If you set runs to 200 as well, you can take it out:
200 * (runGas + multiplicity * bytesRequired)
runGas
and bytesRequired
are in the same ballpark so with this value they will have similar impact on the result.
As for multiplicity
, it's the number of times the constant appears in the bytecode. I'm actually not sure it makes sense to have it in there. There's a comment saying that it's not applied to runGas
because "the runs are per opcode", but cost is being calculated for all the AssemblyItem
s pushing the same value at once, not for a single one. Maybe we should remove it.
I personally never run with that default -- I either go with 2, for size, or 2M for speed; but this PR is not about my preferences).
Whether it's a good default is of course a different question. Perhaps we should make it higher. I often hear that deployment cost does not matter to people at all. But that could also be due to the fact that it comes from bigger projects where that cost is a drop in the bucket and what they worry about is minimizing the cost for the users. We would be open to changing it if someone showed some convincing arguments either way.
The effect is simply to change the switch over point from constant to code based on the runs value. If this was my compiler, I'd go with it, but this compiler is used by a lot of people who may depend on that default behavior based on the arbitrary 200 value and an arbitrary heuristic, so I didn't want to disturb the switch-over point.
I don't think this is a concern. With 200 runs you get constants because they provide lower lifetime cost than the code equivalent. If the new code is better in that regard, it will naturally move the switch-over point, but this will benefit users who want a balanced ratio. Those who don't are supposed to choose a higher or lower runs
value anyway.
The new code will always produce shorter code, no "regressions" on that, unless you want minimum gas at runtime.
Like @ekpyron said above, the fact that in some cases costs increased with 200 runs does not necessarily disqualify the initial version as long as it can be explained. A better benchmark would probably be whether the change universally improves deployment cost with low runs
and execution cost with high runs
. At 200 it's understandable to see some improve and some degrade.
Though there's of course also the question of how well the current formula reflects the actual costs. I do have some concerns here:
- The assumption that every opcode is hit once per run basically means that your contract is one big function with no branching, early returns or loops. That's not true for most real contracts. Any of these elements will skew the evaluation. For example it will overestimate the execution cost in contracts with many small functions and underestimate it in contracts with many loops.
- Like I said above, I have doubts whether
multiplicity
should even be in the formula. Repeating the same value many times will artificially lower the estimated execution cost. - Data cost is taken as 200 per byte of deployed code, but should really be 216 (268 for older EVMs), because that code is also a part of the calldata of the initial transaction. It's even a bit more because each byte needs to be copied to memory before it is returned from constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the number of times you expect the contract to be executed over its lifetime
Have you ever met an engineer who could predict that? Given how contracts and financial incentives are tightly bound, wouldn't it be peculiar if someone knew or wished that a contract would have such a temporary life as only 200 executions?
I often hear that deployment cost does not matter to people at all.
Indeed that is true, but the overall size limit is a huge pain. That is the motivation for me doing this (if you're curious, I have an issue tracking that). The patterns that avoid that (diamond, proxy, etc), come with a 20K gas per call overhead.
because they provide lower lifetime cost than the code equivalent
I think that's where we disagree (not that it really matters, see below*). From my perspective the lifetime computation is so unrealistic that it shouldn't be considered anything other than a heuristic. As you point out, running through every opcode once is not a measure of anything resembling an actual contract call. I love me some loops 😄 (which is probably why I'm hesitant to increase runtime gas costs at the default settings).
whether the change universally improves deployment cost with low runs and execution cost with high runs
With low runs, the new code is always better, because both the new and old code choose code-to-compute-constant, and the new computation takes fewer bytes and consumes less gas. With high runs, the constants are just constants, so the they're exactly equivalent.
- Since I don't run the compiler with 200, I don't really care how it behaves with default settings. The decision I leave up to you. Happy to provide more data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The runs
parameter is a rather complex topic - even putting aside the fact that its naming alone has caused quite a bit of confusion (there has been the plan to rename it to `expectedExecutionsPerDeployment``). While it is technically (rather) precisely defined, in practice, it indeed definitely serves more as a heuristic than a precise metric. It would be nice if we could just replace it by "optimize for size" and "optimize for runtime gas" instead, however, unfortunately things are not as simply as that due to, as you say, the overall size limit. As far as I'm aware, the usual use for the parameter is indeed either set it to very low (optimizing for size) or very high (optimizing for runtime gas) - however, it is not uncommon to have to reduce it down from the maximum value to account for the overall code size limit (which then is more of a binary search for fitting the bytecode into code size limits than actually reasoning about the number of executions). The default value is historic and rarely makes sense itself (and as such is also something we should definitely consider changing; but also nothing we need to particularly tweak for here). Ideally, we'd have more targeted and direct settings for accommodating code size limits, and ideally we'd even take into account the control flow structure for it (e.g. emphasizing code size reductions in reverting paths or de-emphasizing it for loops, etc.). However, that's of course not a simple matter and obviously goes beyond the scope of this PR.
For this PR, I'd myself tend towards just letting it use the existing heuristics as they are (so without the additional thresholds) and would consider generally refining the heuristic and compiler settings in this regard a longer term goal that needs further consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd myself tend towards just letting it use the existing heuristics
ok, great. Sounds like we have consensus from the core team on that. I've removed the runs parameter dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ekpyron Not strictly related to the PR, but while looking at it I noticed that ConstantOptimisationMethod::simpleRunGas()
does not account for the new AssemblyItem
types we introduced for EOF.
Especially DUPN
and SWAPN
. Though fortunately it should not have any real consequences because we are not using them in constant optimizer yet (heads up @rodiazet).
In many test types you can use the // ====
// bytecodeFormat: legacy Normally you'd create a separate version with different expectations for EOF, but since the constant optimizer is disabled there, it's fine to just leave a TODO comment as a reminder to do that later.
If that works then perhaps the constant optimizer does not even need much changes for EOF, but I'd still rather not enable it just as an afterthought in this PR. Even if it compiles and passes current tests, it does not necessarily mean it's not broken in some subtle way. At this point we do have working semantic tests so there's a high chance it's not, but our coverage is not always perfect, especially for things that depend on the |
Regarding costs, the ones from test cases may not always be representative, because they are rather artificial. We should also check how it affects real projects (i.e. our external tests). You can do that by getting the |
Agreed. The constant optimizer is relatively straight forward to reason about, as it's completely self contained. For the record, here are the instructions used in the 3 different paths: constant: compute: copy: My code essentially disables the third branch. Just leaving here as a note for the next person who needs to deal with it. |
c0ec242
to
c78e807
Compare
Thanks for the pointer.
|
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
brink | -2.09% ✅ |
||
colony | -2.32% ✅ |
||
elementfi | -1.47% ✅ |
||
ens | -2.11% ✅ |
-2.5% ✅ |
-0.06% ✅ |
euler | -2.9% ✅ |
-2.93% ✅ |
-0.34% ✅ |
gnosis | |||
gp2 | -1.89% ✅ |
||
pool-together | -2.29% ✅ |
||
uniswap | -2.77% ✅ |
||
yield_liquidator | -3.42% ✅ |
-3.53% ✅ |
-0.05% ✅ |
zeppelin | -2.54% ✅ |
This PR as is now (with 200 heuristic) vs develop:
ir-optimize-evm+yul
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
brink | -2.09% ✅ |
||
colony | -1.91% ✅ |
||
elementfi | -1% ✅ |
||
ens | -1.21% ✅ |
-1.75% ✅ |
-0.05% ✅ |
euler | -1.84% ✅ |
-1.82% ✅ |
-0.25% ✅ |
gnosis | |||
gp2 | -1.25% ✅ |
||
pool-together | -1.62% ✅ |
||
uniswap | -1.17% ✅ |
||
yield_liquidator | -2.13% ✅ |
-2.07% ✅ |
-0.07% ✅ |
zeppelin | -1.56% ✅ |
I'm not really sure what method_gas
is being measured, so it's hard to opine on what that means.
(Edit: now with fancy tables)
This is the total execution cost measured by running project's test suite. Hard to say how representative it is, but at least it comes from actually executing most of project's code. We should actually be getting Also, just FYI, what you pasted are the tables in format that's better suited for the terminal (which is why it's the default), but there's a flag that will give you more fancy markdown as well :) |
I double checked and it's |
The build is green (I tried to fix the spurious failures). Still todo: test cases using the new test framework (no rush, just keeping track of what's left). anything else? |
914ed2c
to
e128f30
Compare
auto-fix some of the tests -- these have lower gas in the test only optimize fully if runs param is less than 200
e128f30
to
1490e2b
Compare
I added a semantic test for masks. Most masks are generated by the compiler internally, so it made sense to me to test those. (of the categories under |
implements #15870