Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mask generator optimizations #15935

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

moh-eulith
Copy link

implements #15870

Copy link

Thank you for your contribution to the Solidity compiler! A team member will follow up shortly.

If you haven't read our contributing guidelines and our review checklist before, please do it now, this makes the reviewing process and accepting your contribution smoother.

If you have any questions or need our help, feel free to post them in the PR or talk to us directly on the #solidity-dev channel on Matrix.

@moh-eulith
Copy link
Author

Looking at the build failures...
I've been running ./test/soltest -p, and that passes.
I'll start with the required ones.

Copy link
Member

@ekpyron ekpyron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the CI failures are due to missing updates on the gas statistics in some tests.
There is an easy way to do that:
You can run
<build>/test/tools/isoltest -t semanticTests/* --optimize --accept-updates
Which should adjust the test expectations (while of course verifying that only gas costs change - you can also skip --accept-updates to manually inspect the changes and adjust case by case).

We're also wondering how best to add a unit tests for this and may get back to you with a suggestion for that.

Changelog.md Outdated
@@ -4,7 +4,7 @@ Language Features:


Compiler Features:

* Optimized constant generation for masks. With low compiler runs (optimized for size), reduces the byte code length and gas.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Optimized constant generation for masks. With low compiler runs (optimized for size), reduces the byte code length and gas.
* Constant Optimizer: Compute masks using shifts when optimizing for size.

Feel free to adjust the bit after the colon, but to keep things consistent, we usually categorize by component (see e.g. 0.8.27 for reference)

@ekpyron
Copy link
Member

ekpyron commented Mar 12, 2025

Similarly test/cmdlineTests.sh --no-smt --update will update the command line tests (we'll then need to sanity-check the changes).
The t_native_text_ext_* failures seem to be a transient network issue which will hopefully vanish on the next run. (Since you're not modifying the smtsolver, which require Z3 to be installed and are slow to run running with --no-smt should be enough)

@ekpyron
Copy link
Member

ekpyron commented Mar 12, 2025

One option for testing would be to borrow from

BOOST_AUTO_TEST_CASE(jumpdest_removal_subassemblies, *boost::unit_test::precondition(nonEOF()))
while specifically running the ConstantOptimisationMethod::optimiseConstants on Assembly as constructed there. A few simple cases for masks in the middle bits, the high bits, and the low bits and maybe a non-mask should do it. If there's problems with that let us know!

@cameel
Copy link
Member

cameel commented Mar 12, 2025

We're also wondering how best to add a unit tests for this and may get back to you with a suggestion for that.

We talked about it a bit and we could generally use a new test case type to test EVM assembly import and assembly-level optimizations. I'm going to work on this and then ping you here when you'll be able to use this.

It will work like --import-asm-json, just in a text format that's not as verbose as JSON.

@moh-eulith

This comment was marked as outdated.

@moh-eulith

This comment was marked as outdated.

@cameel
Copy link
Member

cameel commented Mar 12, 2025

Is it fair to assume auxdata is not part of the gas?

auxdata is basically metadata and is part of the binary. It does affect gas at creation time because it has to be copied to memory along with the rest of the bytecode. We account for the code cost separately, but that's just the cost of storing it in calldata and does not include any operations that may be performed on it by the contract.

@ekpyron
Copy link
Member

ekpyron commented Mar 12, 2025

The optimized tests run per default with runs set to 200, since that's the default compiler setting (it's arguable whether that's a good default, but it's been so historically) - with that on the case in question I currently see the masks as constants on develop - without having looked at it more closely my guess would be that the increase in runtime gas cost under that runs value pays off with a decrease in code size (which the test in this case doesn't display).
The gas costs in the tests are meant to give a good estimation of the effect of changes of code, but since we don't generate them for the boundary runs settings, it can be fine if some of them increase if the reason is well understood.

@moh-eulith

This comment was marked as outdated.

@moh-eulith

This comment was marked as outdated.

@moh-eulith

This comment was marked as outdated.

@moh-eulith

This comment was marked as outdated.

@moh-eulith

This comment was marked as outdated.

@moh-eulith
Copy link
Author

My pedestrian solution for EOF and constant optimizer. Happy to include in this PR or another, or just ignore.

moh-eulith@5805f6d

@moh-eulith
Copy link
Author

I decided messing with the default behavior was not appropriate. Changed the thresholds for the default runs of 200 so that the size optimization doesn't kick in for simple constants. The change is now strictly better (at 200), and will favor size reduction below 200.

Comment on lines 260 to 262
//only fully optimize for size if the compiler parameters are not default of 200
unsigned threshold1 = m_params.runs < 200 ? 32 : 128;
unsigned threshold2 = m_params.runs < 200 ? 16 : 128;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using arbitrary thresholds to choose between the old and the new method, why not try both and choose the better one based on gasNeeded()?

Note that this is how the constant optimizer generally operates. For example below we run the decomposition in a loop multiple times and choose the best result based on gas. And we select between using a literal, data or code based on gas as well.

This may solve some of the cost regressions you're seeing, because in cases where your method is not better, we'll simply fall back to the old one.

Copy link
Author

@moh-eulith moh-eulith Mar 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you asked that. The original code did exactly as you've suggested: it was using gasNeeded to pick between using a constant or code to compute the constant. The real problem is, gasNeeded is literally a heuristic. This is the formula:

m_params.runs * _runGas + m_params.multiplicity * _repeatedDataGas

both m_params.runs and m_params.multiplicity are arbitrary. What is 200? where did it come from? (I personally never run with that default -- I either go with 2, for size, or 2M for speed; but this PR is not about my preferences).

Given the above heuristic, the new code path was so much better, that with the default of 200, it was getting picked over constant representation. The new code will always produce shorter code, no "regressions" on that, unless you want minimum gas at runtime. The effect is simply to change the switch over point from constant to code based on the runs value. If this was my compiler, I'd go with it, but this compiler is used by a lot of people who may depend on that default behavior based on the arbitrary 200 value and an arbitrary heuristic, so I didn't want to disturb the switch-over point.

Just to emphasize, both values of runGas and dataGas are lower with the new algorithm for the code path. The competition is not between the old and new code representations, but between code representation and a constant.

Copy link
Member

@cameel cameel Mar 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, arbitrariness aside, my main concern is that it introduces another knob to tweak which potentially makes things more complex than making it depend just on the current formula.

both m_params.runs and m_params.multiplicity are arbitrary. What is 200? where did it come from?

It's actually not as arbitrary as it may seem. runs has a pretty straightforward interpretation. It's the number of times you expect the contract to be executed over its lifetime, with a simplifying assumption that each time you hit every opcode exactly once. The optimizer is supposed to give you the bytecode that will minimize gas usage over contract's whole lifetime, accounting for initial deployment and all executions.

200 is the value that could be considered the middle ground between the cost of of deployment and execution. The cost of deployment of each byte of the runtime code is 200 gas and is included as a factor in repeatedDataGas. If you set runs to 200 as well, you can take it out:

200 * (runGas + multiplicity * bytesRequired)

runGas and bytesRequired are in the same ballpark so with this value they will have similar impact on the result.

As for multiplicity, it's the number of times the constant appears in the bytecode. I'm actually not sure it makes sense to have it in there. There's a comment saying that it's not applied to runGas because "the runs are per opcode", but cost is being calculated for all the AssemblyItems pushing the same value at once, not for a single one. Maybe we should remove it.

I personally never run with that default -- I either go with 2, for size, or 2M for speed; but this PR is not about my preferences).

Whether it's a good default is of course a different question. Perhaps we should make it higher. I often hear that deployment cost does not matter to people at all. But that could also be due to the fact that it comes from bigger projects where that cost is a drop in the bucket and what they worry about is minimizing the cost for the users. We would be open to changing it if someone showed some convincing arguments either way.

The effect is simply to change the switch over point from constant to code based on the runs value. If this was my compiler, I'd go with it, but this compiler is used by a lot of people who may depend on that default behavior based on the arbitrary 200 value and an arbitrary heuristic, so I didn't want to disturb the switch-over point.

I don't think this is a concern. With 200 runs you get constants because they provide lower lifetime cost than the code equivalent. If the new code is better in that regard, it will naturally move the switch-over point, but this will benefit users who want a balanced ratio. Those who don't are supposed to choose a higher or lower runs value anyway.

The new code will always produce shorter code, no "regressions" on that, unless you want minimum gas at runtime.

Like @ekpyron said above, the fact that in some cases costs increased with 200 runs does not necessarily disqualify the initial version as long as it can be explained. A better benchmark would probably be whether the change universally improves deployment cost with low runs and execution cost with high runs. At 200 it's understandable to see some improve and some degrade.

Though there's of course also the question of how well the current formula reflects the actual costs. I do have some concerns here:

  • The assumption that every opcode is hit once per run basically means that your contract is one big function with no branching, early returns or loops. That's not true for most real contracts. Any of these elements will skew the evaluation. For example it will overestimate the execution cost in contracts with many small functions and underestimate it in contracts with many loops.
  • Like I said above, I have doubts whether multiplicity should even be in the formula. Repeating the same value many times will artificially lower the estimated execution cost.
  • Data cost is taken as 200 per byte of deployed code, but should really be 216 (268 for older EVMs), because that code is also a part of the calldata of the initial transaction. It's even a bit more because each byte needs to be copied to memory before it is returned from constructor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the number of times you expect the contract to be executed over its lifetime

Have you ever met an engineer who could predict that? Given how contracts and financial incentives are tightly bound, wouldn't it be peculiar if someone knew or wished that a contract would have such a temporary life as only 200 executions?

I often hear that deployment cost does not matter to people at all.

Indeed that is true, but the overall size limit is a huge pain. That is the motivation for me doing this (if you're curious, I have an issue tracking that). The patterns that avoid that (diamond, proxy, etc), come with a 20K gas per call overhead.

because they provide lower lifetime cost than the code equivalent

I think that's where we disagree (not that it really matters, see below*). From my perspective the lifetime computation is so unrealistic that it shouldn't be considered anything other than a heuristic. As you point out, running through every opcode once is not a measure of anything resembling an actual contract call. I love me some loops 😄 (which is probably why I'm hesitant to increase runtime gas costs at the default settings).

whether the change universally improves deployment cost with low runs and execution cost with high runs

With low runs, the new code is always better, because both the new and old code choose code-to-compute-constant, and the new computation takes fewer bytes and consumes less gas. With high runs, the constants are just constants, so the they're exactly equivalent.

  • Since I don't run the compiler with 200, I don't really care how it behaves with default settings. The decision I leave up to you. Happy to provide more data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runs parameter is a rather complex topic - even putting aside the fact that its naming alone has caused quite a bit of confusion (there has been the plan to rename it to `expectedExecutionsPerDeployment``). While it is technically (rather) precisely defined, in practice, it indeed definitely serves more as a heuristic than a precise metric. It would be nice if we could just replace it by "optimize for size" and "optimize for runtime gas" instead, however, unfortunately things are not as simply as that due to, as you say, the overall size limit. As far as I'm aware, the usual use for the parameter is indeed either set it to very low (optimizing for size) or very high (optimizing for runtime gas) - however, it is not uncommon to have to reduce it down from the maximum value to account for the overall code size limit (which then is more of a binary search for fitting the bytecode into code size limits than actually reasoning about the number of executions). The default value is historic and rarely makes sense itself (and as such is also something we should definitely consider changing; but also nothing we need to particularly tweak for here). Ideally, we'd have more targeted and direct settings for accommodating code size limits, and ideally we'd even take into account the control flow structure for it (e.g. emphasizing code size reductions in reverting paths or de-emphasizing it for loops, etc.). However, that's of course not a simple matter and obviously goes beyond the scope of this PR.
For this PR, I'd myself tend towards just letting it use the existing heuristics as they are (so without the additional thresholds) and would consider generally refining the heuristic and compiler settings in this regard a longer term goal that needs further consideration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd myself tend towards just letting it use the existing heuristics

ok, great. Sounds like we have consensus from the core team on that. I've removed the runs parameter dependency.

Copy link
Member

@cameel cameel Mar 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekpyron Not strictly related to the PR, but while looking at it I noticed that ConstantOptimisationMethod::simpleRunGas() does not account for the new AssemblyItem types we introduced for EOF.

Especially DUPN and SWAPN. Though fortunately it should not have any real consequences because we are not using them in constant optimizer yet (heads up @rodiazet).

@cameel
Copy link
Member

cameel commented Mar 15, 2025

Not sure what the right way to handle the test cases are. Can I supply a different outcome? Can I exclude the test for EOF only?

In many test types you can use the bytecodeFormat setting to limit the test to non-EOF:

// ====
// bytecodeFormat: legacy

Normally you'd create a separate version with different expectations for EOF, but since the constant optimizer is disabled there, it's fine to just leave a TODO comment as a reminder to do that later.

My pedestrian solution for EOF and constant optimizer. Happy to include in this PR or another, or just ignore.

moh-eulith@5805f6d

If that works then perhaps the constant optimizer does not even need much changes for EOF, but I'd still rather not enable it just as an afterthought in this PR. Even if it compiles and passes current tests, it does not necessarily mean it's not broken in some subtle way. At this point we do have working semantic tests so there's a high chance it's not, but our coverage is not always perfect, especially for things that depend on the runs parameter (most tests just run with 200). #15935 (comment) is a good example. This particular problem won't affect it yet because we're not using any EOF-only instructions there, but there could be similar ones lurking in there so it needs to be given proper attention.

@cameel
Copy link
Member

cameel commented Mar 15, 2025

Regarding costs, the ones from test cases may not always be representative, because they are rather artificial. We should also check how it affects real projects (i.e. our external tests). You can do that by getting the summarized-benchmarks.json artifact from the c_ext_benchmarks job from your PR and from the base branch and comparing them using benchmark_diff.py. Check --help. It can produce a markdown table that you can just paste into a comment.

@moh-eulith
Copy link
Author

I'd still rather not enable it just as an afterthought in this PR.

Agreed. The constant optimizer is relatively straight forward to reason about, as it's completely self contained. For the record, here are the instructions used in the 3 different paths:

constant:
PUSHX

compute:
PUSHX, EXP, NOT, MUL, ADD, SUB, SHL, SHR

copy:
CODECOPY, MLOAD, SWAP1, MSTORE, DUP1, DUP2, DUP4, SWAP2

My code essentially disables the third branch. Just leaving here as a note for the next person who needs to deal with it.

@moh-eulith
Copy link
Author

moh-eulith commented Mar 15, 2025

summarized-benchmarks.json artifact from the c_ext_benchmarks job from your PR and from the base branch

Thanks for the pointer.
This PR as originally written (no 200 heuristic) vs. develop:

ir-optimize-evm+yul

project bytecode_size deployment_gas method_gas
brink -2.09% ✅
colony -2.32% ✅
elementfi -1.47% ✅
ens -2.11% ✅ -2.5% ✅ -0.06% ✅
euler -2.9% ✅ -2.93% ✅ -0.34% ✅
gnosis
gp2 -1.89% ✅
pool-together -2.29% ✅
uniswap -2.77% ✅
yield_liquidator -3.42% ✅ -3.53% ✅ -0.05% ✅
zeppelin -2.54% ✅

This PR as is now (with 200 heuristic) vs develop:

ir-optimize-evm+yul

project bytecode_size deployment_gas method_gas
brink -2.09% ✅
colony -1.91% ✅
elementfi -1% ✅
ens -1.21% ✅ -1.75% ✅ -0.05% ✅
euler -1.84% ✅ -1.82% ✅ -0.25% ✅
gnosis
gp2 -1.25% ✅
pool-together -1.62% ✅
uniswap -1.17% ✅
yield_liquidator -2.13% ✅ -2.07% ✅ -0.07% ✅
zeppelin -1.56% ✅

I'm not really sure what method_gas is being measured, so it's hard to opine on what that means.
(Edit: now with fancy tables)

@cameel
Copy link
Member

cameel commented Mar 15, 2025

I'm not really sure what method_gas is being measured, so it's hard to opine on what that means.

This is the total execution cost measured by running project's test suite. Hard to say how representative it is, but at least it comes from actually executing most of project's code.

We should actually be getting method_gas for more projects. I've seen some failures in parsing the coverage report in CI (which is where the values come from). We should look into that. It does not make CI fail, because we intentionally ignore errors in getting benchmarks.

Also, just FYI, what you pasted are the tables in format that's better suited for the terminal (which is why it's the default), but there's a flag that will give you more fancy markdown as well :)

@moh-eulith
Copy link
Author

We should actually be getting method_gas for more projects.

I double checked and it's null in the json for develop branch. Do you know which ones? I managed to run zeppelin and it looks like it never reports any gas metrics. Some are hard to run locally (need older version of foundry, etc).

@moh-eulith
Copy link
Author

The build is green (I tried to fix the spurious failures).

Still todo: test cases using the new test framework (no rush, just keeping track of what's left).

anything else?

@moh-eulith moh-eulith force-pushed the mask_generation branch 2 times, most recently from 914ed2c to e128f30 Compare March 19, 2025 20:34
Mohammad Rezaei added 4 commits March 25, 2025 13:25
auto-fix some of the tests -- these have lower gas in the test
only optimize fully if runs param is less than 200
@moh-eulith
Copy link
Author

I added a semantic test for masks. Most masks are generated by the compiler internally, so it made sense to me to test those. (of the categories under /test semantic felt like the best fit; let me know if it fits better elsewhere)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants