Better safeguard against incompatible builtin handles between dialects #15961

clonker · 2025-03-20T12:12:26Z

Inspired from the remarks by @cameel in #15952 (comment), I have refactored EVMDialect a bit so that there is a class for generating the builtin vector that will automatically insert blanks if conditions aren't met and not give direct access to the underlying raw datastructure.
Also, I have added a unit test which takes the default dialect (without EOF) as well as the latest dialect with EOF and tests whether the populated builtin functions with their handles contained in the respective dialects are preserved when compared to all other dialects with and without object access.

Between current default dialect as well as latest dialect and all other dialects.

cameel

Looks like a step in a good direction, but if we're refactoring these builtins, there's a way to do it in a way that will more completely eliminate the possibility of introducing these kinds of mistakes. We should simply separate they declarations with the logic that defines them. See comments below for details.

cameel · 2025-03-20T20:22:59Z

libyul/backends/evm/EVMDialect.cpp

 	if (!_eofVersion.has_value())
 	{
-		builtins.emplace_back(createIfObjectAccess("datasize", 1, 1, SideEffects{}, ControlFlowSideEffects{}, {LiteralKind::String}, [](
+		builtins.addIfObjectAccess(_objectAccess, "datasize", 1, 1, SideEffects{}, ControlFlowSideEffects{}, {LiteralKind::String}, [](
 			FunctionCall const& _call,


The fact that now some conditions are handled here and some in Builtins still bothers me a bit. I think it would be best if we made the whole thing more declarative. I.e. have one place that declares the builtins along with their scopes (i.e. EVM/EOF version range and state of object access). Then have another place that loops over all builtins that decides whether to actually define them based on their scopes.

The definition loop would then be the single place that decides whether the opcode should be skipped and whether to leave an empty spot in m_functions.

This would be a more comprehensive solution for the problem, because we would no longer touch that definition loop when adding new opcodes. We would only touch declarations and scopes without the risk of accidentally breaking the logic.

BTW, not sure "scope" is the best way to call it, but I could not think of anything better so far. Could be just "conditions", but that sounds very non-specific.

cameel · 2025-03-20T21:11:14Z

libyul/backends/evm/EVMDialect.cpp

+			!(_opcode >= evmasm::Instruction::DUP1 && _opcode <= evmasm::Instruction::DUP16) &&
+			!(_opcode >= evmasm::Instruction::SWAP1 && _opcode <= evmasm::Instruction::SWAP16) &&
+			!isPushInstruction(_opcode) &&
+			_opcode != evmasm::Instruction::JUMP &&
+			_opcode != evmasm::Instruction::JUMPI &&
+			_opcode != evmasm::Instruction::JUMPDEST &&
+			_opcode != evmasm::Instruction::DATALOADN &&
+			_opcode != evmasm::Instruction::EOFCREATE &&
+			_opcode != evmasm::Instruction::RETURNCONTRACT &&
+			_opcode != evmasm::Instruction::RJUMP &&
+			_opcode != evmasm::Instruction::RJUMPI &&
+			_opcode != evmasm::Instruction::CALLF &&
+			_opcode != evmasm::Instruction::JUMPF &&
+			_opcode != evmasm::Instruction::DUPN &&
+			_opcode != evmasm::Instruction::SWAPN &&
+			_opcode != evmasm::Instruction::RETF &&


For builtins based on opcodes, this part of the condition is the declarative part. I'd move it to a separate function that returns a boolean. Then when adding a new opcode we'd only really modify that function.

In fact, I'd actually create more than one function. I think we should make it more semantic by having smaller functions expressing specific properties the excluded opcodes satisfy:

All the DUP/SWAP/PUSH variants: low-level stack manipulation opcodes.

We should also use SemanticInformation::isSwapInstruction()/SemanticInformation::isDupInstruction() here.

All the JUMP variants and RETF/CALLF: low-level control flow opcodes

DATALOADN, EOFCREATE, RETURNCONTRACT: these are actually exposed, but not directly. We have builtins replacing them. We can define a separate function for them, but a nicer alternative would be to be able to declare for a builtin that it replaces an instruction and have the definition logic exclude that instruction automatically.

Executing these functions and EVMVersion::hasOpcode() would then be a fixed part of the definition logic and would not have to be adjusted. It would be the equivalent of checking the scope of a builtin.

cameel · 2025-03-20T21:27:13Z

libyul/backends/evm/EVMDialect.cpp

+
+/// Make sure to only add builtins in a way that is consistent over EVM versions. If the order depends on the
+/// EVM version - which can easily happen using conditionals -, different dialects' builtin handles
+/// become inherently incompatible.


Thinking about it, the compatibility between dialects with and without object access should be more prominently documented in the header. I'd mention it for strictAssemblyForEVMObjects() and maybe also for the whole EVMDialect class.

TBH we guarantee so much compatibility between them that I'm not sure we should even be calling these distinct dialects. Move like flavors of the same dialect. It seems like an orthogonal property that could apply to any dialect.

cameel · 2025-03-20T22:01:56Z

test/libyul/EVMDialectCompatibility.h

+inline void requireBuiltinCompatibility(EVMDialect const& _dialect1, EVMDialect const& _dialect2, std::string_view const _builtin)
+{
+	if (auto const currentHandle = _dialect1.findBuiltin(_builtin))
+		if (auto const handle = _dialect2.findBuiltin(_builtin))


This will only ensure that builtins available in both dialects have matching IDs, which is necessary but not sufficient for full compatibility. We also need to ensure that the dialect with objects has all the builtins present in the one without.

cameel · 2025-03-20T22:06:40Z

test/libyul/EVMDialectCompatibility.cpp

+	auto const& dialectToTestAgainst = evmDialectConfigurationToTest.dialect();
+	// no object access for current
+	{
+		auto const& currentDialect = EVMDialect::strictAssemblyForEVM({}, std::nullopt);


I'd keep it more explicit to make places where we hard-code current EVM version easier to spot.

Suggested change

auto const& currentDialect = EVMDialect::strictAssemblyForEVM({}, std::nullopt);

auto const& currentDialect = EVMDialect::strictAssemblyForEVM(EVMVersion{}, std::nullopt);

BTW, we should probably add that EVMVersion::current() helper. But just internally, without exposing it to the user.

cameel · 2025-03-20T22:11:00Z

test/libyul/EVMDialectCompatibility.cpp

+	auto const& dialectToTestAgainst = evmDialectConfigurationToTest.dialect();
+	langutil::EVMVersion latestEVMVersion = langutil::EVMVersion::allVersions().back();
+	std::optional<uint8_t> eofVersion = std::nullopt;
+	if (withEOF)


It's likely that we'll forget to update this if we ever add a EOF version. We should define allEOFVersions() (analogous to EVMVersion::allVersions()), so that we can just iterate over them.

cameel · 2025-03-20T22:15:49Z

test/libyul/EVMDialectCompatibility.cpp

+	builtin_function_handle_compatibility_eof,
+	bdata::monomorphic::grid(
+		bdata::make(generateEVMDialectConfigurationsToTest(std::nullopt)) + bdata::make(generateEVMDialectConfigurationsToTest(1)),


This seems to actually be testing both EOF and non-EOF despite the name.

And a much more relevant difference is that it's using the latest rather than the default EVM version. In not so distant future both current and latest will have EOF, but the fact that current is not always the latest will remain. If you replace withEOF check with withEOF && evmVersion.supportsEOF() it will be more future-proof. You can then unify the tests and make the source EVM version just another dimension.

Also, I think we should have test case where we cover the "inline" -> "with objects" conversion for all dialects. Converting between distinct dialects is nice to have but these pairs are the most relevant, because for them we actually do perform such a conversion in practice.

clonker requested a review from cameel March 20, 2025 12:12

clonker mentioned this pull request Mar 20, 2025

IRGeneratorForStatements: Remove outdated check against usr$ prefixing of builtins #15952

Merged

clonker force-pushed the safeguard_builtin_handles branch from 5c3353b to 27f4e90 Compare March 20, 2025 12:22

clonker added 2 commits March 20, 2025 16:16

EVMDialect: Encapsulate adding builtins during construction into a class

077ae07

EVMDialect: add test that enforces builtin handle compatibility

106775f

Between current default dialect as well as latest dialect and all other dialects.

clonker force-pushed the safeguard_builtin_handles branch from 27f4e90 to 106775f Compare March 20, 2025 15:17

cameel reviewed Mar 20, 2025

View reviewed changes

cameel added the refactor label Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better safeguard against incompatible builtin handles between dialects #15961

Better safeguard against incompatible builtin handles between dialects #15961

clonker commented Mar 20, 2025

cameel left a comment

cameel Mar 20, 2025 •

edited

Loading

cameel Mar 20, 2025

cameel Mar 20, 2025

cameel Mar 20, 2025

cameel Mar 20, 2025 •

edited

Loading

cameel Mar 20, 2025 •

edited

Loading

cameel Mar 20, 2025

cameel Mar 20, 2025

cameel Mar 20, 2025

cameel Mar 20, 2025

cameel Mar 20, 2025

	auto const& currentDialect = EVMDialect::strictAssemblyForEVM({}, std::nullopt);
	auto const& currentDialect = EVMDialect::strictAssemblyForEVM(EVMVersion{}, std::nullopt);

Better safeguard against incompatible builtin handles between dialects #15961

Are you sure you want to change the base?

Better safeguard against incompatible builtin handles between dialects #15961

Conversation

clonker commented Mar 20, 2025

cameel left a comment

Choose a reason for hiding this comment

cameel Mar 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cameel Mar 20, 2025 • edited Loading

Choose a reason for hiding this comment

cameel Mar 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cameel Mar 20, 2025 •

edited

Loading

cameel Mar 20, 2025 •

edited

Loading

cameel Mar 20, 2025 •

edited

Loading