Skip to content

Conversation

rodiazet
Copy link
Member

@rodiazet rodiazet commented Oct 7, 2024

This PR implements proper support for evm benchmarking in a form of state test file.

  • Remove old style benchmarking support using raw bytecode in file
  • Load state test json file and run benchmarks on tests defined in file.
  • Run state tests before benchmarking to make sure that it passes.

@rodiazet rodiazet force-pushed the bench-fix branch 4 times, most recently from 0f7876d to ca95f71 Compare October 9, 2024 09:33
@rodiazet rodiazet changed the title Bench fix test: Fix state test format benchmarking Oct 9, 2024
Copy link

codecov bot commented Oct 9, 2024

Codecov Report

Attention: Patch coverage is 0% with 84 lines in your changes missing coverage. Please review.

Project coverage is 94.69%. Comparing base (a9d5bfe) to head (a815b5f).

Files with missing lines Patch % Lines
test/bench/bench.cpp 0.00% 73 Missing ⚠️
test/bench/helpers.hpp 0.00% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1043      +/-   ##
==========================================
+ Coverage   94.54%   94.69%   +0.14%     
==========================================
  Files         175      175              
  Lines       19702    19672      -30     
==========================================
  Hits        18628    18628              
+ Misses       1074     1044      -30     
Flag Coverage Δ
eest_gmp 15.29% <0.00%> (+0.02%) ⬆️
eof_execution_spec_tests 19.88% <0.00%> (+0.03%) ⬆️
ethereum_tests 21.49% <0.00%> (+0.03%) ⬆️
ethereum_tests_silkpre 18.30% <0.00%> (+0.02%) ⬆️
execution_spec_tests 18.57% <0.00%> (+0.02%) ⬆️
unittests 91.99% <0.00%> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
test/bench/helpers.hpp 0.00% <0.00%> (ø)
test/bench/bench.cpp 0.00% <0.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chfast chfast added the tests Testing infrastructure label May 14, 2025
@rodiazet rodiazet force-pushed the bench-fix branch 5 times, most recently from bd6083a to 9588b72 Compare May 19, 2025 08:59
@rodiazet rodiazet marked this pull request as ready for review May 19, 2025 09:05
@rodiazet rodiazet requested a review from chfast May 19, 2025 09:05
@chfast chfast changed the title test: Fix state test format benchmarking test: Handle full state tests in evmone-bench May 19, 2025
if (const auto it = registered_vms.find("advanced"); it != registered_vms.end())
advanced_vm = &it->second;
if (const auto it = registered_vms.find("baseline"); it != registered_vms.end())
baseline_vm = &it->second;
if (const auto it = registered_vms.find("bnocgoto"); it != registered_vms.end())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this VM removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because now all VMs are tested in the loop below. First two VMs are used to perform analysis. The bnocgoto did not have analysis run before. After moving all VMs to the loop this became unused.

#include <evmone/vm.hpp>

namespace evmone::test
{
extern std::map<std::string_view, evmc::VM> registered_vms;

constexpr auto default_revision = EVMC_ISTANBUL;
constexpr auto default_revision = EVMC_PRAGUE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe don't change this value if not needed. Later we will need to update synthetic tests or remove them in favor of EEST.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legcy change. Reverting

auto iteration_gas_used = int64_t{0};
for (auto _ : state)
{
const auto tx_props_or_error = state::validate_transaction(pre_state, block_info, tx, rev,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove transaction validation from the benchmark. You should also add a TODO to later register "validation" subcase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only validates the test but the time needed to execute this is not added to total benchmark time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Sorry misunderstood. I was referring the run_state_test. Validation can be removed.

@rodiazet rodiazet force-pushed the bench-fix branch 5 times, most recently from 6132839 to 63b9809 Compare May 20, 2025 14:00
rodiazet added 2 commits May 20, 2025 16:12
- Load benchmarks as proper state test.
- Support single file path.
- Remove support for benchmarking raw bytecode.
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the benchmarking infrastructure within evmone to support state tests by loading state test JSON files, removing old raw bytecode support, and ensuring that state tests pass before running benchmarks.

  • Updated CMake builds to include new test runner sources and link necessary GTest components.
  • Removed legacy benchmarking functions and added a new bench_transition helper to facilitate state transitions.
  • Refactored benchmark registration and argument parsing to support JSON-based state tests.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
test/statetest/CMakeLists.txt Removed raw bytecode support; added statetest_runner.cpp to the build.
test/bench/helpers.hpp Removed legacy execution functions and added bench_transition.
test/bench/bench.cpp Refactored benchmark registration, argument parsing, and state tests.

const auto name = "advanced/execute/" + case_name;
RegisterBenchmark(name, [&vm = *advanced_vm, &b, &input](State& state) {
bench_advanced_execute(state, vm, b.code, input.input, input.expected_output);
RegisterBenchmark("advanced/analyse/" + b.name, [code, &rev](State& state) {
Copy link
Preview

Copilot AI May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider capturing 'rev' by value instead of by reference in the lambda to ensure that its value is preserved correctly in the benchmark callback.

Suggested change
RegisterBenchmark("advanced/analyse/" + b.name, [code, &rev](State& state) {
RegisterBenchmark("advanced/analyse/" + b.name, [code, rev](State& state) {

Copilot uses AI. Check for mistakes.

constexpr auto bench_baseline_execute =
bench_execute<ExecutionState, baseline::CodeAnalysis, baseline_execute, baseline_analyse>;
using benchmark::Counter;
state.counters["gas_used"] = Counter(static_cast<double>(iteration_gas_used));
Copy link
Preview

Copilot AI May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Review whether the 'gas_used' counter should accumulate the total gas from all iterations rather than only reflecting the gas from the final iteration. Clarify the intention to prevent any misinterpretation of benchmark results.

Suggested change
state.counters["gas_used"] = Counter(static_cast<double>(iteration_gas_used));
state.counters["gas_used"] = Counter(static_cast<double>(total_gas_used));

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Testing infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants