Skip to content

Conversation

JulianGCalderon
Copy link
Contributor

@JulianGCalderon JulianGCalderon commented Sep 23, 2025

Optimize Cairo 0 execution

Description

This PR includes 2 minor optimizations, mainly targeted to Cairo 0 executions. I benchmarked block 10000 and compared the execution with v2.5.0. All benchmarks were ran in my M4 Macbook Pro.

  • a8daa0a: Using with_capacity to avoid reallocs when inserting to HashMap, in get_ids_data.

  • b4c8768: Use insert_all to load contiguous memory cells all at once, instead of one at a time.

Benchmarks

I replayed multiple block ranges with my Macbook M4 Pro:

  • Mainnet 10000 - 5% improvement
  • Mainnet 20000 to 20010 - 5% improvement
  • Mainnet 2000000 to 2000010 - 2% improvement

By changing the compile_hint parameter type to `Arc` instead of `Rc`, we
can reuse the constants that are already included in `Program`. This
avoids cloning all the constants.

With this commit, there is a 9.8% performance increase when replaying
mainnet block 10000, compared to 2.5.0.
With this commit, there is a 10.5% performance improvement while
executing block 1000, compared to 2.5.0
With this commit, there is a 14.4% improvement while executing mainnet
block 10000, compared to 2.5.0
Copy link

github-actions bot commented Sep 23, 2025

Benchmark Results for unmodified programs 🚀

Command Mean [s] Min [s] Max [s] Relative
base big_factorial 2.135 ± 0.018 2.113 2.168 1.00 ± 0.01
head big_factorial 2.133 ± 0.008 2.124 2.152 1.00
Command Mean [s] Min [s] Max [s] Relative
base big_fibonacci 2.058 ± 0.006 2.047 2.066 1.00
head big_fibonacci 2.062 ± 0.016 2.044 2.096 1.00 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base blake2s_integration_benchmark 7.681 ± 0.063 7.612 7.818 1.00
head blake2s_integration_benchmark 7.701 ± 0.153 7.596 8.105 1.00 ± 0.02
Command Mean [s] Min [s] Max [s] Relative
base compare_arrays_200000 2.193 ± 0.010 2.179 2.207 1.01 ± 0.01
head compare_arrays_200000 2.173 ± 0.009 2.159 2.183 1.00
Command Mean [s] Min [s] Max [s] Relative
base dict_integration_benchmark 1.426 ± 0.006 1.420 1.437 1.00
head dict_integration_benchmark 1.431 ± 0.018 1.417 1.480 1.00 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base field_arithmetic_get_square_benchmark 1.232 ± 0.008 1.220 1.245 1.01 ± 0.01
head field_arithmetic_get_square_benchmark 1.223 ± 0.006 1.210 1.231 1.00
Command Mean [s] Min [s] Max [s] Relative
base integration_builtins 7.764 ± 0.034 7.706 7.806 1.00
head integration_builtins 7.775 ± 0.020 7.742 7.807 1.00 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base keccak_integration_benchmark 8.055 ± 0.145 7.922 8.327 1.00 ± 0.04
head keccak_integration_benchmark 8.018 ± 0.240 7.895 8.695 1.00
Command Mean [s] Min [s] Max [s] Relative
base linear_search 2.163 ± 0.009 2.145 2.174 1.00
head linear_search 2.167 ± 0.038 2.140 2.268 1.00 ± 0.02
Command Mean [s] Min [s] Max [s] Relative
base math_cmp_and_pow_integration_benchmark 1.516 ± 0.006 1.508 1.528 1.00 ± 0.01
head math_cmp_and_pow_integration_benchmark 1.514 ± 0.021 1.500 1.572 1.00
Command Mean [s] Min [s] Max [s] Relative
base math_integration_benchmark 1.469 ± 0.009 1.459 1.486 1.01 ± 0.01
head math_integration_benchmark 1.459 ± 0.006 1.447 1.467 1.00
Command Mean [s] Min [s] Max [s] Relative
base memory_integration_benchmark 1.224 ± 0.003 1.219 1.229 1.01 ± 0.01
head memory_integration_benchmark 1.211 ± 0.006 1.202 1.219 1.00
Command Mean [s] Min [s] Max [s] Relative
base operations_with_data_structures_benchmarks 1.569 ± 0.008 1.562 1.587 1.00
head operations_with_data_structures_benchmarks 1.578 ± 0.012 1.568 1.613 1.01 ± 0.01
Command Mean [ms] Min [ms] Max [ms] Relative
base pedersen 530.4 ± 3.3 526.9 537.6 1.00
head pedersen 531.4 ± 3.1 528.0 536.5 1.00 ± 0.01
Command Mean [ms] Min [ms] Max [ms] Relative
base poseidon_integration_benchmark 637.7 ± 7.7 625.5 654.8 1.01 ± 0.01
head poseidon_integration_benchmark 629.4 ± 3.9 622.5 637.1 1.00
Command Mean [s] Min [s] Max [s] Relative
base secp_integration_benchmark 1.853 ± 0.013 1.843 1.883 1.01 ± 0.01
head secp_integration_benchmark 1.832 ± 0.017 1.814 1.869 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base set_integration_benchmark 633.2 ± 2.1 629.8 637.1 1.04 ± 0.01
head set_integration_benchmark 610.4 ± 2.7 604.6 613.4 1.00
Command Mean [s] Min [s] Max [s] Relative
base uint256_integration_benchmark 4.281 ± 0.067 4.243 4.465 1.01 ± 0.02
head uint256_integration_benchmark 4.241 ± 0.014 4.225 4.270 1.00

Copy link

codecov bot commented Sep 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.66%. Comparing base (065c8f4) to head (4612ef6).

Additional details and impacted files
@@           Coverage Diff           @@
##            2.x.y    #2206   +/-   ##
=======================================
  Coverage   96.66%   96.66%           
=======================================
  Files         103      103           
  Lines       43646    43683   +37     
=======================================
+ Hits        42191    42228   +37     
  Misses       1455     1455           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JulianGCalderon JulianGCalderon marked this pull request as ready for review September 23, 2025 22:23
Comment on lines +764 to +766
if segment.len() < value_offset + vals.len() {
segment.reserve(value_offset + vals.len() - segment.len());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can remove the if since the documentation of reserve() says:

Does nothing if capacity is already sufficient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe they refer to different things:

  • segment.len() < value_offset + vals.len() checks whether the length of the segment is enough for holding the new elements. If not, it reserves capacity for the additional elements.
  • the inner function checks whether the capacity of the segment is enough for holding the new elements. The capacity refers to the vector's allocated memory.

The if is required, as we only need to reserve new elements if the segment's length is not enough already. Without the condition, we would be sometimes be calling reserve with a negative argument (which would cause underflow as we are using a usize).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...we would be sometimes be calling reserve with a negative argument.

I thought that value_offset is always higher than segment.len().

Maybe I have the wrong understanding about segments, but if the lenght of a segment is 5. Doesn't it mean it has 5 allocated elements? If that is the case, having an offset lower than the lenght means it would want to write on already used memory which is something that it cannot be done, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that value_offset is always higher than segment.len().

I think that in load_data that is usually the case, but I'm not sure it would happen always. Consider the following segment:

[NONE, NONE, NONE, 10, 20]

We may want to call insert_all to insert 3 elements at the start of the segment. In that case, there is no need to reserve more space. Note that having NONE is completely valid in a segment, those are commonly known as "memory gaps".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, I forgot you could have those. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, insert_all is generic and supports the use case of inserting multiple elements at the middle of a segment.

If we make sure that load_data can only insert elements at the end of a segment, we could have another method (i.e. extend_at), only used for when inserting elements at the end of a segment. This could improve performance.

segment.resize(value_offset, MemoryCell::NONE);
}
// Insert new elements.
let last_element_to_replace = segment.len().min(value_offset + vals.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn´t the last index always value_offset + vals.len()? I don´t get in which case the segments len would be higher than that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior of splice is a bit tricky.

It receives two arguments:

  • The range to replace.
  • the elements to replace it with.

The length of the range and the length of the replacement does not need to coincide. For example, consider the following array:

[0, 1, 2, 3, 4, 5]

If we want to insert [6,7,8] at index 4, we would be inserting 3 elements, but replacing only 2. The splice call would look like this:

splice(4..6, [6,7,8])

The result would look like this:

[0, 1, 2, 3, 6, 7, 8]

The following, instead, fails with index out of bounds, because we are replacing an element that does not exist.

splice(4..7, [6,7,8])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh I see. Awesome, thanks!

Copy link
Contributor

@DiegoCivi DiegoCivi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants