-
Notifications
You must be signed in to change notification settings - Fork 196
Optimize Cairo 0 execution #2206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2.x.y
Are you sure you want to change the base?
Conversation
By changing the compile_hint parameter type to `Arc` instead of `Rc`, we can reuse the constants that are already included in `Program`. This avoids cloning all the constants. With this commit, there is a 9.8% performance increase when replaying mainnet block 10000, compared to 2.5.0.
With this commit, there is a 10.5% performance improvement while executing block 1000, compared to 2.5.0
With this commit, there is a 14.4% improvement while executing mainnet block 10000, compared to 2.5.0
Benchmark Results for unmodified programs 🚀
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## 2.x.y #2206 +/- ##
=======================================
Coverage 96.66% 96.66%
=======================================
Files 103 103
Lines 43646 43683 +37
=======================================
+ Hits 42191 42228 +37
Misses 1455 1455 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
if segment.len() < value_offset + vals.len() { | ||
segment.reserve(value_offset + vals.len() - segment.len()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we can remove the if since the documentation of reserve()
says:
Does nothing if capacity is already sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe they refer to different things:
segment.len() < value_offset + vals.len()
checks whether the length of the segment is enough for holding the new elements. If not, it reserves capacity for the additional elements.- the inner function checks whether the capacity of the segment is enough for holding the new elements. The capacity refers to the vector's allocated memory.
The if is required, as we only need to reserve new elements if the segment's length is not enough already. Without the condition, we would be sometimes be calling reserve
with a negative argument (which would cause underflow as we are using a usize
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...we would be sometimes be calling reserve with a negative argument.
I thought that value_offset
is always higher than segment.len()
.
Maybe I have the wrong understanding about segments, but if the lenght of a segment is 5. Doesn't it mean it has 5 allocated elements? If that is the case, having an offset lower than the lenght means it would want to write on already used memory which is something that it cannot be done, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that value_offset is always higher than segment.len().
I think that in load_data
that is usually the case, but I'm not sure it would happen always. Consider the following segment:
[NONE, NONE, NONE, 10, 20]
We may want to call insert_all
to insert 3 elements at the start of the segment. In that case, there is no need to reserve more space. Note that having NONE
is completely valid in a segment, those are commonly known as "memory gaps".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, I forgot you could have those. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, insert_all
is generic and supports the use case of inserting multiple elements at the middle of a segment.
If we make sure that load_data
can only insert elements at the end of a segment, we could have another method (i.e. extend_at
), only used for when inserting elements at the end of a segment. This could improve performance.
segment.resize(value_offset, MemoryCell::NONE); | ||
} | ||
// Insert new elements. | ||
let last_element_to_replace = segment.len().min(value_offset + vals.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn´t the last index always value_offset + vals.len()
? I don´t get in which case the segments len would be higher than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior of splice
is a bit tricky.
It receives two arguments:
- The range to replace.
- the elements to replace it with.
The length of the range and the length of the replacement does not need to coincide. For example, consider the following array:
[0, 1, 2, 3, 4, 5]
If we want to insert [6,7,8]
at index 4
, we would be inserting 3 elements, but replacing only 2. The splice call would look like this:
splice(4..6, [6,7,8])
The result would look like this:
[0, 1, 2, 3, 6, 7, 8]
The following, instead, fails with index out of bounds, because we are replacing an element that does not exist.
splice(4..7, [6,7,8])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh I see. Awesome, thanks!
This reverts commit 27d314d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Optimize Cairo 0 execution
Description
This PR includes 2 minor optimizations, mainly targeted to Cairo 0 executions. I benchmarked block 10000 and compared the execution with v2.5.0. All benchmarks were ran in my M4 Macbook Pro.
a8daa0a: Using with_capacity to avoid reallocs when inserting to HashMap, in
get_ids_data
.b4c8768: Use insert_all to load contiguous memory cells all at once, instead of one at a time.
Benchmarks
I replayed multiple block ranges with my Macbook M4 Pro: