Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If your work is part of a larger effort, please discuss your general plans on Discourse first to align your vision with maintainers.
https://yosyshq.discourse.group/t/parallel-optmergepass-implementation/
What are the reasons/motivation for this change?
Massive speedups on large modules. On a particular real-world flattened design with millions of cells, at 20 cores I get 14.5x speedup on an
opt_mergethat takes 15 iterations internally. On a 48-core machine speedup levels out at 25x.Another way to get massive speedups would be to make
opt_mergeincremental, i.e. at each step only consider merging cells whose state or connected cells have changed since the last iteration ofopt_merge. That would be more efficient, but also more invasive since information about what's changed would have to be kept up to date across passes. Anyway, see the discussion in the Discourse thread.Explain how this is achieved.
This is the same algorithm presented in the Discourse thread, but over the last few months we've upgraded RTLIL to be thread-safe for read-only access, so there is no extra TRTLIL layer anymore and a lot less code needs to change here.
I'm not sure why but this is actually faster than existing
opt_mergeeven with YOSYS_MAX_THREADS=1, for the jpeg synthesis test. 16.0s before, 15.5s after for end-to-end synthesis.