Skip to content

[pull] master from verilog-to-routing:master #565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9,384 commits into
base: master
Choose a base branch
from

Conversation

pull[bot]
Copy link

@pull pull bot commented Dec 2, 2020

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

amin1377 and others added 30 commits April 2, 2025 18:44
The AP flow has many tunable knobs which trade-off quality and run time.
Went through each of the knobs to find a good combination.

Updates to the partial legalizer:
- Reversed the order that unplaced large blocks are inserted into partitions.
- Increased the bin cluster gap from 1 to 2
On the largest VTR benchmarks, this decreased the number of overfilled
bins after legalization by 15% and the average overfill of each of those
bins by 40%.
On Titan, the number of overfilled bins decreased by 32% and the average
overfill decreased by 2.5%.

Updates to the analytical solver and global placer:
- Allowed the B2B solver to stop early if it seems to be converging.
- Changed the anchor weights from a linearized term to a quadratic term.
- Decreased the distance epsilon from 0.5 to 0.01.
- Increased the max number of B2B solver iterations from 6 to 24
- Decreased the CG iteration cap from 200 to 150.
- The global placer saves the best legalized placement it has seen and
  returns it as its final result.
On the largest VTR benchmarks, this decreased the post GP HPWL by 22%
and decreased the GP run time by 17%.
On Titan, the post GP HPWL decreased by 25%, and the GP run time
decreased by 19%.

Updates to APPack:
- Decreased the max candidate distance from 0.5 (W + H) to 0.1 (W + H)
  for logical blocks.
- Decreased the max candidate distance for all other blocks to 0.35 (W +
  H)
- Lowered the attenuation distance threshold from 2.0 to 1.75.
- Decreased the attenuation value at the distance threshold to 0.35.
- Increased the max unrelated clustering distance from 1 to 5.
- Increased the max number of unrelated clustering attempts from 2 to
  10.
- Turned off all APPack optimization for RAM blocks.
On the largest VTR benchmarks, this decreased the wirelength by 2% over
the un-tuned AP flow, with a 2.8% decreased pack time.
On Titan, the post FL wirelength decreased by 6% and the post routing
wirelength decreased by 2.6%, with a 0.7% decrease in pack time.

Updates to initial placement:
- Fixed oversight with how the centroid was being calculated.
- Increased the range limit when searching for nearby locations when the
  location a cluster wants is take from 15 to 60.
This further improved the post routing wirelength of Titan to 4.4%
better than the un-tuned AP flow.

I found that there are a lot of issues with the initial placement which
may be blocking a large amount of gains. Will be investigating the
initial placement code soon.
…s/libs/EXTERNAL/libcatch2-76f70b1

Bump libs/EXTERNAL/libcatch2 from `914aeec` to `76f70b1`
The AP flow makes its own prepacker which it uses throughout. However, a
full legalizer in the AP flow (APPack) uses the try_pack method which
creates its own prepacker. This creates two independent prepacker
objects when only one is needed.

Move the construction of the prepacker object into vpr_api and have it
get passed into the try_pack function.
[VTR Script] Add Run Number to Parse Script
[Prepacker] Moved the Prepacker Out of Try Pack
[Place] rename get_initial_move_lim to get_place_inner_loop_num_move
…-run

[vtr_flow] Changed Triggers for Second Run
…format

Set InsertNewlineAtEOF in .clang-format file
Timing was intermixed into the packer. It appears as though the code
originally was designed to recalculate the timing information every so
often in the packer, but the idea was abandoned. This left timing code
in disperse locations around the Packer and the timing was being
recomputed every time clustering was restarted which was unecessary.

Collecting all of the timing information from the Packer into a single
object called PreClusterTimingManager which abstracts all of the timing
info in the Packer.

The ultimate goal is to bring this Manager class into the AP flow to be
used together with the Global Placer. By sharing this manager class, the
AP flow may be able to update the timing info with flat placement
information to make the timing more accurate.
…ager

[Pack][Timing] Abstracted How Timing is Used in the Packer
Added basic timing awareness to the AP flow by weighting nets in the AP
solver by their criticality (the max criticality of all edges through
that net). This makes the solver try to minimize the length of nets that
are more critical more than nets that are less critical (according to
the pre-clustering timing analyzer).

Added a command-line option to tradeoff between timing and wirelength in
the AP flow.
[AP][Timing] Added Basic Net Weighting
[AP][Test] Added Titan Nightly Test of WL-Driven AP Flow
Found that the Initial Placer stage of the AP flow (after APPack, but
before Detailed Placement) was not working as expected. The intention
was that clusters would be placed at their centroid location accordin to
the flat placement, and if that site was illegal or taken it would take
a nearby point instead (falling back on the original initial placer if
nothing can be found).

To achieve this, I was using a method called find_centroid_neighbor
which I thought would return the nearest legal location to the given
location. This was not correct. This method just creates a bounding-box
and tries to find a random point in that box around the given point.
This was causing our AP flow to move clusters WAY farther than they
wanted, which moved them into places other clusters wanted to go. This
was also not exhaustive, so it was often falling back on the original
approach which was putting clusters in practically random locations. All
of this was causing the post-FL placement from the AP flow to actually
have worse quality than the default AP flow!

To resolve this, I wrote the actual method I was intending. It performs
a BFS-style search from the src location to all legal locations and
returns the closest one. By doing this BFS on the compressed grid, I
found that this is actually quite efficient. With these changes, I found
that the quality of the post-FL placement more than doubled and the
average atom displacement from the GP solution decrease dramatically.
[AP][InitialPlacement] Improved Initial Placement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants