Skip to content

[AP][InitialPlacement] Improved Initial Placement #2975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

AlexandreSinger
Copy link
Contributor

@AlexandreSinger AlexandreSinger commented Apr 13, 2025

Found that the Initial Placer stage of the AP flow (after APPack, but before Detailed Placement) was not working as expected. The intention was that clusters would be placed at their centroid location according to the flat placement, and if that site was illegal or taken it would take a nearby point instead (falling back on the original initial placer if nothing can be found).

To achieve this, I was using a method called find_centroid_neighbor which I thought would return the nearest legal location to the given location. This was not correct. This method just creates a bounding-box and tries to find a random point in that box around the given point. This was causing our AP flow to move clusters WAY farther than they wanted, which moved them into places other clusters wanted to go. This was also not exhaustive, so it was often falling back on the original approach which was putting clusters in practically random locations. All of this was causing the post-FL placement from the AP flow to actually have worse quality than the default AP flow!

To resolve this, I wrote the actual method I was intending. It performs a BFS-style search from the src location to all legal locations and returns the closest one. By doing this BFS on the compressed grid, I found that this is actually quite efficient. With these changes, I found that the quality of the post-FL placement more than doubled and the average atom displacement from the GP solution decrease dramatically.

Results on the 9 largest VTR benchmarks (with fixed IO):

  Post-FL WL Post-Route WL Total Runtime
No AP 1 1 1
AP Baseline 1.151589662 0.9310077442 1.477490535
AP Improved IP 0.5489232291 0.9195574479 1.352550022

As we can see, the quality of the post-FL placement was actually 15% worse on the VTR benchmark than the non-AP flow. This did not make sense since we are giving it more detailed information on where blocks should be placed relative to the objectives. After this change, things make a lot more sense. The quality of the post-FL placement is now 2x better than the non-AP flow. This translated to another 1% improvement in WL. The run time improved slightly; however, I think this may just be machine load. When observing Titan (which I am running now), I find that the FL stage increases in run time (due to IP taking a bit longer) and the DP decreases very slightly.

Some more information comparing the old AP implementation to the new:

  AP Baseline AP Improved IP Change Over Baseline
Post-FL WL 739521.555 352504.519 0.477
Post-Route WL 197028.628 194605.409 0.988
Post-FL Avg. Atom Displ. 29.324 10.492 0.358
Post-FL Max Atom Displ. 107.180 69.581 0.649
FL Run Time 42.818 40.380 0.943
DP Run Time 18.556 13.450 0.725
Total Run Time 106.627 97.611 0.915

We can see that the average atom displacement is 3x better now (being 10.5 tiles on average which is much more reasonable). The max atom displacement also greatly decreased. For the VTR results the FL and DP run times appear to have decreased; however, I would trust the Titan results once I get them!

I think we can see even more gains if we tune the annealer for the AP flow. Even though the quality of the placement is 2x better, the runtime of the annealer did not change very much and the quality did not change as much as you would expect. This should be investigated.

@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool lang-cpp C/C++ code labels Apr 13, 2025
@AlexandreSinger AlexandreSinger force-pushed the feature-ap-initial-placer branch 2 times, most recently from 76101c1 to 56a8b86 Compare April 13, 2025 20:29
Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great results! A few minor comments on the code for you to consider.

@AlexandreSinger AlexandreSinger force-pushed the feature-ap-initial-placer branch from 56a8b86 to fc34da3 Compare April 13, 2025 22:03
@AlexandreSinger
Copy link
Contributor Author

Titan Results (No fixed IO, channel width = 300):

  Post-FL WL Post-Route WL Total Runtime
No AP 1.000 1.000 1.000
AP Baseline 1.176 0.923 1.495
AP Improved IP 0.356 0.905 1.485

More information on the AP improvements:

  AP Baseline AP Improved IP Change Over Baseline
Post-FL WL 1.89E+07 5.73E+06 0.303
Post-Route WL 2.64E+06 2.59E+06 0.980
Post-FL Avg. Atom Displ. 63.735 14.084 0.221
Post-FL Max Atom Displ. 345.646 199.538 0.577
FL Run Time 659.955 667.575 1.012
DP Run Time 631.548 598.406 0.948
Total Run Time 2534.326 2517.297 0.993

On Titan, these changes made the quality of the post-legalized solution 3x better; with the AP flow now outperforming the baseline no AP glow by around 3x in quality before placement. This further reduced the post-routed WL by around 2%, making the AP flow almost 10% better in WL over non-AP.

Similar to the VTR results, the average atom displacement has been reduced by over 4x and the max atom displacement was reduced by 40%. The FL runtime increased slightly due to the increased logic in the initial placer, but looking through the results, directrf appears to have taken the longest for initial placement at ~160 seconds. I am not concerned about this since other large circuits like bitcoinminer took only 4 seconds. I think that directrf may just have a very illegal global placement solution.

The detailed placement run time reduced by 5%, which I find a bit suprising since I would expect the DP runtime to reduce a lot due to the improved initial placement quality. For many of the circuits, I am noticing that the annealer tends to make the placement solution worse, and not returning to the original quality after around 10 iterations (some circuits are less). For example, bitcoin_miner looks like this:
image
I think the starting temperature may be too high which is costing run time and perhaps even quality.

Overall this change is a win across the board, which is good news!

@vaughnbetz
Copy link
Contributor

Looks good. I agree; the wirelength bump is big at the start of annealing. You could try quenching (for a long time) instead, or (as you suggest) a lower initial T.
Looks like these are non-timing driven, so the cost increase can't be due to the annealer trying to fix up timing.

@AlexandreSinger AlexandreSinger force-pushed the feature-ap-initial-placer branch from fc34da3 to 136f3b4 Compare April 14, 2025 19:04
Found that the Initial Placer stage of the AP flow (after APPack, but
before Detailed Placement) was not working as expected. The intention
was that clusters would be placed at their centroid location accordin to
the flat placement, and if that site was illegal or taken it would take
a nearby point instead (falling back on the original initial placer if
nothing can be found).

To achieve this, I was using a method called find_centroid_neighbor
which I thought would return the nearest legal location to the given
location. This was not correct. This method just creates a bounding-box
and tries to find a random point in that box around the given point.
This was causing our AP flow to move clusters WAY farther than they
wanted, which moved them into places other clusters wanted to go. This
was also not exhaustive, so it was often falling back on the original
approach which was putting clusters in practically random locations. All
of this was causing the post-FL placement from the AP flow to actually
have worse quality than the default AP flow!

To resolve this, I wrote the actual method I was intending. It performs
a BFS-style search from the src location to all legal locations and
returns the closest one. By doing this BFS on the compressed grid, I
found that this is actually quite efficient. With these changes, I found
that the quality of the post-FL placement more than doubled and the
average atom displacement from the GP solution decrease dramatically.
@AlexandreSinger AlexandreSinger force-pushed the feature-ap-initial-placer branch from 136f3b4 to 2da44e5 Compare April 14, 2025 19:10
@AlexandreSinger AlexandreSinger merged commit 5090124 into verilog-to-routing:master Apr 14, 2025
36 checks passed
@AlexandreSinger AlexandreSinger deleted the feature-ap-initial-placer branch April 14, 2025 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants