[AP][InitialPlacement] Improved Initial Placement #2975

AlexandreSinger · 2025-04-13T13:54:44Z

Found that the Initial Placer stage of the AP flow (after APPack, but before Detailed Placement) was not working as expected. The intention was that clusters would be placed at their centroid location according to the flat placement, and if that site was illegal or taken it would take a nearby point instead (falling back on the original initial placer if nothing can be found).

To achieve this, I was using a method called find_centroid_neighbor which I thought would return the nearest legal location to the given location. This was not correct. This method just creates a bounding-box and tries to find a random point in that box around the given point. This was causing our AP flow to move clusters WAY farther than they wanted, which moved them into places other clusters wanted to go. This was also not exhaustive, so it was often falling back on the original approach which was putting clusters in practically random locations. All of this was causing the post-FL placement from the AP flow to actually have worse quality than the default AP flow!

To resolve this, I wrote the actual method I was intending. It performs a BFS-style search from the src location to all legal locations and returns the closest one. By doing this BFS on the compressed grid, I found that this is actually quite efficient. With these changes, I found that the quality of the post-FL placement more than doubled and the average atom displacement from the GP solution decrease dramatically.

Results on the 9 largest VTR benchmarks (with fixed IO):

	Post-FL WL	Post-Route WL	Total Runtime
No AP	1	1	1
AP Baseline	1.151589662	0.9310077442	1.477490535
AP Improved IP	0.5489232291	0.9195574479	1.352550022

As we can see, the quality of the post-FL placement was actually 15% worse on the VTR benchmark than the non-AP flow. This did not make sense since we are giving it more detailed information on where blocks should be placed relative to the objectives. After this change, things make a lot more sense. The quality of the post-FL placement is now 2x better than the non-AP flow. This translated to another 1% improvement in WL. The run time improved slightly; however, I think this may just be machine load. When observing Titan (which I am running now), I find that the FL stage increases in run time (due to IP taking a bit longer) and the DP decreases very slightly.

Some more information comparing the old AP implementation to the new:

	AP Baseline	AP Improved IP	Change Over Baseline
Post-FL WL	739521.555	352504.519	0.477
Post-Route WL	197028.628	194605.409	0.988
Post-FL Avg. Atom Displ.	29.324	10.492	0.358
Post-FL Max Atom Displ.	107.180	69.581	0.649
FL Run Time	42.818	40.380	0.943
DP Run Time	18.556	13.450	0.725
Total Run Time	106.627	97.611	0.915

We can see that the average atom displacement is 3x better now (being 10.5 tiles on average which is much more reasonable). The max atom displacement also greatly decreased. For the VTR results the FL and DP run times appear to have decreased; however, I would trust the Titan results once I get them!

I think we can see even more gains if we tune the annealer for the AP flow. Even though the quality of the placement is 2x better, the runtime of the annealer did not change very much and the quality did not change as much as you would expect. This should be investigated.

vaughnbetz

Great results! A few minor comments on the code for you to consider.

vpr/src/place/initial_placement.cpp

AlexandreSinger · 2025-04-14T04:10:02Z

Titan Results (No fixed IO, channel width = 300):

	Post-FL WL	Post-Route WL	Total Runtime
No AP	1.000	1.000	1.000
AP Baseline	1.176	0.923	1.495
AP Improved IP	0.356	0.905	1.485

More information on the AP improvements:

	AP Baseline	AP Improved IP	Change Over Baseline
Post-FL WL	1.89E+07	5.73E+06	0.303
Post-Route WL	2.64E+06	2.59E+06	0.980
Post-FL Avg. Atom Displ.	63.735	14.084	0.221
Post-FL Max Atom Displ.	345.646	199.538	0.577
FL Run Time	659.955	667.575	1.012
DP Run Time	631.548	598.406	0.948
Total Run Time	2534.326	2517.297	0.993

On Titan, these changes made the quality of the post-legalized solution 3x better; with the AP flow now outperforming the baseline no AP glow by around 3x in quality before placement. This further reduced the post-routed WL by around 2%, making the AP flow almost 10% better in WL over non-AP.

Similar to the VTR results, the average atom displacement has been reduced by over 4x and the max atom displacement was reduced by 40%. The FL runtime increased slightly due to the increased logic in the initial placer, but looking through the results, directrf appears to have taken the longest for initial placement at ~160 seconds. I am not concerned about this since other large circuits like bitcoinminer took only 4 seconds. I think that directrf may just have a very illegal global placement solution.

The detailed placement run time reduced by 5%, which I find a bit suprising since I would expect the DP runtime to reduce a lot due to the improved initial placement quality. For many of the circuits, I am noticing that the annealer tends to make the placement solution worse, and not returning to the original quality after around 10 iterations (some circuits are less). For example, bitcoin_miner looks like this:

I think the starting temperature may be too high which is costing run time and perhaps even quality.

Overall this change is a win across the board, which is good news!

vaughnbetz · 2025-04-14T04:14:39Z

Looks good. I agree; the wirelength bump is big at the start of annealing. You could try quenching (for a long time) instead, or (as you suggest) a lower initial T.
Looks like these are non-timing driven, so the cost increase can't be due to the annealer trying to fix up timing.

Found that the Initial Placer stage of the AP flow (after APPack, but before Detailed Placement) was not working as expected. The intention was that clusters would be placed at their centroid location accordin to the flat placement, and if that site was illegal or taken it would take a nearby point instead (falling back on the original initial placer if nothing can be found). To achieve this, I was using a method called find_centroid_neighbor which I thought would return the nearest legal location to the given location. This was not correct. This method just creates a bounding-box and tries to find a random point in that box around the given point. This was causing our AP flow to move clusters WAY farther than they wanted, which moved them into places other clusters wanted to go. This was also not exhaustive, so it was often falling back on the original approach which was putting clusters in practically random locations. All of this was causing the post-FL placement from the AP flow to actually have worse quality than the default AP flow! To resolve this, I wrote the actual method I was intending. It performs a BFS-style search from the src location to all legal locations and returns the closest one. By doing this BFS on the compressed grid, I found that this is actually quite efficient. With these changes, I found that the quality of the post-FL placement more than doubled and the average atom displacement from the GP solution decrease dramatically.

AlexandreSinger requested review from vaughnbetz and amin1377 April 13, 2025 13:54

github-actions bot added VPR lang-cpp labels Apr 13, 2025

AlexandreSinger force-pushed the feature-ap-initial-placer branch 2 times, most recently from 76101c1 to 56a8b86 Compare April 13, 2025 20:29

vaughnbetz reviewed Apr 13, 2025

View reviewed changes

AlexandreSinger force-pushed the feature-ap-initial-placer branch from 56a8b86 to fc34da3 Compare April 13, 2025 22:03

AlexandreSinger force-pushed the feature-ap-initial-placer branch from fc34da3 to 136f3b4 Compare April 14, 2025 19:04

AlexandreSinger force-pushed the feature-ap-initial-placer branch from 136f3b4 to 2da44e5 Compare April 14, 2025 19:10

AlexandreSinger merged commit 5090124 into verilog-to-routing:master Apr 14, 2025
36 checks passed

AlexandreSinger deleted the feature-ap-initial-placer branch April 14, 2025 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AP][InitialPlacement] Improved Initial Placement #2975

[AP][InitialPlacement] Improved Initial Placement #2975

AlexandreSinger commented Apr 13, 2025 •

edited

Loading

vaughnbetz left a comment

AlexandreSinger commented Apr 14, 2025

vaughnbetz commented Apr 14, 2025

[AP][InitialPlacement] Improved Initial Placement #2975

[AP][InitialPlacement] Improved Initial Placement #2975

Conversation

AlexandreSinger commented Apr 13, 2025 • edited Loading

vaughnbetz left a comment

Choose a reason for hiding this comment

AlexandreSinger commented Apr 14, 2025

vaughnbetz commented Apr 14, 2025

AlexandreSinger commented Apr 13, 2025 •

edited

Loading