-
Notifications
You must be signed in to change notification settings - Fork 408
[AP][InitialPlacement] Improved Initial Placement #2975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AP][InitialPlacement] Improved Initial Placement #2975
Conversation
76101c1
to
56a8b86
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great results! A few minor comments on the code for you to consider.
56a8b86
to
fc34da3
Compare
Looks good. I agree; the wirelength bump is big at the start of annealing. You could try quenching (for a long time) instead, or (as you suggest) a lower initial T. |
fc34da3
to
136f3b4
Compare
Found that the Initial Placer stage of the AP flow (after APPack, but before Detailed Placement) was not working as expected. The intention was that clusters would be placed at their centroid location accordin to the flat placement, and if that site was illegal or taken it would take a nearby point instead (falling back on the original initial placer if nothing can be found). To achieve this, I was using a method called find_centroid_neighbor which I thought would return the nearest legal location to the given location. This was not correct. This method just creates a bounding-box and tries to find a random point in that box around the given point. This was causing our AP flow to move clusters WAY farther than they wanted, which moved them into places other clusters wanted to go. This was also not exhaustive, so it was often falling back on the original approach which was putting clusters in practically random locations. All of this was causing the post-FL placement from the AP flow to actually have worse quality than the default AP flow! To resolve this, I wrote the actual method I was intending. It performs a BFS-style search from the src location to all legal locations and returns the closest one. By doing this BFS on the compressed grid, I found that this is actually quite efficient. With these changes, I found that the quality of the post-FL placement more than doubled and the average atom displacement from the GP solution decrease dramatically.
136f3b4
to
2da44e5
Compare
Found that the Initial Placer stage of the AP flow (after APPack, but before Detailed Placement) was not working as expected. The intention was that clusters would be placed at their centroid location according to the flat placement, and if that site was illegal or taken it would take a nearby point instead (falling back on the original initial placer if nothing can be found).
To achieve this, I was using a method called find_centroid_neighbor which I thought would return the nearest legal location to the given location. This was not correct. This method just creates a bounding-box and tries to find a random point in that box around the given point. This was causing our AP flow to move clusters WAY farther than they wanted, which moved them into places other clusters wanted to go. This was also not exhaustive, so it was often falling back on the original approach which was putting clusters in practically random locations. All of this was causing the post-FL placement from the AP flow to actually have worse quality than the default AP flow!
To resolve this, I wrote the actual method I was intending. It performs a BFS-style search from the src location to all legal locations and returns the closest one. By doing this BFS on the compressed grid, I found that this is actually quite efficient. With these changes, I found that the quality of the post-FL placement more than doubled and the average atom displacement from the GP solution decrease dramatically.
Results on the 9 largest VTR benchmarks (with fixed IO):
As we can see, the quality of the post-FL placement was actually 15% worse on the VTR benchmark than the non-AP flow. This did not make sense since we are giving it more detailed information on where blocks should be placed relative to the objectives. After this change, things make a lot more sense. The quality of the post-FL placement is now 2x better than the non-AP flow. This translated to another 1% improvement in WL. The run time improved slightly; however, I think this may just be machine load. When observing Titan (which I am running now), I find that the FL stage increases in run time (due to IP taking a bit longer) and the DP decreases very slightly.
Some more information comparing the old AP implementation to the new:
We can see that the average atom displacement is 3x better now (being 10.5 tiles on average which is much more reasonable). The max atom displacement also greatly decreased. For the VTR results the FL and DP run times appear to have decreased; however, I would trust the Titan results once I get them!
I think we can see even more gains if we tune the annealer for the AP flow. Even though the quality of the placement is 2x better, the runtime of the annealer did not change very much and the quality did not change as much as you would expect. This should be investigated.