diff --git a/doc/src/api/vprinternals/router_connection_router.rst b/doc/src/api/vprinternals/router_connection_router.rst new file mode 100644 index 00000000000..32a7c7dc673 --- /dev/null +++ b/doc/src/api/vprinternals/router_connection_router.rst @@ -0,0 +1,18 @@ +========== +Connection Router +========== + +ConnectionRouter +--------- +.. doxygenfile:: connection_router.h + :project: vpr + +SerialConnectionRouter +---------- +.. doxygenclass:: SerialConnectionRouter + :project: vpr + +ParallelConnectionRouter +---------- +.. doxygenclass:: ParallelConnectionRouter + :project: vpr diff --git a/doc/src/api/vprinternals/vpr_router.rst b/doc/src/api/vprinternals/vpr_router.rst index 63624cd8b39..5e72894aba7 100644 --- a/doc/src/api/vprinternals/vpr_router.rst +++ b/doc/src/api/vprinternals/vpr_router.rst @@ -9,3 +9,4 @@ VPR Router router_heap router_lookahead + router_connection_router diff --git a/doc/src/vpr/command_line_usage.rst b/doc/src/vpr/command_line_usage.rst index f21ee85f1eb..35be3118b40 100644 --- a/doc/src/vpr/command_line_usage.rst +++ b/doc/src/vpr/command_line_usage.rst @@ -47,12 +47,12 @@ By default VPR will perform a binary search routing to find the minimum channel Detailed Command-line Options ----------------------------- -VPR has a lot of options. Running :option:`vpr --help` will display all the available options and their usage information. +VPR has a lot of options. Running :option:`vpr --help` will display all the available options and their usage information. .. option:: -h, --help Display help message then exit. - + The options most people will be interested in are: * :option:`--route_chan_width` (route at a fixed channel width), and @@ -208,7 +208,7 @@ General Options * Any string matching ``name`` attribute of a device layout defined with a ```` tag in the :ref:`arch_grid_layout` section of the architecture file. If the value specified is neither ``auto`` nor matches the ``name`` attribute value of a ```` tag, VPR issues an error. - + .. note:: If the only layout in the architecture file is a single device specified using ````, it is recommended to always specify the ``--device`` option; this prevents the value ``--device auto`` from interfering with operations supported only for ```` grids. **Default:** ``auto`` @@ -900,7 +900,7 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe .. option:: --place_agent_algorithm {e_greedy | softmax} - Controls which placement RL agent is used. + Controls which placement RL agent is used. **Default:** ``softmax`` @@ -922,10 +922,10 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe .. option:: --place_reward_fun {basic | nonPenalizing_basic | runtime_aware | WLbiased_runtime_aware} - The reward function used by the placement RL agent to learn the best action at each anneal stage. + The reward function used by the placement RL agent to learn the best action at each anneal stage. + + .. note:: The latter two are only available for timing-driven placement. - .. note:: The latter two are only available for timing-driven placement. - **Default:** ``WLbiased_runtime_aware`` .. option:: --place_agent_space {move_type | move_block_type} @@ -935,20 +935,20 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe **Default:** ``move_block_type`` .. option:: --place_quench_only {on | off} - + If this option is set to ``on``, the placement will skip the annealing phase and only perform the placement quench. - This option is useful when the the quality of initial placement is good enough and there is no need to perform the + This option is useful when the the quality of initial placement is good enough and there is no need to perform the annealing phase. **Default:** ``off`` .. option:: --placer_debug_block - + .. note:: This option is likely only of interest to developers debugging the placement algorithm - Controls which block the placer produces detailed debug information for. - + Controls which block the placer produces detailed debug information for. + If the block being moved has the same ID as the number assigned to this parameter, the placer will print debugging information about it. * For values >= 0, the value is the block ID for which detailed placer debug information should be produced. @@ -960,7 +960,7 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe **Default:** ``-2`` .. option:: --placer_debug_net - + .. note:: This option is likely only of interest to developers debugging the placement algorithm Controls which net the placer produces detailed debug information for. @@ -1004,7 +1004,7 @@ The following options are only valid when the placement engine is in timing-driv .. option:: --quench_recompute_divider - Controls how many times the placer performs a timing analysis to update its criticality estimates during a quench. + Controls how many times the placer performs a timing analysis to update its criticality estimates during a quench. If unspecified, uses the value from --inner_loop_recompute_divider. **Default:** ``0`` @@ -1088,7 +1088,7 @@ The following options are only valid when the placement engine is in timing-driv NoC Options ^^^^^^^^^^^^^^ -The following options are only used when FPGA device and netlist contain a NoC router. +The following options are only used when FPGA device and netlist contain a NoC router. .. option:: --noc {on | off} @@ -1098,7 +1098,7 @@ The following options are only used when FPGA device and netlist contain a NoC r **Default:** ``off`` .. option:: --noc_flows_file - + XML file containing the list of traffic flows within the NoC (communication between routers). .. note:: noc_flows_file are required to specify if NoC optimization is turned on (--noc on). @@ -1106,7 +1106,7 @@ The following options are only used when FPGA device and netlist contain a NoC r .. option:: --noc_routing_algorithm {xy_routing | bfs_routing | west_first_routing | north_last_routing | negative_first_routing | odd_even_routing} Controls the algorithm used by the NoC to route packets. - + * ``xy_routing`` Uses the direction oriented routing algorithm. This is recommended to be used with mesh NoC topologies. * ``bfs_routing`` Uses the breadth first search algorithm. The objective is to find a route that uses a minimum number of links. This algorithm is not guaranteed to generate deadlock-free traffic flow routes, but can be used with any NoC topology. * ``west_first_routing`` Uses the west-first routing algorithm. This is recommended to be used with mesh NoC topologies. @@ -1119,11 +1119,11 @@ The following options are only used when FPGA device and netlist contain a NoC r .. option:: --noc_placement_weighting Controls the importance of the NoC placement parameters relative to timing and wirelength of the design. - + * ``noc_placement_weighting = 0`` means the placement is based solely on timing and wirelength. * ``noc_placement_weighting = 1`` means noc placement is considered equal to timing and wirelength. * ``noc_placement_weighting > 1`` means the placement is increasingly dominated by NoC parameters. - + **Default:** ``5.0`` .. option:: --noc_aggregate_bandwidth_weighting @@ -1141,7 +1141,7 @@ The following options are only used when FPGA device and netlist contain a NoC r Other positive numbers specify the importance of meeting latency constraints compared to other NoC-related cost terms. Weighting factors for NoC-related cost terms are normalized internally. Therefore, their absolute values are not important, and only their relative ratios determine the importance of each cost term. - + **Default:** ``0.6`` .. option:: --noc_latency_weighting @@ -1151,7 +1151,7 @@ The following options are only used when FPGA device and netlist contain a NoC r Other positive numbers specify the importance of minimizing aggregate latency compared to other NoC-related cost terms. Weighting factors for NoC-related cost terms are normalized internally. Therefore, their absolute values are not important, and only their relative ratios determine the importance of each cost term. - + **Default:** ``0.02`` .. option:: --noc_congestion_weighting @@ -1167,11 +1167,11 @@ The following options are only used when FPGA device and netlist contain a NoC r .. option:: --noc_swap_percentage Sets the minimum fraction of swaps attempted by the placer that are NoC blocks. - This value is an integer ranging from [0-100]. - - * ``0`` means NoC blocks will be moved at the same rate as other blocks. + This value is an integer ranging from [0-100]. + + * ``0`` means NoC blocks will be moved at the same rate as other blocks. * ``100`` means all swaps attempted by the placer are NoC router blocks. - + **Default:** ``0`` .. option:: --noc_placement_file_name @@ -1257,7 +1257,7 @@ Analytical Placement is generally split into three stages: * ``none`` Do not use any Detailed Placer. - * ``annealer`` Use the Annealer from the Placement stage as a Detailed Placer. This will use the same Placer Options from the Place stage to configure the annealer. + * ``annealer`` Use the Annealer from the Placement stage as a Detailed Placer. This will use the same Placer Options from the Place stage to configure the annealer. **Default:** ``annealer`` @@ -1343,8 +1343,8 @@ VPR uses a negotiated congestion algorithm (based on Pathfinder) to perform rout .. option:: --max_pres_fac - Sets the maximum present overuse penalty factor that can ever result during routing. Should always be less than 1e25 or so to prevent overflow. - Smaller values may help prevent circuitous routing in difficult routing problems, but may increase + Sets the maximum present overuse penalty factor that can ever result during routing. Should always be less than 1e25 or so to prevent overflow. + Smaller values may help prevent circuitous routing in difficult routing problems, but may increase the number of routing iterations needed and hence runtime. **Default:** ``1000.0`` @@ -1423,7 +1423,7 @@ VPR uses a negotiated congestion algorithm (based on Pathfinder) to perform rout .. option:: --router_algorithm {timing_driven | parallel | parallel_decomp} - Selects which router algorithm to use. + Selects which router algorithm to use. * ``timing_driven`` is the default single-threaded PathFinder algorithm. @@ -1505,13 +1505,90 @@ The following options are only valid when the router is in timing-driven mode (t **Default:** ``0.0`` .. option:: --router_profiler_astar_fac - + Controls the directedness of the timing-driven router's exploration when doing router delay profiling of an architecture. The router delay profiling step is currently used to calculate the place delay matrix lookup. Values between 1 and 2 are resonable; higher values trade some quality for reduced run-time. **Default:** ``1.2`` +.. option:: --enable_parallel_connection_router {on | off} + + Controls whether the MultiQueue-based parallel connection router is used during a single connection routing. + + When enabled, the parallel connection router accelerates the path search for individual source-sink connections using + multi-threading without altering the net routing order. + + **Default:** ``off`` + +.. option:: --post_target_prune_fac + + Controls the post-target pruning heuristic calculation in the parallel connection router. + + This parameter is used as a multiplicative factor applied to the VPR heuristic (not guaranteed to be admissible, i.e., + might over-predict the cost to the sink) to calculate the 'stopping heuristic' when pruning nodes after the target has + been reached. The 'stopping heuristic' must be admissible for the path search algorithm to guarantee optimal paths and + be deterministic. + + Values of this parameter are architecture-specific and have to be empirically found. + + This parameter has no effect if :option:`--enable_parallel_connection_router` is not set. + + **Default:** ``1.2`` + +.. option:: --post_target_prune_offset + + Controls the post-target pruning heuristic calculation in the parallel connection router. + + This parameter is used as a subtractive offset together with :option:`--post_target_prune_fac` to apply an affine + transformation on the VPR heuristic to calculate the 'stopping heuristic'. The 'stopping heuristic' must be admissible + for the path search algorithm to guarantee optimal paths and be deterministic. + + Values of this parameter are architecture-specific and have to be empirically found. + + This parameter has no effect if :option:`--enable_parallel_connection_router` is not set. + + **Default:** ``0.0`` + +.. option:: --multi_queue_num_threads + + Controls the number of threads used by MultiQueue-based parallel connection router. + + If not explicitly specified, defaults to 1, implying the parallel connection router works in 'serial' mode using only + one main thread to route. + + This parameter has no effect if :option:`--enable_parallel_connection_router` is not set. + + **Default:** ``1`` + +.. option:: --multi_queue_num_queues + + Controls the number of queues used by MultiQueue in the parallel connection router. + + Must be set >= 2. A common configuration for this parameter is the number of threads used by MultiQueue * 4 (the number + of queues per thread). + + This parameter has no effect if :option:`--enable_parallel_connection_router` is not set. + + **Default:** ``2`` + +.. option:: --multi_queue_direct_draining {on | off} + + Controls whether to enable queue draining optimization for MultiQueue-based parallel connection router. + + When enabled, queues can be emptied quickly by draining all elements if no further solutions need to be explored after + the target is reached in the path search. + + Note: For this optimization to maintain optimality and deterministic results, the 'ordering heuristic' (calculated by + :option:`--astar_fac` and :option:`--astar_offset`) must be admissible to ensure emptying queues of entries with higher + costs does not prune possibly superior solutions. However, you can still enable this optimization regardless of whether + optimality and determinism are required for your specific use case (in such cases, the 'ordering heuristic' can be + inadmissible). + + This parameter has no effect if :option:`--enable_parallel_connection_router` is not set. + + **Default:** ``off`` + .. option:: --max_criticality Sets the maximum fraction of routing cost that can come from delay (vs. coming from routability) for any net. diff --git a/utils/route_diag/src/main.cpp b/utils/route_diag/src/main.cpp index 5074d79cc09..7d4edbe603a 100644 --- a/utils/route_diag/src/main.cpp +++ b/utils/route_diag/src/main.cpp @@ -97,7 +97,7 @@ static void do_one_route(const Netlist<>& net_list, segment_inf, is_flat); - ConnectionRouter router( + SerialConnectionRouter router( device_ctx.grid, *router_lookahead, device_ctx.rr_graph.rr_nodes(), diff --git a/vpr/src/base/SetupVPR.cpp b/vpr/src/base/SetupVPR.cpp index e645b35e538..bf978995134 100644 --- a/vpr/src/base/SetupVPR.cpp +++ b/vpr/src/base/SetupVPR.cpp @@ -434,6 +434,12 @@ static void SetupRouterOpts(const t_options& Options, t_router_opts* RouterOpts) RouterOpts->astar_fac = Options.astar_fac; RouterOpts->astar_offset = Options.astar_offset; RouterOpts->router_profiler_astar_fac = Options.router_profiler_astar_fac; + RouterOpts->enable_parallel_connection_router = Options.enable_parallel_connection_router; + RouterOpts->post_target_prune_fac = Options.post_target_prune_fac; + RouterOpts->post_target_prune_offset = Options.post_target_prune_offset; + RouterOpts->multi_queue_num_threads = Options.multi_queue_num_threads; + RouterOpts->multi_queue_num_queues = Options.multi_queue_num_queues; + RouterOpts->multi_queue_direct_draining = Options.multi_queue_direct_draining; RouterOpts->bb_factor = Options.bb_factor; RouterOpts->criticality_exp = Options.criticality_exp; RouterOpts->max_criticality = Options.max_criticality; diff --git a/vpr/src/base/ShowSetup.cpp b/vpr/src/base/ShowSetup.cpp index f21200e97ee..0a326265ed8 100644 --- a/vpr/src/base/ShowSetup.cpp +++ b/vpr/src/base/ShowSetup.cpp @@ -379,6 +379,12 @@ static void ShowRouterOpts(const t_router_opts& RouterOpts) { VTR_LOG("RouterOpts.astar_fac: %f\n", RouterOpts.astar_fac); VTR_LOG("RouterOpts.astar_offset: %f\n", RouterOpts.astar_offset); VTR_LOG("RouterOpts.router_profiler_astar_fac: %f\n", RouterOpts.router_profiler_astar_fac); + VTR_LOG("RouterOpts.enable_parallel_connection_router: %s\n", RouterOpts.enable_parallel_connection_router ? "true" : "false"); + VTR_LOG("RouterOpts.post_target_prune_fac: %f\n", RouterOpts.post_target_prune_fac); + VTR_LOG("RouterOpts.post_target_prune_offset: %f\n", RouterOpts.post_target_prune_offset); + VTR_LOG("RouterOpts.multi_queue_num_threads: %d\n", RouterOpts.multi_queue_num_threads); + VTR_LOG("RouterOpts.multi_queue_num_queues: %d\n", RouterOpts.multi_queue_num_queues); + VTR_LOG("RouterOpts.multi_queue_direct_draining: %s\n", RouterOpts.multi_queue_direct_draining ? "true" : "false"); VTR_LOG("RouterOpts.criticality_exp: %f\n", RouterOpts.criticality_exp); VTR_LOG("RouterOpts.max_criticality: %f\n", RouterOpts.max_criticality); VTR_LOG("RouterOpts.init_wirelength_abort_threshold: %f\n", RouterOpts.init_wirelength_abort_threshold); diff --git a/vpr/src/base/read_options.cpp b/vpr/src/base/read_options.cpp index 4451dd720cd..7d10ff7e74b 100644 --- a/vpr/src/base/read_options.cpp +++ b/vpr/src/base/read_options.cpp @@ -2716,6 +2716,66 @@ argparse::ArgumentParser create_arg_parser(const std::string& prog_name, t_optio .default_value("1.2") .show_in(argparse::ShowIn::HELP_ONLY); + route_timing_grp.add_argument(args.enable_parallel_connection_router, "--enable_parallel_connection_router") + .help( + "Controls whether the MultiQueue-based parallel connection router is used during a single connection" + " routing. When enabled, the parallel connection router accelerates the path search for individual" + " source-sink connections using multi-threading without altering the net routing order.") + .default_value("off") + .show_in(argparse::ShowIn::HELP_ONLY); + + route_timing_grp.add_argument(args.post_target_prune_fac, "--post_target_prune_fac") + .help( + "Controls the post-target pruning heuristic calculation in the parallel connection router." + " This parameter is used as a multiplicative factor applied to the VPR heuristic" + " (not guaranteed to be admissible, i.e., might over-predict the cost to the sink)" + " to calculate the 'stopping heuristic' when pruning nodes after the target has been" + " reached. The 'stopping heuristic' must be admissible for the path search algorithm" + " to guarantee optimal paths and be deterministic. Values of this parameter are" + " architecture-specific and have to be empirically found." + " This parameter has no effect if --enable_parallel_connection_router is not set.") + .default_value("1.2") + .show_in(argparse::ShowIn::HELP_ONLY); + + route_timing_grp.add_argument(args.post_target_prune_offset, "--post_target_prune_offset") + .help( + "Controls the post-target pruning heuristic calculation in the parallel connection router." + " This parameter is used as a subtractive offset together with --post_target_prune_fac" + " to apply an affine transformation on the VPR heuristic to calculate the 'stopping" + " heuristic'. The 'stopping heuristic' must be admissible for the path search" + " algorithm to guarantee optimal paths and be deterministic. Values of this" + " parameter are architecture-specific and have to be empirically found." + " This parameter has no effect if --enable_parallel_connection_router is not set.") + .default_value("0.0") + .show_in(argparse::ShowIn::HELP_ONLY); + + route_timing_grp.add_argument(args.multi_queue_num_threads, "--multi_queue_num_threads") + .help( + "Controls the number of threads used by MultiQueue-based parallel connection router." + " If not explicitly specified, defaults to 1, implying the parallel connection router" + " works in 'serial' mode using only one main thread to route." + " This parameter has no effect if --enable_parallel_connection_router is not set.") + .default_value("1") + .show_in(argparse::ShowIn::HELP_ONLY); + + route_timing_grp.add_argument(args.multi_queue_num_queues, "--multi_queue_num_queues") + .help( + "Controls the number of queues used by MultiQueue in the parallel connection router." + " Must be set >= 2. A common configuration for this parameter is the number of threads" + " used by MultiQueue * 4 (the number of queues per thread)." + " This parameter has no effect if --enable_parallel_connection_router is not set.") + .default_value("2") + .show_in(argparse::ShowIn::HELP_ONLY); + + route_timing_grp.add_argument(args.multi_queue_direct_draining, "--multi_queue_direct_draining") + .help( + "Controls whether to enable queue draining optimization for MultiQueue-based parallel connection" + " router. When enabled, queues can be emptied quickly by draining all elements if no further" + " solutions need to be explored in the path search to guarantee optimality or determinism after" + " reaching the target. This parameter has no effect if --enable_parallel_connection_router is not set.") + .default_value("off") + .show_in(argparse::ShowIn::HELP_ONLY); + route_timing_grp.add_argument(args.max_criticality, "--max_criticality") .help( "Sets the maximum fraction of routing cost derived from delay (vs routability) for any net." diff --git a/vpr/src/base/read_options.h b/vpr/src/base/read_options.h index a71ba63428a..591478b7cfd 100644 --- a/vpr/src/base/read_options.h +++ b/vpr/src/base/read_options.h @@ -233,6 +233,12 @@ struct t_options { argparse::ArgValue astar_fac; argparse::ArgValue astar_offset; argparse::ArgValue router_profiler_astar_fac; + argparse::ArgValue enable_parallel_connection_router; + argparse::ArgValue post_target_prune_fac; + argparse::ArgValue post_target_prune_offset; + argparse::ArgValue multi_queue_num_threads; + argparse::ArgValue multi_queue_num_queues; + argparse::ArgValue multi_queue_direct_draining; argparse::ArgValue max_criticality; argparse::ArgValue criticality_exp; argparse::ArgValue router_init_wirelength_abort_threshold; diff --git a/vpr/src/base/vpr_types.h b/vpr/src/base/vpr_types.h index ddbcb59b08e..985e7a9f58f 100644 --- a/vpr/src/base/vpr_types.h +++ b/vpr/src/base/vpr_types.h @@ -1213,6 +1213,12 @@ struct t_router_opts { float astar_fac; float astar_offset; float router_profiler_astar_fac; + bool enable_parallel_connection_router; + float post_target_prune_fac; + float post_target_prune_offset; + int multi_queue_num_threads; + int multi_queue_num_queues; + bool multi_queue_direct_draining; float max_criticality; float criticality_exp; float init_wirelength_abort_threshold; diff --git a/vpr/src/route/DecompNetlistRouter.h b/vpr/src/route/DecompNetlistRouter.h index a41d656c240..e670bc5597d 100644 --- a/vpr/src/route/DecompNetlistRouter.h +++ b/vpr/src/route/DecompNetlistRouter.h @@ -85,11 +85,11 @@ class DecompNetlistRouter : public NetlistRouter { /** A single task to route nets inside a PartitionTree node and add tasks for its child nodes to task group \p g. */ void route_partition_tree_node(tbb::task_group& g, PartitionTreeNode& node); - ConnectionRouter _make_router(const RouterLookahead* router_lookahead, bool is_flat) { + SerialConnectionRouter _make_router(const RouterLookahead* router_lookahead, bool is_flat) { auto& device_ctx = g_vpr_ctx.device(); auto& route_ctx = g_vpr_ctx.mutable_routing(); - return ConnectionRouter( + return SerialConnectionRouter( device_ctx.grid, *router_lookahead, device_ctx.rr_graph.rr_nodes(), @@ -101,8 +101,8 @@ class DecompNetlistRouter : public NetlistRouter { } /* Context fields. Most of them will be forwarded to route_net (see route_net.tpp) */ - /** Per-thread storage for ConnectionRouters. */ - tbb::enumerable_thread_specific> _routers_th; + /** Per-thread storage for SerialConnectionRouter. */ + tbb::enumerable_thread_specific> _routers_th; const Netlist<>& _net_list; const t_router_opts& _router_opts; CBRR& _connections_inf; diff --git a/vpr/src/route/DecompNetlistRouter.tpp b/vpr/src/route/DecompNetlistRouter.tpp index 228cf428ef6..21d800ec0b3 100644 --- a/vpr/src/route/DecompNetlistRouter.tpp +++ b/vpr/src/route/DecompNetlistRouter.tpp @@ -204,12 +204,12 @@ void DecompNetlistRouter::route_partition_tree_node(tbb::task_group& g route_ctx.route_bb[net_id], false); if (!flags.success && !flags.retry_with_full_bb) { - /* Disconnected RRG and ConnectionRouter doesn't think growing the BB will work */ + /* Disconnected RRG and SerialConnectionRouter doesn't think growing the BB will work */ _results_th.local().is_routable = false; return; } if (flags.retry_with_full_bb) { - /* ConnectionRouter thinks we should grow the BB. Do that and leave this net unrouted for now */ + /*SerialConnectionRouter thinks we should grow the BB. Do that and leave this net unrouted for now */ route_ctx.route_bb[net_id] = full_device_bb(); _results_th.local().bb_updated_nets.push_back(net_id); /* Disable decomposition for nets like this: they're already problematic */ diff --git a/vpr/src/route/NestedNetlistRouter.h b/vpr/src/route/NestedNetlistRouter.h index 6870842af8f..e776d0a42da 100644 --- a/vpr/src/route/NestedNetlistRouter.h +++ b/vpr/src/route/NestedNetlistRouter.h @@ -4,6 +4,9 @@ #include "netlist_routers.h" #include "vtr_optional.h" #include "vtr_thread_pool.h" +#include "serial_connection_router.h" +#include "parallel_connection_router.h" +#include #include /* Add cmd line option for this later */ @@ -67,19 +70,38 @@ class NestedNetlistRouter : public NetlistRouter { /** Route all nets in a PartitionTree node and add its children to the task queue. */ void route_partition_tree_node(PartitionTreeNode& node); - ConnectionRouter _make_router(const RouterLookahead* router_lookahead, bool is_flat) { + std::unique_ptr _make_router(const RouterLookahead* router_lookahead, + const t_router_opts& router_opts, + bool is_flat) { auto& device_ctx = g_vpr_ctx.device(); auto& route_ctx = g_vpr_ctx.mutable_routing(); - return ConnectionRouter( - device_ctx.grid, - *router_lookahead, - device_ctx.rr_graph.rr_nodes(), - &device_ctx.rr_graph, - device_ctx.rr_rc_data, - device_ctx.rr_graph.rr_switch(), - route_ctx.rr_node_route_inf, - is_flat); + if (!router_opts.enable_parallel_connection_router) { + // Serial Connection Router + return std::make_unique>( + device_ctx.grid, + *router_lookahead, + device_ctx.rr_graph.rr_nodes(), + &device_ctx.rr_graph, + device_ctx.rr_rc_data, + device_ctx.rr_graph.rr_switch(), + route_ctx.rr_node_route_inf, + is_flat); + } else { + // Parallel Connection Router + return std::make_unique>( + device_ctx.grid, + *router_lookahead, + device_ctx.rr_graph.rr_nodes(), + &device_ctx.rr_graph, + device_ctx.rr_rc_data, + device_ctx.rr_graph.rr_switch(), + route_ctx.rr_node_route_inf, + is_flat, + router_opts.multi_queue_num_threads, + router_opts.multi_queue_num_queues, + router_opts.multi_queue_direct_draining); + } } /* Context fields. Most of them will be forwarded to route_net (see route_net.tpp) */ @@ -109,19 +131,19 @@ class NestedNetlistRouter : public NetlistRouter { /* Thread-local storage. * These are maps because thread::id is a random integer instead of 1, 2, ... */ - std::unordered_map> _routers_th; + std::unordered_map> _routers_th; std::unordered_map _results_th; std::mutex _storage_mutex; /** Get a thread-local ConnectionRouter. We lock the id->router lookup, but this is * accessed once per partition so the overhead should be small */ - ConnectionRouter& get_thread_router() { + ConnectionRouterInterface& get_thread_router() { auto id = std::this_thread::get_id(); std::lock_guard lock(_storage_mutex); if (!_routers_th.count(id)) { - _routers_th.emplace(id, _make_router(_router_lookahead, _is_flat)); + _routers_th.emplace(id, _make_router(_router_lookahead, _router_opts, _is_flat)); } - return _routers_th.at(id); + return *_routers_th.at(id); } RouteIterResults& get_thread_results() { diff --git a/vpr/src/route/NestedNetlistRouter.tpp b/vpr/src/route/NestedNetlistRouter.tpp index 333be28ea3b..ec4b1fe0aa6 100644 --- a/vpr/src/route/NestedNetlistRouter.tpp +++ b/vpr/src/route/NestedNetlistRouter.tpp @@ -66,10 +66,9 @@ void NestedNetlistRouter::route_partition_tree_node(PartitionTreeNode& /* Route all nets in this node serially */ for (auto net_id : nets) { auto& results = get_thread_results(); - auto& router = get_thread_router(); auto flags = route_net( - router, + get_thread_router(), _net_list, net_id, _itry, @@ -131,7 +130,7 @@ void NestedNetlistRouter::handle_bb_updated_nets(const std::vector void NestedNetlistRouter::set_rcv_enabled(bool x) { for (auto& [_, router] : _routers_th) { - router.set_rcv_enabled(x); + router->set_rcv_enabled(x); } } diff --git a/vpr/src/route/ParallelNetlistRouter.h b/vpr/src/route/ParallelNetlistRouter.h index e77fdf8344e..68b240321b2 100644 --- a/vpr/src/route/ParallelNetlistRouter.h +++ b/vpr/src/route/ParallelNetlistRouter.h @@ -15,7 +15,7 @@ #include /** Parallel impl for NetlistRouter. - * Holds enough context members to glue together ConnectionRouter and net routing functions, + * Holds enough context members to glue together SerialConnectionRouter and net routing functions, * such as \ref route_net. Keeps the members in thread-local storage where needed, * i.e. ConnectionRouters and RouteIterResults-es. * See \ref route_net. */ @@ -62,11 +62,11 @@ class ParallelNetlistRouter : public NetlistRouter { /** A single task to route nets inside a PartitionTree node and add tasks for its child nodes to task group \p g. */ void route_partition_tree_node(tbb::task_group& g, PartitionTreeNode& node); - ConnectionRouter _make_router(const RouterLookahead* router_lookahead, bool is_flat) { + SerialConnectionRouter _make_router(const RouterLookahead* router_lookahead, bool is_flat) { auto& device_ctx = g_vpr_ctx.device(); auto& route_ctx = g_vpr_ctx.mutable_routing(); - return ConnectionRouter( + return SerialConnectionRouter( device_ctx.grid, *router_lookahead, device_ctx.rr_graph.rr_nodes(), @@ -79,7 +79,7 @@ class ParallelNetlistRouter : public NetlistRouter { /* Context fields. Most of them will be forwarded to route_net (see route_net.tpp) */ /** Per-thread storage for ConnectionRouters. */ - tbb::enumerable_thread_specific> _routers_th; + tbb::enumerable_thread_specific> _routers_th; const Netlist<>& _net_list; const t_router_opts& _router_opts; CBRR& _connections_inf; diff --git a/vpr/src/route/ParallelNetlistRouter.tpp b/vpr/src/route/ParallelNetlistRouter.tpp index c845be8518d..dfdbac0cc29 100644 --- a/vpr/src/route/ParallelNetlistRouter.tpp +++ b/vpr/src/route/ParallelNetlistRouter.tpp @@ -79,12 +79,12 @@ void ParallelNetlistRouter::route_partition_tree_node(tbb::task_group& route_ctx.route_bb[net_id]); if (!flags.success && !flags.retry_with_full_bb) { - /* Disconnected RRG and ConnectionRouter doesn't think growing the BB will work */ + /* Disconnected RRG and SerialConnectionRouter doesn't think growing the BB will work */ _results_th.local().is_routable = false; return; } if (flags.retry_with_full_bb) { - /* ConnectionRouter thinks we should grow the BB. Do that and leave this net unrouted for now */ + /* SerialConnectionRouter thinks we should grow the BB. Do that and leave this net unrouted for now */ route_ctx.route_bb[net_id] = full_device_bb(); _results_th.local().bb_updated_nets.push_back(net_id); continue; diff --git a/vpr/src/route/SerialNetlistRouter.h b/vpr/src/route/SerialNetlistRouter.h index 352de125b68..d56414d00af 100644 --- a/vpr/src/route/SerialNetlistRouter.h +++ b/vpr/src/route/SerialNetlistRouter.h @@ -3,6 +3,8 @@ /** @file Serial case for \ref NetlistRouter: just loop through nets */ #include "netlist_routers.h" +#include "serial_connection_router.h" +#include "parallel_connection_router.h" template class SerialNetlistRouter : public NetlistRouter { @@ -20,7 +22,7 @@ class SerialNetlistRouter : public NetlistRouter { const RoutingPredictor& routing_predictor, const vtr::vector>>& choking_spots, bool is_flat) - : _router(_make_router(router_lookahead, is_flat)) + : _router(_make_router(router_lookahead, router_opts, is_flat)) , _net_list(net_list) , _router_opts(router_opts) , _connections_inf(connections_inf) @@ -40,22 +42,41 @@ class SerialNetlistRouter : public NetlistRouter { void set_timing_info(std::shared_ptr timing_info); private: - ConnectionRouter _make_router(const RouterLookahead* router_lookahead, bool is_flat) { + std::unique_ptr _make_router(const RouterLookahead* router_lookahead, + const t_router_opts& router_opts, + bool is_flat) { auto& device_ctx = g_vpr_ctx.device(); auto& route_ctx = g_vpr_ctx.mutable_routing(); - return ConnectionRouter( - device_ctx.grid, - *router_lookahead, - device_ctx.rr_graph.rr_nodes(), - &device_ctx.rr_graph, - device_ctx.rr_rc_data, - device_ctx.rr_graph.rr_switch(), - route_ctx.rr_node_route_inf, - is_flat); + if (!router_opts.enable_parallel_connection_router) { + // Serial Connection Router + return std::make_unique>( + device_ctx.grid, + *router_lookahead, + device_ctx.rr_graph.rr_nodes(), + &device_ctx.rr_graph, + device_ctx.rr_rc_data, + device_ctx.rr_graph.rr_switch(), + route_ctx.rr_node_route_inf, + is_flat); + } else { + // Parallel Connection Router + return std::make_unique>( + device_ctx.grid, + *router_lookahead, + device_ctx.rr_graph.rr_nodes(), + &device_ctx.rr_graph, + device_ctx.rr_rc_data, + device_ctx.rr_graph.rr_switch(), + route_ctx.rr_node_route_inf, + is_flat, + router_opts.multi_queue_num_threads, + router_opts.multi_queue_num_queues, + router_opts.multi_queue_direct_draining); + } } /* Context fields */ - ConnectionRouter _router; + std::unique_ptr _router; const Netlist<>& _net_list; const t_router_opts& _router_opts; CBRR& _connections_inf; diff --git a/vpr/src/route/SerialNetlistRouter.tpp b/vpr/src/route/SerialNetlistRouter.tpp index 63497d7d394..b84acfbd58f 100644 --- a/vpr/src/route/SerialNetlistRouter.tpp +++ b/vpr/src/route/SerialNetlistRouter.tpp @@ -22,7 +22,7 @@ inline RouteIterResults SerialNetlistRouter::route_netlist(int itry, f for (size_t inet = 0; inet < sorted_nets.size(); inet++) { ParentNetId net_id = sorted_nets[inet]; NetResultFlags flags = route_net( - _router, + *_router, _net_list, net_id, itry, @@ -42,7 +42,7 @@ inline RouteIterResults SerialNetlistRouter::route_netlist(int itry, f route_ctx.route_bb[net_id]); if (!flags.success && !flags.retry_with_full_bb) { - /* Disconnected RRG and ConnectionRouter doesn't think growing the BB will work */ + /* Disconnected RRG and SerialConnectionRouter doesn't think growing the BB will work */ out.is_routable = false; return out; } @@ -74,7 +74,7 @@ void SerialNetlistRouter::handle_bb_updated_nets(const std::vector void SerialNetlistRouter::set_rcv_enabled(bool x) { - _router.set_rcv_enabled(x); + _router->set_rcv_enabled(x); } template diff --git a/vpr/src/route/connection_router.cpp b/vpr/src/route/connection_router.cpp deleted file mode 100644 index ee80073c3c6..00000000000 --- a/vpr/src/route/connection_router.cpp +++ /dev/null @@ -1,1121 +0,0 @@ -#include "connection_router.h" - -#include -#include "rr_graph.h" -#include "rr_graph_fwd.h" - -/** Used for the flat router. The node isn't relevant to the target if - * it is an intra-block node outside of our target block */ -static bool relevant_node_to_target(const RRGraphView* rr_graph, - RRNodeId node_to_add, - RRNodeId target_node); - -static void update_router_stats(RouterStats* router_stats, - bool is_push, - RRNodeId rr_node_id, - const RRGraphView* rr_graph); - -/** return tuple */ -template -std::tuple ConnectionRouter::timing_driven_route_connection_from_route_tree( - const RouteTreeNode& rt_root, - RRNodeId sink_node, - const t_conn_cost_params& cost_params, - const t_bb& bounding_box, - RouterStats& router_stats, - const ConnectionParameters& conn_params) { - router_stats_ = &router_stats; - conn_params_ = &conn_params; - - bool retry = false; - retry = timing_driven_route_connection_common_setup(rt_root, sink_node, cost_params, bounding_box); - - if (!std::isinf(rr_node_route_inf_[sink_node].path_cost)) { - // Only the `index`, `prev_edge`, and `rcv_path_backward_delay` fields of `out` - // are used after this function returns. - RTExploredNode out; - out.index = sink_node; - out.prev_edge = rr_node_route_inf_[sink_node].prev_edge; - if (rcv_path_manager.is_enabled()) { - out.rcv_path_backward_delay = rcv_path_data[sink_node]->backward_delay; - rcv_path_manager.update_route_tree_set(rcv_path_data[sink_node]); - rcv_path_manager.empty_heap(); - } - heap_.empty_heap(); - return std::make_tuple(true, /*retry=*/false, out); - } else { - reset_path_costs(); - clear_modified_rr_node_info(); - heap_.empty_heap(); - rcv_path_manager.empty_heap(); - return std::make_tuple(false, retry, RTExploredNode()); - } -} - -/** Return whether to retry with full bb */ -template -bool ConnectionRouter::timing_driven_route_connection_common_setup( - const RouteTreeNode& rt_root, - RRNodeId sink_node, - const t_conn_cost_params& cost_params, - const t_bb& bounding_box) { - //Re-add route nodes from the existing route tree to the heap. - //They need to be repushed onto the heap since each node's cost is target specific. - - add_route_tree_to_heap(rt_root, sink_node, cost_params, bounding_box); - heap_.build_heap(); // via sifting down everything - - RRNodeId source_node = rt_root.inode; - - if (heap_.is_empty_heap()) { - VTR_LOG("No source in route tree: %s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str()); - return false; - } - - VTR_LOGV_DEBUG(router_debug_, " Routing to %d as normal net (BB: %d,%d,%d x %d,%d,%d)\n", sink_node, - bounding_box.layer_min, bounding_box.xmin, bounding_box.ymin, - bounding_box.layer_max, bounding_box.xmax, bounding_box.ymax); - - timing_driven_route_connection_from_heap(sink_node, - cost_params, - bounding_box); - - if (std::isinf(rr_node_route_inf_[sink_node].path_cost)) { - // No path found within the current bounding box. - // - // If the bounding box is already max size, just fail - if (bounding_box.xmin == 0 - && bounding_box.ymin == 0 - && bounding_box.xmax == (int)(grid_.width() - 1) - && bounding_box.ymax == (int)(grid_.height() - 1) - && bounding_box.layer_min == 0 - && bounding_box.layer_max == (int)(grid_.get_num_layers() - 1)) { - VTR_LOG("%s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str()); - return false; - } - - // Otherwise, leave unrouted and bubble up a signal to retry this net with a full-device bounding box - VTR_LOG_WARN("No routing path for connection to sink_rr %d, leaving unrouted to retry later\n", sink_node); - return true; - } - - return false; -} - -// Finds a path from the route tree rooted at rt_root to sink_node for a high fanout net. -// -// Unlike timing_driven_route_connection_from_route_tree(), only part of the route tree -// which is spatially close to the sink is added to the heap. -// Returns a tuple of */ -template -std::tuple ConnectionRouter::timing_driven_route_connection_from_route_tree_high_fanout( - const RouteTreeNode& rt_root, - RRNodeId sink_node, - const t_conn_cost_params& cost_params, - const t_bb& net_bounding_box, - const SpatialRouteTreeLookup& spatial_rt_lookup, - RouterStats& router_stats, - const ConnectionParameters& conn_params) { - router_stats_ = &router_stats; - conn_params_ = &conn_params; - - // re-explore route tree from root to add any new nodes (buildheap afterwards) - // route tree needs to be repushed onto the heap since each node's cost is target specific - t_bb high_fanout_bb = add_high_fanout_route_tree_to_heap(rt_root, sink_node, cost_params, spatial_rt_lookup, net_bounding_box); - heap_.build_heap(); - - RRNodeId source_node = rt_root.inode; - - if (heap_.is_empty_heap()) { - VTR_LOG("No source in route tree: %s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str()); - return std::make_tuple(false, false, RTExploredNode()); - } - - VTR_LOGV_DEBUG(router_debug_, " Routing to %d as high fanout net (BB: %d,%d,%d x %d,%d,%d)\n", sink_node, - high_fanout_bb.layer_min, high_fanout_bb.xmin, high_fanout_bb.ymin, - high_fanout_bb.layer_max, high_fanout_bb.xmax, high_fanout_bb.ymax); - - bool retry_with_full_bb = false; - timing_driven_route_connection_from_heap(sink_node, - cost_params, - high_fanout_bb); - - if (std::isinf(rr_node_route_inf_[sink_node].path_cost)) { - //Found no path, that may be due to an unlucky choice of existing route tree sub-set, - //try again with the full route tree to be sure this is not an artifact of high-fanout routing - VTR_LOG_WARN("No routing path found in high-fanout mode for net %zu connection (to sink_rr %d), retrying with full route tree\n", size_t(conn_params.net_id_), sink_node); - - //Reset any previously recorded node costs so timing_driven_route_connection() - //starts over from scratch. - reset_path_costs(); - clear_modified_rr_node_info(); - - retry_with_full_bb = timing_driven_route_connection_common_setup(rt_root, - sink_node, - cost_params, - net_bounding_box); - } - - if (std::isinf(rr_node_route_inf_[sink_node].path_cost)) { - VTR_LOG("%s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str()); - - heap_.empty_heap(); - rcv_path_manager.empty_heap(); - return std::make_tuple(false, retry_with_full_bb, RTExploredNode()); - } - - RTExploredNode out; - out.index = sink_node; - out.prev_edge = rr_node_route_inf_[sink_node].prev_edge; - if (rcv_path_manager.is_enabled()) { - out.rcv_path_backward_delay = rcv_path_data[sink_node]->backward_delay; - rcv_path_manager.update_route_tree_set(rcv_path_data[sink_node]); - rcv_path_manager.empty_heap(); - } - heap_.empty_heap(); - - return std::make_tuple(true, retry_with_full_bb, out); -} - -// Finds a path to sink_node, starting from the elements currently in the heap. -// This is the core maze routing routine. -template -void ConnectionRouter::timing_driven_route_connection_from_heap(RRNodeId sink_node, - const t_conn_cost_params& cost_params, - const t_bb& bounding_box) { - VTR_ASSERT_SAFE(heap_.is_valid()); - - if (heap_.is_empty_heap()) { //No source - VTR_LOGV_DEBUG(router_debug_, " Initial heap empty (no source)\n"); - } - - const auto& device_ctx = g_vpr_ctx.device(); - auto& route_ctx = g_vpr_ctx.mutable_routing(); - - // Get bounding box for sink node used in timing_driven_expand_neighbour - VTR_ASSERT_SAFE(sink_node != RRNodeId::INVALID()); - - t_bb target_bb; - if (rr_graph_->node_type(sink_node) == SINK) { // We need to get a bounding box for the sink's entire tile - vtr::Rect tile_bb = grid_.get_tile_bb({rr_graph_->node_xlow(sink_node), - rr_graph_->node_ylow(sink_node), - rr_graph_->node_layer(sink_node)}); - - target_bb.xmin = tile_bb.xmin(); - target_bb.ymin = tile_bb.ymin(); - target_bb.xmax = tile_bb.xmax(); - target_bb.ymax = tile_bb.ymax(); - } else { - target_bb.xmin = rr_graph_->node_xlow(sink_node); - target_bb.ymin = rr_graph_->node_ylow(sink_node); - target_bb.xmax = rr_graph_->node_xhigh(sink_node); - target_bb.ymax = rr_graph_->node_yhigh(sink_node); - } - - target_bb.layer_min = rr_graph_->node_layer(RRNodeId(sink_node)); - target_bb.layer_max = rr_graph_->node_layer(RRNodeId(sink_node)); - - // Start measuring path search time - std::chrono::steady_clock::time_point begin_time = std::chrono::steady_clock::now(); - - HeapNode cheapest; - while (heap_.try_pop(cheapest)) { - // inode with the cheapest total cost in current route tree to be expanded on - const auto& [new_total_cost, inode] = cheapest; - update_router_stats(router_stats_, - /*is_push=*/false, - inode, - rr_graph_); - - VTR_LOGV_DEBUG(router_debug_, " Popping node %d (cost: %g)\n", - inode, new_total_cost); - - // Have we found the target? - if (inode == sink_node) { - // If we're running RCV, the path will be stored in the path_data->path_rr vector - // This is then placed into the traceback so that the correct path is returned - // TODO: This can be eliminated by modifying the actual traceback function in route_timing - if (rcv_path_manager.is_enabled()) { - rcv_path_manager.insert_backwards_path_into_traceback(rcv_path_data[inode], - rr_node_route_inf_[inode].path_cost, - rr_node_route_inf_[inode].backward_path_cost, - route_ctx); - } - VTR_LOGV_DEBUG(router_debug_, " Found target %8d (%s)\n", inode, describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, inode, is_flat_).c_str()); - break; - } - - // If not, keep searching - timing_driven_expand_cheapest(inode, - new_total_cost, - sink_node, - cost_params, - bounding_box, - target_bb); - } - - // Stop measuring path search time - std::chrono::steady_clock::time_point end_time = std::chrono::steady_clock::now(); - path_search_cumulative_time += std::chrono::duration_cast(end_time - begin_time); -} - -// Find shortest paths from specified route tree to all nodes in the RR graph -template -vtr::vector ConnectionRouter::timing_driven_find_all_shortest_paths_from_route_tree( - const RouteTreeNode& rt_root, - const t_conn_cost_params& cost_params, - const t_bb& bounding_box, - RouterStats& router_stats, - const ConnectionParameters& conn_params) { - router_stats_ = &router_stats; - conn_params_ = &conn_params; - - // Add the route tree to the heap with no specific target node - RRNodeId target_node = RRNodeId::INVALID(); - add_route_tree_to_heap(rt_root, target_node, cost_params, bounding_box); - heap_.build_heap(); // via sifting down everything - - auto res = timing_driven_find_all_shortest_paths_from_heap(cost_params, bounding_box); - heap_.empty_heap(); - - return res; -} - -// Find shortest paths from current heap to all nodes in the RR graph -// -// Since there is no single *target* node this uses Dijkstra's algorithm -// with a modified exit condition (runs until heap is empty). -template -vtr::vector ConnectionRouter::timing_driven_find_all_shortest_paths_from_heap( - const t_conn_cost_params& cost_params, - const t_bb& bounding_box) { - vtr::vector cheapest_paths(rr_nodes_.size()); - - VTR_ASSERT_SAFE(heap_.is_valid()); - - if (heap_.is_empty_heap()) { // No source - VTR_LOGV_DEBUG(router_debug_, " Initial heap empty (no source)\n"); - } - - // Start measuring path search time - std::chrono::steady_clock::time_point begin_time = std::chrono::steady_clock::now(); - - HeapNode cheapest; - while (heap_.try_pop(cheapest)) { - // inode with the cheapest total cost in current route tree to be expanded on - const auto& [new_total_cost, inode] = cheapest; - update_router_stats(router_stats_, - /*is_push=*/false, - inode, - rr_graph_); - - VTR_LOGV_DEBUG(router_debug_, " Popping node %d (cost: %g)\n", - inode, new_total_cost); - - // Since we want to find shortest paths to all nodes in the graph - // we do not specify a target node. - // - // By setting the target_node to INVALID in combination with the NoOp router - // lookahead we can re-use the node exploration code from the regular router - RRNodeId target_node = RRNodeId::INVALID(); - - timing_driven_expand_cheapest(inode, - new_total_cost, - target_node, - cost_params, - bounding_box, - t_bb()); - - if (cheapest_paths[inode].index == RRNodeId::INVALID() || cheapest_paths[inode].total_cost >= new_total_cost) { - VTR_LOGV_DEBUG(router_debug_, " Better cost to node %d: %g (was %g)\n", inode, new_total_cost, cheapest_paths[inode].total_cost); - // Only the `index` and `prev_edge` fields of `cheapest_paths[inode]` are used after this function returns - cheapest_paths[inode].index = inode; - cheapest_paths[inode].prev_edge = rr_node_route_inf_[inode].prev_edge; - } else { - VTR_LOGV_DEBUG(router_debug_, " Worse cost to node %d: %g (better %g)\n", inode, new_total_cost, cheapest_paths[inode].total_cost); - } - } - - // Stop measuring path search time - std::chrono::steady_clock::time_point end_time = std::chrono::steady_clock::now(); - path_search_cumulative_time += std::chrono::duration_cast(end_time - begin_time); - - return cheapest_paths; -} - -template -void ConnectionRouter::timing_driven_expand_cheapest(RRNodeId from_node, - float new_total_cost, - RRNodeId target_node, - const t_conn_cost_params& cost_params, - const t_bb& bounding_box, - const t_bb& target_bb) { - float best_total_cost = rr_node_route_inf_[from_node].path_cost; - if (best_total_cost == new_total_cost) { - // Explore from this node, since its total cost is exactly the same as - // the best total cost ever seen for this node. Otherwise, prune this node - // to reduce redundant work (i.e., unnecessary neighbor exploration). - // `new_total_cost` is used here as an identifier to detect if the pair - // (from_node or inode, new_total_cost) was the most recently pushed - // element for the corresponding node. - // - // Note: For RCV, it often isn't searching for a shortest path; it is - // searching for a path in the target delay range. So it might find a - // path to node n that has a higher `backward_path_cost` but the `total_cost` - // (including expected delay to sink, going through a cost function that - // checks that against the target delay) might be lower than the previously - // stored value. In that case we want to re-expand the node so long as - // it doesn't create a loop. That `rcv_path_manager` should store enough - // info for us to avoid loops. - RTExploredNode current; - current.index = from_node; - current.backward_path_cost = rr_node_route_inf_[from_node].backward_path_cost; - current.prev_edge = rr_node_route_inf_[from_node].prev_edge; - current.R_upstream = rr_node_route_inf_[from_node].R_upstream; - - VTR_LOGV_DEBUG(router_debug_, " Better cost to %d\n", from_node); - VTR_LOGV_DEBUG(router_debug_, " New total cost: %g\n", new_total_cost); - VTR_LOGV_DEBUG(router_debug_ && (current.prev_edge != RREdgeId::INVALID()), - " Setting path costs for associated node %d (from %d edge %zu)\n", - from_node, - static_cast(rr_graph_->edge_src_node(current.prev_edge)), - static_cast(current.prev_edge)); - - timing_driven_expand_neighbours(current, cost_params, bounding_box, target_node, target_bb); - } else { - // Post-heap prune, do not re-explore from the current/new partial path as it - // has worse cost than the best partial path to this node found so far - VTR_LOGV_DEBUG(router_debug_, " Worse cost to %d\n", from_node); - VTR_LOGV_DEBUG(router_debug_, " Old total cost: %g\n", best_total_cost); - VTR_LOGV_DEBUG(router_debug_, " New total cost: %g\n", new_total_cost); - } -} - -template -void ConnectionRouter::timing_driven_expand_neighbours(const RTExploredNode& current, - const t_conn_cost_params& cost_params, - const t_bb& bounding_box, - RRNodeId target_node, - const t_bb& target_bb) { - /* Puts all the rr_nodes adjacent to current on the heap. */ - - // For each node associated with the current heap element, expand all of it's neighbors - auto edges = rr_nodes_.edge_range(current.index); - - // This is a simple prefetch that prefetches: - // - RR node data reachable from this node - // - rr switch data to reach those nodes from this node. - // - // This code will be a NOP on compiler targets that do not have a - // builtin to emit prefetch instructions. - // - // This code will be a NOP on CPU targets that lack prefetch instructions. - // All modern x86 and ARM64 platforms provide prefetch instructions. - // - // This code delivers ~6-8% reduction in wallclock time when running Titan - // benchmarks, and was specifically measured against the gsm_switch and - // directrf vtr_reg_weekly running in high effort. - // - // - directrf_stratixiv_arch_timing.blif - // - gsm_switch_stratixiv_arch_timing.blif - // - for (RREdgeId from_edge : edges) { - RRNodeId to_node = rr_nodes_.edge_sink_node(from_edge); - rr_nodes_.prefetch_node(to_node); - - int switch_idx = rr_nodes_.edge_switch(from_edge); - VTR_PREFETCH(&rr_switch_inf_[switch_idx], 0, 0); - } - - for (RREdgeId from_edge : edges) { - RRNodeId to_node = rr_nodes_.edge_sink_node(from_edge); - timing_driven_expand_neighbour(current, - from_edge, - to_node, - cost_params, - bounding_box, - target_node, - target_bb); - } -} - -// Conditionally adds to_node to the router heap (via path from from_node via from_edge). -// RR nodes outside the expanded bounding box specified in bounding_box are not added -// to the heap. -template -void ConnectionRouter::timing_driven_expand_neighbour(const RTExploredNode& current, - RREdgeId from_edge, - RRNodeId to_node, - const t_conn_cost_params& cost_params, - const t_bb& bounding_box, - RRNodeId target_node, - const t_bb& target_bb) { - VTR_ASSERT(bounding_box.layer_max < g_vpr_ctx.device().grid.get_num_layers()); - - const RRNodeId& from_node = current.index; - - // BB-pruning - // Disable BB-pruning if RCV is enabled, as this can make it harder for circuits with high negative hold slack to resolve this - // TODO: Only disable pruning if the net has negative hold slack, maybe go off budgets - if (!inside_bb(to_node, bounding_box) - && !rcv_path_manager.is_enabled()) { - VTR_LOGV_DEBUG(router_debug_, - " Pruned expansion of node %d edge %zu -> %d" - " (to node location %d,%d,%d x %d,%d,%d outside of expanded" - " net bounding box %d,%d,%d x %d,%d,%d)\n", - from_node, size_t(from_edge), size_t(to_node), - rr_graph_->node_xlow(to_node), rr_graph_->node_ylow(to_node), rr_graph_->node_layer(to_node), - rr_graph_->node_xhigh(to_node), rr_graph_->node_yhigh(to_node), rr_graph_->node_layer(to_node), - bounding_box.xmin, bounding_box.ymin, bounding_box.layer_min, - bounding_box.xmax, bounding_box.ymax, bounding_box.layer_max); - return; /* Node is outside (expanded) bounding box. */ - } - - /* Prune away IPINs that lead to blocks other than the target one. Avoids * - * the issue of how to cost them properly so they don't get expanded before * - * more promising routes, but makes route-through (via CLBs) impossible. * - * Change this if you want to investigate route-throughs. */ - if (target_node != RRNodeId::INVALID()) { - t_rr_type to_type = rr_graph_->node_type(to_node); - if (to_type == IPIN) { - // Check if this IPIN leads to the target block - // IPIN's of the target block should be contained within it's bounding box - int to_xlow = rr_graph_->node_xlow(to_node); - int to_ylow = rr_graph_->node_ylow(to_node); - int to_layer = rr_graph_->node_layer(to_node); - int to_xhigh = rr_graph_->node_xhigh(to_node); - int to_yhigh = rr_graph_->node_yhigh(to_node); - if (to_xlow < target_bb.xmin - || to_ylow < target_bb.ymin - || to_xhigh > target_bb.xmax - || to_yhigh > target_bb.ymax - || to_layer < target_bb.layer_min - || to_layer > target_bb.layer_max) { - VTR_LOGV_DEBUG(router_debug_, - " Pruned expansion of node %d edge %zu -> %d" - " (to node is IPIN at %d,%d,%d x %d,%d,%d which does not" - " lead to target block %d,%d,%d x %d,%d,%d)\n", - from_node, size_t(from_edge), size_t(to_node), - to_xlow, to_ylow, to_layer, - to_xhigh, to_yhigh, to_layer, - target_bb.xmin, target_bb.ymin, target_bb.layer_min, - target_bb.xmax, target_bb.ymax, target_bb.layer_max); - return; - } - } - } - - VTR_LOGV_DEBUG(router_debug_, " Expanding node %d edge %zu -> %d\n", - from_node, size_t(from_edge), size_t(to_node)); - - // Check if the node exists in the route tree when RCV is enabled - // Other pruning methods have been disabled when RCV is on, so this method is required to prevent "loops" from being created - bool node_exists = false; - if (rcv_path_manager.is_enabled()) { - node_exists = rcv_path_manager.node_exists_in_tree(rcv_path_data[from_node], - to_node); - } - - if (!node_exists || !rcv_path_manager.is_enabled()) { - timing_driven_add_to_heap(cost_params, - current, - to_node, - from_edge, - target_node); - } -} - -// Add to_node to the heap, and also add any nodes which are connected by non-configurable edges -template -void ConnectionRouter::timing_driven_add_to_heap(const t_conn_cost_params& cost_params, - const RTExploredNode& current, - RRNodeId to_node, - const RREdgeId from_edge, - RRNodeId target_node) { - const auto& device_ctx = g_vpr_ctx.device(); - const RRNodeId& from_node = current.index; - - // Initialized to current - RTExploredNode next; - next.R_upstream = current.R_upstream; - next.index = to_node; - next.prev_edge = from_edge; - next.total_cost = std::numeric_limits::infinity(); // Not used directly - next.backward_path_cost = current.backward_path_cost; - - // Initalize RCV data struct if needed, otherwise it's set to nullptr - rcv_path_manager.alloc_path_struct(next.path_data); - // path_data variables are initialized to current values - if (rcv_path_manager.is_enabled() && rcv_path_data[from_node]) { - next.path_data->backward_cong = rcv_path_data[from_node]->backward_cong; - next.path_data->backward_delay = rcv_path_data[from_node]->backward_delay; - } - - evaluate_timing_driven_node_costs(&next, - cost_params, - from_node, - target_node); - - float best_total_cost = rr_node_route_inf_[to_node].path_cost; - float best_back_cost = rr_node_route_inf_[to_node].backward_path_cost; - - float new_total_cost = next.total_cost; - float new_back_cost = next.backward_path_cost; - - // We need to only expand this node if it is a better path. And we need to - // update its `rr_node_route_inf` data as we put it into the heap; there may - // be other (previously explored) paths to this node in the heap already, - // but they will be pruned when we pop those heap nodes later as we'll see - // they have inferior costs to what is in the `rr_node_route_inf` data for - // this node. - // FIXME: Adding a link to the FPT paper when it is public - // - // When RCV is enabled, prune based on the RCV-specific total path cost (see - // in `compute_node_cost_using_rcv` in `evaluate_timing_driven_node_costs`) - // to allow detours to get better QoR. - if ((!rcv_path_manager.is_enabled() && best_back_cost > new_back_cost) || (rcv_path_manager.is_enabled() && best_total_cost > new_total_cost)) { - VTR_LOGV_DEBUG(router_debug_, " Expanding to node %d (%s)\n", to_node, - describe_rr_node(device_ctx.rr_graph, - device_ctx.grid, - device_ctx.rr_indexed_data, - to_node, - is_flat_) - .c_str()); - VTR_LOGV_DEBUG(router_debug_, " New Total Cost %g New back Cost %g\n", new_total_cost, new_back_cost); - //Add node to the heap only if the cost via the current partial path is less than the - //best known cost, since there is no reason for the router to expand more expensive paths. - // - //Pre-heap prune to keep the heap small, by not putting paths which are known to be - //sub-optimal (at this point in time) into the heap. - - update_cheapest(next, from_node); - - heap_.add_to_heap({new_total_cost, to_node}); - update_router_stats(router_stats_, - /*is_push=*/true, - to_node, - rr_graph_); - - } else { - VTR_LOGV_DEBUG(router_debug_, " Didn't expand to %d (%s)\n", to_node, describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, to_node, is_flat_).c_str()); - VTR_LOGV_DEBUG(router_debug_, " Prev Total Cost %g Prev back Cost %g \n", best_total_cost, best_back_cost); - VTR_LOGV_DEBUG(router_debug_, " New Total Cost %g New back Cost %g \n", new_total_cost, new_back_cost); - } - - if (rcv_path_manager.is_enabled() && next.path_data != nullptr) { - rcv_path_manager.free_path_struct(next.path_data); - } -} - -#ifdef VTR_ASSERT_SAFE_ENABLED - -//Returns true if both nodes are part of the same non-configurable edge set -static bool same_non_config_node_set(RRNodeId from_node, RRNodeId to_node) { - auto& device_ctx = g_vpr_ctx.device(); - - auto from_itr = device_ctx.rr_node_to_non_config_node_set.find(from_node); - auto to_itr = device_ctx.rr_node_to_non_config_node_set.find(to_node); - - if (from_itr == device_ctx.rr_node_to_non_config_node_set.end() - || to_itr == device_ctx.rr_node_to_non_config_node_set.end()) { - return false; //Not part of a non-config node set - } - - return from_itr->second == to_itr->second; //Check for same non-config set IDs -} - -#endif - -template -float ConnectionRouter::compute_node_cost_using_rcv(const t_conn_cost_params cost_params, - RRNodeId to_node, - RRNodeId target_node, - float backwards_delay, - float backwards_cong, - float R_upstream) { - float expected_delay; - float expected_cong; - - const t_conn_delay_budget* delay_budget = cost_params.delay_budget; - // TODO: This function is not tested for is_flat == true - VTR_ASSERT(is_flat_ != true); - std::tie(expected_delay, expected_cong) = router_lookahead_.get_expected_delay_and_cong(to_node, target_node, cost_params, R_upstream); - - float expected_total_delay_cost; - float expected_total_cong_cost; - - float expected_total_cong = expected_cong + backwards_cong; - float expected_total_delay = expected_delay + backwards_delay; - - //If budgets specified calculate cost as described by RCV paper: - // R. Fung, V. Betz and W. Chow, "Slack Allocation and Routing to Improve FPGA Timing While - // Repairing Short-Path Violations," in IEEE Transactions on Computer-Aided Design of - // Integrated Circuits and Systems, vol. 27, no. 4, pp. 686-697, April 2008. - - // Normalization constant defined in RCV paper cited above - constexpr float NORMALIZATION_CONSTANT = 100e-12; - - expected_total_delay_cost = expected_total_delay; - expected_total_delay_cost += (delay_budget->short_path_criticality + cost_params.criticality) * std::max(0.f, delay_budget->target_delay - expected_total_delay); - // expected_total_delay_cost += std::pow(std::max(0.f, expected_total_delay - delay_budget->max_delay), 2) / NORMALIZATION_CONSTANT; - expected_total_delay_cost += std::pow(std::max(0.f, delay_budget->min_delay - expected_total_delay), 2) / NORMALIZATION_CONSTANT; - expected_total_cong_cost = expected_total_cong; - - float total_cost = expected_total_delay_cost + expected_total_cong_cost; - - return total_cost; -} - -// Empty the route tree set node, use this after each net is routed -template -void ConnectionRouter::empty_rcv_route_tree_set() { - rcv_path_manager.empty_route_tree_nodes(); -} - -// Enable or disable RCV -template -void ConnectionRouter::set_rcv_enabled(bool enable) { - rcv_path_manager.set_enabled(enable); - if (enable) { - rcv_path_data.resize(rr_node_route_inf_.size()); - } -} - -//Calculates the cost of reaching to_node (i.e., to->index) -template -void ConnectionRouter::evaluate_timing_driven_node_costs(RTExploredNode* to, - const t_conn_cost_params& cost_params, - RRNodeId from_node, - RRNodeId target_node) { - /* new_costs.backward_cost: is the "known" part of the cost to this node -- the - * congestion cost of all the routing resources back to the existing route - * plus the known delay of the total path back to the source. - * - * new_costs.total_cost: is this "known" backward cost + an expected cost to get to the target. - * - * new_costs.R_upstream: is the upstream resistance at the end of this node - */ - - //Info for the switch connecting from_node to_node (i.e., to->index) - int iswitch = rr_nodes_.edge_switch(to->prev_edge); - bool switch_buffered = rr_switch_inf_[iswitch].buffered(); - bool reached_configurably = rr_switch_inf_[iswitch].configurable(); - float switch_R = rr_switch_inf_[iswitch].R; - float switch_Tdel = rr_switch_inf_[iswitch].Tdel; - float switch_Cinternal = rr_switch_inf_[iswitch].Cinternal; - - //To node info - auto rc_index = rr_graph_->node_rc_index(to->index); - float node_C = rr_rc_data_[rc_index].C; - float node_R = rr_rc_data_[rc_index].R; - - //From node info - float from_node_R = rr_rc_data_[rr_graph_->node_rc_index(from_node)].R; - - //Update R_upstream - if (switch_buffered) { - to->R_upstream = 0.; //No upstream resistance - } else { - //R_Upstream already initialized - } - - to->R_upstream += switch_R; //Switch resistance - to->R_upstream += node_R; //Node resistance - - //Calculate delay - float Rdel = to->R_upstream - 0.5 * node_R; //Only consider half node's resistance for delay - float Tdel = switch_Tdel + Rdel * node_C; - - //Depending on the switch used, the Tdel of the upstream node (from_node) may change due to - //increased loading from the switch's internal capacitance. - // - //Even though this delay physically affects from_node, we make the adjustment (now) on the to_node, - //since only once we've reached to to_node do we know the connection used (and the switch enabled). - // - //To adjust for the time delay, we compute the product of the Rdel associated with from_node and - //the internal capacitance of the switch. - // - //First, we will calculate Rdel_adjust (just like in the computation for Rdel, we consider only - //half of from_node's resistance). - float Rdel_adjust = to->R_upstream - 0.5 * from_node_R; - - //Second, we adjust the Tdel to account for the delay caused by the internal capacitance. - Tdel += Rdel_adjust * switch_Cinternal; - - float cong_cost = 0.; - if (reached_configurably) { - cong_cost = get_rr_cong_cost(to->index, cost_params.pres_fac); - } else { - //Reached by a non-configurable edge. - //Therefore the from_node and to_node are part of the same non-configurable node set. -#ifdef VTR_ASSERT_SAFE_ENABLED - VTR_ASSERT_SAFE_MSG(same_non_config_node_set(from_node, to->index), - "Non-configurably connected edges should be part of the same node set"); -#endif - - //The congestion cost of all nodes in the set has already been accounted for (when - //the current path first expanded a node in the set). Therefore do *not* re-add the congestion - //cost. - cong_cost = 0.; - } - if (conn_params_->router_opt_choke_points_ && is_flat_ && rr_graph_->node_type(to->index) == IPIN) { - auto find_res = conn_params_->connection_choking_spots_.find(to->index); - if (find_res != conn_params_->connection_choking_spots_.end()) { - cong_cost = cong_cost / pow(2, (float)find_res->second); - } - } - - //Update the backward cost (upstream already included) - to->backward_path_cost += (1. - cost_params.criticality) * cong_cost; //Congestion cost - to->backward_path_cost += cost_params.criticality * Tdel; //Delay cost - - if (cost_params.bend_cost != 0.) { - t_rr_type from_type = rr_graph_->node_type(from_node); - t_rr_type to_type = rr_graph_->node_type(to->index); - if ((from_type == CHANX && to_type == CHANY) || (from_type == CHANY && to_type == CHANX)) { - to->backward_path_cost += cost_params.bend_cost; //Bend cost - } - } - - float total_cost = 0.; - - if (rcv_path_manager.is_enabled() && to->path_data != nullptr) { - to->path_data->backward_delay += cost_params.criticality * Tdel; - to->path_data->backward_cong += (1. - cost_params.criticality) * get_rr_cong_cost(to->index, cost_params.pres_fac); - - total_cost = compute_node_cost_using_rcv(cost_params, to->index, target_node, to->path_data->backward_delay, to->path_data->backward_cong, to->R_upstream); - } else { - const auto& device_ctx = g_vpr_ctx.device(); - //Update total cost - float expected_cost = router_lookahead_.get_expected_cost(to->index, target_node, cost_params, to->R_upstream); - VTR_LOGV_DEBUG(router_debug_ && !std::isfinite(expected_cost), - " Lookahead from %s (%s) to %s (%s) is non-finite, expected_cost = %f, to->R_upstream = %f\n", - rr_node_arch_name(to->index, is_flat_).c_str(), - describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, to->index, is_flat_).c_str(), - rr_node_arch_name(target_node, is_flat_).c_str(), - describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, target_node, is_flat_).c_str(), - expected_cost, to->R_upstream); - total_cost += to->backward_path_cost + cost_params.astar_fac * std::max(0.f, expected_cost - cost_params.astar_offset); - } - to->total_cost = total_cost; -} - -//Adds the route tree rooted at rt_node to the heap, preparing it to be -//used as branch-points for further routing. -template -void ConnectionRouter::add_route_tree_to_heap( - const RouteTreeNode& rt_node, - RRNodeId target_node, - const t_conn_cost_params& cost_params, - const t_bb& net_bb) { - /* Puts the entire partial routing below and including rt_node onto the heap * - * (except for those parts marked as not to be expanded) by calling itself * - * recursively. */ - - /* Pre-order depth-first traversal */ - // IPINs and SINKS are not re_expanded - if (rt_node.re_expand) { - add_route_tree_node_to_heap(rt_node, - target_node, - cost_params, - net_bb); - } - - for (const RouteTreeNode& child_node : rt_node.child_nodes()) { - if (is_flat_) { - if (relevant_node_to_target(rr_graph_, - child_node.inode, - target_node)) { - add_route_tree_to_heap(child_node, - target_node, - cost_params, - net_bb); - } - } else { - add_route_tree_to_heap(child_node, - target_node, - cost_params, - net_bb); - } - } -} - -//Unconditionally adds rt_node to the heap -// -//Note that if you want to respect rt_node.re_expand that is the caller's -//responsibility. -template -void ConnectionRouter::add_route_tree_node_to_heap( - const RouteTreeNode& rt_node, - RRNodeId target_node, - const t_conn_cost_params& cost_params, - const t_bb& net_bb) { - const auto& device_ctx = g_vpr_ctx.device(); - const RRNodeId inode = rt_node.inode; - float backward_path_cost = cost_params.criticality * rt_node.Tdel; - float R_upstream = rt_node.R_upstream; - - /* Don't push to heap if not in bounding box: no-op for serial router, important for parallel router */ - if (!inside_bb(rt_node.inode, net_bb)) - return; - - // after budgets are loaded, calculate delay cost as described by RCV paper - /* R. Fung, V. Betz and W. Chow, "Slack Allocation and Routing to Improve FPGA Timing While - * Repairing Short-Path Violations," in IEEE Transactions on Computer-Aided Design of - * Integrated Circuits and Systems, vol. 27, no. 4, pp. 686-697, April 2008.*/ - // float expected_cost = router_lookahead_.get_expected_cost(inode, target_node, cost_params, R_upstream); - - if (!rcv_path_manager.is_enabled()) { - // tot_cost = backward_path_cost + cost_params.astar_fac * expected_cost; - float expected_cost = router_lookahead_.get_expected_cost(inode, target_node, cost_params, R_upstream); - float tot_cost = backward_path_cost + cost_params.astar_fac * std::max(0.f, expected_cost - cost_params.astar_offset); - VTR_LOGV_DEBUG(router_debug_, " Adding node %8d to heap from init route tree with cost %g (%s)\n", - inode, - tot_cost, - describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, inode, is_flat_).c_str()); - - if (tot_cost > rr_node_route_inf_[inode].path_cost) { - return; - } - add_to_mod_list(inode); - rr_node_route_inf_[inode].path_cost = tot_cost; - rr_node_route_inf_[inode].prev_edge = RREdgeId::INVALID(); - rr_node_route_inf_[inode].backward_path_cost = backward_path_cost; - rr_node_route_inf_[inode].R_upstream = R_upstream; - heap_.push_back({tot_cost, inode}); - - // push_back_node(&heap_, rr_node_route_inf_, - // inode, tot_cost, RREdgeId::INVALID(), - // backward_path_cost, R_upstream); - } else { - float expected_total_cost = compute_node_cost_using_rcv(cost_params, inode, target_node, rt_node.Tdel, 0, R_upstream); - - add_to_mod_list(inode); - rr_node_route_inf_[inode].path_cost = expected_total_cost; - rr_node_route_inf_[inode].prev_edge = RREdgeId::INVALID(); - rr_node_route_inf_[inode].backward_path_cost = backward_path_cost; - rr_node_route_inf_[inode].R_upstream = R_upstream; - - rcv_path_manager.alloc_path_struct(rcv_path_data[inode]); - rcv_path_data[inode]->backward_delay = rt_node.Tdel; - - heap_.push_back({expected_total_cost, inode}); - - // push_back_node_with_info(&heap_, inode, expected_total_cost, - // backward_path_cost, R_upstream, rt_node.Tdel, &rcv_path_manager); - } - - update_router_stats(router_stats_, - /*is_push=*/true, - inode, - rr_graph_); - - if constexpr (VTR_ENABLE_DEBUG_LOGGING_CONST_EXPR) { - router_stats_->rt_node_pushes[rr_graph_->node_type(inode)]++; - } -} - -/* Expand bb by inode's extents and clip against net_bb */ -inline void expand_highfanout_bounding_box(t_bb& bb, const t_bb& net_bb, RRNodeId inode, const RRGraphView* rr_graph) { - bb.xmin = std::max(net_bb.xmin, std::min(bb.xmin, rr_graph->node_xlow(inode))); - bb.ymin = std::max(net_bb.ymin, std::min(bb.ymin, rr_graph->node_ylow(inode))); - bb.xmax = std::min(net_bb.xmax, std::max(bb.xmax, rr_graph->node_xhigh(inode))); - bb.ymax = std::min(net_bb.ymax, std::max(bb.ymax, rr_graph->node_yhigh(inode))); - bb.layer_min = std::min(bb.layer_min, rr_graph->node_layer(inode)); - bb.layer_max = std::max(bb.layer_max, rr_graph->node_layer(inode)); -} - -/* Expand bb by HIGH_FANOUT_BB_FAC and clip against net_bb */ -inline void adjust_highfanout_bounding_box(t_bb& bb, const t_bb& net_bb) { - constexpr int HIGH_FANOUT_BB_FAC = 3; - - bb.xmin = std::max(net_bb.xmin, bb.xmin - HIGH_FANOUT_BB_FAC); - bb.ymin = std::max(net_bb.ymin, bb.ymin - HIGH_FANOUT_BB_FAC); - bb.xmax = std::min(net_bb.xmax, bb.xmax + HIGH_FANOUT_BB_FAC); - bb.ymax = std::min(net_bb.ymax, bb.ymax + HIGH_FANOUT_BB_FAC); - bb.layer_min = std::min(net_bb.layer_min, bb.layer_min); - bb.layer_max = std::max(net_bb.layer_max, bb.layer_max); -} - -template -t_bb ConnectionRouter::add_high_fanout_route_tree_to_heap( - const RouteTreeNode& rt_root, - RRNodeId target_node, - const t_conn_cost_params& cost_params, - const SpatialRouteTreeLookup& spatial_rt_lookup, - const t_bb& net_bounding_box) { - //For high fanout nets we only add those route tree nodes which are spatially close - //to the sink. - // - //Based on: - // J. Swartz, V. Betz, J. Rose, "A Fast Routability-Driven Router for FPGAs", FPGA, 1998 - // - //We rely on a grid-based spatial look-up which is maintained for high fanout nets by - //update_route_tree(), which allows us to add spatially close route tree nodes without traversing - //the entire route tree (which is likely large for a high fanout net). - - //Determine which bin the target node is located in - - int target_bin_x = grid_to_bin_x(rr_graph_->node_xlow(target_node), spatial_rt_lookup); - int target_bin_y = grid_to_bin_y(rr_graph_->node_ylow(target_node), spatial_rt_lookup); - - auto target_layer = rr_graph_->node_layer(target_node); - - int chan_nodes_added = 0; - - t_bb highfanout_bb; - highfanout_bb.xmin = rr_graph_->node_xlow(target_node); - highfanout_bb.xmax = rr_graph_->node_xhigh(target_node); - highfanout_bb.ymin = rr_graph_->node_ylow(target_node); - highfanout_bb.ymax = rr_graph_->node_yhigh(target_node); - highfanout_bb.layer_min = target_layer; - highfanout_bb.layer_max = target_layer; - - //Add existing routing starting from the target bin. - //If the target's bin has insufficient existing routing add from the surrounding bins - constexpr int SINGLE_BIN_MIN_NODES = 2; - bool done = false; - bool found_node_on_same_layer = false; - for (int dx : {0, -1, +1}) { - size_t bin_x = target_bin_x + dx; - - if (bin_x > spatial_rt_lookup.dim_size(0) - 1) continue; //Out of range - - for (int dy : {0, -1, +1}) { - size_t bin_y = target_bin_y + dy; - - if (bin_y > spatial_rt_lookup.dim_size(1) - 1) continue; //Out of range - - for (const RouteTreeNode& rt_node : spatial_rt_lookup[bin_x][bin_y]) { - if (!rt_node.re_expand) // Some nodes (like IPINs) shouldn't be re-expanded - continue; - RRNodeId rr_node_to_add = rt_node.inode; - - /* Flat router: don't go into clusters other than the target one */ - if (is_flat_) { - if (!relevant_node_to_target(rr_graph_, rr_node_to_add, target_node)) - continue; - } - - /* In case of the parallel router, we may be dealing with a virtual net - * so prune the nodes from the HF lookup against the bounding box just in case */ - if (!inside_bb(rr_node_to_add, net_bounding_box)) - continue; - - auto rt_node_layer_num = rr_graph_->node_layer(rr_node_to_add); - if (rt_node_layer_num == target_layer) - found_node_on_same_layer = true; - - // Put the node onto the heap - add_route_tree_node_to_heap(rt_node, target_node, cost_params, net_bounding_box); - - // Expand HF BB to include the node (clip by original BB) - expand_highfanout_bounding_box(highfanout_bb, net_bounding_box, rr_node_to_add, rr_graph_); - - if (rr_graph_->node_type(rr_node_to_add) == CHANY || rr_graph_->node_type(rr_node_to_add) == CHANX) { - chan_nodes_added++; - } - } - - if (dx == 0 && dy == 0 && chan_nodes_added > SINGLE_BIN_MIN_NODES && found_node_on_same_layer) { - //Target bin contained at least minimum amount of routing - // - //We require at least SINGLE_BIN_MIN_NODES to be added. - //This helps ensure we don't end up with, for example, a single - //routing wire running in the wrong direction which may not be - //able to reach the target within the bounding box. - done = true; - break; - } - } - if (done) break; - } - /* If we didn't find enough nodes to branch off near the target - * or they are on the wrong grid layer, just add the full route tree */ - if (chan_nodes_added <= SINGLE_BIN_MIN_NODES || !found_node_on_same_layer) { - add_route_tree_to_heap(rt_root, target_node, cost_params, net_bounding_box); - return net_bounding_box; - } else { - //We found nearby routing, replace original bounding box to be localized around that routing - adjust_highfanout_bounding_box(highfanout_bb, net_bounding_box); - return highfanout_bb; - } -} - -static inline bool relevant_node_to_target(const RRGraphView* rr_graph, - RRNodeId node_to_add, - RRNodeId target_node) { - VTR_ASSERT_SAFE(rr_graph->node_type(target_node) == t_rr_type::SINK); - auto node_to_add_type = rr_graph->node_type(node_to_add); - return node_to_add_type != t_rr_type::IPIN || node_in_same_physical_tile(node_to_add, target_node); -} - -static inline void update_router_stats(RouterStats* router_stats, - bool is_push, - RRNodeId rr_node_id, - const RRGraphView* rr_graph) { - if (is_push) { - router_stats->heap_pushes++; - } else { - router_stats->heap_pops++; - } - - if constexpr (VTR_ENABLE_DEBUG_LOGGING_CONST_EXPR) { - auto node_type = rr_graph->node_type(rr_node_id); - VTR_ASSERT(node_type != NUM_RR_TYPES); - - if (is_inter_cluster_node(*rr_graph, rr_node_id)) { - if (is_push) { - router_stats->inter_cluster_node_pushes++; - router_stats->inter_cluster_node_type_cnt_pushes[node_type]++; - } else { - router_stats->inter_cluster_node_pops++; - router_stats->inter_cluster_node_type_cnt_pops[node_type]++; - } - } else { - if (is_push) { - router_stats->intra_cluster_node_pushes++; - router_stats->intra_cluster_node_type_cnt_pushes[node_type]++; - } else { - router_stats->intra_cluster_node_pops++; - router_stats->intra_cluster_node_type_cnt_pops[node_type]++; - } - } - } -} - -std::unique_ptr make_connection_router(e_heap_type heap_type, - const DeviceGrid& grid, - const RouterLookahead& router_lookahead, - const t_rr_graph_storage& rr_nodes, - const RRGraphView* rr_graph, - const std::vector& rr_rc_data, - const vtr::vector& rr_switch_inf, - vtr::vector& rr_node_route_inf, - bool is_flat) { - switch (heap_type) { - case e_heap_type::BINARY_HEAP: - return std::make_unique>( - grid, - router_lookahead, - rr_nodes, - rr_graph, - rr_rc_data, - rr_switch_inf, - rr_node_route_inf, - is_flat); - case e_heap_type::FOUR_ARY_HEAP: - return std::make_unique>( - grid, - router_lookahead, - rr_nodes, - rr_graph, - rr_rc_data, - rr_switch_inf, - rr_node_route_inf, - is_flat); - default: - VPR_FATAL_ERROR(VPR_ERROR_ROUTE, "Unknown heap_type %d", - heap_type); - } -} diff --git a/vpr/src/route/connection_router.h b/vpr/src/route/connection_router.h index 0de6d508991..f5bb7c57aa9 100644 --- a/vpr/src/route/connection_router.h +++ b/vpr/src/route/connection_router.h @@ -1,6 +1,26 @@ #ifndef _CONNECTION_ROUTER_H #define _CONNECTION_ROUTER_H +/** + * @file + * @brief This file defines the ConnectionRouter class. + * + * Overview + * ======== + * The ConnectionRouter represents the timing-driven connection routers, which + * route from some initial set of sources (via the input rt tree) to a particular + * sink. VPR supports two timing-driven connection routers, including the serial + * connection router and the MultiQueue-based parallel connection router. This + * class defines the interface for the two connection routers and encapsulates + * the common member variables and helper functions for them. + * + * @note + * When the ConnectionRouter is used, it mutates the provided rr_node_route_inf. + * The routed path can be found by tracing from the sink node (which is returned) + * through the rr_node_route_inf. See update_traceback as an example of this tracing. + * + */ + #include "connection_router_interface.h" #include "rr_graph_storage.h" #include "route_common.h" @@ -10,16 +30,10 @@ #include "router_stats.h" #include "spatial_route_tree_lookup.h" -#include "d_ary_heap.h" - -// This class encapsulates the timing driven connection router. This class -// routes from some initial set of sources (via the input rt tree) to a -// particular sink. -// -// When the ConnectionRouter is used, it mutates the provided -// rr_node_route_inf. The routed path can be found by tracing from the sink -// node (which is returned) through the rr_node_route_inf. See -// update_traceback as an example of this tracing. +/** + * @class ConnectionRouter defines the interface for the serial and parallel connection + * routers and encapsulates the common variables and helper functions for the two routers + */ template class ConnectionRouter : public ConnectionRouterInterface { public: @@ -46,40 +60,36 @@ class ConnectionRouter : public ConnectionRouterInterface { , router_debug_(false) , path_search_cumulative_time(0) { heap_.init_heap(grid); - only_opin_inter_layer = (grid.get_num_layers() > 1) && inter_layer_connections_limited_to_opin(*rr_graph); - } - - ~ConnectionRouter() { - VTR_LOG("Serial Connection Router is being destroyed. Time spent on path search: %.3f seconds.\n", - std::chrono::duration(path_search_cumulative_time).count()); - } - - // Clear's the modified list. Should be called after reset_path_costs - // have been called. - void clear_modified_rr_node_info() final { - modified_rr_node_inf_.clear(); - } - - // Reset modified data in rr_node_route_inf based on modified_rr_node_inf. - void reset_path_costs() final { - // Reset the node info stored in rr_node_route_inf variable - ::reset_path_costs(modified_rr_node_inf_); - // Reset the node info stored inside the connection router - if (rcv_path_manager.is_enabled()) { - for (const auto& node : modified_rr_node_inf_) { - rcv_path_data[node] = nullptr; - } - } } - /** Finds a path from the route tree rooted at rt_root to sink_node. - * This is used when you want to allow previous routing of the same net to - * serve as valid start locations for the current connection. - * - * Returns a tuple of: - * bool: path exists? (hard failure, rr graph disconnected) - * bool: should retry with full bounding box? (only used in parallel routing) - * RTExploredNode: the explored sink node, from which the cheapest path can be found via back-tracing */ + virtual ~ConnectionRouter() {} + + /** + * @brief Clears the modified list + * @note Should be called after reset_path_costs have been called + */ + virtual void clear_modified_rr_node_info() = 0; + + /** + * @brief Resets modified data in rr_node_route_inf based on modified_rr_node_inf + */ + virtual void reset_path_costs() = 0; + + /** + * @brief Finds a path from the route tree rooted at rt_root to sink_node + * @note This is used when you want to allow previous routing of the same + * net to serve as valid start locations for the current connection. + * @param rt_root RouteTreeNode describing the current routing state + * @param sink_node Sink node ID to route to + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + * @param router_stats Update router statistics + * @param conn_params Parameters to guide the routing of the given connection + * @return A tuple of: + * - bool: path exists? (hard failure, rr graph disconnected) + * - bool: should retry with full bounding box? (only used in parallel routing) + * - RTExploredNode: the explored sink node, from which the cheapest path can be found via back-tracing + */ std::tuple timing_driven_route_connection_from_route_tree( const RouteTreeNode& rt_root, RRNodeId sink_node, @@ -88,16 +98,22 @@ class ConnectionRouter : public ConnectionRouterInterface { RouterStats& router_stats, const ConnectionParameters& conn_params) final; - /** Finds a path from the route tree rooted at rt_root to sink_node for a - * high fanout net. - * - * Unlike timing_driven_route_connection_from_route_tree(), only part of - * the route tree which is spatially close to the sink is added to the heap. - * - * Returns a tuple of: - * bool: path exists? (hard failure, rr graph disconnected) - * bool: should retry with full bounding box? (only used in parallel routing) - * RTExploredNode: the explored sink node, from which the cheapest path can be found via back-tracing */ + /** + * @brief Finds a path from the route tree rooted at rt_root to sink_node for a high fanout net + * @note Unlike timing_driven_route_connection_from_route_tree(), only part of the route tree which + * is spatially close to the sink is added to the heap. + * @param rt_root RouteTreeNode describing the current routing state + * @param sink_node Sink node ID to route to + * @param cost_params Cost function parameters + * @param net_bounding_box Keep search confined to this bounding box + * @param spatial_rt_lookup Route tree spatial lookup + * @param router_stats Update router statistics + * @param conn_params Parameters to guide the routing of the given connection + * @return A tuple of: + * - bool: path exists? (hard failure, rr graph disconnected) + * - bool: should retry with full bounding box? (only used in parallel routing) + * - RTExploredNode: the explored sink node, from which the cheapest path can be found via back-tracing + */ std::tuple timing_driven_route_connection_from_route_tree_high_fanout( const RouteTreeNode& rt_root, RRNodeId sink_node, @@ -107,159 +123,150 @@ class ConnectionRouter : public ConnectionRouterInterface { RouterStats& router_stats, const ConnectionParameters& conn_params) final; - // Finds a path from the route tree rooted at rt_root to all sinks - // available. - // - // Each element of the returned vector is a reachable sink. - // - // If cost_params.astar_fac is set to 0, this effectively becomes - // Dijkstra's algorithm with a modified exit condition (runs until heap is - // empty). When using cost_params.astar_fac = 0, for efficiency the - // RouterLookahead used should be the NoOpLookahead. - // - // Note: This routine is currently used only to generate information that - // may be helpful in debugging an architecture. - vtr::vector timing_driven_find_all_shortest_paths_from_route_tree( + /** + * @brief Finds shortest paths from the route tree rooted at rt_root to all sinks available + * @note Unlike timing_driven_route_connection_from_route_tree(), only part of the route tree which + * is spatially close to the sink is added to the heap. + * @note If cost_params.astar_fac is set to 0, this effectively becomes Dijkstra's algorithm with a + * modified exit condition (runs until heap is empty). When using cost_params.astar_fac = 0, for + * efficiency the RouterLookahead used should be the NoOpLookahead. + * @note This routine is currently used only to generate information that may be helpful in debugging + * an architecture. + * @param rt_root RouteTreeNode describing the current routing state + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + * @param router_stats Update router statistics + * @param conn_params Parameters to guide the routing of the given connection + * @return A vector where each element is a reachable sink + */ + virtual vtr::vector timing_driven_find_all_shortest_paths_from_route_tree( const RouteTreeNode& rt_root, const t_conn_cost_params& cost_params, const t_bb& bounding_box, RouterStats& router_stats, - const ConnectionParameters& conn_params) final; + const ConnectionParameters& conn_params) = 0; + /** + * @brief Sets router debug option + * @param router_debug Router debug option + */ void set_router_debug(bool router_debug) final { router_debug_ = router_debug; } - // Empty the route tree set used for RCV node detection - // Will return if RCV is disabled - // Called after each net is finished routing to flush the set - void empty_rcv_route_tree_set() final; - - // Enable or disable RCV in connection router - // Enabling this will utilize extra path structures, as well as the RCV cost function - // - // Ensure route budgets have been calculated before enabling this - void set_rcv_enabled(bool enable) final; - - private: - // Mark that data associated with rr_node "inode" has been modified, and - // needs to be reset in reset_path_costs. - void add_to_mod_list(RRNodeId inode) { - if (std::isinf(rr_node_route_inf_[inode].path_cost)) { - modified_rr_node_inf_.push_back(inode); - } - } - - // Update the route path to the node `cheapest.index` via the path from - // `from_node` via `cheapest.prev_edge`. - inline void update_cheapest(RTExploredNode& cheapest, const RRNodeId& from_node) { - const RRNodeId& inode = cheapest.index; - add_to_mod_list(inode); - rr_node_route_inf_[inode].prev_edge = cheapest.prev_edge; - rr_node_route_inf_[inode].path_cost = cheapest.total_cost; - rr_node_route_inf_[inode].backward_path_cost = cheapest.backward_path_cost; - - // Use the already created next path structure pointer when RCV is enabled - if (rcv_path_manager.is_enabled()) { - rcv_path_manager.move(rcv_path_data[inode], cheapest.path_data); - - rcv_path_data[inode]->path_rr = rcv_path_data[from_node]->path_rr; - rcv_path_data[inode]->edge = rcv_path_data[from_node]->edge; - rcv_path_data[inode]->path_rr.push_back(from_node); - rcv_path_data[inode]->edge.push_back(cheapest.prev_edge); - } + /** + * @brief Empties the route tree set used for RCV node detection + * @note Will immediately return if RCV is disabled. Called after + * each net is finished routing to flush the set. + */ + void empty_rcv_route_tree_set() final { + rcv_path_manager.empty_route_tree_nodes(); } - /** Common logic from timing_driven_route_connection_from_route_tree and + /** + * @brief Enables or disables RCV in connection router + * @note Enabling this will utilize extra path structures, as well as + * the RCV cost function. Ensure route budgets have been calculated + * before enabling this. + * @param enable Whether enabling RCV or not + */ + virtual void set_rcv_enabled(bool enable) = 0; + + protected: + /** + * @brief Common logic from timing_driven_route_connection_from_route_tree and * timing_driven_route_connection_from_route_tree_high_fanout for running * the connection router. - * @param[in] rt_root RouteTreeNode describing the current routing state - * @param[in] sink_node Sink node ID to route to - * @param[in] cost_params - * @param[in] bounding_box Keep search confined to this bounding box - * @return bool Signal to retry this connection with a full-device bounding box */ + * @param rt_root RouteTreeNode describing the current routing state + * @param sink_node Sink node ID to route to + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + * @return bool signal to retry this connection with a full-device bounding box + */ bool timing_driven_route_connection_common_setup( const RouteTreeNode& rt_root, RRNodeId sink_node, const t_conn_cost_params& cost_params, const t_bb& bounding_box); - // Finds a path to sink_node, starting from the elements currently in the - // heap. - // - // If the path is not found, which means that the path_cost of sink_node in - // RR node route info has never been updated, `rr_node_route_inf_[sink_node] - // .path_cost` will be the initial value (i.e., float infinity). This case - // can be detected by `std::isinf(rr_node_route_inf_[sink_node].path_cost)`. - // - // This is the core maze routing routine. - // - // Note: For understanding the connection router, start here. + /** + * @brief Finds a path to sink_node, starting from the elements currently in the heap + * @note If the path is not found, which means that the path_cost of sink_node in RR + * node route info has never been updated, `rr_node_route_inf_[sink_node].path_cost` + * will be the initial value (i.e., float infinity). This case can be detected by + * `std::isinf(rr_node_route_inf_[sink_node].path_cost)`. + * @note This is the core maze routing routine. For understanding the connection + * router, start here. + * @param sink_node Sink node ID to route to + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + */ void timing_driven_route_connection_from_heap( RRNodeId sink_node, const t_conn_cost_params& cost_params, const t_bb& bounding_box); - // Expand this current node if it is a cheaper path. - void timing_driven_expand_cheapest( - RRNodeId from_node, - float new_total_cost, - RRNodeId target_node, + /** + * @brief Finds the single shortest path from current heap to the sink node in the RR graph + * @param sink_node Sink node ID to route to + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + * @param target_bb Prune IPINs that lead to blocks other than the target block + */ + virtual void timing_driven_find_single_shortest_path_from_heap(RRNodeId sink_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + const t_bb& target_bb) = 0; + + /** + * @brief Finds shortest paths from current heap to all nodes in the RR graph + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + * @return A vector where each element contains the shortest route to a specific sink node + */ + virtual vtr::vector timing_driven_find_all_shortest_paths_from_heap( const t_conn_cost_params& cost_params, - const t_bb& bounding_box, - const t_bb& target_bb); - - // Expand each neighbor of the current node. - void timing_driven_expand_neighbours( - const RTExploredNode& current, - const t_conn_cost_params& cost_params, - const t_bb& bounding_box, + const t_bb& bounding_box) = 0; + + /** + * @brief Unconditionally adds rt_node to the heap + * @note If you want to respect rt_node->re_expand that is the caller's responsibility. + * @todo Consider moving this function into the ConnectionRouter class after checking + * the different prune functions of the serial and parallel connection routers. + * @param rt_node RouteTreeNode to be added to the heap + * @param target_node Target node ID to route to + * @param cost_params Cost function parameters + * @param net_bb Do not push to heap if not in bounding box + */ + virtual void add_route_tree_node_to_heap( + const RouteTreeNode& rt_node, RRNodeId target_node, - const t_bb& target_bb); - - // Conditionally adds to_node to the router heap (via path from current.index - // via from_edge). - // - // RR nodes outside bounding box specified in bounding_box are not added - // to the heap. - void timing_driven_expand_neighbour( - const RTExploredNode& current, - RREdgeId from_edge, - RRNodeId to_node, const t_conn_cost_params& cost_params, - const t_bb& bounding_box, - RRNodeId target_node, - const t_bb& target_bb); - - // Add to_node to the heap, and also add any nodes which are connected by - // non-configurable edges - void timing_driven_add_to_heap( - const t_conn_cost_params& cost_params, - const RTExploredNode& current, - RRNodeId to_node, - RREdgeId from_edge, - RRNodeId target_node); - - // Calculates the cost of reaching to_node + const t_bb& net_bb) = 0; + + /** + * @brief Calculates the cost of reaching to_node + * @param to Neighbor node to calculate costs before being expanded + * @param cost_params Cost function parameters + * @param from_node Current node ID being explored + * @param target_node Target node ID to route to + */ void evaluate_timing_driven_node_costs( RTExploredNode* to, const t_conn_cost_params& cost_params, RRNodeId from_node, RRNodeId target_node); - // Find paths from current heap to all nodes in the RR graph - vtr::vector timing_driven_find_all_shortest_paths_from_heap( - const t_conn_cost_params& cost_params, - const t_bb& bounding_box); - - //Adds the route tree rooted at rt_node to the heap, preparing it to be - //used as branch-points for further routing. - void add_route_tree_to_heap(const RouteTreeNode& rt_node, - RRNodeId target_node, - const t_conn_cost_params& cost_params, - const t_bb& net_bb); - - // Evaluate node costs using the RCV algorith + /** + * @brief Evaluate node costs using the RCV algorithm + * @param cost_params Cost function parameters + * @param to_node Neighbor node to calculate costs before being expanded + * @param target_node Target node ID to route to + * @param backwards_delay "Known" delay up to and including to_node + * @param backwards_cong "Known" congestion up to and including to_node + * @param R_upstream Upstream resistance to ground from to_node + * @return Node cost using RCV + */ float compute_node_cost_using_rcv(const t_conn_cost_params cost_params, RRNodeId to_node, RRNodeId target_node, @@ -267,16 +274,27 @@ class ConnectionRouter : public ConnectionRouterInterface { float backwards_cong, float R_upstream); - //Unconditionally adds rt_node to the heap - // - //Note that if you want to respect rt_node->re_expand that is the caller's - //responsibility. - void add_route_tree_node_to_heap( - const RouteTreeNode& rt_node, - RRNodeId target_node, - const t_conn_cost_params& cost_params, - const t_bb& net_bb); - + /** + * @brief Adds the route tree rooted at rt_node to the heap, preparing + * it to be used as branch-points for further routing + * @param rt_node RouteTreeNode to be added to the heap + * @param target_node Target node ID to route to + * @param cost_params Cost function parameters + * @param net_bb Do not push to heap if not in bounding box + */ + void add_route_tree_to_heap(const RouteTreeNode& rt_node, + RRNodeId target_node, + const t_conn_cost_params& cost_params, + const t_bb& net_bb); + /** + * @brief For high fanout nets, adds only route tree nodes which are + * spatially close to the sink + * @param rt_root RouteTreeNode to be added to the heap + * @param target_node Target node ID to route to + * @param cost_params Cost function parameters + * @param spatial_route_tree_lookup Route tree spatial lookup + * @param net_bounding_box Do not push to heap if not in bounding box + */ t_bb add_high_fanout_route_tree_to_heap( const RouteTreeNode& rt_root, RRNodeId target_node, @@ -284,47 +302,59 @@ class ConnectionRouter : public ConnectionRouterInterface { const SpatialRouteTreeLookup& spatial_route_tree_lookup, const t_bb& net_bounding_box); + /** Device grid */ const DeviceGrid& grid_; + + /** Router lookahead */ const RouterLookahead& router_lookahead_; + + /** RR node data */ const t_rr_graph_view rr_nodes_; + + /** RR graph */ const RRGraphView* rr_graph_; + + /** RR node resistance/capacitance data */ vtr::array_view rr_rc_data_; + + /** RR switch data */ vtr::array_view rr_switch_inf_; + + //@{ + /** Net terminal groups */ const vtr::vector>>& net_terminal_groups; const vtr::vector>& net_terminal_group_num; + //@} + + /** RR node extra information needed during routing */ vtr::vector& rr_node_route_inf_; + + /** Is flat router enabled or not? */ bool is_flat_; - std::vector modified_rr_node_inf_; + + /** Router statistics (e.g., heap push/pop counts) */ RouterStats* router_stats_; + + /** Parameters to guide the routing of the given connection */ const ConnectionParameters* conn_params_; + + /** Templated heap instance (e.g., binary heap, 4-ary heap, MultiQueue-based parallel heap) */ HeapImplementation heap_; - bool router_debug_; - bool only_opin_inter_layer; + /** Router debug option */ + bool router_debug_; - // Cumulative time spent in the path search part of the connection router. + /** Cumulative time spent in the path search part of the connection router */ std::chrono::microseconds path_search_cumulative_time; - // The path manager for RCV, keeps track of the route tree as a set, also - // manages the allocation of `rcv_path_data`. + //@{ + /** The path manager for RCV, keeps track of the route tree as a set, also + * manages the allocation of `rcv_path_data`. */ PathManager rcv_path_manager; vtr::vector rcv_path_data; + //@} }; -/** Construct a connection router that uses the specified heap type. - * This function is not used, but removing it will result in "undefined reference" - * errors since heap type specializations won't get emitted from connection_router.cpp - * without it. - * The alternative is moving all ConnectionRouter fn implementations into the header. */ -std::unique_ptr make_connection_router( - e_heap_type heap_type, - const DeviceGrid& grid, - const RouterLookahead& router_lookahead, - const t_rr_graph_storage& rr_nodes, - const RRGraphView* rr_graph, - const std::vector& rr_rc_data, - const vtr::vector& rr_switch_inf, - vtr::vector& rr_node_route_inf, - bool is_flat); +#include "connection_router.tpp" #endif /* _CONNECTION_ROUTER_H */ diff --git a/vpr/src/route/connection_router.tpp b/vpr/src/route/connection_router.tpp new file mode 100644 index 00000000000..e47fa0abceb --- /dev/null +++ b/vpr/src/route/connection_router.tpp @@ -0,0 +1,545 @@ +#pragma once + +#include "connection_router.h" + +#include +#include "rr_graph.h" +#include "rr_graph_fwd.h" + +/** Used for the flat router. The node isn't relevant to the target if + * it is an intra-block node outside of our target block */ +inline bool relevant_node_to_target(const RRGraphView* rr_graph, + RRNodeId node_to_add, + RRNodeId target_node); + +template +std::tuple ConnectionRouter::timing_driven_route_connection_from_route_tree( + const RouteTreeNode& rt_root, + RRNodeId sink_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + RouterStats& router_stats, + const ConnectionParameters& conn_params) { + router_stats_ = &router_stats; + conn_params_ = &conn_params; + + bool retry = false; + retry = timing_driven_route_connection_common_setup(rt_root, sink_node, cost_params, bounding_box); + + if (!std::isinf(rr_node_route_inf_[sink_node].path_cost)) { + // Only the `index`, `prev_edge`, and `rcv_path_backward_delay` fields of `out` + // are used after this function returns. + RTExploredNode out; + out.index = sink_node; + out.prev_edge = rr_node_route_inf_[sink_node].prev_edge; + if (rcv_path_manager.is_enabled()) { + out.rcv_path_backward_delay = rcv_path_data[sink_node]->backward_delay; + rcv_path_manager.update_route_tree_set(rcv_path_data[sink_node]); + rcv_path_manager.empty_heap(); + } + heap_.empty_heap(); + return std::make_tuple(true, /*retry=*/false, out); + } else { + reset_path_costs(); + clear_modified_rr_node_info(); + heap_.empty_heap(); + rcv_path_manager.empty_heap(); + return std::make_tuple(false, retry, RTExploredNode()); + } +} + +template +std::tuple ConnectionRouter::timing_driven_route_connection_from_route_tree_high_fanout( + const RouteTreeNode& rt_root, + RRNodeId sink_node, + const t_conn_cost_params& cost_params, + const t_bb& net_bounding_box, + const SpatialRouteTreeLookup& spatial_rt_lookup, + RouterStats& router_stats, + const ConnectionParameters& conn_params) { + router_stats_ = &router_stats; + conn_params_ = &conn_params; + + // re-explore route tree from root to add any new nodes (buildheap afterwards) + // route tree needs to be repushed onto the heap since each node's cost is target specific + t_bb high_fanout_bb = add_high_fanout_route_tree_to_heap(rt_root, sink_node, cost_params, spatial_rt_lookup, net_bounding_box); + heap_.build_heap(); + + RRNodeId source_node = rt_root.inode; + + if (heap_.is_empty_heap()) { + VTR_LOG("No source in route tree: %s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str()); + return std::make_tuple(false, false, RTExploredNode()); + } + + VTR_LOGV_DEBUG(router_debug_, " Routing to %d as high fanout net (BB: %d,%d,%d x %d,%d,%d)\n", sink_node, + high_fanout_bb.layer_min, high_fanout_bb.xmin, high_fanout_bb.ymin, + high_fanout_bb.layer_max, high_fanout_bb.xmax, high_fanout_bb.ymax); + + bool retry_with_full_bb = false; + timing_driven_route_connection_from_heap(sink_node, cost_params, high_fanout_bb); + + if (std::isinf(rr_node_route_inf_[sink_node].path_cost)) { + //Found no path, that may be due to an unlucky choice of existing route tree sub-set, + //try again with the full route tree to be sure this is not an artifact of high-fanout routing + VTR_LOG_WARN("No routing path found in high-fanout mode for net %zu connection (to sink_rr %d), retrying with full route tree\n", size_t(conn_params.net_id_), sink_node); + + //Reset any previously recorded node costs so timing_driven_route_connection() + //starts over from scratch. + reset_path_costs(); + clear_modified_rr_node_info(); + + retry_with_full_bb = timing_driven_route_connection_common_setup(rt_root, sink_node, cost_params, net_bounding_box); + } + + if (std::isinf(rr_node_route_inf_[sink_node].path_cost)) { + VTR_LOG("%s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str()); + + heap_.empty_heap(); + rcv_path_manager.empty_heap(); + return std::make_tuple(false, retry_with_full_bb, RTExploredNode()); + } + + RTExploredNode out; + out.index = sink_node; + out.prev_edge = rr_node_route_inf_[sink_node].prev_edge; + if (rcv_path_manager.is_enabled()) { + out.rcv_path_backward_delay = rcv_path_data[sink_node]->backward_delay; + rcv_path_manager.update_route_tree_set(rcv_path_data[sink_node]); + rcv_path_manager.empty_heap(); + } + heap_.empty_heap(); + + return std::make_tuple(true, retry_with_full_bb, out); +} + +template +bool ConnectionRouter::timing_driven_route_connection_common_setup( + const RouteTreeNode& rt_root, + RRNodeId sink_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box) { + //Re-add route nodes from the existing route tree to the heap. + //They need to be repushed onto the heap since each node's cost is target specific. + + add_route_tree_to_heap(rt_root, sink_node, cost_params, bounding_box); + heap_.build_heap(); // via sifting down everything + + RRNodeId source_node = rt_root.inode; + + if (heap_.is_empty_heap()) { + VTR_LOG("No source in route tree: %s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str()); + return false; + } + + VTR_LOGV_DEBUG(router_debug_, " Routing to %d as normal net (BB: %d,%d,%d x %d,%d,%d)\n", sink_node, + bounding_box.layer_min, bounding_box.xmin, bounding_box.ymin, + bounding_box.layer_max, bounding_box.xmax, bounding_box.ymax); + + timing_driven_route_connection_from_heap(sink_node, cost_params, bounding_box); + + if (std::isinf(rr_node_route_inf_[sink_node].path_cost)) { + // No path found within the current bounding box. + // + // If the bounding box is already max size, just fail + if (bounding_box.xmin == 0 + && bounding_box.ymin == 0 + && bounding_box.xmax == (int)(grid_.width() - 1) + && bounding_box.ymax == (int)(grid_.height() - 1) + && bounding_box.layer_min == 0 + && bounding_box.layer_max == (int)(grid_.get_num_layers() - 1)) { + VTR_LOG("%s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str()); + return false; + } + + // Otherwise, leave unrouted and bubble up a signal to retry this net with a full-device bounding box + VTR_LOG_WARN("No routing path for connection to sink_rr %d, leaving unrouted to retry later\n", sink_node); + return true; + } + + return false; +} + +template +void ConnectionRouter::timing_driven_route_connection_from_heap(RRNodeId sink_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box) { + VTR_ASSERT_SAFE(heap_.is_valid()); + + if (heap_.is_empty_heap()) { //No source + VTR_LOGV_DEBUG(router_debug_, " Initial heap empty (no source)\n"); + } + + // Get bounding box for sink node used in timing_driven_expand_neighbour + VTR_ASSERT_SAFE(sink_node != RRNodeId::INVALID()); + + t_bb target_bb; + if (rr_graph_->node_type(sink_node) == SINK) { // We need to get a bounding box for the sink's entire tile + vtr::Rect tile_bb = grid_.get_tile_bb({rr_graph_->node_xlow(sink_node), + rr_graph_->node_ylow(sink_node), + rr_graph_->node_layer(sink_node)}); + + target_bb.xmin = tile_bb.xmin(); + target_bb.ymin = tile_bb.ymin(); + target_bb.xmax = tile_bb.xmax(); + target_bb.ymax = tile_bb.ymax(); + } else { + target_bb.xmin = rr_graph_->node_xlow(sink_node); + target_bb.ymin = rr_graph_->node_ylow(sink_node); + target_bb.xmax = rr_graph_->node_xhigh(sink_node); + target_bb.ymax = rr_graph_->node_yhigh(sink_node); + } + + target_bb.layer_min = rr_graph_->node_layer(RRNodeId(sink_node)); + target_bb.layer_max = rr_graph_->node_layer(RRNodeId(sink_node)); + + // Start measuring path search time + std::chrono::steady_clock::time_point begin_time = std::chrono::steady_clock::now(); + + timing_driven_find_single_shortest_path_from_heap(sink_node, cost_params, bounding_box, target_bb); + + // Stop measuring path search time + std::chrono::steady_clock::time_point end_time = std::chrono::steady_clock::now(); + path_search_cumulative_time += std::chrono::duration_cast(end_time - begin_time); +} + +#ifdef VTR_ASSERT_SAFE_ENABLED + +//Returns true if both nodes are part of the same non-configurable edge set +inline bool same_non_config_node_set(RRNodeId from_node, RRNodeId to_node) { + auto& device_ctx = g_vpr_ctx.device(); + + auto from_itr = device_ctx.rr_node_to_non_config_node_set.find(from_node); + auto to_itr = device_ctx.rr_node_to_non_config_node_set.find(to_node); + + if (from_itr == device_ctx.rr_node_to_non_config_node_set.end() + || to_itr == device_ctx.rr_node_to_non_config_node_set.end()) { + return false; //Not part of a non-config node set + } + + return from_itr->second == to_itr->second; //Check for same non-config set IDs +} + +#endif + +template +float ConnectionRouter::compute_node_cost_using_rcv(const t_conn_cost_params cost_params, + RRNodeId to_node, + RRNodeId target_node, + float backwards_delay, + float backwards_cong, + float R_upstream) { + float expected_delay; + float expected_cong; + + const t_conn_delay_budget* delay_budget = cost_params.delay_budget; + // TODO: This function is not tested for is_flat == true + VTR_ASSERT(is_flat_ != true); + std::tie(expected_delay, expected_cong) = router_lookahead_.get_expected_delay_and_cong(to_node, target_node, cost_params, R_upstream); + + float expected_total_delay_cost; + float expected_total_cong_cost; + + float expected_total_cong = expected_cong + backwards_cong; + float expected_total_delay = expected_delay + backwards_delay; + + //If budgets specified calculate cost as described by RCV paper: + // R. Fung, V. Betz and W. Chow, "Slack Allocation and Routing to Improve FPGA Timing While + // Repairing Short-Path Violations," in IEEE Transactions on Computer-Aided Design of + // Integrated Circuits and Systems, vol. 27, no. 4, pp. 686-697, April 2008. + + // Normalization constant defined in RCV paper cited above + constexpr float NORMALIZATION_CONSTANT = 100e-12; + + expected_total_delay_cost = expected_total_delay; + expected_total_delay_cost += (delay_budget->short_path_criticality + cost_params.criticality) * std::max(0.f, delay_budget->target_delay - expected_total_delay); + // expected_total_delay_cost += std::pow(std::max(0.f, expected_total_delay - delay_budget->max_delay), 2) / NORMALIZATION_CONSTANT; + expected_total_delay_cost += std::pow(std::max(0.f, delay_budget->min_delay - expected_total_delay), 2) / NORMALIZATION_CONSTANT; + expected_total_cong_cost = expected_total_cong; + + float total_cost = expected_total_delay_cost + expected_total_cong_cost; + + return total_cost; +} + +template +void ConnectionRouter::evaluate_timing_driven_node_costs(RTExploredNode* to, + const t_conn_cost_params& cost_params, + RRNodeId from_node, + RRNodeId target_node) { + /* new_costs.backward_cost: is the "known" part of the cost to this node -- the + * congestion cost of all the routing resources back to the existing route + * plus the known delay of the total path back to the source. + * + * new_costs.total_cost: is this "known" backward cost + an expected cost to get to the target. + * + * new_costs.R_upstream: is the upstream resistance at the end of this node + */ + + //Info for the switch connecting from_node to_node (i.e., to->index) + int iswitch = rr_nodes_.edge_switch(to->prev_edge); + bool switch_buffered = rr_switch_inf_[iswitch].buffered(); + bool reached_configurably = rr_switch_inf_[iswitch].configurable(); + float switch_R = rr_switch_inf_[iswitch].R; + float switch_Tdel = rr_switch_inf_[iswitch].Tdel; + float switch_Cinternal = rr_switch_inf_[iswitch].Cinternal; + + //To node info + auto rc_index = rr_graph_->node_rc_index(to->index); + float node_C = rr_rc_data_[rc_index].C; + float node_R = rr_rc_data_[rc_index].R; + + //From node info + float from_node_R = rr_rc_data_[rr_graph_->node_rc_index(from_node)].R; + + //Update R_upstream + if (switch_buffered) { + to->R_upstream = 0.; //No upstream resistance + } else { + //R_Upstream already initialized + } + + to->R_upstream += switch_R; //Switch resistance + to->R_upstream += node_R; //Node resistance + + //Calculate delay + float Rdel = to->R_upstream - 0.5 * node_R; //Only consider half node's resistance for delay + float Tdel = switch_Tdel + Rdel * node_C; + + //Depending on the switch used, the Tdel of the upstream node (from_node) may change due to + //increased loading from the switch's internal capacitance. + // + //Even though this delay physically affects from_node, we make the adjustment (now) on the to_node, + //since only once we've reached to to_node do we know the connection used (and the switch enabled). + // + //To adjust for the time delay, we compute the product of the Rdel associated with from_node and + //the internal capacitance of the switch. + // + //First, we will calculate Rdel_adjust (just like in the computation for Rdel, we consider only + //half of from_node's resistance). + float Rdel_adjust = to->R_upstream - 0.5 * from_node_R; + + //Second, we adjust the Tdel to account for the delay caused by the internal capacitance. + Tdel += Rdel_adjust * switch_Cinternal; + + float cong_cost = 0.; + if (reached_configurably) { + cong_cost = get_rr_cong_cost(to->index, cost_params.pres_fac); + } else { + //Reached by a non-configurable edge. + //Therefore the from_node and to_node are part of the same non-configurable node set. +#ifdef VTR_ASSERT_SAFE_ENABLED + VTR_ASSERT_SAFE_MSG(same_non_config_node_set(from_node, to->index), + "Non-configurably connected edges should be part of the same node set"); +#endif + + //The congestion cost of all nodes in the set has already been accounted for (when + //the current path first expanded a node in the set). Therefore do *not* re-add the congestion + //cost. + cong_cost = 0.; + } + if (conn_params_->router_opt_choke_points_ && is_flat_ && rr_graph_->node_type(to->index) == IPIN) { + auto find_res = conn_params_->connection_choking_spots_.find(to->index); + if (find_res != conn_params_->connection_choking_spots_.end()) { + cong_cost = cong_cost / pow(2, (float)find_res->second); + } + } + + //Update the backward cost (upstream already included) + to->backward_path_cost += (1. - cost_params.criticality) * cong_cost; //Congestion cost + to->backward_path_cost += cost_params.criticality * Tdel; //Delay cost + + if (cost_params.bend_cost != 0.) { + t_rr_type from_type = rr_graph_->node_type(from_node); + t_rr_type to_type = rr_graph_->node_type(to->index); + if ((from_type == CHANX && to_type == CHANY) || (from_type == CHANY && to_type == CHANX)) { + to->backward_path_cost += cost_params.bend_cost; //Bend cost + } + } + + float total_cost = 0.; + + if (rcv_path_manager.is_enabled() && to->path_data != nullptr) { + to->path_data->backward_delay += cost_params.criticality * Tdel; + to->path_data->backward_cong += (1. - cost_params.criticality) * get_rr_cong_cost(to->index, cost_params.pres_fac); + + total_cost = compute_node_cost_using_rcv(cost_params, to->index, target_node, to->path_data->backward_delay, to->path_data->backward_cong, to->R_upstream); + } else { + const auto& device_ctx = g_vpr_ctx.device(); + //Update total cost + float expected_cost = router_lookahead_.get_expected_cost(to->index, target_node, cost_params, to->R_upstream); + VTR_LOGV_DEBUG(router_debug_ && !std::isfinite(expected_cost), + " Lookahead from %s (%s) to %s (%s) is non-finite, expected_cost = %f, to->R_upstream = %f\n", + rr_node_arch_name(to->index, is_flat_).c_str(), + describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, to->index, is_flat_).c_str(), + rr_node_arch_name(target_node, is_flat_).c_str(), + describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, target_node, is_flat_).c_str(), + expected_cost, to->R_upstream); + total_cost += to->backward_path_cost + cost_params.astar_fac * std::max(0.f, expected_cost - cost_params.astar_offset); + } + to->total_cost = total_cost; +} + +template +void ConnectionRouter::add_route_tree_to_heap( + const RouteTreeNode& rt_node, + RRNodeId target_node, + const t_conn_cost_params& cost_params, + const t_bb& net_bb) { + /* Puts the entire partial routing below and including rt_node onto the heap * + * (except for those parts marked as not to be expanded) by calling itself * + * recursively. */ + + /* Pre-order depth-first traversal */ + // IPINs and SINKS are not re_expanded + if (rt_node.re_expand) { + add_route_tree_node_to_heap(rt_node, target_node, cost_params, net_bb); + } + + for (const RouteTreeNode& child_node : rt_node.child_nodes()) { + if (is_flat_) { + if (relevant_node_to_target(rr_graph_, child_node.inode, target_node)) { + add_route_tree_to_heap(child_node, target_node, cost_params, net_bb); + } + } else { + add_route_tree_to_heap(child_node, target_node, cost_params, net_bb); + } + } +} + +/* Expand bb by inode's extents and clip against net_bb */ +inline void expand_highfanout_bounding_box(t_bb& bb, const t_bb& net_bb, RRNodeId inode, const RRGraphView* rr_graph) { + bb.xmin = std::max(net_bb.xmin, std::min(bb.xmin, rr_graph->node_xlow(inode))); + bb.ymin = std::max(net_bb.ymin, std::min(bb.ymin, rr_graph->node_ylow(inode))); + bb.xmax = std::min(net_bb.xmax, std::max(bb.xmax, rr_graph->node_xhigh(inode))); + bb.ymax = std::min(net_bb.ymax, std::max(bb.ymax, rr_graph->node_yhigh(inode))); + bb.layer_min = std::min(bb.layer_min, rr_graph->node_layer(inode)); + bb.layer_max = std::max(bb.layer_max, rr_graph->node_layer(inode)); +} + +/* Expand bb by HIGH_FANOUT_BB_FAC and clip against net_bb */ +inline void adjust_highfanout_bounding_box(t_bb& bb, const t_bb& net_bb) { + constexpr int HIGH_FANOUT_BB_FAC = 3; + + bb.xmin = std::max(net_bb.xmin, bb.xmin - HIGH_FANOUT_BB_FAC); + bb.ymin = std::max(net_bb.ymin, bb.ymin - HIGH_FANOUT_BB_FAC); + bb.xmax = std::min(net_bb.xmax, bb.xmax + HIGH_FANOUT_BB_FAC); + bb.ymax = std::min(net_bb.ymax, bb.ymax + HIGH_FANOUT_BB_FAC); + bb.layer_min = std::min(net_bb.layer_min, bb.layer_min); + bb.layer_max = std::max(net_bb.layer_max, bb.layer_max); +} + +template +t_bb ConnectionRouter::add_high_fanout_route_tree_to_heap( + const RouteTreeNode& rt_root, + RRNodeId target_node, + const t_conn_cost_params& cost_params, + const SpatialRouteTreeLookup& spatial_rt_lookup, + const t_bb& net_bounding_box) { + //For high fanout nets we only add those route tree nodes which are spatially close + //to the sink. + // + //Based on: + // J. Swartz, V. Betz, J. Rose, "A Fast Routability-Driven Router for FPGAs", FPGA, 1998 + // + //We rely on a grid-based spatial look-up which is maintained for high fanout nets by + //update_route_tree(), which allows us to add spatially close route tree nodes without traversing + //the entire route tree (which is likely large for a high fanout net). + + //Determine which bin the target node is located in + + int target_bin_x = grid_to_bin_x(rr_graph_->node_xlow(target_node), spatial_rt_lookup); + int target_bin_y = grid_to_bin_y(rr_graph_->node_ylow(target_node), spatial_rt_lookup); + + auto target_layer = rr_graph_->node_layer(target_node); + + int chan_nodes_added = 0; + + t_bb highfanout_bb; + highfanout_bb.xmin = rr_graph_->node_xlow(target_node); + highfanout_bb.xmax = rr_graph_->node_xhigh(target_node); + highfanout_bb.ymin = rr_graph_->node_ylow(target_node); + highfanout_bb.ymax = rr_graph_->node_yhigh(target_node); + highfanout_bb.layer_min = target_layer; + highfanout_bb.layer_max = target_layer; + + //Add existing routing starting from the target bin. + //If the target's bin has insufficient existing routing add from the surrounding bins + constexpr int SINGLE_BIN_MIN_NODES = 2; + bool done = false; + bool found_node_on_same_layer = false; + for (int dx : {0, -1, +1}) { + size_t bin_x = target_bin_x + dx; + + if (bin_x > spatial_rt_lookup.dim_size(0) - 1) continue; //Out of range + + for (int dy : {0, -1, +1}) { + size_t bin_y = target_bin_y + dy; + + if (bin_y > spatial_rt_lookup.dim_size(1) - 1) continue; //Out of range + + for (const RouteTreeNode& rt_node : spatial_rt_lookup[bin_x][bin_y]) { + if (!rt_node.re_expand) // Some nodes (like IPINs) shouldn't be re-expanded + continue; + RRNodeId rr_node_to_add = rt_node.inode; + + /* Flat router: don't go into clusters other than the target one */ + if (is_flat_) { + if (!relevant_node_to_target(rr_graph_, rr_node_to_add, target_node)) + continue; + } + + /* In case of the parallel router, we may be dealing with a virtual net + * so prune the nodes from the HF lookup against the bounding box just in case */ + if (!inside_bb(rr_node_to_add, net_bounding_box)) + continue; + + auto rt_node_layer_num = rr_graph_->node_layer(rr_node_to_add); + if (rt_node_layer_num == target_layer) + found_node_on_same_layer = true; + + // Put the node onto the heap + add_route_tree_node_to_heap(rt_node, target_node, cost_params, net_bounding_box); + + // Expand HF BB to include the node (clip by original BB) + expand_highfanout_bounding_box(highfanout_bb, net_bounding_box, rr_node_to_add, rr_graph_); + + if (rr_graph_->node_type(rr_node_to_add) == CHANY || rr_graph_->node_type(rr_node_to_add) == CHANX) { + chan_nodes_added++; + } + } + + if (dx == 0 && dy == 0 && chan_nodes_added > SINGLE_BIN_MIN_NODES && found_node_on_same_layer) { + //Target bin contained at least minimum amount of routing + // + //We require at least SINGLE_BIN_MIN_NODES to be added. + //This helps ensure we don't end up with, for example, a single + //routing wire running in the wrong direction which may not be + //able to reach the target within the bounding box. + done = true; + break; + } + } + if (done) break; + } + /* If we didn't find enough nodes to branch off near the target + * or they are on the wrong grid layer, just add the full route tree */ + if (chan_nodes_added <= SINGLE_BIN_MIN_NODES || !found_node_on_same_layer) { + add_route_tree_to_heap(rt_root, target_node, cost_params, net_bounding_box); + return net_bounding_box; + } else { + //We found nearby routing, replace original bounding box to be localized around that routing + adjust_highfanout_bounding_box(highfanout_bb, net_bounding_box); + return highfanout_bb; + } +} + +/** Used for the flat router. The node isn't relevant to the target if + * it is an intra-block node outside of our target block */ +inline bool relevant_node_to_target(const RRGraphView* rr_graph, + RRNodeId node_to_add, + RRNodeId target_node) { + VTR_ASSERT_SAFE(rr_graph->node_type(target_node) == t_rr_type::SINK); + auto node_to_add_type = rr_graph->node_type(node_to_add); + return node_to_add_type != t_rr_type::IPIN || node_in_same_physical_tile(node_to_add, target_node); +} diff --git a/vpr/src/route/connection_router_interface.h b/vpr/src/route/connection_router_interface.h index 96ef278833a..178768bf5d5 100644 --- a/vpr/src/route/connection_router_interface.h +++ b/vpr/src/route/connection_router_interface.h @@ -24,6 +24,8 @@ struct t_conn_cost_params { float criticality = 1.; float astar_fac = 1.2; float astar_offset = 0.f; + float post_target_prune_fac = 1.2f; + float post_target_prune_offset = 0.f; float bend_cost = 1.; float pres_fac = 1.; const t_conn_delay_budget* delay_budget = nullptr; diff --git a/vpr/src/route/d_ary_heap.h b/vpr/src/route/d_ary_heap.h index c52cd702d13..ed10b0157bd 100644 --- a/vpr/src/route/d_ary_heap.h +++ b/vpr/src/route/d_ary_heap.h @@ -21,6 +21,8 @@ template class DAryHeap : public HeapInterface { public: + static constexpr unsigned arg_D = D; + using priority_queue = customized_d_ary_priority_queue, HeapNodeComparator>; DAryHeap() {} diff --git a/vpr/src/route/multi_queue_d_ary_heap.h b/vpr/src/route/multi_queue_d_ary_heap.h new file mode 100644 index 00000000000..5a49dadae50 --- /dev/null +++ b/vpr/src/route/multi_queue_d_ary_heap.h @@ -0,0 +1,133 @@ +/******************************************************************** + * MultiQueue Implementation + * + * Originally authored by Guozheng Zhang, Gilead Posluns, and Mark C. Jeffrey + * Published at the 36th ACM Symposium on Parallelism in Algorithms and + * Architectures (SPAA), June 2024 + * + * Original source: https://github.com/mcj-group/cps + * + * This implementation and interface has been modified from the original to: + * - Support queue draining functionality + * - Enable integration with the VTR project + * + * The MultiQueue data structure provides an efficient concurrent priority + * queue implementation designed for parallel processing applications. + * + * Modified: February 2025 + ********************************************************************/ + +#ifndef _MULTI_QUEUE_D_ARY_HEAP_H +#define _MULTI_QUEUE_D_ARY_HEAP_H + +#include "device_grid.h" +#include "heap_type.h" +#include "multi_queue_d_ary_heap.tpp" +#include +#include + +// FIXME: Use unified heap node struct (HeapNodeId) and comparator (HeapNodeComparator) +// defined in heap_type.h. Currently, the MQ_IO is not compatible with them. Need a lot +// of refactoring in MQ_IO to make it work, which is left for another PR to clean it up. +using MQHeapNode = std::tuple; + +struct MQHeapNodeTupleComparator /* FIXME: Use HeapNodeComparator */ { + bool operator()(const MQHeapNode& u, const MQHeapNode& v) { + return std::get<0>(u) > std::get<0>(v); + } +}; + +template +class MultiQueueDAryHeap { + public: + using MQ_IO = MultiQueueIO; + + MultiQueueDAryHeap() { + set_num_threads_and_queues(1, 2); // Serial (#threads=1, #queues=2) by default + } + + MultiQueueDAryHeap(size_t num_threads, size_t num_queues) { + set_num_threads_and_queues(num_threads, num_queues); + } + + ~MultiQueueDAryHeap() {} + + void set_num_threads_and_queues(size_t num_threads, size_t num_queues) { + pq_.reset(); + // Note: BE AWARE that in MQ_IO interface, `num_queues` comes first, then `num_threads`! + pq_ = std::make_unique(num_queues, num_threads, 0 /*Dont care (batch size for only popBatch)*/); + } + + void init_heap(const DeviceGrid& grid) { + (void)grid; + // TODO: Reserve storage for MQ_IO + // Note: This function could be called before setting num_threads/num_queues + } + + bool try_pop(HeapNode& heap_node) { + auto tmp = pq_->tryPop(); + if (!tmp.has_value()) { + return false; + } else { + uint32_t node_id; + std::tie(heap_node.prio, node_id) = tmp.value(); // FIXME: eliminate type cast by modifying MQ_IO + heap_node.node = RRNodeId(node_id); + return true; + } + } + + void add_to_heap(const HeapNode& heap_node) { + HeapNodePriority prio = heap_node.prio; + uint32_t node = size_t(heap_node.node); + pq_->push({prio, node}); + } + + void push_back(const HeapNode& heap_node) { + HeapNodePriority prio = heap_node.prio; + uint32_t node = size_t(heap_node.node); + pq_->push({prio, node}); // FIXME: add to heap without maintaining the heap property + } + + void build_heap() { + // FIXME: restore the heap property after pushing back nodes + } + + bool is_valid() const { + return true; // FIXME: checking if the heap property is maintained or not + } + + void empty_heap() { + pq_->reset(); // TODO: check if adding clear function for MQ_IO is necessary + } + + bool is_empty_heap() const { + return (bool)(pq_->empty()); + } + + uint64_t get_num_pushes() const { + return pq_->getNumPushes(); + } + + uint64_t get_num_pops() const { + return pq_->getNumPops(); + } + + uint64_t get_heap_occupancy() const { + return pq_->getQueueOccupancy(); + } + + void reset() { + pq_->reset(); + } + +#ifdef MQ_IO_ENABLE_CLEAR_FOR_POP + void set_min_priority_for_pop(const HeapNodePriority& minPrio) { + pq_->setMinPrioForPop(minPrio); + } +#endif + + private: + std::unique_ptr pq_; +}; + +#endif diff --git a/vpr/src/route/multi_queue_d_ary_heap.tpp b/vpr/src/route/multi_queue_d_ary_heap.tpp new file mode 100644 index 00000000000..e7ed202a7e4 --- /dev/null +++ b/vpr/src/route/multi_queue_d_ary_heap.tpp @@ -0,0 +1,436 @@ +/******************************************************************** + * MultiQueue Implementation + * + * Originally authored by Guozheng Zhang, Gilead Posluns, and Mark C. Jeffrey + * Published at the 36th ACM Symposium on Parallelism in Algorithms and + * Architectures (SPAA), June 2024 + * + * Original source: https://github.com/mcj-group/cps + * + * This implementation and interface has been modified from the original to: + * - Support queue draining functionality + * - Enable integration with the VTR project + * + * The MultiQueue data structure provides an efficient concurrent priority + * queue implementation designed for parallel processing applications. + * + * Modified: February 2025 + ********************************************************************/ + +#pragma once + +#include +#include +#include +#include +#include +#include +#include +#include +#include "d_ary_heap.tpp" + +#define CACHELINE 64 + +// #define PERF 1 +#define MQ_IO_ENABLE_CLEAR_FOR_POP + +template< + unsigned D, + typename PQElement, + typename Comparator, + typename PrioType> +class MultiQueueIO { + using PQ = customized_d_ary_priority_queue, Comparator>; + Comparator compare; + + // Special value used to signify that there is no 'min' element in a PQ + // container. The user should ensure that they do not use this priority + // while using the MQ. + static constexpr PrioType EMPTY_PRIO = std::numeric_limits::max(); + + struct PQContainer { + uint64_t pushes = 0; + uint64_t pops = 0; + PQ pq; + std::atomic_flag queueLock = ATOMIC_FLAG_INIT; + std::atomic min{EMPTY_PRIO}; + + void lock() { + while (queueLock.test_and_set(std::memory_order_acquire)) + ; + } + bool try_lock() { return queueLock.test_and_set(std::memory_order_acquire); } + void unlock() { queueLock.clear(std::memory_order_release); } + + } __attribute__((aligned(CACHELINE))); + + std::vector< + PQContainer + // FIXME: Disabled this due to VTR not using Boost. There is a C++ way + // of doing this, but it requires making an aligned allocator + // class. May be a good idea to add to VTR util in the future. + // Should profile for performance first; may not be worth it. + // , boost::alignment::aligned_allocator + > + queues; + uint64_t NUM_QUEUES; + + // Termination: + // - numIdle records the number of threads that believe + // there are no more work to do. + // -numEmpty records number of queues that are empty + uint64_t threadNum; + std::atomic numIdle{0}; + std::atomic numEmpty; +#ifdef MQ_IO_ENABLE_CLEAR_FOR_POP + std::atomic minPrioForPop{std::numeric_limits::max()}; +#endif + + uint64_t batchSize; + + public: + MultiQueueIO(uint64_t numQueues, uint64_t numThreads, uint64_t batch) + : queues(numQueues) + , NUM_QUEUES(numQueues) + , threadNum(numThreads) + , numEmpty(numQueues) + , batchSize(batch) {} + +#ifdef PERF + uint64_t __attribute__((noinline)) ThreadLocalRandom() { +#else + uint64_t ThreadLocalRandom() { +#endif + // static thread_local std::mt19937_64 generator; + // std::uniform_real_distribution<> distribution(min,max); + // return distribution(generator); + static uint64_t modMask = NUM_QUEUES - 1; + static thread_local uint64_t x = pthread_self(); + uint64_t z = (x += UINT64_C(0x9E3779B97F4A7C15)); + z = (z ^ (z >> 30)) * UINT64_C(0xBF58476D1CE4E5B9); + z = (z ^ (z >> 27)) * UINT64_C(0x94D049BB133111EB); + return (z ^ (z >> 31)) & modMask; + } + +#ifdef PERF + void __attribute__((noinline)) pushInt(uint64_t queue, PQElement item) { + queues[queue].pq.push(item); + } +#endif + +#ifdef PERF + void __attribute__((noinline)) push(PQElement item) { +#else + inline void push(PQElement item) { +#endif + uint64_t queue; + while (true) { + queue = ThreadLocalRandom(); + if (!queues[queue].try_lock()) break; + } + auto& q = queues[queue]; + q.pushes++; + if (q.pq.empty()) + numEmpty.fetch_sub(1, std::memory_order_relaxed); +#ifdef PERF + pushInt(queue, item); +#else + q.pq.push(item); +#endif + q.min.store( + q.pq.size() > 0 + ? std::get<0>(q.pq.top()) + : EMPTY_PRIO, + std::memory_order_release); + q.unlock(); + } + +#ifdef PERF + void __attribute__((noinline)) pushBatch(uint64_t size, PQElement* items) { +#else + inline void pushBatch(uint64_t size, PQElement* items) { +#endif + uint64_t queue; + while (true) { + queue = ThreadLocalRandom(); + if (!queues[queue].try_lock()) break; + } + auto& q = queues[queue]; + q.pushes += size; + if (q.pq.empty()) + numEmpty.fetch_sub(1, std::memory_order_relaxed); + for (uint64_t i = 0; i < size; i++) { +#ifdef PERF + pushInt(queue, items[i]); +#else + q.pq.push(items[i]); +#endif + } + q.min.store( + q.pq.size() > 0 + ? std::get<0>(q.pq.top()) + : EMPTY_PRIO, + std::memory_order_release); + q.unlock(); + } + + // Simplified Termination detection idea from the 2021 MultiQueue paper: + // Repeatedly try popping and stop when numIdle >= threadNum, + // That is, stop when all threads agree that there are no more work +#ifdef PERF + boost::optional __attribute__((noinline)) tryPop() { +#else + inline std::optional tryPop() { +#endif + auto item = pop(); + if (item) return item; + + // increment count and keep on trying to pop + uint64_t num = numIdle.fetch_add(1, std::memory_order_relaxed) + 1; + do { + item = pop(); + if (item) break; + if (num >= threadNum) return {}; + + num = numIdle.load(std::memory_order_relaxed); + + } while (true); + + numIdle.fetch_sub(1, std::memory_order_relaxed); + return item; + } + +#ifdef MQ_IO_ENABLE_CLEAR_FOR_POP + inline void setMinPrioForPop(PrioType newMinPrio) { + PrioType oldMinPrio = minPrioForPop.load(std::memory_order_relaxed); + while (compare({oldMinPrio, 0}, {newMinPrio, 0}) /* old > new */ && !minPrioForPop.compare_exchange_weak(oldMinPrio, newMinPrio)) + ; + } +#endif + +#ifdef PERF + boost::optional __attribute__((noinline)) pop() { +#else + inline std::optional pop() { +#endif + uint64_t poppingQueue = NUM_QUEUES; + while (true) { + // Pick the higher priority max of queue i and j + uint64_t i = ThreadLocalRandom(); + uint64_t j = ThreadLocalRandom(); + while (j == i) { + j = ThreadLocalRandom(); + } + + PrioType minI = queues[i].min.load(std::memory_order_acquire); + PrioType minJ = queues[j].min.load(std::memory_order_acquire); + + if (minI == EMPTY_PRIO && minJ == EMPTY_PRIO) { + uint64_t emptyQueues = numEmpty.load(std::memory_order_relaxed); + if (emptyQueues >= queues.size()) + break; + else + continue; + } + + if (minI != EMPTY_PRIO && minJ != EMPTY_PRIO) { + poppingQueue = compare({minJ, 0}, {minI, 0}) ? i : j; + } else if (minJ == EMPTY_PRIO) { + poppingQueue = i; + } else { + poppingQueue = j; + } + if (queues[poppingQueue].try_lock()) continue; + auto& q = queues[poppingQueue]; + if (!q.pq.empty()) { +#ifdef MQ_IO_ENABLE_CLEAR_FOR_POP + PrioType minPrio = minPrioForPop.load(std::memory_order_acquire); + if (compare(q.pq.top(), {minPrio, 0})) { + q.pq.clear(); + // do not add `q.pops` on purpose + numEmpty.fetch_add(1, std::memory_order_relaxed); + q.min.store(EMPTY_PRIO, std::memory_order_release); + } else { +#endif + PQElement retItem = q.pq.top(); + q.pq.pop(); + q.pops++; + if (q.pq.empty()) + numEmpty.fetch_add(1, std::memory_order_relaxed); + q.min.store( + q.pq.size() > 0 + ? std::get<0>(q.pq.top()) + : EMPTY_PRIO, + std::memory_order_release); + q.unlock(); + return retItem; +#ifdef MQ_IO_ENABLE_CLEAR_FOR_POP + } +#endif + } + q.unlock(); + } + return {}; + } + +#ifdef PERF + boost::optional __attribute__((noinline)) tryPopBatch(PQElement* ret) { +#else + inline std::optional tryPopBatch(PQElement* ret) { +#endif + auto item = popBatch(ret); + if (item) return item; + + // increment count and keep on trying to pop + uint64_t num = numIdle.fetch_add(1, std::memory_order_relaxed) + 1; + do { + item = popBatch(ret); + if (item) break; + if (num >= threadNum) return {}; + + num = numIdle.load(std::memory_order_relaxed); + + } while (true); + + numIdle.fetch_sub(1, std::memory_order_relaxed); + return item; + } + +#ifdef PERF + void __attribute__((noinline)) popInt(uint64_t queue, PQElement* ret) { + auto& q = queues[queue]; + *ret = q.pq.top(); + q.pq.pop(); + } +#endif + +#ifdef PERF + boost::optional __attribute__((noinline)) popBatch(PQElement* ret){ +#else + inline std::optional popBatch(PQElement* ret) { +#endif + uint64_t poppingQueue = NUM_QUEUES; + while (true) { + // Pick the higher priority max of queue i and j + uint64_t i = ThreadLocalRandom(); + uint64_t j = ThreadLocalRandom(); + while (j == i) { + j = ThreadLocalRandom(); + } + + PrioType minI = queues[i].min.load(std::memory_order_acquire); + PrioType minJ = queues[j].min.load(std::memory_order_acquire); + + if (minI == EMPTY_PRIO && minJ == EMPTY_PRIO) { + uint64_t emptyQueues = numEmpty.load(std::memory_order_relaxed); + if (emptyQueues >= queues.size()) + break; + else + continue; + } + + if (minI != EMPTY_PRIO && minJ != EMPTY_PRIO) { + poppingQueue = compare({minJ, 0}, {minI, 0}) ? i : j; + } else if (minJ == EMPTY_PRIO) { + poppingQueue = i; + } else { + poppingQueue = j; + } + if (queues[poppingQueue].try_lock()) continue; + auto& q = queues[poppingQueue]; + if (q.pq.empty()) { + q.unlock(); + continue; + } + + uint64_t num = 0; + for (num = 0; num < batchSize; num++) { + if (q.pq.empty()) break; +#ifdef PERF + popInt(poppingQueue, &ret[num]); +#else + ret[num] = q.pq.top(); + q.pq.pop(); +#endif + } + q.pops += num; + if (q.pq.empty()) + numEmpty.fetch_add(1, std::memory_order_relaxed); + q.min.store( + q.pq.size() > 0 + ? std::get<0>(q.pq.top()) + : EMPTY_PRIO, + std::memory_order_release); + q.unlock(); + if (num == 0) continue; + + return num; + } + return {}; +} + +inline uint64_t +getQueueOccupancy() const { + uint64_t maxOccupancy = 0; + for (uint64_t i = 0; i < NUM_QUEUES; i++) { + maxOccupancy = std::max(maxOccupancy, queues[i].pq.size()); + } + return maxOccupancy; +} + +// Get the number of pushes to all queues. +// Note: this is not lock protected. +inline uint64_t getNumPushes() const { + uint64_t totalPushes = 0; + for (uint64_t i = 0; i < NUM_QUEUES; i++) { + totalPushes += queues[i].pushes; + } + return totalPushes; +} + +// Get the number of pops to all queues. +// Note: this is not lock protected. +inline uint64_t getNumPops() const { + uint64_t totalPops = 0; + for (uint64_t i = 0; i < NUM_QUEUES; i++) { + totalPops += queues[i].pops; + } + return totalPops; +} + +inline void stat() const { + std::cout << "total pushes " << getNumPushes() << "\n"; + std::cout << "total pops " << getNumPops() << "\n"; +} + +// Note: this is only called at the end of algorithm as a +// sanity check, therefore it is not lock protected. +inline bool empty() const { + for (uint i = 0; i < NUM_QUEUES; i++) { + if (!queues[i].pq.empty()) { + return false; + } + } + return true; +} + +// Resets the MultiQueue to a state as if it was reinitialized. +// This must be called before using the MQ again after using TypPop(). +// Note: this assumes the queues are already empty and unlocked. +inline void reset() { + for (uint64_t i = 0; i < NUM_QUEUES; i++) { + assert(queues[i].pq.empty() && "reset() assumes empty queues"); + assert((queues[i].queueLock.test(std::memory_order_relaxed) == 0) + && "reset() assumes unlocked queues"); + queues[i].pushes = 0; + queues[i].pops = 0; + queues[i].min.store(EMPTY_PRIO, std::memory_order_relaxed); + } + numIdle.store(0, std::memory_order_relaxed); + numEmpty.store(NUM_QUEUES, std::memory_order_relaxed); +#ifdef MQ_IO_ENABLE_CLEAR_FOR_POP + minPrioForPop.store(std::numeric_limits::max(), std::memory_order_relaxed); +#endif +} +} +; diff --git a/vpr/src/route/netlist_routers.h b/vpr/src/route/netlist_routers.h index d64477f03ad..eb8a220f51f 100644 --- a/vpr/src/route/netlist_routers.h +++ b/vpr/src/route/netlist_routers.h @@ -3,7 +3,7 @@ /** @file Interface for a netlist router. * * A NetlistRouter manages the required bits of state to complete the netlist routing process, - * which requires finding a path for every connection in the netlist using a ConnectionRouter. + * which requires finding a path for every connection in the netlist using a SerialConnectionRouter. * This needs to be an interface because there may be different netlist routing schedules, * i.e. parallel or net-decomposing routers. * @@ -19,7 +19,6 @@ #include "NetPinTimingInvalidator.h" #include "clustered_netlist_utils.h" #include "connection_based_routing_fwd.h" -#include "connection_router.h" #include "globals.h" #include "heap_type.h" #include "netlist_fwd.h" diff --git a/vpr/src/route/parallel_connection_router.cpp b/vpr/src/route/parallel_connection_router.cpp new file mode 100644 index 00000000000..59889204c23 --- /dev/null +++ b/vpr/src/route/parallel_connection_router.cpp @@ -0,0 +1,489 @@ +#include "parallel_connection_router.h" + +#include +#include "route_tree.h" +#include "rr_graph_fwd.h" + +/** Post-target pruning: Prune a given node (do not explore it) if the cost of + * the best possible path from the source, through the node, to the target is + * higher than the cost of the best path found to the target so far. Cited from + * the FPT'24 conference paper (more details can also be found there). */ +static inline bool post_target_prune_node(float new_total_cost, + float new_back_cost, + float best_back_cost_to_target, + const t_conn_cost_params& params) { + // Divide out the astar_fac, then multiply to get determinism + // This is a correction factor to the forward cost to make the total + // cost an under-estimate. + // TODO: Should investigate creating a heuristic function that is + // gaurenteed to be an under-estimate. + // NOTE: Found experimentally that using the original heuristic to order + // the nodes in the queue and then post-target pruning based on the + // under-estimating heuristic has better runtime. + float expected_cost = new_total_cost - new_back_cost; + float new_expected_cost = expected_cost; + // h1 = (h - offset) * fac + // Protection for division by zero + if (params.astar_fac > 0.001) + // To save time, does not recompute the heuristic, just divideds out + // the astar_fac. + new_expected_cost /= params.astar_fac; + new_expected_cost = new_expected_cost - params.post_target_prune_offset; + // Max function to prevent the heuristic from going negative + new_expected_cost = std::max(0.f, new_expected_cost); + new_expected_cost *= params.post_target_prune_fac; + if ((new_back_cost + new_expected_cost) > best_back_cost_to_target) + return true; + // NOTE: we do NOT check for equality here. Equality does not matter for + // determinism when draining the queues (may just lead to a bit more work). + return false; +} + +/** Pre-push pruning: when iterating over the neighbors of u, this function + * determines whether a path through u to its neighbor node v has a better + * backward cost than the best path to v found so far (breaking ties if needed). + * Cited from the FPT'24 conference paper (more details can also be found there). + */ +// TODO: Once we have a heap node struct, clean this up! +static inline bool prune_node(RRNodeId inode, + float new_total_cost, + float new_back_cost, + RREdgeId new_prev_edge, + RRNodeId target_node, + vtr::vector& rr_node_route_inf_, + const t_conn_cost_params& params) { + // Post-target pruning: After the target is reached the first time, should + // use the heuristic to help drain the queues. + if (inode != target_node) { + t_rr_node_route_inf* target_route_inf = &rr_node_route_inf_[target_node]; + float best_back_cost_to_target = target_route_inf->backward_path_cost; + if (post_target_prune_node(new_total_cost, new_back_cost, best_back_cost_to_target, params)) + return true; + } + + // Backwards Pruning + // NOTE: When going to the target, we only want to prune on the truth. + // The queues handle using the heuristic to explore nodes faster. + t_rr_node_route_inf* route_inf = &rr_node_route_inf_[inode]; + float best_back_cost = route_inf->backward_path_cost; + if (new_back_cost > best_back_cost) + return true; + // In the case of a tie, need to be picky about whether to prune or not in + // order to get determinism. + // FIXME: This may not be thread safe. If the best node changes while this + // function is being called, we may have the new_back_cost and best + // prev_edge's being from different heap nodes! + // TODO: Move this to within the lock (the rest can stay for performance). + if (new_back_cost == best_back_cost) { +#ifndef NON_DETERMINISTIC_PRUNING + // With deterministic pruning, cannot always prune on ties. + // In the case of a true tie, just prune, no need to explore neightbors + RREdgeId best_prev_edge = route_inf->prev_edge; + if (new_prev_edge == best_prev_edge) + return true; + // When it comes to invalid edge IDs, in the case of a tied back cost, + // always try to keep the invalid edge ID (likely the start node). + // TODO: Verify this. + // If the best previous edge is invalid, prune + if (!best_prev_edge.is_valid()) + return true; + // If the new previous edge is invalid (assuming the best is not), accept + if (!new_prev_edge.is_valid()) + return false; + // Finally, if this node is not coming from a preferred edge, prune + // Deterministic version prefers a given EdgeID, so a unique path is returned since, + // in the case of a tie, a determinstic path wins. + // Is first preferred over second? + auto is_preferred_edge = [](RREdgeId first, RREdgeId second) { + return first < second; + }; + if (!is_preferred_edge(new_prev_edge, best_prev_edge)) + return true; +#else + std::ignore = new_prev_edge; + // When we do not care about determinism, always prune on equality. + return true; +#endif + } + + // If all above passes, do not prune. + return false; +} + +/** Post-pop pruning: After node u is popped from the queue, this function + * decides whether to explore the neighbors of u or to prune. Initially, it + * performs Post-Target Pruning based on the stopping criterion. Then, the + * current total estimated cost of the path through node u (f_u) is compared + * to the best total cost so far (most recently pushed) for that node and, + * if the two are different, the node u is pruned. During the wave expansion, + * u may be pushed to the queue multiple times. For example, node u may be + * pushed to the queue and then, before u is popped from the queue, a better + * path to u may be found and pushed to the queue. Here we are using f_u as + * an optimistic identifier to check if the pair (u, f_u) is the most recently + * pushed element for node u. This reduces redundant work. + * Cited from the FPT'24 conference paper (more details can also be found there). + */ +static inline bool should_not_explore_neighbors(RRNodeId inode, + float new_total_cost, + float new_back_cost, + RRNodeId target_node, + vtr::vector& rr_node_route_inf_, + const t_conn_cost_params& params) { +#ifndef NON_DETERMINISTIC_PRUNING + // For deterministic pruning, cannot enforce anything on the total cost since + // traversal order is not gaurenteed. However, since total cost is used as a + // "key" to signify that this node is the last node that was pushed, we can + // just check for equality. There is a chance this may cause some duplicates + // for the deterministic case, but thats ok they will be handled. + // TODO: Maybe consider having the non-deterministic version do this too. + if (new_total_cost != rr_node_route_inf_[inode].path_cost) + return true; +#else + // For non-deterministic pruning, can greadily just ignore nodes with higher + // total cost. + if (new_total_cost > rr_node_route_inf_[inode].path_cost) + return true; +#endif + // Perform post-target pruning. If this is not done, there is a chance that + // several duplicates of a node is in the queue that will never reach the + // target better than what we found and they will explore all of their + // neighbors which is not good. This is done before obtaining the lock to + // prevent lock contention where possible. + if (inode != target_node) { + float best_back_cost_to_target = rr_node_route_inf_[target_node].backward_path_cost; + if (post_target_prune_node(new_total_cost, new_back_cost, best_back_cost_to_target, params)) + return true; + } + return false; +} + +template +void ParallelConnectionRouter::timing_driven_find_single_shortest_path_from_heap(RRNodeId sink_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + const t_bb& target_bb) { + // Assign the thread task function parameters to atomic variables + this->sink_node_ = &sink_node; + this->cost_params_ = const_cast(&cost_params); + this->bounding_box_ = const_cast(&bounding_box); + this->target_bb_ = const_cast(&target_bb); + + // Synchronize at the barrier before executing a new thread task + this->thread_barrier_.wait(); + + // Main thread executes a new thread task (helper threads are doing the same in the background) + this->timing_driven_find_single_shortest_path_from_heap_thread_func(*this->sink_node_, + *this->cost_params_, + *this->bounding_box_, + *this->target_bb_, 0); + + // Synchronize at the barrier before resetting the heap + this->thread_barrier_.wait(); + + // Collect the number of heap pushes and pops + this->router_stats_->heap_pushes += this->heap_.get_num_pushes(); + this->router_stats_->heap_pops += this->heap_.get_num_pops(); + + // Reset the heap for the next connection + this->heap_.reset(); +} + +template +void ParallelConnectionRouter::timing_driven_find_single_shortest_path_from_heap_sub_thread_wrapper(const size_t thread_idx) { + this->thread_barrier_.init(); + while (true) { + this->thread_barrier_.wait(); + if (this->is_router_destroying_ == true) { + return; + } else { + timing_driven_find_single_shortest_path_from_heap_thread_func(*this->sink_node_, + *this->cost_params_, + *this->bounding_box_, + *this->target_bb_, + thread_idx); + } + this->thread_barrier_.wait(); + } +} + +template +void ParallelConnectionRouter::timing_driven_find_single_shortest_path_from_heap_thread_func(RRNodeId sink_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + const t_bb& target_bb, + const size_t thread_idx) { + HeapNode cheapest; + while (this->heap_.try_pop(cheapest)) { + // Pop a new inode with the cheapest total cost in current route tree to be expanded on + const auto& [new_total_cost, inode] = cheapest; + + // Check if we should explore the neighbors of this node + if (should_not_explore_neighbors(inode, new_total_cost, this->rr_node_route_inf_[inode].backward_path_cost, sink_node, this->rr_node_route_inf_, cost_params)) { + continue; + } + + // Get the current RR node info within a critical section to prevent data races + obtainSpinLock(inode); + + RTExploredNode current; + current.index = inode; + current.backward_path_cost = this->rr_node_route_inf_[inode].backward_path_cost; + current.prev_edge = this->rr_node_route_inf_[inode].prev_edge; + current.R_upstream = this->rr_node_route_inf_[inode].R_upstream; + + releaseLock(inode); + + // Double check now just to be sure that we should still explore neighbors + // NOTE: A good question is what happened to the uniqueness pruning. The idea + // is that at this point it does not matter. Basically any duplicates + // will act like they were the last one pushed in. This may create some + // duplicates, but it is a simple way of handling this situation. + // It may be worth investigating a better way to do this in the future. + // TODO: This is still doing post-target pruning. May want to investigate + // if this is worth doing. + // TODO: should try testing without the pruning below and see if anything changes. + if (should_not_explore_neighbors(inode, new_total_cost, current.backward_path_cost, sink_node, this->rr_node_route_inf_, cost_params)) { + continue; + } + + // Adding nodes to heap + timing_driven_expand_neighbours(current, cost_params, bounding_box, sink_node, target_bb, thread_idx); + } +} + +template +void ParallelConnectionRouter::timing_driven_expand_neighbours(const RTExploredNode& current, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + RRNodeId target_node, + const t_bb& target_bb, + size_t thread_idx) { + /* Puts all the rr_nodes adjacent to current on the heap. */ + + // For each node associated with the current heap element, expand all of it's neighbors + auto edges = this->rr_nodes_.edge_range(current.index); + + // This is a simple prefetch that prefetches: + // - RR node data reachable from this node + // - rr switch data to reach those nodes from this node. + // + // This code will be a NOP on compiler targets that do not have a + // builtin to emit prefetch instructions. + // + // This code will be a NOP on CPU targets that lack prefetch instructions. + // All modern x86 and ARM64 platforms provide prefetch instructions. + // + // This code delivers ~6-8% reduction in wallclock time when running Titan + // benchmarks, and was specifically measured against the gsm_switch and + // directrf vtr_reg_weekly running in high effort. + // + // - directrf_stratixiv_arch_timing.blif + // - gsm_switch_stratixiv_arch_timing.blif + // + for (RREdgeId from_edge : edges) { + RRNodeId to_node = this->rr_nodes_.edge_sink_node(from_edge); + this->rr_nodes_.prefetch_node(to_node); + + int switch_idx = this->rr_nodes_.edge_switch(from_edge); + VTR_PREFETCH(&this->rr_switch_inf_[switch_idx], 0, 0); + } + + for (RREdgeId from_edge : edges) { + RRNodeId to_node = this->rr_nodes_.edge_sink_node(from_edge); + timing_driven_expand_neighbour(current, + from_edge, + to_node, + cost_params, + bounding_box, + target_node, + target_bb, + thread_idx); + } +} + +template +void ParallelConnectionRouter::timing_driven_expand_neighbour(const RTExploredNode& current, + RREdgeId from_edge, + RRNodeId to_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + RRNodeId target_node, + const t_bb& target_bb, + size_t thread_idx) { + // BB-pruning + // Disable BB-pruning if RCV is enabled, as this can make it harder for circuits with high negative hold slack to resolve this + // TODO: Only disable pruning if the net has negative hold slack, maybe go off budgets + if (!inside_bb(to_node, bounding_box)) { + // Note: Logging are disabled for parallel connection router + return; /* Node is outside (expanded) bounding box. */ + } + + /* Prune away IPINs that lead to blocks other than the target one. Avoids * + * the issue of how to cost them properly so they don't get expanded before * + * more promising routes, but makes route-through (via CLBs) impossible. * + * Change this if you want to investigate route-throughs. */ + if (target_node != RRNodeId::INVALID()) { + t_rr_type to_type = this->rr_graph_->node_type(to_node); + if (to_type == IPIN) { + // Check if this IPIN leads to the target block + // IPIN's of the target block should be contained within it's bounding box + int to_xlow = this->rr_graph_->node_xlow(to_node); + int to_ylow = this->rr_graph_->node_ylow(to_node); + int to_layer = this->rr_graph_->node_layer(to_node); + int to_xhigh = this->rr_graph_->node_xhigh(to_node); + int to_yhigh = this->rr_graph_->node_yhigh(to_node); + if (to_xlow < target_bb.xmin + || to_ylow < target_bb.ymin + || to_xhigh > target_bb.xmax + || to_yhigh > target_bb.ymax + || to_layer < target_bb.layer_min + || to_layer > target_bb.layer_max) { + // Note: Logging are disabled for parallel connection router + return; + } + } + } + // Note: Logging are disabled for parallel connection router + + timing_driven_add_to_heap(cost_params, + current, + to_node, + from_edge, + target_node, + thread_idx); +} + +template +void ParallelConnectionRouter::timing_driven_add_to_heap(const t_conn_cost_params& cost_params, + const RTExploredNode& current, + RRNodeId to_node, + const RREdgeId from_edge, + RRNodeId target_node, + size_t thread_idx) { + const RRNodeId& from_node = current.index; + + // Initialize the neighbor RTExploredNode + RTExploredNode next; + next.R_upstream = current.R_upstream; + next.index = to_node; + next.prev_edge = from_edge; + next.total_cost = std::numeric_limits::infinity(); // Not used directly + next.backward_path_cost = current.backward_path_cost; + + this->evaluate_timing_driven_node_costs(&next, cost_params, from_node, target_node); + + float new_total_cost = next.total_cost; + float new_back_cost = next.backward_path_cost; + + // To further reduce lock contention, we add a cheap read-only check before acquiring the lock, motivated by Shun et al. + if (prune_node(to_node, new_total_cost, new_back_cost, from_edge, target_node, this->rr_node_route_inf_, cost_params)) { + return; + } + + obtainSpinLock(to_node); + + if (prune_node(to_node, new_total_cost, new_back_cost, from_edge, target_node, this->rr_node_route_inf_, cost_params)) { + releaseLock(to_node); + return; + } + + update_cheapest(next, thread_idx); + + releaseLock(to_node); + + if (to_node == target_node) { +#ifdef MQ_IO_ENABLE_CLEAR_FOR_POP + if (multi_queue_direct_draining_) { + this->heap_.set_min_priority_for_pop(new_total_cost); + } +#endif + return; + } + this->heap_.add_to_heap({new_total_cost, to_node}); +} + +template +void ParallelConnectionRouter::add_route_tree_node_to_heap( + const RouteTreeNode& rt_node, + RRNodeId target_node, + const t_conn_cost_params& cost_params, + const t_bb& net_bb) { + const auto& device_ctx = g_vpr_ctx.device(); + const RRNodeId inode = rt_node.inode; + float backward_path_cost = cost_params.criticality * rt_node.Tdel; + float R_upstream = rt_node.R_upstream; + + /* Don't push to heap if not in bounding box: no-op for serial router, important for parallel router */ + if (!inside_bb(rt_node.inode, net_bb)) + return; + + // After budgets are loaded, calculate delay cost as described by RCV paper + /* R. Fung, V. Betz and W. Chow, "Slack Allocation and Routing to Improve FPGA Timing While + * Repairing Short-Path Violations," in IEEE Transactions on Computer-Aided Design of + * Integrated Circuits and Systems, vol. 27, no. 4, pp. 686-697, April 2008.*/ + + if (!this->rcv_path_manager.is_enabled()) { + float expected_cost = this->router_lookahead_.get_expected_cost(inode, target_node, cost_params, R_upstream); + float tot_cost = backward_path_cost + cost_params.astar_fac * std::max(0.f, expected_cost - cost_params.astar_offset); + VTR_LOGV_DEBUG(this->router_debug_, " Adding node %8d to heap from init route tree with cost %g (%s)\n", + inode, + tot_cost, + describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, inode, this->is_flat_).c_str()); + + if (prune_node(inode, tot_cost, backward_path_cost, RREdgeId::INVALID(), target_node, this->rr_node_route_inf_, cost_params)) { + return; + } + add_to_mod_list(inode, 0 /*main thread*/); + this->rr_node_route_inf_[inode].path_cost = tot_cost; + this->rr_node_route_inf_[inode].prev_edge = RREdgeId::INVALID(); + this->rr_node_route_inf_[inode].backward_path_cost = backward_path_cost; + this->rr_node_route_inf_[inode].R_upstream = R_upstream; + this->heap_.push_back({tot_cost, inode}); + } + // Note: RCV is not supported by parallel connection router +} + +std::unique_ptr make_parallel_connection_router(e_heap_type heap_type, + const DeviceGrid& grid, + const RouterLookahead& router_lookahead, + const t_rr_graph_storage& rr_nodes, + const RRGraphView* rr_graph, + const std::vector& rr_rc_data, + const vtr::vector& rr_switch_inf, + vtr::vector& rr_node_route_inf, + bool is_flat, + int multi_queue_num_threads, + int multi_queue_num_queues, + bool multi_queue_direct_draining) { + switch (heap_type) { + case e_heap_type::BINARY_HEAP: + return std::make_unique>( + grid, + router_lookahead, + rr_nodes, + rr_graph, + rr_rc_data, + rr_switch_inf, + rr_node_route_inf, + is_flat, + multi_queue_num_threads, + multi_queue_num_queues, + multi_queue_direct_draining); + case e_heap_type::FOUR_ARY_HEAP: + return std::make_unique>( + grid, + router_lookahead, + rr_nodes, + rr_graph, + rr_rc_data, + rr_switch_inf, + rr_node_route_inf, + is_flat, + multi_queue_num_threads, + multi_queue_num_queues, + multi_queue_direct_draining); + default: + VPR_FATAL_ERROR(VPR_ERROR_ROUTE, "Unknown heap_type %d", + heap_type); + } +} diff --git a/vpr/src/route/parallel_connection_router.h b/vpr/src/route/parallel_connection_router.h new file mode 100644 index 00000000000..18d873e0c6e --- /dev/null +++ b/vpr/src/route/parallel_connection_router.h @@ -0,0 +1,443 @@ +#ifndef _PARALLEL_CONNECTION_ROUTER_H +#define _PARALLEL_CONNECTION_ROUTER_H + +#include "connection_router.h" + +#include "d_ary_heap.h" +#include "multi_queue_d_ary_heap.h" + +#include +#include +#include +#include + +/** + * @brief Spin lock implementation using std::atomic_flag + * + * It is used per RR node for protecting the update to node costs + * to prevent data races. Since different threads rarely work on + * the same node simultaneously, this fine-grained locking strategy + * of one lock per node reduces contention. + */ +class spin_lock_t { + /** Atomic flag used for the lock implementation */ + std::atomic_flag lock_ = ATOMIC_FLAG_INIT; + + public: + /** + * @brief Acquires the spin lock, repeatedly attempting until successful + */ + void acquire() { + while (std::atomic_flag_test_and_set_explicit(&lock_, std::memory_order_acquire)) + ; + } + + /** + * @brief Releases the spin lock, allowing other threads to acquire it + */ + void release() { + std::atomic_flag_clear_explicit(&lock_, std::memory_order_release); + } +}; + +/** + * @brief Thread barrier implementation using std::mutex + * + * It ensures all participating threads reach a synchronization point + * before any are allowed to proceed further. It uses a mutex and + * condition variable to coordinate thread synchronization. + */ +class barrier_mutex_t { + // FIXME: Try std::barrier (since C++20) to replace this mutex barrier + std::mutex mutex_; + std::condition_variable cv_; + size_t count_; + size_t max_count_; + size_t generation_ = 0; + + public: + /** + * @brief Constructs a barrier for a specific number of threads + * @param num_threads Number of threads that must call wait() before + * any thread is allowed to proceed + */ + explicit barrier_mutex_t(size_t num_threads) + : count_(num_threads) + , max_count_(num_threads) {} + + /** + * @brief Blocks the calling thread until all threads have called wait() + * + * When the specified number of threads have called this method, all + * threads are unblocked and the barrier is reset for the next use. + */ + void wait() { + std::unique_lock lock{mutex_}; + size_t gen = generation_; + if (--count_ == 0) { + generation_++; + count_ = max_count_; + cv_.notify_all(); + } else { + cv_.wait(lock, [this, &gen] { return gen != generation_; }); + } + } +}; + +/** + * @brief Spin-based thread barrier implementation using std::atomic + * + * It ensures all participating threads reach a synchronization point + * before any are allowed to proceed further. It uses atomic operations + * to implement Sense-Reversing Centralized Barrier (from Section 5.2.1 + * of Michael L. Scott's textbook) without using mutex locks. + */ +class barrier_spin_t { + /** Number of threads that must reach the barrier */ + size_t num_threads_ = 1; + + /** Atomic counter tracking the number of threads that have arrived at the barrier */ + std::atomic count_ = 0; + + /** Global sense shared by all participating threads */ + std::atomic sense_ = false; + + /** Thread-local sense value for each participating thread */ + inline static thread_local bool local_sense_ = false; + + public: + /** + * @brief Constructs a barrier for a specific number of threads + * @param num_threads Number of threads that must call wait() before + * any thread is allowed to proceed + */ + explicit barrier_spin_t(size_t num_threads) { num_threads_ = num_threads; } + + /** + * @brief Initializes the thread-local sense flag + * @note Should be called by each thread before first using the barrier. + */ + void init() { + local_sense_ = false; + } + + /** + * @brief Blocks the calling thread until all threads have called wait() + * + * Uses a sense-reversing algorithm to synchronize threads. The last thread + * to arrive unblocks all waiting threads. This method avoids using locks or + * condition variables, making it potentially more efficient for short waits. + */ + void wait() { + bool s = !local_sense_; + local_sense_ = s; + size_t num_arrivals = count_.fetch_add(1) + 1; + if (num_arrivals == num_threads_) { + count_.store(0); + sense_.store(s); + } else { + while (sense_.load() != s) + ; // spin until the last thread arrives + } + } +}; + +using barrier_t = barrier_spin_t; // Using the spin-based thread barrier + +/** + * @class ParallelConnectionRouter implements the MultiQueue-based parallel connection + * router (FPT'24) based on the ConnectionRouter interface. + * @details The details of the algorithm can be found from the conference paper: + * A. Singer, H. Yan, G. Zhang, M. Jeffrey, M. Stojilovic and V. Betz, "MultiQueue-Based FPGA Routing: + * Relaxed A* Priority Ordering for Improved Parallelism," Int. Conf. on Field-Programmable Technology, + * Dec. 2024. + */ +template +class ParallelConnectionRouter : public ConnectionRouter> { + public: + ParallelConnectionRouter( + const DeviceGrid& grid, + const RouterLookahead& router_lookahead, + const t_rr_graph_storage& rr_nodes, + const RRGraphView* rr_graph, + const std::vector& rr_rc_data, + const vtr::vector& rr_switch_inf, + vtr::vector& rr_node_route_inf, + bool is_flat, + int multi_queue_num_threads, + int multi_queue_num_queues, + bool multi_queue_direct_draining) + : ConnectionRouter>(grid, router_lookahead, rr_nodes, rr_graph, rr_rc_data, rr_switch_inf, rr_node_route_inf, is_flat) + , modified_rr_node_inf_(multi_queue_num_threads) + , thread_barrier_(multi_queue_num_threads) + , is_router_destroying_(false) + , locks_(rr_node_route_inf.size()) + , multi_queue_direct_draining_(multi_queue_direct_draining) { + // Set the MultiQueue parameters + this->heap_.set_num_threads_and_queues(multi_queue_num_threads, multi_queue_num_queues); + // Initialize the thread barrier + this->thread_barrier_.init(); + // Instantiate (multi_queue_num_threads - 1) helper threads + this->sub_threads_.resize(multi_queue_num_threads - 1); + for (int i = 0; i < multi_queue_num_threads - 1; ++i) { + this->sub_threads_[i] = std::thread(&ParallelConnectionRouter::timing_driven_find_single_shortest_path_from_heap_sub_thread_wrapper, this, i + 1 /*0: main thread*/); + this->sub_threads_[i].detach(); + } + } + + ~ParallelConnectionRouter() { + this->is_router_destroying_ = true; // signal the helper threads to exit + this->thread_barrier_.wait(); // wait until all threads reach the barrier + + VTR_LOG("Parallel Connection Router is being destroyed. Time spent on path search: %.3f seconds.\n", + std::chrono::duration(this->path_search_cumulative_time).count()); + } + + /** + * @brief Clears the modified list per thread + * @note Should be called after reset_path_costs have been called + */ + void clear_modified_rr_node_info() final { + for (auto& thread_visited_rr_nodes : this->modified_rr_node_inf_) { + thread_visited_rr_nodes.clear(); + } + } + + /** + * @brief Resets modified data in rr_node_route_inf based on modified_rr_node_inf + */ + void reset_path_costs() final { + // Reset the node info stored in rr_node_route_inf variable + for (const auto& thread_visited_rr_nodes : this->modified_rr_node_inf_) { + ::reset_path_costs(thread_visited_rr_nodes); + } + } + + /** + * @brief [Not supported] Enables RCV feature + * @note RCV for parallel connection router has not been implemented yet. + * Thus this function is not expected to be called. + */ + void set_rcv_enabled(bool) final { + VPR_FATAL_ERROR(VPR_ERROR_ROUTE, "RCV for parallel connection router not yet implemented. Not expected to be called."); + } + + /** + * @brief [Not supported] Finds shortest paths from the route tree rooted at rt_root to all sinks available + * @note This function has not been implemented yet and is not the focus of parallel connection router. + * Thus this function is not expected to be called. + */ + vtr::vector timing_driven_find_all_shortest_paths_from_route_tree( + const RouteTreeNode&, + const t_conn_cost_params&, + const t_bb&, + RouterStats&, + const ConnectionParameters&) final { + VPR_FATAL_ERROR(VPR_ERROR_ROUTE, "timing_driven_find_all_shortest_paths_from_route_tree not yet implemented (nor is the focus of the parallel connection router). Not expected to be called."); + } + + protected: + /** + * @brief Marks that data associated with rr_node 'inode' has + * been modified, and needs to be reset in reset_path_costs + */ + inline void add_to_mod_list(RRNodeId inode, size_t thread_idx) { + if (std::isinf(this->rr_node_route_inf_[inode].path_cost)) { + this->modified_rr_node_inf_[thread_idx].push_back(inode); + } + } + + /** + * @brief Updates the route path to the node `cheapest.index` + * via the path from `from_node` via `cheapest.prev_edge` + */ + inline void update_cheapest(RTExploredNode& cheapest, size_t thread_idx) { + const RRNodeId& inode = cheapest.index; + add_to_mod_list(inode, thread_idx); + this->rr_node_route_inf_[inode].prev_edge = cheapest.prev_edge; + this->rr_node_route_inf_[inode].path_cost = cheapest.total_cost; + this->rr_node_route_inf_[inode].backward_path_cost = cheapest.backward_path_cost; + } + + /** + * @brief Obtains the per-node spin locks for protecting node cost updates + */ + inline void obtainSpinLock(const RRNodeId& inode) { + this->locks_[size_t(inode)].acquire(); + } + + /** + * @brief Releases the per-node spin lock, allowing other + * threads working on the same node to obtain it + */ + inline void releaseLock(const RRNodeId& inode) { + this->locks_[size_t(inode)].release(); + } + + /** + * @brief Finds the single shortest path from current heap to the sink node in the RR graph + * @param sink_node Sink node ID to route to + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + * @param target_bb Prune IPINs that lead to blocks other than the target block + */ + void timing_driven_find_single_shortest_path_from_heap(RRNodeId sink_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + const t_bb& target_bb) final; + + /** + * @brief Helper thread wrapper function, passed to std::thread instantiation and running a + * while-loop to obtain and execute new helper thread tasks until the main thread signals the + * threads to exit + * @param thread_idx Thread ID (0 means main thread; 1 to #threads-1 means helper threads) + */ + void timing_driven_find_single_shortest_path_from_heap_sub_thread_wrapper( + const size_t thread_idx); + + /** + * @brief Helper thread task function to find the single shortest path from current heap to + * the sink node in the RR graph + * @param sink_node Sink node ID to route to + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + * @param target_bb Prune IPINs that lead to blocks other than the target block + * @param thread_idx Thread ID (0 means main thread; 1 to #threads-1 means helper threads) + */ + void timing_driven_find_single_shortest_path_from_heap_thread_func( + RRNodeId sink_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + const t_bb& target_bb, + const size_t thread_idx); + + /** + * @brief Expands each neighbor of the current node in the wave expansion + * @param current Current node being explored + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + * @param target_node Target node ID to route to + * @param target_bb Prune IPINs that lead to blocks other than the target block + * @param thread_idx Thread ID (0 means main thread; 1 to #threads-1 means helper threads) + */ + void timing_driven_expand_neighbours( + const RTExploredNode& current, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + RRNodeId target_node, + const t_bb& target_bb, + size_t thread_idx); + + /** + * @brief Conditionally adds to_node to the router heap (via path from current.index via from_edge) + * @note RR nodes outside bounding box specified in bounding_box are not added to the heap. + * @param current Current node being explored + * @param from_edge Edge between the current node and the neighbor node + * @param to_node Neighbor node to be expanded + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + * @param target_node Target node ID to route to + * @param target_bb Prune IPINs that lead to blocks other than the target block + * @param thread_idx Thread ID (0 means main thread; 1 to #threads-1 means helper threads) + */ + void timing_driven_expand_neighbour( + const RTExploredNode& current, + RREdgeId from_edge, + RRNodeId to_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + RRNodeId target_node, + const t_bb& target_bb, + size_t thread_idx); + + /** + * @brief Adds to_node to the heap, and also adds any nodes which are connected by non-configurable edges + * @param cost_params Cost function parameters + * @param current Current node being explored + * @param to_node Neighbor node to be expanded + * @param from_edge Edge between the current node and the neighbor node + * @param target_node Target node ID to route to + * @param thread_idx Thread ID (0 means main thread; 1 to #threads-1 means helper threads) + */ + void timing_driven_add_to_heap( + const t_conn_cost_params& cost_params, + const RTExploredNode& current, + RRNodeId to_node, + RREdgeId from_edge, + RRNodeId target_node, + size_t thread_idx); + + /** + * @brief Unconditionally adds rt_node to the heap + * @note If you want to respect rt_node->re_expand that is the caller's responsibility. + * @todo Consider moving this function into the ConnectionRouter class after checking + * the different prune functions of the serial and parallel connection routers. + * @param rt_node RouteTreeNode to be added to the heap + * @param target_node Target node ID to route to + * @param cost_params Cost function parameters + * @param net_bb Do not push to heap if not in bounding box + */ + void add_route_tree_node_to_heap( + const RouteTreeNode& rt_node, + RRNodeId target_node, + const t_conn_cost_params& cost_params, + const t_bb& net_bb) final; + + /** + * @brief [Not supported] Finds shortest paths from current heap to all nodes in the RR graph + * @note This function has not been implemented yet and is not the focus of parallel connection router. + * Thus this function is not expected to be called. + */ + vtr::vector timing_driven_find_all_shortest_paths_from_heap( + const t_conn_cost_params&, + const t_bb&) final { + VPR_FATAL_ERROR(VPR_ERROR_ROUTE, "timing_driven_find_all_shortest_paths_from_heap not yet implemented (nor is the focus of this project). Not expected to be called."); + } + + /** Node IDs of modified nodes in rr_node_route_inf for each thread*/ + std::vector> modified_rr_node_inf_; + + /** Helper threads */ + std::vector sub_threads_; + + /** Thread barrier for synchronization */ + barrier_t thread_barrier_; + + /** Signal for helper threads to exit */ + std::atomic is_router_destroying_; + + /** Fine-grained locks per RR node */ + std::vector locks_; + + /** Is queue draining optimization enabled? */ + bool multi_queue_direct_draining_; + + //@{ + /** Atomic parameters of thread task functions to pass from main thread to helper threads */ + std::atomic sink_node_; + std::atomic cost_params_; + std::atomic bounding_box_; + std::atomic target_bb_; + //@} +}; + +/** Construct a parallel connection router that uses the specified heap type. + * This function is not used, but removing it will result in "undefined reference" + * errors since heap type specializations won't get emitted from parallel_connection_router.cpp + * without it. + * The alternative is moving all ParallelConnectionRouter fn implementations into the header. */ +std::unique_ptr make_parallel_connection_router( + e_heap_type heap_type, + const DeviceGrid& grid, + const RouterLookahead& router_lookahead, + const t_rr_graph_storage& rr_nodes, + const RRGraphView* rr_graph, + const std::vector& rr_rc_data, + const vtr::vector& rr_switch_inf, + vtr::vector& rr_node_route_inf, + bool is_flat, + int multi_queue_num_threads, + int multi_queue_num_queues, + bool multi_queue_direct_draining); + +#endif /* _PARALLEL_CONNECTION_ROUTER_H */ diff --git a/vpr/src/route/partition_tree.cpp b/vpr/src/route/partition_tree.cpp index 38ee7abc2dd..497f887cf74 100644 --- a/vpr/src/route/partition_tree.cpp +++ b/vpr/src/route/partition_tree.cpp @@ -44,7 +44,7 @@ std::unique_ptr PartitionTree::build_helper(const Netlist<>& * Do this for every step with only given nets, because each cutline takes some nets out * of the game, so if we just built a global lookup it wouldn't yield accurate results. * - * VPR's bounding boxes include the borders (see ConnectionRouter::timing_driven_expand_neighbour()) + * VPR's bounding boxes include the borders (see SerialConnectionRouter::timing_driven_expand_neighbour()) * so try to include x=bb.xmax, y=bb.ymax etc. when calculating things. */ int width = x2 - x1 + 1; int height = y2 - y1 + 1; diff --git a/vpr/src/route/partition_tree.h b/vpr/src/route/partition_tree.h index 6bf68be04b8..d30d5121492 100644 --- a/vpr/src/route/partition_tree.h +++ b/vpr/src/route/partition_tree.h @@ -1,6 +1,6 @@ #pragma once -#include "connection_router.h" +#include "serial_connection_router.h" #include "netlist_fwd.h" #include "router_stats.h" @@ -27,7 +27,7 @@ inline Side operator!(const Side& rhs) { } /** Part of a net in the context of the \ref DecompNetlistRouter. Sinks and routing resources - * routable/usable by the \ref ConnectionRouter are constrained to ones inside clipped_bb + * routable/usable by the \ref SerialConnectionRouter are constrained to ones inside clipped_bb * (\see inside_bb()) */ class VirtualNet { public: diff --git a/vpr/src/route/route_net.tpp b/vpr/src/route/route_net.tpp index 0e8c4c268a5..1a5715b7341 100644 --- a/vpr/src/route/route_net.tpp +++ b/vpr/src/route/route_net.tpp @@ -17,7 +17,7 @@ /** Attempt to route a single net. * - * @param router The ConnectionRouter instance + * @param router The ConnectionRouterType instance * @param net_list Input netlist * @param net_id * @param itry # of iteration @@ -40,8 +40,8 @@ * @param should_setup Should we reset/prune the existing route tree first? * @param sink_mask Which sinks to route? Assumed all sinks if nullopt, otherwise a mask of [1..num_sinks+1] where set bits request the sink to be routed * @return NetResultFlags for this net */ -template -inline NetResultFlags route_net(ConnectionRouter& router, +template +inline NetResultFlags route_net(ConnectionRouterType& router, const Netlist<>& net_list, const ParentNetId& net_id, int itry, @@ -140,6 +140,8 @@ inline NetResultFlags route_net(ConnectionRouter& router, t_conn_cost_params cost_params; cost_params.astar_fac = router_opts.astar_fac; cost_params.astar_offset = router_opts.astar_offset; + cost_params.post_target_prune_fac = router_opts.post_target_prune_fac; + cost_params.post_target_prune_offset = router_opts.post_target_prune_offset; cost_params.bend_cost = router_opts.bend_cost; cost_params.pres_fac = pres_fac; cost_params.delay_budget = ((budgeting_inf.if_set()) ? &conn_delay_budget : nullptr); @@ -285,8 +287,8 @@ inline NetResultFlags route_net(ConnectionRouter& router, /** Route to a "virtual sink" in the netlist which corresponds to the start point * of the global clock network. */ -template -inline NetResultFlags pre_route_to_clock_root(ConnectionRouter& router, +template +inline NetResultFlags pre_route_to_clock_root(ConnectionRouterType& router, ParentNetId net_id, const Netlist<>& net_list, RRNodeId sink_node, @@ -382,7 +384,7 @@ inline NetResultFlags pre_route_to_clock_root(ConnectionRouter& router, * In the process, update global pathfinder costs, rr_node_route_inf and extend the global RouteTree * for this net. * - * @param router The ConnectionRouter instance + * @param router The ConnectionRouterType instance * @param net_list Input netlist * @param net_id * @param itarget # of this connection in the net (only used for debug output) @@ -399,8 +401,8 @@ inline NetResultFlags pre_route_to_clock_root(ConnectionRouter& router, * @param is_flat * @param net_bb Bounding box for the net (Routing resources outside net_bb will not be used) * @return NetResultFlags for this sink to be bubbled up through route_net */ -template -inline NetResultFlags route_sink(ConnectionRouter& router, +template +inline NetResultFlags route_sink(ConnectionRouterType& router, const Netlist<>& net_list, ParentNetId net_id, unsigned itarget, diff --git a/vpr/src/route/router_delay_profiling.cpp b/vpr/src/route/router_delay_profiling.cpp index 68fb441a369..257b35d20f6 100644 --- a/vpr/src/route/router_delay_profiling.cpp +++ b/vpr/src/route/router_delay_profiling.cpp @@ -88,6 +88,8 @@ bool RouterDelayProfiler::calculate_delay(RRNodeId source_node, cost_params.criticality = 1.; cost_params.astar_fac = router_opts.router_profiler_astar_fac; cost_params.astar_offset = router_opts.astar_offset; + cost_params.post_target_prune_fac = router_opts.post_target_prune_fac; + cost_params.post_target_prune_offset = router_opts.post_target_prune_offset; cost_params.bend_cost = router_opts.bend_cost; route_budgets budgeting_inf(net_list_, is_flat_); @@ -163,6 +165,8 @@ vtr::vector calculate_all_path_delays_from_rr_node(RRNodeId src cost_params.criticality = 1.; cost_params.astar_fac = router_opts.astar_fac; cost_params.astar_offset = router_opts.astar_offset; + cost_params.post_target_prune_fac = router_opts.post_target_prune_fac; + cost_params.post_target_prune_offset = router_opts.post_target_prune_offset; cost_params.bend_cost = router_opts.bend_cost; /* This function is called during placement. Thus, the flat routing option should be disabled. */ //TODO: Placement is run with is_flat=false. However, since is_flat is passed, det_routing_arch should @@ -174,7 +178,7 @@ vtr::vector calculate_all_path_delays_from_rr_node(RRNodeId src /*segment_inf=*/{}, is_flat); - ConnectionRouter router( + SerialConnectionRouter router( device_ctx.grid, *router_lookahead, device_ctx.rr_graph.rr_nodes(), diff --git a/vpr/src/route/router_delay_profiling.h b/vpr/src/route/router_delay_profiling.h index ca855720d85..f137e143df9 100644 --- a/vpr/src/route/router_delay_profiling.h +++ b/vpr/src/route/router_delay_profiling.h @@ -2,7 +2,7 @@ #define ROUTER_DELAY_PROFILING_H_ #include "vpr_types.h" -#include "connection_router.h" +#include "serial_connection_router.h" #include @@ -43,7 +43,7 @@ class RouterDelayProfiler { private: const Netlist<>& net_list_; RouterStats router_stats_; - ConnectionRouter router_; + SerialConnectionRouter router_; vtr::NdMatrix min_delays_; // [physical_type_idx][from_layer][to_layer][dx][dy] bool is_flat_; }; diff --git a/vpr/src/route/serial_connection_router.cpp b/vpr/src/route/serial_connection_router.cpp new file mode 100644 index 00000000000..f5c3a1762e5 --- /dev/null +++ b/vpr/src/route/serial_connection_router.cpp @@ -0,0 +1,533 @@ +#include "serial_connection_router.h" + +#include +#include "rr_graph.h" +#include "rr_graph_fwd.h" + +/** Used to update router statistics for serial connection router */ +inline void update_serial_router_stats(RouterStats* router_stats, + bool is_push, + RRNodeId rr_node_id, + const RRGraphView* rr_graph); + +template +void SerialConnectionRouter::timing_driven_find_single_shortest_path_from_heap(RRNodeId sink_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + const t_bb& target_bb) { + const auto& device_ctx = g_vpr_ctx.device(); + auto& route_ctx = g_vpr_ctx.mutable_routing(); + + HeapNode cheapest; + while (this->heap_.try_pop(cheapest)) { + // Pop a new inode with the cheapest total cost in current route tree to be expanded on + const auto& [new_total_cost, inode] = cheapest; + update_serial_router_stats(this->router_stats_, + /*is_push=*/false, + inode, + this->rr_graph_); + + VTR_LOGV_DEBUG(this->router_debug_, " Popping node %d (cost: %g)\n", + inode, new_total_cost); + + // Have we found the target? + if (inode == sink_node) { + // If we're running RCV, the path will be stored in the path_data->path_rr vector + // This is then placed into the traceback so that the correct path is returned + // TODO: This can be eliminated by modifying the actual traceback function in route_timing + if (this->rcv_path_manager.is_enabled()) { + this->rcv_path_manager.insert_backwards_path_into_traceback(this->rcv_path_data[inode], + this->rr_node_route_inf_[inode].path_cost, + this->rr_node_route_inf_[inode].backward_path_cost, + route_ctx); + } + VTR_LOGV_DEBUG(this->router_debug_, " Found target %8d (%s)\n", inode, describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, inode, this->is_flat_).c_str()); + break; + } + + // If not, keep searching + timing_driven_expand_cheapest(inode, + new_total_cost, + sink_node, + cost_params, + bounding_box, + target_bb); + } +} + +template +vtr::vector SerialConnectionRouter::timing_driven_find_all_shortest_paths_from_route_tree( + const RouteTreeNode& rt_root, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + RouterStats& router_stats, + const ConnectionParameters& conn_params) { + this->router_stats_ = &router_stats; + this->conn_params_ = &conn_params; + + // Add the route tree to the heap with no specific target node + RRNodeId target_node = RRNodeId::INVALID(); + this->add_route_tree_to_heap(rt_root, target_node, cost_params, bounding_box); + this->heap_.build_heap(); // via sifting down everything + + auto res = timing_driven_find_all_shortest_paths_from_heap(cost_params, bounding_box); + this->heap_.empty_heap(); + + return res; +} + +template +vtr::vector SerialConnectionRouter::timing_driven_find_all_shortest_paths_from_heap( + const t_conn_cost_params& cost_params, + const t_bb& bounding_box) { + // Since there is no single *target* node this uses Dijkstra's algorithm + // with a modified exit condition (runs until heap is empty). + + vtr::vector cheapest_paths(this->rr_nodes_.size()); + + VTR_ASSERT_SAFE(this->heap_.is_valid()); + + if (this->heap_.is_empty_heap()) { // No source + VTR_LOGV_DEBUG(this->router_debug_, " Initial heap empty (no source)\n"); + } + + // Start measuring path search time + std::chrono::steady_clock::time_point begin_time = std::chrono::steady_clock::now(); + + HeapNode cheapest; + while (this->heap_.try_pop(cheapest)) { + // Pop a new inode with the cheapest total cost in current route tree to be expanded on + const auto& [new_total_cost, inode] = cheapest; + update_serial_router_stats(this->router_stats_, + /*is_push=*/false, + inode, + this->rr_graph_); + + VTR_LOGV_DEBUG(this->router_debug_, " Popping node %d (cost: %g)\n", + inode, new_total_cost); + + // Since we want to find shortest paths to all nodes in the graph + // we do not specify a target node. + // + // By setting the target_node to INVALID in combination with the NoOp router + // lookahead we can re-use the node exploration code from the regular router + RRNodeId target_node = RRNodeId::INVALID(); + + timing_driven_expand_cheapest(inode, + new_total_cost, + target_node, + cost_params, + bounding_box, + t_bb()); + + if (cheapest_paths[inode].index == RRNodeId::INVALID() || cheapest_paths[inode].total_cost >= new_total_cost) { + VTR_LOGV_DEBUG(this->router_debug_, " Better cost to node %d: %g (was %g)\n", inode, new_total_cost, cheapest_paths[inode].total_cost); + // Only the `index` and `prev_edge` fields of `cheapest_paths[inode]` are used after this function returns + cheapest_paths[inode].index = inode; + cheapest_paths[inode].prev_edge = this->rr_node_route_inf_[inode].prev_edge; + } else { + VTR_LOGV_DEBUG(this->router_debug_, " Worse cost to node %d: %g (better %g)\n", inode, new_total_cost, cheapest_paths[inode].total_cost); + } + } + + // Stop measuring path search time + std::chrono::steady_clock::time_point end_time = std::chrono::steady_clock::now(); + this->path_search_cumulative_time += std::chrono::duration_cast(end_time - begin_time); + + return cheapest_paths; +} + +template +void SerialConnectionRouter::timing_driven_expand_cheapest(RRNodeId from_node, + float new_total_cost, + RRNodeId target_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + const t_bb& target_bb) { + float best_total_cost = this->rr_node_route_inf_[from_node].path_cost; + if (best_total_cost == new_total_cost) { + // Explore from this node, since its total cost is exactly the same as + // the best total cost ever seen for this node. Otherwise, prune this node + // to reduce redundant work (i.e., unnecessary neighbor exploration). + // `new_total_cost` is used here as an identifier to detect if the pair + // (from_node or inode, new_total_cost) was the most recently pushed + // element for the corresponding node. + // + // Note: For RCV, it often isn't searching for a shortest path; it is + // searching for a path in the target delay range. So it might find a + // path to node n that has a higher `backward_path_cost` but the `total_cost` + // (including expected delay to sink, going through a cost function that + // checks that against the target delay) might be lower than the previously + // stored value. In that case we want to re-expand the node so long as + // it doesn't create a loop. That `this->rcv_path_manager` should store enough + // info for us to avoid loops. + RTExploredNode current; + current.index = from_node; + current.backward_path_cost = this->rr_node_route_inf_[from_node].backward_path_cost; + current.prev_edge = this->rr_node_route_inf_[from_node].prev_edge; + current.R_upstream = this->rr_node_route_inf_[from_node].R_upstream; + + VTR_LOGV_DEBUG(this->router_debug_, " Better cost to %d\n", from_node); + VTR_LOGV_DEBUG(this->router_debug_, " New total cost: %g\n", new_total_cost); + VTR_LOGV_DEBUG(this->router_debug_ && (current.prev_edge != RREdgeId::INVALID()), + " Setting path costs for associated node %d (from %d edge %zu)\n", + from_node, + static_cast(this->rr_graph_->edge_src_node(current.prev_edge)), + static_cast(current.prev_edge)); + + timing_driven_expand_neighbours(current, cost_params, bounding_box, target_node, target_bb); + } else { + // Post-heap prune, do not re-explore from the current/new partial path as it + // has worse cost than the best partial path to this node found so far + VTR_LOGV_DEBUG(this->router_debug_, " Worse cost to %d\n", from_node); + VTR_LOGV_DEBUG(this->router_debug_, " Old total cost: %g\n", best_total_cost); + VTR_LOGV_DEBUG(this->router_debug_, " New total cost: %g\n", new_total_cost); + } +} + +template +void SerialConnectionRouter::timing_driven_expand_neighbours(const RTExploredNode& current, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + RRNodeId target_node, + const t_bb& target_bb) { + /* Puts all the rr_nodes adjacent to current on the heap. */ + + // For each node associated with the current heap element, expand all of it's neighbors + auto edges = this->rr_nodes_.edge_range(current.index); + + // This is a simple prefetch that prefetches: + // - RR node data reachable from this node + // - rr switch data to reach those nodes from this node. + // + // This code will be a NOP on compiler targets that do not have a + // builtin to emit prefetch instructions. + // + // This code will be a NOP on CPU targets that lack prefetch instructions. + // All modern x86 and ARM64 platforms provide prefetch instructions. + // + // This code delivers ~6-8% reduction in wallclock time when running Titan + // benchmarks, and was specifically measured against the gsm_switch and + // directrf vtr_reg_weekly running in high effort. + // + // - directrf_stratixiv_arch_timing.blif + // - gsm_switch_stratixiv_arch_timing.blif + // + for (RREdgeId from_edge : edges) { + RRNodeId to_node = this->rr_nodes_.edge_sink_node(from_edge); + this->rr_nodes_.prefetch_node(to_node); + + int switch_idx = this->rr_nodes_.edge_switch(from_edge); + VTR_PREFETCH(&this->rr_switch_inf_[switch_idx], 0, 0); + } + + for (RREdgeId from_edge : edges) { + RRNodeId to_node = this->rr_nodes_.edge_sink_node(from_edge); + timing_driven_expand_neighbour(current, + from_edge, + to_node, + cost_params, + bounding_box, + target_node, + target_bb); + } +} + +template +void SerialConnectionRouter::timing_driven_expand_neighbour(const RTExploredNode& current, + RREdgeId from_edge, + RRNodeId to_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + RRNodeId target_node, + const t_bb& target_bb) { + VTR_ASSERT(bounding_box.layer_max < g_vpr_ctx.device().grid.get_num_layers()); + + const RRNodeId& from_node = current.index; + + // BB-pruning + // Disable BB-pruning if RCV is enabled, as this can make it harder for circuits with high negative hold slack to resolve this + // TODO: Only disable pruning if the net has negative hold slack, maybe go off budgets + if (!inside_bb(to_node, bounding_box) + && !this->rcv_path_manager.is_enabled()) { + VTR_LOGV_DEBUG(this->router_debug_, + " Pruned expansion of node %d edge %zu -> %d" + " (to node location %d,%d,%d x %d,%d,%d outside of expanded" + " net bounding box %d,%d,%d x %d,%d,%d)\n", + from_node, size_t(from_edge), size_t(to_node), + this->rr_graph_->node_xlow(to_node), this->rr_graph_->node_ylow(to_node), this->rr_graph_->node_layer(to_node), + this->rr_graph_->node_xhigh(to_node), this->rr_graph_->node_yhigh(to_node), this->rr_graph_->node_layer(to_node), + bounding_box.xmin, bounding_box.ymin, bounding_box.layer_min, + bounding_box.xmax, bounding_box.ymax, bounding_box.layer_max); + return; /* Node is outside (expanded) bounding box. */ + } + + /* Prune away IPINs that lead to blocks other than the target one. Avoids * + * the issue of how to cost them properly so they don't get expanded before * + * more promising routes, but makes route-through (via CLBs) impossible. * + * Change this if you want to investigate route-throughs. */ + if (target_node != RRNodeId::INVALID()) { + t_rr_type to_type = this->rr_graph_->node_type(to_node); + if (to_type == IPIN) { + // Check if this IPIN leads to the target block + // IPIN's of the target block should be contained within it's bounding box + int to_xlow = this->rr_graph_->node_xlow(to_node); + int to_ylow = this->rr_graph_->node_ylow(to_node); + int to_layer = this->rr_graph_->node_layer(to_node); + int to_xhigh = this->rr_graph_->node_xhigh(to_node); + int to_yhigh = this->rr_graph_->node_yhigh(to_node); + if (to_xlow < target_bb.xmin + || to_ylow < target_bb.ymin + || to_xhigh > target_bb.xmax + || to_yhigh > target_bb.ymax + || to_layer < target_bb.layer_min + || to_layer > target_bb.layer_max) { + VTR_LOGV_DEBUG(this->router_debug_, + " Pruned expansion of node %d edge %zu -> %d" + " (to node is IPIN at %d,%d,%d x %d,%d,%d which does not" + " lead to target block %d,%d,%d x %d,%d,%d)\n", + from_node, size_t(from_edge), size_t(to_node), + to_xlow, to_ylow, to_layer, + to_xhigh, to_yhigh, to_layer, + target_bb.xmin, target_bb.ymin, target_bb.layer_min, + target_bb.xmax, target_bb.ymax, target_bb.layer_max); + return; + } + } + } + + VTR_LOGV_DEBUG(this->router_debug_, " Expanding node %d edge %zu -> %d\n", + from_node, size_t(from_edge), size_t(to_node)); + + // Check if the node exists in the route tree when RCV is enabled + // Other pruning methods have been disabled when RCV is on, so this method is required to prevent "loops" from being created + bool node_exists = false; + if (this->rcv_path_manager.is_enabled()) { + node_exists = this->rcv_path_manager.node_exists_in_tree(this->rcv_path_data[from_node], + to_node); + } + + if (!node_exists || !this->rcv_path_manager.is_enabled()) { + timing_driven_add_to_heap(cost_params, + current, + to_node, + from_edge, + target_node); + } +} + +template +void SerialConnectionRouter::timing_driven_add_to_heap(const t_conn_cost_params& cost_params, + const RTExploredNode& current, + RRNodeId to_node, + const RREdgeId from_edge, + RRNodeId target_node) { + const auto& device_ctx = g_vpr_ctx.device(); + const RRNodeId& from_node = current.index; + + // Initialize the neighbor RTExploredNode + RTExploredNode next; + next.R_upstream = current.R_upstream; + next.index = to_node; + next.prev_edge = from_edge; + next.total_cost = std::numeric_limits::infinity(); // Not used directly + next.backward_path_cost = current.backward_path_cost; + + // Initialize RCV data struct if needed, otherwise it's set to nullptr + this->rcv_path_manager.alloc_path_struct(next.path_data); + // path_data variables are initialized to current values + if (this->rcv_path_manager.is_enabled() && this->rcv_path_data[from_node]) { + next.path_data->backward_cong = this->rcv_path_data[from_node]->backward_cong; + next.path_data->backward_delay = this->rcv_path_data[from_node]->backward_delay; + } + + this->evaluate_timing_driven_node_costs(&next, cost_params, from_node, target_node); + + float best_total_cost = this->rr_node_route_inf_[to_node].path_cost; + float best_back_cost = this->rr_node_route_inf_[to_node].backward_path_cost; + + float new_total_cost = next.total_cost; + float new_back_cost = next.backward_path_cost; + + // We need to only expand this node if it is a better path. And we need to + // update its `rr_node_route_inf` data as we put it into the heap; there may + // be other (previously explored) paths to this node in the heap already, + // but they will be pruned when we pop those heap nodes later as we'll see + // they have inferior costs to what is in the `rr_node_route_inf` data for + // this node. More details can be found from the FPT'24 parallel connection + // router paper. + // + // When RCV is enabled, prune based on the RCV-specific total path cost (see + // in `compute_node_cost_using_rcv` in `evaluate_timing_driven_node_costs`) + // to allow detours to get better QoR. + if ((!this->rcv_path_manager.is_enabled() && best_back_cost > new_back_cost) || (this->rcv_path_manager.is_enabled() && best_total_cost > new_total_cost)) { + VTR_LOGV_DEBUG(this->router_debug_, " Expanding to node %d (%s)\n", to_node, + describe_rr_node(device_ctx.rr_graph, + device_ctx.grid, + device_ctx.rr_indexed_data, + to_node, + this->is_flat_) + .c_str()); + VTR_LOGV_DEBUG(this->router_debug_, " New Total Cost %g New back Cost %g\n", new_total_cost, new_back_cost); + //Add node to the heap only if the cost via the current partial path is less than the + //best known cost, since there is no reason for the router to expand more expensive paths. + // + //Pre-heap prune to keep the heap small, by not putting paths which are known to be + //sub-optimal (at this point in time) into the heap. + + update_cheapest(next, from_node); + + this->heap_.add_to_heap({new_total_cost, to_node}); + update_serial_router_stats(this->router_stats_, + /*is_push=*/true, + to_node, + this->rr_graph_); + + } else { + VTR_LOGV_DEBUG(this->router_debug_, " Didn't expand to %d (%s)\n", to_node, describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, to_node, this->is_flat_).c_str()); + VTR_LOGV_DEBUG(this->router_debug_, " Prev Total Cost %g Prev back Cost %g \n", best_total_cost, best_back_cost); + VTR_LOGV_DEBUG(this->router_debug_, " New Total Cost %g New back Cost %g \n", new_total_cost, new_back_cost); + } + + if (this->rcv_path_manager.is_enabled() && next.path_data != nullptr) { + this->rcv_path_manager.free_path_struct(next.path_data); + } +} + +template +void SerialConnectionRouter::add_route_tree_node_to_heap( + const RouteTreeNode& rt_node, + RRNodeId target_node, + const t_conn_cost_params& cost_params, + const t_bb& net_bb) { + const auto& device_ctx = g_vpr_ctx.device(); + const RRNodeId inode = rt_node.inode; + float backward_path_cost = cost_params.criticality * rt_node.Tdel; + float R_upstream = rt_node.R_upstream; + + /* Don't push to heap if not in bounding box: no-op for serial router, important for parallel router */ + if (!inside_bb(rt_node.inode, net_bb)) + return; + + // After budgets are loaded, calculate delay cost as described by RCV paper + /* R. Fung, V. Betz and W. Chow, "Slack Allocation and Routing to Improve FPGA Timing While + * Repairing Short-Path Violations," in IEEE Transactions on Computer-Aided Design of + * Integrated Circuits and Systems, vol. 27, no. 4, pp. 686-697, April 2008.*/ + // float expected_cost = router_lookahead_.get_expected_cost(inode, target_node, cost_params, R_upstream); + + if (!this->rcv_path_manager.is_enabled()) { + float expected_cost = this->router_lookahead_.get_expected_cost(inode, target_node, cost_params, R_upstream); + float tot_cost = backward_path_cost + cost_params.astar_fac * std::max(0.f, expected_cost - cost_params.astar_offset); + VTR_LOGV_DEBUG(this->router_debug_, " Adding node %8d to heap from init route tree with cost %g (%s)\n", + inode, + tot_cost, + describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, inode, this->is_flat_).c_str()); + + if (tot_cost > this->rr_node_route_inf_[inode].path_cost) { + return; + } + add_to_mod_list(inode); + this->rr_node_route_inf_[inode].path_cost = tot_cost; + this->rr_node_route_inf_[inode].prev_edge = RREdgeId::INVALID(); + this->rr_node_route_inf_[inode].backward_path_cost = backward_path_cost; + this->rr_node_route_inf_[inode].R_upstream = R_upstream; + this->heap_.push_back({tot_cost, inode}); + } else { + float expected_total_cost = this->compute_node_cost_using_rcv(cost_params, inode, target_node, rt_node.Tdel, 0, R_upstream); + + add_to_mod_list(inode); + this->rr_node_route_inf_[inode].path_cost = expected_total_cost; + this->rr_node_route_inf_[inode].prev_edge = RREdgeId::INVALID(); + this->rr_node_route_inf_[inode].backward_path_cost = backward_path_cost; + this->rr_node_route_inf_[inode].R_upstream = R_upstream; + + this->rcv_path_manager.alloc_path_struct(this->rcv_path_data[inode]); + this->rcv_path_data[inode]->backward_delay = rt_node.Tdel; + + this->heap_.push_back({expected_total_cost, inode}); + } + + update_serial_router_stats(this->router_stats_, + /*is_push=*/true, + inode, + this->rr_graph_); + + if constexpr (VTR_ENABLE_DEBUG_LOGGING_CONST_EXPR) { + this->router_stats_->rt_node_pushes[this->rr_graph_->node_type(inode)]++; + } +} + +std::unique_ptr make_serial_connection_router(e_heap_type heap_type, + const DeviceGrid& grid, + const RouterLookahead& router_lookahead, + const t_rr_graph_storage& rr_nodes, + const RRGraphView* rr_graph, + const std::vector& rr_rc_data, + const vtr::vector& rr_switch_inf, + vtr::vector& rr_node_route_inf, + bool is_flat) { + switch (heap_type) { + case e_heap_type::BINARY_HEAP: + return std::make_unique>( + grid, + router_lookahead, + rr_nodes, + rr_graph, + rr_rc_data, + rr_switch_inf, + rr_node_route_inf, + is_flat); + case e_heap_type::FOUR_ARY_HEAP: + return std::make_unique>( + grid, + router_lookahead, + rr_nodes, + rr_graph, + rr_rc_data, + rr_switch_inf, + rr_node_route_inf, + is_flat); + default: + VPR_FATAL_ERROR(VPR_ERROR_ROUTE, "Unknown heap_type %d", + heap_type); + } +} + +/** This function is only used for the serial connection router since some + * statistic variables in router_stats are not thread-safe for the parallel + * connection router. To update router_stats (more precisely heap_pushes/pops) + * for parallel connection router, we use the MultiQueue internal statistics + * method instead. */ +inline void update_serial_router_stats(RouterStats* router_stats, + bool is_push, + RRNodeId rr_node_id, + const RRGraphView* rr_graph) { + if (is_push) { + router_stats->heap_pushes++; + } else { + router_stats->heap_pops++; + } + + if constexpr (VTR_ENABLE_DEBUG_LOGGING_CONST_EXPR) { + auto node_type = rr_graph->node_type(rr_node_id); + VTR_ASSERT(node_type != NUM_RR_TYPES); + + if (is_inter_cluster_node(*rr_graph, rr_node_id)) { + if (is_push) { + router_stats->inter_cluster_node_pushes++; + router_stats->inter_cluster_node_type_cnt_pushes[node_type]++; + } else { + router_stats->inter_cluster_node_pops++; + router_stats->inter_cluster_node_type_cnt_pops[node_type]++; + } + } else { + if (is_push) { + router_stats->intra_cluster_node_pushes++; + router_stats->intra_cluster_node_type_cnt_pushes[node_type]++; + } else { + router_stats->intra_cluster_node_pops++; + router_stats->intra_cluster_node_type_cnt_pops[node_type]++; + } + } + } +} diff --git a/vpr/src/route/serial_connection_router.h b/vpr/src/route/serial_connection_router.h new file mode 100644 index 00000000000..2cd23f1460e --- /dev/null +++ b/vpr/src/route/serial_connection_router.h @@ -0,0 +1,255 @@ +#ifndef _SERIAL_CONNECTION_ROUTER_H +#define _SERIAL_CONNECTION_ROUTER_H + +#include "connection_router.h" + +#include "d_ary_heap.h" + +/** + * @class SerialConnectionRouter implements the AIR's serial timing-driven connection router + * @details This class routes from some initial set of sources (via the input rt tree) to a + * particular sink using single thread. + */ +template +class SerialConnectionRouter : public ConnectionRouter { + public: + SerialConnectionRouter( + const DeviceGrid& grid, + const RouterLookahead& router_lookahead, + const t_rr_graph_storage& rr_nodes, + const RRGraphView* rr_graph, + const std::vector& rr_rc_data, + const vtr::vector& rr_switch_inf, + vtr::vector& rr_node_route_inf, + bool is_flat) + : ConnectionRouter(grid, router_lookahead, rr_nodes, rr_graph, rr_rc_data, rr_switch_inf, rr_node_route_inf, is_flat) { + } + + ~SerialConnectionRouter() { + VTR_LOG("Serial Connection Router is being destroyed. Time spent on path search: %.3f seconds.\n", + std::chrono::duration(this->path_search_cumulative_time).count()); + } + + /** + * @brief Clears the modified list per thread + * @note Should be called after reset_path_costs have been called + */ + void clear_modified_rr_node_info() final { + this->modified_rr_node_inf_.clear(); + } + + /** + * @brief Resets modified data in rr_node_route_inf based on modified_rr_node_inf + */ + void reset_path_costs() final { + // Reset the node info stored in rr_node_route_inf variable + ::reset_path_costs(this->modified_rr_node_inf_); + // Reset the node (RCV-related) info stored inside the connection router + if (this->rcv_path_manager.is_enabled()) { + for (const auto& node : this->modified_rr_node_inf_) { + this->rcv_path_data[node] = nullptr; + } + } + } + + /** + * @brief Enables or disables RCV in connection router + * @note Enabling this will utilize extra path structures, as well as + * the RCV cost function. Ensure route budgets have been calculated + * before enabling this. + * @param enable Whether enabling RCV or not + */ + void set_rcv_enabled(bool enable) final { + this->rcv_path_manager.set_enabled(enable); + if (enable) { + this->rcv_path_data.resize(this->rr_node_route_inf_.size()); + } + } + + /** + * @brief Finds shortest paths from the route tree rooted at rt_root to all sinks available + * @note Unlike timing_driven_route_connection_from_route_tree(), only part of the route tree which + * is spatially close to the sink is added to the heap. + * @note If cost_params.astar_fac is set to 0, this effectively becomes Dijkstra's algorithm with a + * modified exit condition (runs until heap is empty). When using cost_params.astar_fac = 0, for + * efficiency the RouterLookahead used should be the NoOpLookahead. + * @note This routine is currently used only to generate information that may be helpful in debugging + * an architecture. + * @param rt_root RouteTreeNode describing the current routing state + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + * @param router_stats Update router statistics + * @param conn_params Parameters to guide the routing of the given connection + * @return A vector where each element is a reachable sink + */ + vtr::vector timing_driven_find_all_shortest_paths_from_route_tree( + const RouteTreeNode& rt_root, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + RouterStats& router_stats, + const ConnectionParameters& conn_params) final; + + protected: + /** + * @brief Marks that data associated with rr_node 'inode' has + * been modified, and needs to be reset in reset_path_costs + */ + inline void add_to_mod_list(RRNodeId inode) { + if (std::isinf(this->rr_node_route_inf_[inode].path_cost)) { + this->modified_rr_node_inf_.push_back(inode); + } + } + + /** + * @brief Updates the route path to the node `cheapest.index` + * via the path from `from_node` via `cheapest.prev_edge` + */ + inline void update_cheapest(RTExploredNode& cheapest, const RRNodeId& from_node) { + const RRNodeId& inode = cheapest.index; + add_to_mod_list(inode); + this->rr_node_route_inf_[inode].prev_edge = cheapest.prev_edge; + this->rr_node_route_inf_[inode].path_cost = cheapest.total_cost; + this->rr_node_route_inf_[inode].backward_path_cost = cheapest.backward_path_cost; + + // Use the already created next path structure pointer when RCV is enabled + if (this->rcv_path_manager.is_enabled()) { + this->rcv_path_manager.move(this->rcv_path_data[inode], cheapest.path_data); + + this->rcv_path_data[inode]->path_rr = this->rcv_path_data[from_node]->path_rr; + this->rcv_path_data[inode]->edge = this->rcv_path_data[from_node]->edge; + this->rcv_path_data[inode]->path_rr.push_back(from_node); + this->rcv_path_data[inode]->edge.push_back(cheapest.prev_edge); + } + } + + /** + * @brief Finds the single shortest path from current heap to the sink node in the RR graph + * @param sink_node Sink node ID to route to + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + * @param target_bb Prune IPINs that lead to blocks other than the target block + */ + void timing_driven_find_single_shortest_path_from_heap(RRNodeId sink_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + const t_bb& target_bb) final; + + /** + * @brief Expands this current node if it is a cheaper path + * @param from_node Current node ID being explored + * @param new_total_cost Identifier popped from the heap to detect if the element (pair) + * (from_node, new_total_cost) was the most recently pushed element for from_node + * @param target_node Target node ID to route to + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + * @param target_bb Prune IPINs that lead to blocks other than the target block + */ + void timing_driven_expand_cheapest( + RRNodeId from_node, + float new_total_cost, + RRNodeId target_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + const t_bb& target_bb); + + /** + * @brief Expands each neighbor of the current node in the wave expansion + * @param current Current node being explored + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + * @param target_node Target node ID to route to + * @param target_bb Prune IPINs that lead to blocks other than the target block + */ + void timing_driven_expand_neighbours( + const RTExploredNode& current, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + RRNodeId target_node, + const t_bb& target_bb); + + /** + * @brief Conditionally adds to_node to the router heap (via path from current.index via from_edge) + * @note RR nodes outside bounding box specified in bounding_box are not added to the heap. + * @param current Current node being explored + * @param from_edge Edge between the current node and the neighbor node + * @param to_node Neighbor node to be expanded + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + * @param target_node Target node ID to route to + * @param target_bb Prune IPINs that lead to blocks other than the target block + */ + void timing_driven_expand_neighbour( + const RTExploredNode& current, + RREdgeId from_edge, + RRNodeId to_node, + const t_conn_cost_params& cost_params, + const t_bb& bounding_box, + RRNodeId target_node, + const t_bb& target_bb); + + /** + * @brief Adds to_node to the heap, and also adds any nodes which are connected by non-configurable edges + * @param cost_params Cost function parameters + * @param current Current node being explored + * @param to_node Neighbor node to be expanded + * @param from_edge Edge between the current node and the neighbor node + * @param target_node Target node ID to route to + */ + void timing_driven_add_to_heap( + const t_conn_cost_params& cost_params, + const RTExploredNode& current, + RRNodeId to_node, + RREdgeId from_edge, + RRNodeId target_node); + + /** + * @brief Unconditionally adds rt_node to the heap + * @note If you want to respect rt_node->re_expand that is the caller's responsibility. + * @todo Consider moving this function into the ConnectionRouter class after checking + * the different prune functions of the serial and parallel connection routers. + * @param rt_node RouteTreeNode to be added to the heap + * @param target_node Target node ID to route to + * @param cost_params Cost function parameters + * @param net_bb Do not push to heap if not in bounding box + */ + void add_route_tree_node_to_heap( + const RouteTreeNode& rt_node, + RRNodeId target_node, + const t_conn_cost_params& cost_params, + const t_bb& net_bb) final; + + /** + * @brief Finds shortest paths from current heap to all nodes in the RR graph + * + * Since there is no single *target* node this uses Dijkstra's algorithm with + * a modified exit condition (runs until heap is empty). + * + * @param cost_params Cost function parameters + * @param bounding_box Keep search confined to this bounding box + * @return A vector where each element contains the shortest route to a specific sink node + */ + vtr::vector timing_driven_find_all_shortest_paths_from_heap( + const t_conn_cost_params& cost_params, + const t_bb& bounding_box) final; + + /** Node IDs of modified nodes in rr_node_route_inf */ + std::vector modified_rr_node_inf_; +}; + +/** Construct a serial connection router that uses the specified heap type. + * This function is not used, but removing it will result in "undefined reference" + * errors since heap type specializations won't get emitted from serial_connection_router.cpp + * without it. + * The alternative is moving all SerialConnectionRouter fn implementations into the header. */ +std::unique_ptr make_serial_connection_router( + e_heap_type heap_type, + const DeviceGrid& grid, + const RouterLookahead& router_lookahead, + const t_rr_graph_storage& rr_nodes, + const RRGraphView* rr_graph, + const std::vector& rr_rc_data, + const vtr::vector& rr_switch_inf, + vtr::vector& rr_node_route_inf, + bool is_flat); + +#endif /* _SERIAL_CONNECTION_ROUTER_H */ diff --git a/vpr/test/test_connection_router.cpp b/vpr/test/test_connection_router.cpp deleted file mode 100644 index 138e003b04e..00000000000 --- a/vpr/test/test_connection_router.cpp +++ /dev/null @@ -1,194 +0,0 @@ -#include -#include "catch2/catch_test_macros.hpp" - -#include "route_net.h" -#include "rr_graph_fwd.h" -#include "vpr_api.h" -#include "vpr_signal_handler.h" -#include "globals.h" -#include "net_delay.h" -#include "place_and_route.h" -#include "connection_router.h" -#include "router_delay_profiling.h" - -static constexpr const char kArchFile[] = "../../vtr_flow/arch/timing/k6_frac_N10_mem32K_40nm.xml"; -static constexpr int kMaxHops = 10; - -namespace { - -// Route from source_node to sink_node, returning either the delay, or infinity if unroutable. -static float do_one_route(RRNodeId source_node, - RRNodeId sink_node, - const t_det_routing_arch& det_routing_arch, - const t_router_opts& router_opts, - const std::vector& segment_inf) { - bool is_flat = router_opts.flat_routing; - auto& device_ctx = g_vpr_ctx.device(); - - RouteTree tree((RRNodeId(source_node))); - - // Update base costs according to fanout and criticality rules. - update_rr_base_costs(1); - - // Bounding box includes the entire grid. - t_bb bounding_box; - bounding_box.xmin = 0; - bounding_box.xmax = device_ctx.grid.width() + 1; - bounding_box.ymin = 0; - bounding_box.ymax = device_ctx.grid.height() + 1; - bounding_box.layer_min = 0; - bounding_box.layer_max = device_ctx.grid.get_num_layers() - 1; - - t_conn_cost_params cost_params; - cost_params.criticality = router_opts.max_criticality; - cost_params.astar_fac = router_opts.astar_fac; - cost_params.astar_offset = router_opts.astar_offset; - cost_params.bend_cost = router_opts.bend_cost; - - const Netlist<>& net_list = is_flat ? (const Netlist<>&)g_vpr_ctx.atom().netlist() : (const Netlist<>&)g_vpr_ctx.clustering().clb_nlist; - route_budgets budgeting_inf(net_list, is_flat); - - RouterStats router_stats; - auto router_lookahead = make_router_lookahead(det_routing_arch, - router_opts.lookahead_type, - router_opts.write_router_lookahead, - router_opts.read_router_lookahead, - segment_inf, - is_flat); - - ConnectionRouter router( - device_ctx.grid, - *router_lookahead, - device_ctx.rr_graph.rr_nodes(), - &device_ctx.rr_graph, - device_ctx.rr_rc_data, - device_ctx.rr_graph.rr_switch(), - g_vpr_ctx.mutable_routing().rr_node_route_inf, - is_flat); - - // Find the cheapest route if possible. - bool found_path; - RTExploredNode cheapest; - ConnectionParameters conn_params(ParentNetId::INVALID(), - -1, - false, - std::unordered_map()); - std::tie(found_path, std::ignore, cheapest) = router.timing_driven_route_connection_from_route_tree(tree.root(), - sink_node, - cost_params, - bounding_box, - router_stats, - conn_params); - - // Default delay is infinity, which indicates that a route was not found. - float delay = std::numeric_limits::infinity(); - if (found_path) { - // Check that the route goes to the requested sink. - REQUIRE(RRNodeId(cheapest.index) == sink_node); - - // Get the delay - vtr::optional rt_node_of_sink; - std::tie(std::ignore, rt_node_of_sink) = tree.update_from_heap(&cheapest, OPEN, nullptr, router_opts.flat_routing); - delay = rt_node_of_sink.value().Tdel; - } - - // Reset for the next router call. - router.reset_path_costs(); - return delay; -} - -// Find a source and a sink by walking edges. -std::tuple find_source_and_sink() { - auto& device_ctx = g_vpr_ctx.device(); - auto& rr_graph = device_ctx.rr_graph; - - // Current longest walk - std::tuple longest = std::make_tuple(RRNodeId::INVALID(), RRNodeId::INVALID(), 0); - - // Start from each RR node - for (size_t id = 0; id < rr_graph.num_nodes(); id++) { - RRNodeId source(id), sink = source; - for (int hops = 0; hops < kMaxHops; hops++) { - // Take the first edge, if there is one. - auto edge = rr_graph.node_first_edge(sink); - if (edge == rr_graph.node_last_edge(sink)) { - break; - } - sink = rr_graph.rr_nodes().edge_sink_node(edge); - - // If this is the new longest walk, store it. - if (hops > std::get<2>(longest)) { - longest = std::make_tuple(source, sink, hops); - } - } - } - return longest; -} - -// Test that the router can route nets individually, not considering congestion. -// This is a minimal timing driven routing test that can be used as documentation, -// and as a starting point for experimentation. -TEST_CASE("connection_router", "[vpr]") { - // Minimal setup - auto options = t_options(); - auto arch = t_arch(); - auto vpr_setup = t_vpr_setup(); - - vpr_install_signal_handler(); - vpr_initialize_logging(); - - // Command line arguments - const char* argv[] = { - "test_vpr", - kArchFile, - "wire.eblif", - "--route_chan_width", "100"}; - vpr_init(sizeof(argv) / sizeof(argv[0]), argv, &options, &vpr_setup, &arch); - - vpr_create_device_grid(vpr_setup, arch); - vpr_setup_clock_networks(vpr_setup, arch); - auto det_routing_arch = &vpr_setup.RoutingArch; - auto& router_opts = vpr_setup.RouterOpts; - e_graph_type graph_directionality; - - if (router_opts.route_type == GLOBAL) { - graph_directionality = e_graph_type::BIDIR; - } else { - graph_directionality = (det_routing_arch->directionality == BI_DIRECTIONAL ? e_graph_type::BIDIR : e_graph_type::UNIDIR); - } - - auto chan_width = init_chan(vpr_setup.RouterOpts.fixed_channel_width, arch.Chans, graph_directionality); - - alloc_routing_structs( - chan_width, - vpr_setup.RouterOpts, - &vpr_setup.RoutingArch, - vpr_setup.Segments, - arch.directs, - router_opts.flat_routing); - - // Find a source and sink to route - RRNodeId source_rr_node, sink_rr_node; - int hops; - std::tie(source_rr_node, sink_rr_node, hops) = find_source_and_sink(); - - // Check that the route will be non-trivial - REQUIRE(source_rr_node != sink_rr_node); - REQUIRE(hops >= 3); - - // Find the route - float delay = do_one_route(source_rr_node, - sink_rr_node, - vpr_setup.RoutingArch, - vpr_setup.RouterOpts, - vpr_setup.Segments); - - // Check that a route was found - REQUIRE(delay < std::numeric_limits::infinity()); - - // Clean up - free_routing_structs(); - vpr_free_all(arch, vpr_setup); -} - -} // namespace diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/config.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/config.txt index 1ccd16490d7..3ca35cef4c4 100644 --- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/config.txt +++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/config.txt @@ -38,3 +38,6 @@ pass_requirements_file=pass_requirements.txt script_params_common=-track_memory_usage script_params_list_add = script_params_list_add = --router_algorithm parallel +script_params_list_add = --enable_parallel_connection_router on +script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 4 --multi_queue_num_queues 16 +script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 4 --multi_queue_direct_draining on diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/golden_results.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/golden_results.txt index 39aa722daca..fa43c27af55 100644 --- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/golden_results.txt +++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/golden_results.txt @@ -1,3 +1,6 @@ - arch circuit script_params vtr_flow_elapsed_time vtr_max_mem_stage vtr_max_mem error odin_synth_time max_odin_mem parmys_synth_time max_parmys_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_mem pack_time placed_wirelength_est total_swap accepted_swap rejected_swap aborted_swap place_mem place_time place_quench_time placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est placed_geomean_nonvirtual_intradomain_critical_path_delay_est place_delay_matrix_lookup_time place_quench_timing_analysis_time place_quench_sta_time place_total_timing_analysis_time place_total_sta_time ap_mem ap_time ap_full_legalizer_mem ap_full_legalizer_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time min_chan_width_total_timing_analysis_time min_chan_width_total_sta_time crit_path_num_rr_graph_nodes crit_path_num_rr_graph_edges crit_path_collapsed_nodes crit_path_routed_wirelength crit_path_route_success_iteration crit_path_total_nets_routed crit_path_total_connections_routed crit_path_total_heap_pushes crit_path_total_heap_pops critical_path_delay geomean_nonvirtual_intradomain_critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS crit_path_routing_area_total crit_path_routing_area_per_tile router_lookahead_computation_time crit_path_route_time crit_path_create_rr_graph_time crit_path_create_intra_cluster_rr_graph_time crit_path_tile_lookahead_computation_time crit_path_router_lookahead_computation_time crit_path_total_timing_analysis_time crit_path_total_sta_time - k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml test.v common 9.38 vpr 77.35 MiB -1 -1 0.36 22280 1 0.10 -1 -1 35580 -1 -1 12 130 0 -1 success v8.0.0-12163-g0dba7016b-dirty Release VTR_ASSERT_LEVEL=2 GNU 11.4.0 on Linux-6.8.0-51-generic x86_64 2025-02-19T17:54:19 haydar-Precision-5820-Tower /home/haydar/vtr-verilog-to-routing 79208 130 40 596 562 1 356 185 14 14 196 dsp_top auto 38.5 MiB 0.18 1862 38583 13232 21153 4198 77.4 MiB 0.24 0.00 5.12303 -624.562 -5.12303 5.12303 0.45 0.00115671 0.00104931 0.13445 0.124537 -1 -1 -1 -1 64 3969 9 4.93594e+06 1.0962e+06 976140. 4980.31 5.77 0.971386 0.907233 31408 195022 -1 3606 8 821 857 201107 78801 4.57723 4.57723 -666.876 -4.57723 0 0 1.23909e+06 6321.90 0.06 0.12 0.38 -1 -1 0.06 0.0628918 0.0600921 - k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml test.v common_--router_algorithm_parallel 7.77 vpr 77.61 MiB -1 -1 0.36 22212 1 0.08 -1 -1 35140 -1 -1 12 130 0 -1 success v8.0.0-12163-g0dba7016b-dirty Release VTR_ASSERT_LEVEL=2 GNU 11.4.0 on Linux-6.8.0-51-generic x86_64 2025-02-19T17:54:19 haydar-Precision-5820-Tower /home/haydar/vtr-verilog-to-routing 79472 130 40 596 562 1 356 185 14 14 196 dsp_top auto 38.6 MiB 0.18 1862 38583 13232 21153 4198 77.6 MiB 0.37 0.00 5.12303 -624.562 -5.12303 5.12303 0.55 0.00210597 0.00194049 0.204405 0.191731 -1 -1 -1 -1 64 3993 10 4.93594e+06 1.0962e+06 976140. 4980.31 3.98 0.785401 0.735059 31408 195022 -1 3592 9 794 830 166912 64369 4.57723 4.57723 -658.916 -4.57723 0 0 1.23909e+06 6321.90 0.07 0.13 0.32 -1 -1 0.07 0.068841 0.0645644 +arch circuit script_params vtr_flow_elapsed_time vtr_max_mem_stage vtr_max_mem error odin_synth_time max_odin_mem parmys_synth_time max_parmys_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_mem pack_time initial_placed_wirelength_est placed_wirelength_est total_swap accepted_swap rejected_swap aborted_swap place_mem place_time place_quench_time initial_placed_CPD_est placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est placed_geomean_nonvirtual_intradomain_critical_path_delay_est place_delay_matrix_lookup_time place_quench_timing_analysis_time place_quench_sta_time place_total_timing_analysis_time place_total_sta_time ap_mem ap_time ap_full_legalizer_mem ap_full_legalizer_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time min_chan_width_total_timing_analysis_time min_chan_width_total_sta_time crit_path_num_rr_graph_nodes crit_path_num_rr_graph_edges crit_path_collapsed_nodes crit_path_routed_wirelength crit_path_route_success_iteration crit_path_total_nets_routed crit_path_total_connections_routed crit_path_total_heap_pushes crit_path_total_heap_pops critical_path_delay geomean_nonvirtual_intradomain_critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS crit_path_routing_area_total crit_path_routing_area_per_tile router_lookahead_computation_time crit_path_route_time crit_path_create_rr_graph_time crit_path_create_intra_cluster_rr_graph_time crit_path_tile_lookahead_computation_time crit_path_router_lookahead_computation_time crit_path_total_timing_analysis_time crit_path_total_sta_time +k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml test.v common 4.41 vpr 75.36 MiB -1 -1 0.19 17940 1 0.05 -1 -1 31600 -1 -1 12 130 0 -1 success cdda01bb5 release IPO VTR_ASSERT_LEVEL=2 GNU 13.3.0 on Linux-6.8.0-58-generic x86_64 2025-04-25T09:43:13 betzgrp-wintermute /home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks 77168 130 40 596 562 1 356 185 14 14 196 dsp_top auto 36.3 MiB 0.10 3253 1906 39109 13750 20961 4398 75.4 MiB 0.14 0.00 5.12303 5.12303 -649.023 -5.12303 5.12303 0.22 0.000974867 0.000904716 0.077352 0.0718699 -1 -1 -1 -1 82 3601 9 4.93594e+06 1.0962e+06 1.23902e+06 6321.54 2.25 0.377802 0.348243 33448 250998 -1 3687 9 800 863 234820 89374 4.57723 4.57723 -726.049 -4.57723 0 0 1.53308e+06 7821.82 0.04 0.07 0.22 -1 -1 0.04 0.0332833 0.0315356 +k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml test.v common_--router_algorithm_parallel 4.54 vpr 75.36 MiB -1 -1 0.19 17936 1 0.05 -1 -1 31376 -1 -1 12 130 0 -1 success cdda01bb5 release IPO VTR_ASSERT_LEVEL=2 GNU 13.3.0 on Linux-6.8.0-58-generic x86_64 2025-04-25T09:43:13 betzgrp-wintermute /home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks 77168 130 40 596 562 1 356 185 14 14 196 dsp_top auto 36.1 MiB 0.11 3253 1906 39109 13750 20961 4398 75.4 MiB 0.14 0.00 5.12303 5.12303 -649.023 -5.12303 5.12303 0.23 0.000981081 0.000906865 0.0786348 0.0731135 -1 -1 -1 -1 82 3585 15 4.93594e+06 1.0962e+06 1.23902e+06 6321.54 2.38 0.403967 0.372656 33448 250998 -1 3715 9 792 819 214644 81314 4.57723 4.57723 -685.291 -4.57723 0 0 1.53308e+06 7821.82 0.04 0.06 0.21 -1 -1 0.04 0.0307442 0.0291164 +k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml test.v common_--enable_parallel_connection_router_on 4.52 vpr 75.37 MiB -1 -1 0.20 17940 1 0.05 -1 -1 31712 -1 -1 12 130 0 -1 success cdda01bb5 release IPO VTR_ASSERT_LEVEL=2 GNU 13.3.0 on Linux-6.8.0-58-generic x86_64 2025-04-25T09:43:13 betzgrp-wintermute /home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks 77176 130 40 596 562 1 356 185 14 14 196 dsp_top auto 36.3 MiB 0.10 3253 1906 39109 13750 20961 4398 75.4 MiB 0.14 0.00 5.12303 5.12303 -649.023 -5.12303 5.12303 0.22 0.000979303 0.00090991 0.077946 0.0724613 -1 -1 -1 -1 82 3581 10 4.93594e+06 1.0962e+06 1.23902e+06 6321.54 2.36 0.375357 0.346115 33448 250998 -1 3699 9 747 819 220831 220831 4.57723 4.57723 -679.037 -4.57723 0 0 1.53308e+06 7821.82 0.04 0.08 0.21 -1 -1 0.04 0.0313828 0.0297241 +k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml test.v common_--enable_parallel_connection_router_on_--multi_queue_num_threads_4_--multi_queue_num_queues_16 5.06 vpr 74.98 MiB -1 -1 0.19 17956 1 0.05 -1 -1 31380 -1 -1 12 130 0 -1 success cdda01bb5 release IPO VTR_ASSERT_LEVEL=2 GNU 13.3.0 on Linux-6.8.0-58-generic x86_64 2025-04-25T09:43:13 betzgrp-wintermute /home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks 76780 130 40 596 562 1 356 185 14 14 196 dsp_top auto 36.3 MiB 0.10 3253 1906 39109 13750 20961 4398 75.0 MiB 0.14 0.00 5.12303 5.12303 -649.023 -5.12303 5.12303 0.22 0.000974681 0.00090508 0.0774429 0.0719617 -1 -1 -1 -1 82 3638 20 4.93594e+06 1.0962e+06 1.23902e+06 6321.54 2.83 0.404389 0.372638 33448 250998 -1 3485 9 735 762 269282 269282 4.57723 4.57723 -660.925 -4.57723 0 0 1.53308e+06 7821.82 0.04 0.14 0.21 -1 -1 0.04 0.0317757 0.0300764 +k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml test.v common_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_4_--multi_queue_direct_draining_on 5.20 vpr 74.15 MiB -1 -1 0.22 17572 1 0.06 -1 -1 31392 -1 -1 12 130 0 -1 success cdda01bb5 release IPO VTR_ASSERT_LEVEL=2 GNU 13.3.0 on Linux-6.8.0-58-generic x86_64 2025-04-25T09:43:13 betzgrp-wintermute /home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks 75932 130 40 596 562 1 356 185 14 14 196 dsp_top auto 35.1 MiB 0.11 3253 1906 39109 13750 20961 4398 74.2 MiB 0.14 0.00 5.12303 5.12303 -649.023 -5.12303 5.12303 0.23 0.000986009 0.000916173 0.0784056 0.0728838 -1 -1 -1 -1 82 3602 9 4.93594e+06 1.0962e+06 1.23902e+06 6321.54 2.86 0.418875 0.386466 33448 250998 -1 3679 10 722 785 234800 89746 4.57723 4.57723 -676.631 -4.57723 0 0 1.53308e+06 7821.82 0.04 0.12 0.21 -1 -1 0.04 0.0325316 0.0307593 diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/config.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/config.txt index d59d17d4831..122e16a14a2 100644 --- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/config.txt +++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/config.txt @@ -27,3 +27,6 @@ pass_requirements_file=pass_requirements.txt script_params_common=-track_memory_usage --route_chan_width 100 --max_router_iterations 100 --router_lookahead map --flat_routing on script_params_list_add = script_params_list_add = --router_algorithm parallel --num_workers 4 +script_params_list_add = --enable_parallel_connection_router on +script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 4 --multi_queue_num_queues 16 +script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 4 --multi_queue_direct_draining on diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/golden_results.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/golden_results.txt index e37401667f7..d308c81afd1 100644 --- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/golden_results.txt +++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/golden_results.txt @@ -1,3 +1,6 @@ - arch circuit script_params vtr_flow_elapsed_time vtr_max_mem_stage vtr_max_mem error odin_synth_time max_odin_mem parmys_synth_time max_parmys_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_mem pack_time placed_wirelength_est total_swap accepted_swap rejected_swap aborted_swap place_mem place_time place_quench_time placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est placed_geomean_nonvirtual_intradomain_critical_path_delay_est place_delay_matrix_lookup_time place_quench_timing_analysis_time place_quench_sta_time place_total_timing_analysis_time place_total_sta_time ap_mem ap_time ap_full_legalizer_mem ap_full_legalizer_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time min_chan_width_total_timing_analysis_time min_chan_width_total_sta_time crit_path_num_rr_graph_nodes crit_path_num_rr_graph_edges crit_path_collapsed_nodes crit_path_routed_wirelength crit_path_route_success_iteration crit_path_total_nets_routed crit_path_total_connections_routed crit_path_total_heap_pushes crit_path_total_heap_pops critical_path_delay geomean_nonvirtual_intradomain_critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS crit_path_routing_area_total crit_path_routing_area_per_tile router_lookahead_computation_time crit_path_route_time crit_path_create_rr_graph_time crit_path_create_intra_cluster_rr_graph_time crit_path_tile_lookahead_computation_time crit_path_router_lookahead_computation_time crit_path_total_timing_analysis_time crit_path_total_sta_time - k6_frac_N10_frac_chain_mem32K_40nm.xml spree.v common 11.85 vpr 79.08 MiB -1 -1 3.58 35500 16 0.65 -1 -1 38580 -1 -1 60 45 3 1 success v8.0.0-12163-g0dba7016b-dirty Release VTR_ASSERT_LEVEL=2 GNU 11.4.0 on Linux-6.8.0-51-generic x86_64 2025-02-19T17:54:19 haydar-Precision-5820-Tower /home/haydar/vtr-verilog-to-routing 80980 45 32 1192 1151 1 782 141 14 14 196 memory auto 40.0 MiB 3.23 6742 28689 8224 17037 3428 79.1 MiB 0.65 0.01 10.7103 -7090.32 -10.7103 10.7103 0.00 0.00310914 0.00279648 0.314019 0.270375 -1 -1 -1 -1 -1 10349 13 9.20055e+06 5.27364e+06 1.47691e+06 7535.23 1.50 0.423776 0.367585 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 - k6_frac_N10_frac_chain_mem32K_40nm.xml spree.v common_--router_algorithm_parallel_--num_workers_4 12.82 vpr 78.98 MiB -1 -1 3.48 35500 16 0.73 -1 -1 38088 -1 -1 60 45 3 1 success v8.0.0-12163-g0dba7016b-dirty Release VTR_ASSERT_LEVEL=2 GNU 11.4.0 on Linux-6.8.0-51-generic x86_64 2025-02-19T17:54:19 haydar-Precision-5820-Tower /home/haydar/vtr-verilog-to-routing 80880 45 32 1192 1151 1 782 141 14 14 196 memory auto 40.1 MiB 3.28 6742 28689 8224 17037 3428 79.0 MiB 0.59 0.01 10.7103 -7090.32 -10.7103 10.7103 0.00 0.00230907 0.0018852 0.209392 0.171163 -1 -1 -1 -1 -1 10313 15 9.20055e+06 5.27364e+06 1.47691e+06 7535.23 2.42 0.342057 0.287674 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 +arch circuit script_params vtr_flow_elapsed_time vtr_max_mem_stage vtr_max_mem error odin_synth_time max_odin_mem parmys_synth_time max_parmys_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_mem pack_time initial_placed_wirelength_est placed_wirelength_est total_swap accepted_swap rejected_swap aborted_swap place_mem place_time place_quench_time initial_placed_CPD_est placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est placed_geomean_nonvirtual_intradomain_critical_path_delay_est place_delay_matrix_lookup_time place_quench_timing_analysis_time place_quench_sta_time place_total_timing_analysis_time place_total_sta_time ap_mem ap_time ap_full_legalizer_mem ap_full_legalizer_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time min_chan_width_total_timing_analysis_time min_chan_width_total_sta_time crit_path_num_rr_graph_nodes crit_path_num_rr_graph_edges crit_path_collapsed_nodes crit_path_routed_wirelength crit_path_route_success_iteration crit_path_total_nets_routed crit_path_total_connections_routed crit_path_total_heap_pushes crit_path_total_heap_pops critical_path_delay geomean_nonvirtual_intradomain_critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS crit_path_routing_area_total crit_path_routing_area_per_tile router_lookahead_computation_time crit_path_route_time crit_path_create_rr_graph_time crit_path_create_intra_cluster_rr_graph_time crit_path_tile_lookahead_computation_time crit_path_router_lookahead_computation_time crit_path_total_timing_analysis_time crit_path_total_sta_time +k6_frac_N10_frac_chain_mem32K_40nm.xml spree.v common 7.09 vpr 77.94 MiB -1 -1 1.87 32304 16 0.41 -1 -1 34724 -1 -1 60 45 3 1 success cdda01bb5 release IPO VTR_ASSERT_LEVEL=2 GNU 13.3.0 on Linux-6.8.0-58-generic x86_64 2025-04-25T09:43:13 betzgrp-wintermute /home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks 79808 45 32 1192 1151 1 782 141 14 14 196 memory auto 39.5 MiB 1.86 9794 6883 28689 8164 16986 3539 77.9 MiB 0.45 0.01 11.8719 10.9558 -7219.74 -10.9558 10.9558 0.00 0.00305795 0.00281495 0.226764 0.201621 -1 -1 -1 -1 -1 10585 12 9.20055e+06 5.27364e+06 1.47691e+06 7535.23 1.17 0.283544 0.251205 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 +k6_frac_N10_frac_chain_mem32K_40nm.xml spree.v common_--router_algorithm_parallel_--num_workers_4 7.22 vpr 77.26 MiB -1 -1 2.04 31928 16 0.39 -1 -1 33776 -1 -1 60 45 3 1 success cdda01bb5 release IPO VTR_ASSERT_LEVEL=2 GNU 13.3.0 on Linux-6.8.0-58-generic x86_64 2025-04-25T09:43:13 betzgrp-wintermute /home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks 79116 45 32 1192 1151 1 782 141 14 14 196 memory auto 39.2 MiB 1.74 9794 6883 28689 8164 16986 3539 77.3 MiB 0.44 0.01 11.8719 10.9558 -7219.74 -10.9558 10.9558 0.00 0.00285411 0.002454 0.233988 0.206778 -1 -1 -1 -1 -1 10620 13 9.20055e+06 5.27364e+06 1.47691e+06 7535.23 1.24 0.301013 0.263687 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 +k6_frac_N10_frac_chain_mem32K_40nm.xml spree.v common_--enable_parallel_connection_router_on 7.01 vpr 77.55 MiB -1 -1 1.84 32308 16 0.38 -1 -1 34724 -1 -1 60 45 3 1 success cdda01bb5 release IPO VTR_ASSERT_LEVEL=2 GNU 13.3.0 on Linux-6.8.0-58-generic x86_64 2025-04-25T09:43:13 betzgrp-wintermute /home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks 79412 45 32 1192 1151 1 782 141 14 14 196 memory auto 39.2 MiB 1.77 9794 6883 28689 8164 16986 3539 77.6 MiB 0.41 0.01 11.8719 10.9558 -7219.74 -10.9558 10.9558 0.00 0.00217684 0.0019326 0.199185 0.177858 -1 -1 -1 -1 -1 10546 13 9.20055e+06 5.27364e+06 1.47691e+06 7535.23 1.30 0.25973 0.230708 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 +k6_frac_N10_frac_chain_mem32K_40nm.xml spree.v common_--enable_parallel_connection_router_on_--multi_queue_num_threads_4_--multi_queue_num_queues_16 8.02 vpr 77.44 MiB -1 -1 1.84 31552 16 0.39 -1 -1 34440 -1 -1 60 45 3 1 success cdda01bb5 release IPO VTR_ASSERT_LEVEL=2 GNU 13.3.0 on Linux-6.8.0-58-generic x86_64 2025-04-25T09:43:13 betzgrp-wintermute /home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks 79296 45 32 1192 1151 1 782 141 14 14 196 memory auto 39.1 MiB 1.70 9794 6883 28689 8164 16986 3539 77.4 MiB 0.36 0.00 11.8719 10.9558 -7219.74 -10.9558 10.9558 0.00 0.00187804 0.00164622 0.167362 0.14756 -1 -1 -1 -1 -1 10692 11 9.20055e+06 5.27364e+06 1.47691e+06 7535.23 2.43 0.218922 0.192418 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 +k6_frac_N10_frac_chain_mem32K_40nm.xml spree.v common_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_4_--multi_queue_direct_draining_on 7.25 vpr 77.56 MiB -1 -1 1.84 32324 16 0.39 -1 -1 34576 -1 -1 60 45 3 1 success cdda01bb5 release IPO VTR_ASSERT_LEVEL=2 GNU 13.3.0 on Linux-6.8.0-58-generic x86_64 2025-04-25T09:43:13 betzgrp-wintermute /home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks 79424 45 32 1192 1151 1 782 141 14 14 196 memory auto 39.2 MiB 1.70 9794 6883 28689 8164 16986 3539 77.6 MiB 0.36 0.00 11.8719 10.9558 -7219.74 -10.9558 10.9558 0.00 0.00188091 0.00164873 0.16884 0.149107 -1 -1 -1 -1 -1 10708 13 9.20055e+06 5.27364e+06 1.47691e+06 7535.23 1.64 0.225256 0.197955 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/config.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/config.txt index dbceb44a4dc..09855147b8b 100644 --- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/config.txt +++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/config.txt @@ -27,3 +27,6 @@ pass_requirements_file=pass_requirements_multiclock.txt script_params_common=-starting_stage vpr -sdc_file tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/multiclock.sdc script_params_list_add = script_params_list_add = --router_algorithm parallel --num_workers 4 +script_params_list_add = --enable_parallel_connection_router on +script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 4 --multi_queue_num_queues 16 +script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 4 --multi_queue_direct_draining on diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/golden_results.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/golden_results.txt index 7e566048732..2f7e01b0b8e 100644 --- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/golden_results.txt +++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/golden_results.txt @@ -1,3 +1,6 @@ - arch circuit script_params crit_path_delay_mcw clk_to_clk_cpd clk_to_clk2_cpd clk_to_input_cpd clk_to_output_cpd clk2_to_clk2_cpd clk2_to_clk_cpd clk2_to_input_cpd clk2_to_output_cpd input_to_input_cpd input_to_clk_cpd input_to_clk2_cpd input_to_output_cpd output_to_output_cpd output_to_clk_cpd output_to_clk2_cpd output_to_input_cpd clk_to_clk_setup_slack clk_to_clk2_setup_slack clk_to_input_setup_slack clk_to_output_setup_slack clk2_to_clk2_setup_slack clk2_to_clk_setup_slack clk2_to_input_setup_slack clk2_to_output_setup_slack input_to_input_setup_slack input_to_clk_setup_slack input_to_clk2_setup_slack input_to_output_setup_slack output_to_output_setup_slack output_to_clk_setup_slack output_to_clk2_setup_slack output_to_input_setup_slack clk_to_clk_hold_slack clk_to_clk2_hold_slack clk_to_input_hold_slack clk_to_output_hold_slack clk2_to_clk2_hold_slack clk2_to_clk_hold_slack clk2_to_input_hold_slack clk2_to_output_hold_slack input_to_input_hold_slack input_to_clk_hold_slack input_to_clk2_hold_slack input_to_output_hold_slack output_to_output_hold_slack output_to_clk_hold_slack output_to_clk2_hold_slack output_to_input_hold_slack - k6_frac_N10_mem32K_40nm.xml multiclock.blif common 1.59919 0.595 0.841581 -1 -1 0.57 0.814813 -1 1.59919 -1 1.1662 -1 1.8371 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0.243 1.71958 -1 -1 0.268 3.24281 -1 1.44782 -1 3.4042 -1 -1.40928 -1 -1 -1 -1 - k6_frac_N10_mem32K_40nm.xml multiclock.blif common_--router_algorithm_parallel_--num_workers_4 1.59919 0.595 0.841581 -1 -1 0.57 0.814813 -1 1.59919 -1 1.14847 -1 1.95678 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0.243 1.71958 -1 -1 0.268 3.24281 -1 1.44782 -1 3.38647 -1 -1.28959 -1 -1 -1 -1 +arch circuit script_params crit_path_delay_mcw clk_to_clk_cpd clk_to_clk2_cpd clk_to_input_cpd clk_to_output_cpd clk2_to_clk2_cpd clk2_to_clk_cpd clk2_to_input_cpd clk2_to_output_cpd input_to_input_cpd input_to_clk_cpd input_to_clk2_cpd input_to_output_cpd output_to_output_cpd output_to_clk_cpd output_to_clk2_cpd output_to_input_cpd clk_to_clk_setup_slack clk_to_clk2_setup_slack clk_to_input_setup_slack clk_to_output_setup_slack clk2_to_clk2_setup_slack clk2_to_clk_setup_slack clk2_to_input_setup_slack clk2_to_output_setup_slack input_to_input_setup_slack input_to_clk_setup_slack input_to_clk2_setup_slack input_to_output_setup_slack output_to_output_setup_slack output_to_clk_setup_slack output_to_clk2_setup_slack output_to_input_setup_slack clk_to_clk_hold_slack clk_to_clk2_hold_slack clk_to_input_hold_slack clk_to_output_hold_slack clk2_to_clk2_hold_slack clk2_to_clk_hold_slack clk2_to_input_hold_slack clk2_to_output_hold_slack input_to_input_hold_slack input_to_clk_hold_slack input_to_clk2_hold_slack input_to_output_hold_slack output_to_output_hold_slack output_to_clk_hold_slack output_to_clk2_hold_slack output_to_input_hold_slack +k6_frac_N10_mem32K_40nm.xml multiclock.blif common 1.59919 0.595 0.841581 -1 -1 0.57 0.814813 -1 1.59919 -1 1.1662 -1 1.8371 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0.243 1.71958 -1 -1 0.268 3.24281 -1 1.44782 -1 3.4042 -1 -1.40928 -1 -1 -1 -1 +k6_frac_N10_mem32K_40nm.xml multiclock.blif common_--router_algorithm_parallel_--num_workers_4 1.59919 0.595 0.841581 -1 -1 0.57 0.814813 -1 1.59919 -1 1.14847 -1 1.95678 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0.243 1.71958 -1 -1 0.268 3.24281 -1 1.44782 -1 3.38647 -1 -1.28959 -1 -1 -1 -1 +k6_frac_N10_mem32K_40nm.xml multiclock.blif common_--enable_parallel_connection_router_on 1.59919 0.595 0.841581 -1 -1 0.57 0.814813 -1 1.59919 -1 1.1662 -1 1.8371 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0.243 1.71958 -1 -1 0.268 3.24281 -1 1.44782 -1 3.4042 -1 -1.40928 -1 -1 -1 -1 +k6_frac_N10_mem32K_40nm.xml multiclock.blif common_--enable_parallel_connection_router_on_--multi_queue_num_threads_4_--multi_queue_num_queues_16 1.59919 0.595 0.841581 -1 -1 0.57 0.814813 -1 1.59919 -1 1.1662 -1 1.8371 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0.243 1.71958 -1 -1 0.268 3.24281 -1 1.44782 -1 3.4042 -1 -1.40928 -1 -1 -1 -1 +k6_frac_N10_mem32K_40nm.xml multiclock.blif common_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_4_--multi_queue_direct_draining_on 1.59919 0.595 0.841581 -1 -1 0.57 0.814813 -1 1.59919 -1 1.1662 -1 1.8371 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0.243 1.71958 -1 -1 0.268 3.24281 -1 1.44782 -1 3.4042 -1 -1.40928 -1 -1 -1 -1 diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/config.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/config.txt index dac263af64c..1ec5bc88ec3 100644 --- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/config.txt +++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/config.txt @@ -27,3 +27,6 @@ pass_requirements_file=pass_requirements.txt script_params_common = -track_memory_usage script_params_list_add = script_params_list_add = --router_algorithm parallel --num_workers 4 +script_params_list_add = --enable_parallel_connection_router on +script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 4 --multi_queue_num_queues 16 +script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 4 --multi_queue_direct_draining on diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/golden_results.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/golden_results.txt index b003134057c..5cbe4ea049b 100644 --- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/golden_results.txt +++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/golden_results.txt @@ -1,3 +1,6 @@ - arch circuit script_params vtr_flow_elapsed_time vtr_max_mem_stage vtr_max_mem error odin_synth_time max_odin_mem parmys_synth_time max_parmys_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_mem pack_time placed_wirelength_est total_swap accepted_swap rejected_swap aborted_swap place_mem place_time place_quench_time placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est placed_geomean_nonvirtual_intradomain_critical_path_delay_est place_delay_matrix_lookup_time place_quench_timing_analysis_time place_quench_sta_time place_total_timing_analysis_time place_total_sta_time ap_mem ap_time ap_full_legalizer_mem ap_full_legalizer_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time min_chan_width_total_timing_analysis_time min_chan_width_total_sta_time crit_path_num_rr_graph_nodes crit_path_num_rr_graph_edges crit_path_collapsed_nodes crit_path_routed_wirelength crit_path_route_success_iteration crit_path_total_nets_routed crit_path_total_connections_routed crit_path_total_heap_pushes crit_path_total_heap_pops critical_path_delay geomean_nonvirtual_intradomain_critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS crit_path_routing_area_total crit_path_routing_area_per_tile router_lookahead_computation_time crit_path_route_time crit_path_create_rr_graph_time crit_path_create_intra_cluster_rr_graph_time crit_path_tile_lookahead_computation_time crit_path_router_lookahead_computation_time crit_path_total_timing_analysis_time crit_path_total_sta_time - k6_frac_N10_mem32K_40nm.xml ch_intrinsics.v common 2.63 vpr 68.02 MiB -1 -1 0.39 22168 3 0.11 -1 -1 36800 -1 -1 68 99 1 0 success v8.0.0-12163-g0dba7016b-dirty Release VTR_ASSERT_LEVEL=2 GNU 11.4.0 on Linux-6.8.0-51-generic x86_64 2025-02-19T17:54:19 haydar-Precision-5820-Tower /home/haydar/vtr-verilog-to-routing 69656 99 130 344 474 1 227 298 12 12 144 clb auto 28.7 MiB 0.20 673 63978 19550 30341 14087 68.0 MiB 0.23 0.00 1.86472 -118.834 -1.86472 1.86472 0.15 0.000594963 0.000540506 0.0732034 0.0668337 -1 -1 -1 -1 38 1389 12 5.66058e+06 4.21279e+06 319130. 2216.18 0.54 0.213559 0.195205 12522 62564 -1 1116 11 409 682 22304 6997 1.90702 1.90702 -133.281 -1.90702 -1.20917 -0.320482 406292. 2821.48 0.02 0.04 0.08 -1 -1 0.02 0.0300207 0.027912 - k6_frac_N10_mem32K_40nm.xml ch_intrinsics.v common_--router_algorithm_parallel_--num_workers_4 2.86 vpr 68.12 MiB -1 -1 0.35 22168 3 0.11 -1 -1 36740 -1 -1 68 99 1 0 success v8.0.0-12163-g0dba7016b-dirty Release VTR_ASSERT_LEVEL=2 GNU 11.4.0 on Linux-6.8.0-51-generic x86_64 2025-02-19T17:54:19 haydar-Precision-5820-Tower /home/haydar/vtr-verilog-to-routing 69760 99 130 344 474 1 227 298 12 12 144 clb auto 28.7 MiB 0.20 673 63978 19550 30341 14087 68.1 MiB 0.27 0.00 1.86472 -118.834 -1.86472 1.86472 0.21 0.000644886 0.000574461 0.100184 0.0946805 -1 -1 -1 -1 38 1379 12 5.66058e+06 4.21279e+06 319130. 2216.18 0.64 0.202724 0.187418 12522 62564 -1 1115 10 390 630 21561 6939 1.90702 1.90702 -131.117 -1.90702 -1.20917 -0.320482 406292. 2821.48 0.02 0.04 0.10 -1 -1 0.02 0.021384 0.0193317 +arch circuit script_params vtr_flow_elapsed_time vtr_max_mem_stage vtr_max_mem error odin_synth_time max_odin_mem parmys_synth_time max_parmys_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_mem pack_time initial_placed_wirelength_est placed_wirelength_est total_swap accepted_swap rejected_swap aborted_swap place_mem place_time place_quench_time initial_placed_CPD_est placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est placed_geomean_nonvirtual_intradomain_critical_path_delay_est place_delay_matrix_lookup_time place_quench_timing_analysis_time place_quench_sta_time place_total_timing_analysis_time place_total_sta_time ap_mem ap_time ap_full_legalizer_mem ap_full_legalizer_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time min_chan_width_total_timing_analysis_time min_chan_width_total_sta_time crit_path_num_rr_graph_nodes crit_path_num_rr_graph_edges crit_path_collapsed_nodes crit_path_routed_wirelength crit_path_route_success_iteration crit_path_total_nets_routed crit_path_total_connections_routed crit_path_total_heap_pushes crit_path_total_heap_pops critical_path_delay geomean_nonvirtual_intradomain_critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS crit_path_routing_area_total crit_path_routing_area_per_tile router_lookahead_computation_time crit_path_route_time crit_path_create_rr_graph_time crit_path_create_intra_cluster_rr_graph_time crit_path_tile_lookahead_computation_time crit_path_router_lookahead_computation_time crit_path_total_timing_analysis_time crit_path_total_sta_time +k6_frac_N10_mem32K_40nm.xml ch_intrinsics.v common 1.80 vpr 66.19 MiB -1 -1 0.22 18464 3 0.07 -1 -1 32740 -1 -1 68 99 1 0 success cdda01bb5 release IPO VTR_ASSERT_LEVEL=2 GNU 13.3.0 on Linux-6.8.0-58-generic x86_64 2025-04-25T09:43:13 betzgrp-wintermute /home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks 67776 99 130 344 474 1 227 298 12 12 144 clb auto 26.8 MiB 0.11 1695 684 72933 23047 34243 15643 66.2 MiB 0.13 0.00 1.98228 1.86362 -118.513 -1.86362 1.86362 0.11 0.000573836 0.000534893 0.0449119 0.0418379 -1 -1 -1 -1 38 1437 13 5.66058e+06 4.21279e+06 319130. 2216.18 0.35 0.164006 0.149599 12522 62564 -1 1141 11 437 710 29360 10219 1.94502 1.94502 -130.926 -1.94502 -0.717819 -0.29768 406292. 2821.48 0.01 0.03 0.04 -1 -1 0.01 0.0193638 0.0180814 +k6_frac_N10_mem32K_40nm.xml ch_intrinsics.v common_--router_algorithm_parallel_--num_workers_4 1.78 vpr 66.56 MiB -1 -1 0.22 18460 3 0.07 -1 -1 33112 -1 -1 68 99 1 0 success cdda01bb5 release IPO VTR_ASSERT_LEVEL=2 GNU 13.3.0 on Linux-6.8.0-58-generic x86_64 2025-04-25T09:43:13 betzgrp-wintermute /home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks 68160 99 130 344 474 1 227 298 12 12 144 clb auto 26.8 MiB 0.11 1695 684 72933 23047 34243 15643 66.6 MiB 0.14 0.00 1.98228 1.86362 -118.513 -1.86362 1.86362 0.11 0.000665545 0.000617009 0.0528489 0.0485457 -1 -1 -1 -1 38 1420 13 5.66058e+06 4.21279e+06 319130. 2216.18 0.32 0.145628 0.130364 12522 62564 -1 1150 9 446 701 30426 10498 1.94502 1.94502 -131.108 -1.94502 -0.67939 -0.29768 406292. 2821.48 0.01 0.03 0.04 -1 -1 0.01 0.0162708 0.0148159 +k6_frac_N10_mem32K_40nm.xml ch_intrinsics.v common_--enable_parallel_connection_router_on 1.83 vpr 65.93 MiB -1 -1 0.22 18452 3 0.07 -1 -1 32968 -1 -1 68 99 1 0 success cdda01bb5 release IPO VTR_ASSERT_LEVEL=2 GNU 13.3.0 on Linux-6.8.0-58-generic x86_64 2025-04-25T09:43:13 betzgrp-wintermute /home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks 67512 99 130 344 474 1 227 298 12 12 144 clb auto 26.1 MiB 0.11 1695 684 72933 23047 34243 15643 65.9 MiB 0.14 0.00 1.98228 1.86362 -118.513 -1.86362 1.86362 0.11 0.00057601 0.000537702 0.0453956 0.0423047 -1 -1 -1 -1 38 1417 9 5.66058e+06 4.21279e+06 319130. 2216.18 0.35 0.161904 0.148079 12522 62564 -1 1152 9 438 706 28653 28653 1.94502 1.94502 -129.801 -1.94502 -0.717819 -0.29768 406292. 2821.48 0.01 0.03 0.04 -1 -1 0.01 0.0178252 0.0167004 +k6_frac_N10_mem32K_40nm.xml ch_intrinsics.v common_--enable_parallel_connection_router_on_--multi_queue_num_threads_4_--multi_queue_num_queues_16 1.93 vpr 65.91 MiB -1 -1 0.23 18864 3 0.07 -1 -1 32740 -1 -1 68 99 1 0 success cdda01bb5 release IPO VTR_ASSERT_LEVEL=2 GNU 13.3.0 on Linux-6.8.0-58-generic x86_64 2025-04-25T09:43:13 betzgrp-wintermute /home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks 67492 99 130 344 474 1 227 298 12 12 144 clb auto 26.1 MiB 0.11 1695 684 72933 23047 34243 15643 65.9 MiB 0.13 0.00 1.98228 1.86362 -118.513 -1.86362 1.86362 0.11 0.000571739 0.000533473 0.0449704 0.0418885 -1 -1 -1 -1 38 1403 12 5.66058e+06 4.21279e+06 319130. 2216.18 0.44 0.164243 0.150129 12522 62564 -1 1126 9 422 667 50272 50272 1.94502 1.94502 -131.371 -1.94502 -0.717819 -0.29768 406292. 2821.48 0.01 0.05 0.04 -1 -1 0.01 0.0176487 0.0165339 +k6_frac_N10_mem32K_40nm.xml ch_intrinsics.v common_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_4_--multi_queue_direct_draining_on 1.95 vpr 66.02 MiB -1 -1 0.22 18476 3 0.07 -1 -1 32744 -1 -1 68 99 1 0 success cdda01bb5 release IPO VTR_ASSERT_LEVEL=2 GNU 13.3.0 on Linux-6.8.0-58-generic x86_64 2025-04-25T09:43:13 betzgrp-wintermute /home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks 67608 99 130 344 474 1 227 298 12 12 144 clb auto 26.4 MiB 0.15 1695 684 72933 23047 34243 15643 66.0 MiB 0.13 0.00 1.98228 1.86362 -118.513 -1.86362 1.86362 0.11 0.000573161 0.000534076 0.0449252 0.0418347 -1 -1 -1 -1 38 1408 11 5.66058e+06 4.21279e+06 319130. 2216.18 0.40 0.162733 0.14864 12522 62564 -1 1155 11 436 701 33997 11289 1.94502 1.94502 -131.251 -1.94502 -0.717819 -0.29768 406292. 2821.48 0.01 0.05 0.05 -1 -1 0.01 0.022968 0.0212952 diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing_update_type/config/config.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing_update_type/config/config.txt index 17b20f60f24..6af15346384 100644 --- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing_update_type/config/config.txt +++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing_update_type/config/config.txt @@ -31,3 +31,9 @@ script_params_list_add = --timing_update_type incremental script_params_list_add = --timing_update_type incremental --quench_recompute_divider 999999999 #Do post-move incremental STA during quench script_params_list_add = --timing_update_type incremental --router_algorithm parallel --num_workers 4 # rarely exercised code path script_params_list_add = --timing_update_type full --router_algorithm parallel --num_workers 4 +script_params_list_add = --timing_update_type incremental --enable_parallel_connection_router on +script_params_list_add = --timing_update_type incremental --enable_parallel_connection_router on --multi_queue_num_threads 4 --multi_queue_num_queues 16 +script_params_list_add = --timing_update_type incremental --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 4 --multi_queue_direct_draining on +script_params_list_add = --timing_update_type full --enable_parallel_connection_router on +script_params_list_add = --timing_update_type full --enable_parallel_connection_router on --multi_queue_num_threads 4 --multi_queue_num_queues 16 +script_params_list_add = --timing_update_type full --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 4 --multi_queue_direct_draining on