diff --git a/doc/src/api/vprinternals/router_connection_router.rst b/doc/src/api/vprinternals/router_connection_router.rst
new file mode 100644
index 00000000000..32a7c7dc673
--- /dev/null
+++ b/doc/src/api/vprinternals/router_connection_router.rst
@@ -0,0 +1,18 @@
+==========
+Connection Router
+==========
+
+ConnectionRouter
+---------
+.. doxygenfile:: connection_router.h
+   :project: vpr
+
+SerialConnectionRouter
+----------
+.. doxygenclass:: SerialConnectionRouter
+   :project: vpr
+
+ParallelConnectionRouter
+----------
+.. doxygenclass:: ParallelConnectionRouter
+   :project: vpr
diff --git a/doc/src/api/vprinternals/vpr_router.rst b/doc/src/api/vprinternals/vpr_router.rst
index 63624cd8b39..5e72894aba7 100644
--- a/doc/src/api/vprinternals/vpr_router.rst
+++ b/doc/src/api/vprinternals/vpr_router.rst
@@ -9,3 +9,4 @@ VPR Router
 
    router_heap
    router_lookahead
+   router_connection_router
diff --git a/doc/src/vpr/command_line_usage.rst b/doc/src/vpr/command_line_usage.rst
index 77eb679ca6f..d8e1cdce681 100644
--- a/doc/src/vpr/command_line_usage.rst
+++ b/doc/src/vpr/command_line_usage.rst
@@ -47,12 +47,12 @@ By default VPR will perform a binary search routing to find the minimum channel
 
 Detailed Command-line Options
 -----------------------------
-VPR has a lot of options. Running :option:`vpr --help` will display all the available options and their usage information. 
+VPR has a lot of options. Running :option:`vpr --help` will display all the available options and their usage information.
 
 .. option:: -h, --help
 
     Display help message then exit.
-    
+
 The options most people will be interested in are:
 
 * :option:`--route_chan_width` (route at a fixed channel width), and
@@ -208,7 +208,7 @@ General Options
     * Any string matching ``name`` attribute of a device layout defined with a ``<fixed_layout>`` tag in the :ref:`arch_grid_layout` section of the architecture file.
 
     If the value specified is neither ``auto`` nor matches the ``name`` attribute value of a ``<fixed_layout>`` tag, VPR issues an error.
-       
+
     .. note:: If the only layout in the architecture file is a single device specified using ``<fixed_layout>``, it is recommended to always specify the ``--device`` option; this prevents the value ``--device auto`` from interfering with operations supported only for ``<fixed_layout>`` grids.
 
     **Default:** ``auto``
@@ -900,7 +900,7 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe
 
 .. option:: --place_agent_algorithm {e_greedy | softmax}
 
-    Controls which placement RL agent is used. 
+    Controls which placement RL agent is used.
 
     **Default:** ``softmax``
 
@@ -922,10 +922,10 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe
 
 .. option:: --place_reward_fun {basic | nonPenalizing_basic | runtime_aware | WLbiased_runtime_aware}
 
-    The reward function used by the placement RL agent to learn the best action at each anneal stage. 
+    The reward function used by the placement RL agent to learn the best action at each anneal stage.
+
+    .. note:: The latter two are only available for timing-driven placement.
 
-    .. note:: The latter two are only available for timing-driven placement. 
-    
     **Default:** ``WLbiased_runtime_aware``
 
 .. option:: --place_agent_space {move_type | move_block_type}
@@ -935,20 +935,20 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe
     **Default:** ``move_block_type``
 
 .. option:: --place_quench_only {on | off}
-    
+
     If this option is set to ``on``, the placement will skip the annealing phase and only perform the placement quench.
-    This option is useful when the the quality of initial placement is good enough and there is no need to perform the 
+    This option is useful when the the quality of initial placement is good enough and there is no need to perform the
     annealing phase.
 
     **Default:** ``off``
 
 
 .. option:: --placer_debug_block <int>
-    
+
     .. note:: This option is likely only of interest to developers debugging the placement algorithm
 
-    Controls which block the placer produces detailed debug information for. 
-    
+    Controls which block the placer produces detailed debug information for.
+
     If the block being moved has the same ID as the number assigned to this parameter, the placer will print debugging information about it.
 
     * For values >= 0, the value is the block ID for which detailed placer debug information should be produced.
@@ -960,7 +960,7 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe
     **Default:** ``-2``
 
 .. option:: --placer_debug_net <int>
-    
+
     .. note:: This option is likely only of interest to developers debugging the placement algorithm
 
     Controls which net the placer produces detailed debug information for.
@@ -1004,7 +1004,7 @@ The following options are only valid when the placement engine is in timing-driv
 
 .. option:: --quench_recompute_divider <int>
 
-    Controls how many times the placer performs a timing analysis to update its criticality estimates during a quench. 
+    Controls how many times the placer performs a timing analysis to update its criticality estimates during a quench.
     If unspecified, uses the value from --inner_loop_recompute_divider.
 
     **Default:** ``0``
@@ -1088,7 +1088,7 @@ The following options are only valid when the placement engine is in timing-driv
 
 NoC Options
 ^^^^^^^^^^^^^^
-The following options are only used when FPGA device and netlist contain a NoC router.  
+The following options are only used when FPGA device and netlist contain a NoC router.
 
 .. option:: --noc {on | off}
 
@@ -1098,7 +1098,7 @@ The following options are only used when FPGA device and netlist contain a NoC r
     **Default:** ``off``
 
 .. option:: --noc_flows_file <file>
-    
+
     XML file containing the list of traffic flows within the NoC (communication between routers).
 
     .. note:: noc_flows_file are required to specify if NoC optimization is turned on (--noc on).
@@ -1106,7 +1106,7 @@ The following options are only used when FPGA device and netlist contain a NoC r
 .. option:: --noc_routing_algorithm {xy_routing | bfs_routing | west_first_routing | north_last_routing | negative_first_routing | odd_even_routing}
 
     Controls the algorithm used by the NoC to route packets.
-    
+
     * ``xy_routing`` Uses the direction oriented routing algorithm. This is recommended to be used with mesh NoC topologies.
     * ``bfs_routing`` Uses the breadth first search algorithm. The objective is to find a route that uses a minimum number of links. This algorithm is not guaranteed to generate deadlock-free traffic flow routes, but can be used with any NoC topology.
     * ``west_first_routing`` Uses the west-first routing algorithm. This is recommended to be used with mesh NoC topologies.
@@ -1119,11 +1119,11 @@ The following options are only used when FPGA device and netlist contain a NoC r
 .. option:: --noc_placement_weighting <float>
 
     Controls the importance of the NoC placement parameters relative to timing and wirelength of the design.
-    
+
     * ``noc_placement_weighting = 0`` means the placement is based solely on timing and wirelength.
     * ``noc_placement_weighting = 1`` means noc placement is considered equal to timing and wirelength.
     * ``noc_placement_weighting > 1`` means the placement is increasingly dominated by NoC parameters.
-    
+
     **Default:** ``5.0``
 
 .. option:: --noc_aggregate_bandwidth_weighting <float>
@@ -1141,7 +1141,7 @@ The following options are only used when FPGA device and netlist contain a NoC r
     Other positive numbers specify the importance of meeting latency constraints compared to other NoC-related cost terms.
     Weighting factors for NoC-related cost terms are normalized internally. Therefore, their absolute values are not important, and
     only their relative ratios determine the importance of each cost term.
-    
+
     **Default:** ``0.6``
 
 .. option:: --noc_latency_weighting <float>
@@ -1151,7 +1151,7 @@ The following options are only used when FPGA device and netlist contain a NoC r
     Other positive numbers specify the importance of minimizing aggregate latency compared to other NoC-related cost terms.
     Weighting factors for NoC-related cost terms are normalized internally. Therefore, their absolute values are not important, and
     only their relative ratios determine the importance of each cost term.
-    
+
     **Default:** ``0.02``
 
 .. option:: --noc_congestion_weighting <float>
@@ -1167,11 +1167,11 @@ The following options are only used when FPGA device and netlist contain a NoC r
 .. option:: --noc_swap_percentage <float>
 
     Sets the minimum fraction of swaps attempted by the placer that are NoC blocks.
-    This value is an integer ranging from [0-100]. 
-    
-    * ``0`` means NoC blocks will be moved at the same rate as other blocks. 
+    This value is an integer ranging from [0-100].
+
+    * ``0`` means NoC blocks will be moved at the same rate as other blocks.
     * ``100`` means all swaps attempted by the placer are NoC router blocks.
-    
+
     **Default:** ``0``
 
 .. option:: --noc_placement_file_name <file>
@@ -1257,7 +1257,7 @@ Analytical Placement is generally split into three stages:
 
     * ``none`` Do not use any Detailed Placer.
 
-    * ``annealer`` Use the Annealer from the Placement stage as a Detailed Placer. This will use the same Placer Options from the Place stage to configure the annealer. 
+    * ``annealer`` Use the Annealer from the Placement stage as a Detailed Placer. This will use the same Placer Options from the Place stage to configure the annealer.
 
     **Default:** ``annealer``
 
@@ -1343,8 +1343,8 @@ VPR uses a negotiated congestion algorithm (based on Pathfinder) to perform rout
 
 .. option:: --max_pres_fac <float>
 
-    Sets the maximum present overuse penalty factor that can ever result during routing. Should always be less than 1e25 or so to prevent overflow. 
-    Smaller values may help prevent circuitous routing in difficult routing problems, but may increase 
+    Sets the maximum present overuse penalty factor that can ever result during routing. Should always be less than 1e25 or so to prevent overflow.
+    Smaller values may help prevent circuitous routing in difficult routing problems, but may increase
     the number of routing iterations needed and hence runtime.
 
     **Default:** ``1000.0``
@@ -1423,7 +1423,7 @@ VPR uses a negotiated congestion algorithm (based on Pathfinder) to perform rout
 
 .. option:: --router_algorithm {timing_driven | parallel | parallel_decomp}
 
-    Selects which router algorithm to use. 
+    Selects which router algorithm to use.
 
     * ``timing_driven`` is the default single-threaded PathFinder algorithm.
 
@@ -1505,13 +1505,90 @@ The following options are only valid when the router is in timing-driven mode (t
     **Default:** ``0.0``
 
 .. option:: --router_profiler_astar_fac <float>
-    
+
     Controls the directedness of the timing-driven router's exploration when doing router delay profiling of an architecture.
     The router delay profiling step is currently used to calculate the place delay matrix lookup.
     Values between 1 and 2 are resonable; higher values trade some quality for reduced run-time.
 
     **Default:** ``1.2``
 
+.. option:: --enable_parallel_connection_router {on | off}
+
+    Controls whether the MultiQueue-based parallel connection router is used during a single connection routing.
+
+    When enabled, the parallel connection router accelerates the path search for individual source-sink connections using
+    multi-threading without altering the net routing order.
+
+    **Default:** ``off``
+
+.. option:: --post_target_prune_fac <float>
+
+    Controls the post-target pruning heuristic calculation in the parallel connection router.
+
+    This parameter is used as a multiplicative factor applied to the VPR heuristic (not guaranteed to be admissible, i.e.,
+    might over-predict the cost to the sink) to calculate the 'stopping heuristic' when pruning nodes after the target has
+    been reached. The 'stopping heuristic' must be admissible for the path search algorithm to guarantee optimal paths and
+    be deterministic.
+
+    Values of this parameter are architecture-specific and have to be empirically found.
+
+    This parameter has no effect if :option:`--enable_parallel_connection_router` is not set.
+
+    **Default:** ``1.2``
+
+.. option:: --post_target_prune_offset <float>
+
+    Controls the post-target pruning heuristic calculation in the parallel connection router.
+
+    This parameter is used as a subtractive offset together with :option:`--post_target_prune_fac` to apply an affine
+    transformation on the VPR heuristic to calculate the 'stopping heuristic'. The 'stopping heuristic' must be admissible
+    for the path search algorithm to guarantee optimal paths and be deterministic.
+
+    Values of this parameter are architecture-specific and have to be empirically found.
+
+    This parameter has no effect if :option:`--enable_parallel_connection_router` is not set.
+
+    **Default:** ``0.0``
+
+.. option:: --multi_queue_num_threads <int>
+
+    Controls the number of threads used by MultiQueue-based parallel connection router.
+
+    If not explicitly specified, defaults to 1, implying the parallel connection router works in 'serial' mode using only
+    one main thread to route.
+
+    This parameter has no effect if :option:`--enable_parallel_connection_router` is not set.
+
+    **Default:** ``1``
+
+.. option:: --multi_queue_num_queues <int>
+
+    Controls the number of queues used by MultiQueue in the parallel connection router.
+
+    Must be set >= 2. A common configuration for this parameter is the number of threads used by MultiQueue * 4 (the number
+    of queues per thread).
+
+    This parameter has no effect if :option:`--enable_parallel_connection_router` is not set.
+
+    **Default:** ``2``
+
+.. option:: --multi_queue_direct_draining {on | off}
+
+    Controls whether to enable queue draining optimization for MultiQueue-based parallel connection router.
+
+    When enabled, queues can be emptied quickly by draining all elements if no further solutions need to be explored after
+    the target is reached in the path search.
+
+    Note: For this optimization to maintain optimality and deterministic results, the 'ordering heuristic' (calculated by
+    :option:`--astar_fac` and :option:`--astar_offset`) must be admissible to ensure emptying queues of entries with higher
+    costs does not prune possibly superior solutions. However, you can still enable this optimization regardless of whether
+    optimality and determinism are required for your specific use case (in such cases, the 'ordering heuristic' can be
+    inadmissible).
+
+    This parameter has no effect if :option:`--enable_parallel_connection_router` is not set.
+
+    **Default:** ``off``
+
 .. option:: --max_criticality <float>
 
     Sets the maximum fraction of routing cost that can come from delay (vs. coming from routability) for any net.
diff --git a/utils/route_diag/src/main.cpp b/utils/route_diag/src/main.cpp
index da911554a42..7160794cc7d 100644
--- a/utils/route_diag/src/main.cpp
+++ b/utils/route_diag/src/main.cpp
@@ -97,7 +97,7 @@ static void do_one_route(const Netlist<>& net_list,
                                                   segment_inf,
                                                   is_flat);
 
-    ConnectionRouter<FourAryHeap> router(
+    SerialConnectionRouter<FourAryHeap> router(
         device_ctx.grid,
         *router_lookahead,
             device_ctx.rr_graph.rr_nodes(),
diff --git a/vpr/src/base/SetupVPR.cpp b/vpr/src/base/SetupVPR.cpp
index f911039c184..d0873a8fb58 100644
--- a/vpr/src/base/SetupVPR.cpp
+++ b/vpr/src/base/SetupVPR.cpp
@@ -431,6 +431,12 @@ static void SetupRouterOpts(const t_options& Options, t_router_opts* RouterOpts)
     RouterOpts->astar_fac = Options.astar_fac;
     RouterOpts->astar_offset = Options.astar_offset;
     RouterOpts->router_profiler_astar_fac = Options.router_profiler_astar_fac;
+    RouterOpts->enable_parallel_connection_router = Options.enable_parallel_connection_router;
+    RouterOpts->post_target_prune_fac = Options.post_target_prune_fac;
+    RouterOpts->post_target_prune_offset = Options.post_target_prune_offset;
+    RouterOpts->multi_queue_num_threads = Options.multi_queue_num_threads;
+    RouterOpts->multi_queue_num_queues = Options.multi_queue_num_queues;
+    RouterOpts->multi_queue_direct_draining = Options.multi_queue_direct_draining;
     RouterOpts->bb_factor = Options.bb_factor;
     RouterOpts->criticality_exp = Options.criticality_exp;
     RouterOpts->max_criticality = Options.max_criticality;
diff --git a/vpr/src/base/ShowSetup.cpp b/vpr/src/base/ShowSetup.cpp
index fba46d4818c..cf46f082d08 100644
--- a/vpr/src/base/ShowSetup.cpp
+++ b/vpr/src/base/ShowSetup.cpp
@@ -379,6 +379,12 @@ static void ShowRouterOpts(const t_router_opts& RouterOpts) {
         VTR_LOG("RouterOpts.astar_fac: %f\n", RouterOpts.astar_fac);
         VTR_LOG("RouterOpts.astar_offset: %f\n", RouterOpts.astar_offset);
         VTR_LOG("RouterOpts.router_profiler_astar_fac: %f\n", RouterOpts.router_profiler_astar_fac);
+        VTR_LOG("RouterOpts.enable_parallel_connection_router: %s\n", RouterOpts.enable_parallel_connection_router ? "true" : "false");
+        VTR_LOG("RouterOpts.post_target_prune_fac: %f\n", RouterOpts.post_target_prune_fac);
+        VTR_LOG("RouterOpts.post_target_prune_offset: %f\n", RouterOpts.post_target_prune_offset);
+        VTR_LOG("RouterOpts.multi_queue_num_threads: %d\n", RouterOpts.multi_queue_num_threads);
+        VTR_LOG("RouterOpts.multi_queue_num_queues: %d\n", RouterOpts.multi_queue_num_queues);
+        VTR_LOG("RouterOpts.multi_queue_direct_draining: %s\n", RouterOpts.multi_queue_direct_draining ? "true" : "false");
         VTR_LOG("RouterOpts.criticality_exp: %f\n", RouterOpts.criticality_exp);
         VTR_LOG("RouterOpts.max_criticality: %f\n", RouterOpts.max_criticality);
         VTR_LOG("RouterOpts.init_wirelength_abort_threshold: %f\n", RouterOpts.init_wirelength_abort_threshold);
diff --git a/vpr/src/base/read_options.cpp b/vpr/src/base/read_options.cpp
index 41d1af33800..751b2ca92c3 100644
--- a/vpr/src/base/read_options.cpp
+++ b/vpr/src/base/read_options.cpp
@@ -2716,6 +2716,66 @@ argparse::ArgumentParser create_arg_parser(const std::string& prog_name, t_optio
         .default_value("1.2")
         .show_in(argparse::ShowIn::HELP_ONLY);
 
+    route_timing_grp.add_argument<bool, ParseOnOff>(args.enable_parallel_connection_router, "--enable_parallel_connection_router")
+        .help(
+            "Controls whether the MultiQueue-based parallel connection router is used during a single connection"
+            " routing. When enabled, the parallel connection router accelerates the path search for individual"
+            " source-sink connections using multi-threading without altering the net routing order.")
+        .default_value("off")
+        .show_in(argparse::ShowIn::HELP_ONLY);
+
+    route_timing_grp.add_argument(args.post_target_prune_fac, "--post_target_prune_fac")
+        .help(
+            "Controls the post-target pruning heuristic calculation in the parallel connection router."
+            " This parameter is used as a multiplicative factor applied to the VPR heuristic"
+            " (not guaranteed to be admissible, i.e., might over-predict the cost to the sink)"
+            " to calculate the 'stopping heuristic' when pruning nodes after the target has been"
+            " reached. The 'stopping heuristic' must be admissible for the path search algorithm"
+            " to guarantee optimal paths and be deterministic. Values of this parameter are"
+            " architecture-specific and have to be empirically found."
+            " This parameter has no effect if --enable_parallel_connection_router is not set.")
+        .default_value("1.2")
+        .show_in(argparse::ShowIn::HELP_ONLY);
+
+    route_timing_grp.add_argument(args.post_target_prune_offset, "--post_target_prune_offset")
+        .help(
+            "Controls the post-target pruning heuristic calculation in the parallel connection router."
+            " This parameter is used as a subtractive offset together with --post_target_prune_fac"
+            " to apply an affine transformation on the VPR heuristic to calculate the 'stopping"
+            " heuristic'. The 'stopping heuristic' must be admissible for the path search"
+            " algorithm to guarantee optimal paths and be deterministic. Values of this"
+            " parameter are architecture-specific and have to be empirically found."
+            " This parameter has no effect if --enable_parallel_connection_router is not set.")
+        .default_value("0.0")
+        .show_in(argparse::ShowIn::HELP_ONLY);
+
+    route_timing_grp.add_argument<int>(args.multi_queue_num_threads, "--multi_queue_num_threads")
+        .help(
+            "Controls the number of threads used by MultiQueue-based parallel connection router."
+            " If not explicitly specified, defaults to 1, implying the parallel connection router"
+            " works in 'serial' mode using only one main thread to route."
+            " This parameter has no effect if --enable_parallel_connection_router is not set.")
+        .default_value("1")
+        .show_in(argparse::ShowIn::HELP_ONLY);
+
+    route_timing_grp.add_argument<int>(args.multi_queue_num_queues, "--multi_queue_num_queues")
+        .help(
+            "Controls the number of queues used by MultiQueue in the parallel connection router."
+            " Must be set >= 2. A common configuration for this parameter is the number of threads"
+            " used by MultiQueue * 4 (the number of queues per thread)."
+            " This parameter has no effect if --enable_parallel_connection_router is not set.")
+        .default_value("2")
+        .show_in(argparse::ShowIn::HELP_ONLY);
+
+    route_timing_grp.add_argument<bool, ParseOnOff>(args.multi_queue_direct_draining, "--multi_queue_direct_draining")
+        .help(
+            "Controls whether to enable queue draining optimization for MultiQueue-based parallel connection"
+            " router. When enabled, queues can be emptied quickly by draining all elements if no further"
+            " solutions need to be explored in the path search to guarantee optimality or determinism after"
+            " reaching the target. This parameter has no effect if --enable_parallel_connection_router is not set.")
+        .default_value("off")
+        .show_in(argparse::ShowIn::HELP_ONLY);
+
     route_timing_grp.add_argument(args.max_criticality, "--max_criticality")
         .help(
             "Sets the maximum fraction of routing cost derived from delay (vs routability) for any net."
diff --git a/vpr/src/base/read_options.h b/vpr/src/base/read_options.h
index 13c0d93f3fe..ef8b91f3c1e 100644
--- a/vpr/src/base/read_options.h
+++ b/vpr/src/base/read_options.h
@@ -233,6 +233,12 @@ struct t_options {
     argparse::ArgValue<float> astar_fac;
     argparse::ArgValue<float> astar_offset;
     argparse::ArgValue<float> router_profiler_astar_fac;
+    argparse::ArgValue<bool> enable_parallel_connection_router;
+    argparse::ArgValue<float> post_target_prune_fac;
+    argparse::ArgValue<float> post_target_prune_offset;
+    argparse::ArgValue<int> multi_queue_num_threads;
+    argparse::ArgValue<int> multi_queue_num_queues;
+    argparse::ArgValue<bool> multi_queue_direct_draining;
     argparse::ArgValue<float> max_criticality;
     argparse::ArgValue<float> criticality_exp;
     argparse::ArgValue<float> router_init_wirelength_abort_threshold;
diff --git a/vpr/src/base/vpr_types.h b/vpr/src/base/vpr_types.h
index 106070c4c97..68cc128e9dd 100644
--- a/vpr/src/base/vpr_types.h
+++ b/vpr/src/base/vpr_types.h
@@ -1212,6 +1212,12 @@ struct t_router_opts {
     float astar_fac;
     float astar_offset;
     float router_profiler_astar_fac;
+    bool enable_parallel_connection_router;
+    float post_target_prune_fac;
+    float post_target_prune_offset;
+    int multi_queue_num_threads;
+    int multi_queue_num_queues;
+    bool multi_queue_direct_draining;
     float max_criticality;
     float criticality_exp;
     float init_wirelength_abort_threshold;
diff --git a/vpr/src/route/DecompNetlistRouter.h b/vpr/src/route/DecompNetlistRouter.h
index a41d656c240..e670bc5597d 100644
--- a/vpr/src/route/DecompNetlistRouter.h
+++ b/vpr/src/route/DecompNetlistRouter.h
@@ -85,11 +85,11 @@ class DecompNetlistRouter : public NetlistRouter {
     /** A single task to route nets inside a PartitionTree node and add tasks for its child nodes to task group \p g. */
     void route_partition_tree_node(tbb::task_group& g, PartitionTreeNode& node);
 
-    ConnectionRouter<HeapType> _make_router(const RouterLookahead* router_lookahead, bool is_flat) {
+    SerialConnectionRouter<HeapType> _make_router(const RouterLookahead* router_lookahead, bool is_flat) {
         auto& device_ctx = g_vpr_ctx.device();
         auto& route_ctx = g_vpr_ctx.mutable_routing();
 
-        return ConnectionRouter<HeapType>(
+        return SerialConnectionRouter<HeapType>(
             device_ctx.grid,
             *router_lookahead,
             device_ctx.rr_graph.rr_nodes(),
@@ -101,8 +101,8 @@ class DecompNetlistRouter : public NetlistRouter {
     }
 
     /* Context fields. Most of them will be forwarded to route_net (see route_net.tpp) */
-    /** Per-thread storage for ConnectionRouters. */
-    tbb::enumerable_thread_specific<ConnectionRouter<HeapType>> _routers_th;
+    /** Per-thread storage for SerialConnectionRouter. */
+    tbb::enumerable_thread_specific<SerialConnectionRouter<HeapType>> _routers_th;
     const Netlist<>& _net_list;
     const t_router_opts& _router_opts;
     CBRR& _connections_inf;
diff --git a/vpr/src/route/DecompNetlistRouter.tpp b/vpr/src/route/DecompNetlistRouter.tpp
index 228cf428ef6..21d800ec0b3 100644
--- a/vpr/src/route/DecompNetlistRouter.tpp
+++ b/vpr/src/route/DecompNetlistRouter.tpp
@@ -204,12 +204,12 @@ void DecompNetlistRouter<HeapType>::route_partition_tree_node(tbb::task_group& g
                 route_ctx.route_bb[net_id],
                 false);
             if (!flags.success && !flags.retry_with_full_bb) {
-                /* Disconnected RRG and ConnectionRouter doesn't think growing the BB will work */
+                /* Disconnected RRG and SerialConnectionRouter doesn't think growing the BB will work */
                 _results_th.local().is_routable = false;
                 return;
             }
             if (flags.retry_with_full_bb) {
-                /* ConnectionRouter thinks we should grow the BB. Do that and leave this net unrouted for now */
+                /*SerialConnectionRouter thinks we should grow the BB. Do that and leave this net unrouted for now */
                 route_ctx.route_bb[net_id] = full_device_bb();
                 _results_th.local().bb_updated_nets.push_back(net_id);
                 /* Disable decomposition for nets like this: they're already problematic */
diff --git a/vpr/src/route/NestedNetlistRouter.h b/vpr/src/route/NestedNetlistRouter.h
index 6870842af8f..e776d0a42da 100644
--- a/vpr/src/route/NestedNetlistRouter.h
+++ b/vpr/src/route/NestedNetlistRouter.h
@@ -4,6 +4,9 @@
 #include "netlist_routers.h"
 #include "vtr_optional.h"
 #include "vtr_thread_pool.h"
+#include "serial_connection_router.h"
+#include "parallel_connection_router.h"
+#include <memory>
 #include <unordered_map>
 
 /* Add cmd line option for this later */
@@ -67,19 +70,38 @@ class NestedNetlistRouter : public NetlistRouter {
     /** Route all nets in a PartitionTree node and add its children to the task queue. */
     void route_partition_tree_node(PartitionTreeNode& node);
 
-    ConnectionRouter<HeapType> _make_router(const RouterLookahead* router_lookahead, bool is_flat) {
+    std::unique_ptr<ConnectionRouterInterface> _make_router(const RouterLookahead* router_lookahead,
+                                                            const t_router_opts& router_opts,
+                                                            bool is_flat) {
         auto& device_ctx = g_vpr_ctx.device();
         auto& route_ctx = g_vpr_ctx.mutable_routing();
 
-        return ConnectionRouter<HeapType>(
-            device_ctx.grid,
-            *router_lookahead,
-            device_ctx.rr_graph.rr_nodes(),
-            &device_ctx.rr_graph,
-            device_ctx.rr_rc_data,
-            device_ctx.rr_graph.rr_switch(),
-            route_ctx.rr_node_route_inf,
-            is_flat);
+        if (!router_opts.enable_parallel_connection_router) {
+            // Serial Connection Router
+            return std::make_unique<SerialConnectionRouter<HeapType>>(
+                device_ctx.grid,
+                *router_lookahead,
+                device_ctx.rr_graph.rr_nodes(),
+                &device_ctx.rr_graph,
+                device_ctx.rr_rc_data,
+                device_ctx.rr_graph.rr_switch(),
+                route_ctx.rr_node_route_inf,
+                is_flat);
+        } else {
+            // Parallel Connection Router
+            return std::make_unique<ParallelConnectionRouter<HeapType>>(
+                device_ctx.grid,
+                *router_lookahead,
+                device_ctx.rr_graph.rr_nodes(),
+                &device_ctx.rr_graph,
+                device_ctx.rr_rc_data,
+                device_ctx.rr_graph.rr_switch(),
+                route_ctx.rr_node_route_inf,
+                is_flat,
+                router_opts.multi_queue_num_threads,
+                router_opts.multi_queue_num_queues,
+                router_opts.multi_queue_direct_draining);
+        }
     }
 
     /* Context fields. Most of them will be forwarded to route_net (see route_net.tpp) */
@@ -109,19 +131,19 @@ class NestedNetlistRouter : public NetlistRouter {
 
     /* Thread-local storage.
      * These are maps because thread::id is a random integer instead of 1, 2, ... */
-    std::unordered_map<std::thread::id, ConnectionRouter<HeapType>> _routers_th;
+    std::unordered_map<std::thread::id, std::unique_ptr<ConnectionRouterInterface>> _routers_th;
     std::unordered_map<std::thread::id, RouteIterResults> _results_th;
     std::mutex _storage_mutex;
 
     /** Get a thread-local ConnectionRouter. We lock the id->router lookup, but this is
      * accessed once per partition so the overhead should be small */
-    ConnectionRouter<HeapType>& get_thread_router() {
+    ConnectionRouterInterface& get_thread_router() {
         auto id = std::this_thread::get_id();
         std::lock_guard<std::mutex> lock(_storage_mutex);
         if (!_routers_th.count(id)) {
-            _routers_th.emplace(id, _make_router(_router_lookahead, _is_flat));
+            _routers_th.emplace(id, _make_router(_router_lookahead, _router_opts, _is_flat));
         }
-        return _routers_th.at(id);
+        return *_routers_th.at(id);
     }
 
     RouteIterResults& get_thread_results() {
diff --git a/vpr/src/route/NestedNetlistRouter.tpp b/vpr/src/route/NestedNetlistRouter.tpp
index 333be28ea3b..ec4b1fe0aa6 100644
--- a/vpr/src/route/NestedNetlistRouter.tpp
+++ b/vpr/src/route/NestedNetlistRouter.tpp
@@ -66,10 +66,9 @@ void NestedNetlistRouter<HeapType>::route_partition_tree_node(PartitionTreeNode&
     /* Route all nets in this node serially */
     for (auto net_id : nets) {
         auto& results = get_thread_results();
-        auto& router = get_thread_router();
 
         auto flags = route_net(
-            router,
+            get_thread_router(),
             _net_list,
             net_id,
             _itry,
@@ -131,7 +130,7 @@ void NestedNetlistRouter<HeapType>::handle_bb_updated_nets(const std::vector<Par
 template<typename HeapType>
 void NestedNetlistRouter<HeapType>::set_rcv_enabled(bool x) {
     for (auto& [_, router] : _routers_th) {
-        router.set_rcv_enabled(x);
+        router->set_rcv_enabled(x);
     }
 }
 
diff --git a/vpr/src/route/ParallelNetlistRouter.h b/vpr/src/route/ParallelNetlistRouter.h
index e77fdf8344e..68b240321b2 100644
--- a/vpr/src/route/ParallelNetlistRouter.h
+++ b/vpr/src/route/ParallelNetlistRouter.h
@@ -15,7 +15,7 @@
 #include <tbb/task_group.h>
 
 /** Parallel impl for NetlistRouter.
- * Holds enough context members to glue together ConnectionRouter and net routing functions,
+ * Holds enough context members to glue together SerialConnectionRouter and net routing functions,
  * such as \ref route_net. Keeps the members in thread-local storage where needed,
  * i.e. ConnectionRouters and RouteIterResults-es.
  * See \ref route_net. */
@@ -62,11 +62,11 @@ class ParallelNetlistRouter : public NetlistRouter {
     /** A single task to route nets inside a PartitionTree node and add tasks for its child nodes to task group \p g. */
     void route_partition_tree_node(tbb::task_group& g, PartitionTreeNode& node);
 
-    ConnectionRouter<HeapType> _make_router(const RouterLookahead* router_lookahead, bool is_flat) {
+    SerialConnectionRouter<HeapType> _make_router(const RouterLookahead* router_lookahead, bool is_flat) {
         auto& device_ctx = g_vpr_ctx.device();
         auto& route_ctx = g_vpr_ctx.mutable_routing();
 
-        return ConnectionRouter<HeapType>(
+        return SerialConnectionRouter<HeapType>(
             device_ctx.grid,
             *router_lookahead,
             device_ctx.rr_graph.rr_nodes(),
@@ -79,7 +79,7 @@ class ParallelNetlistRouter : public NetlistRouter {
 
     /* Context fields. Most of them will be forwarded to route_net (see route_net.tpp) */
     /** Per-thread storage for ConnectionRouters. */
-    tbb::enumerable_thread_specific<ConnectionRouter<HeapType>> _routers_th;
+    tbb::enumerable_thread_specific<SerialConnectionRouter<HeapType>> _routers_th;
     const Netlist<>& _net_list;
     const t_router_opts& _router_opts;
     CBRR& _connections_inf;
diff --git a/vpr/src/route/ParallelNetlistRouter.tpp b/vpr/src/route/ParallelNetlistRouter.tpp
index c845be8518d..dfdbac0cc29 100644
--- a/vpr/src/route/ParallelNetlistRouter.tpp
+++ b/vpr/src/route/ParallelNetlistRouter.tpp
@@ -79,12 +79,12 @@ void ParallelNetlistRouter<HeapType>::route_partition_tree_node(tbb::task_group&
             route_ctx.route_bb[net_id]);
 
         if (!flags.success && !flags.retry_with_full_bb) {
-            /* Disconnected RRG and ConnectionRouter doesn't think growing the BB will work */
+            /* Disconnected RRG and SerialConnectionRouter doesn't think growing the BB will work */
             _results_th.local().is_routable = false;
             return;
         }
         if (flags.retry_with_full_bb) {
-            /* ConnectionRouter thinks we should grow the BB. Do that and leave this net unrouted for now */
+            /* SerialConnectionRouter thinks we should grow the BB. Do that and leave this net unrouted for now */
             route_ctx.route_bb[net_id] = full_device_bb();
             _results_th.local().bb_updated_nets.push_back(net_id);
             continue;
diff --git a/vpr/src/route/SerialNetlistRouter.h b/vpr/src/route/SerialNetlistRouter.h
index 352de125b68..d56414d00af 100644
--- a/vpr/src/route/SerialNetlistRouter.h
+++ b/vpr/src/route/SerialNetlistRouter.h
@@ -3,6 +3,8 @@
 /** @file Serial case for \ref NetlistRouter: just loop through nets */
 
 #include "netlist_routers.h"
+#include "serial_connection_router.h"
+#include "parallel_connection_router.h"
 
 template<typename HeapType>
 class SerialNetlistRouter : public NetlistRouter {
@@ -20,7 +22,7 @@ class SerialNetlistRouter : public NetlistRouter {
         const RoutingPredictor& routing_predictor,
         const vtr::vector<ParentNetId, std::vector<std::unordered_map<RRNodeId, int>>>& choking_spots,
         bool is_flat)
-        : _router(_make_router(router_lookahead, is_flat))
+        : _router(_make_router(router_lookahead, router_opts, is_flat))
         , _net_list(net_list)
         , _router_opts(router_opts)
         , _connections_inf(connections_inf)
@@ -40,22 +42,41 @@ class SerialNetlistRouter : public NetlistRouter {
     void set_timing_info(std::shared_ptr<SetupHoldTimingInfo> timing_info);
 
   private:
-    ConnectionRouter<HeapType> _make_router(const RouterLookahead* router_lookahead, bool is_flat) {
+    std::unique_ptr<ConnectionRouterInterface> _make_router(const RouterLookahead* router_lookahead,
+                                                            const t_router_opts& router_opts,
+                                                            bool is_flat) {
         auto& device_ctx = g_vpr_ctx.device();
         auto& route_ctx = g_vpr_ctx.mutable_routing();
 
-        return ConnectionRouter<HeapType>(
-            device_ctx.grid,
-            *router_lookahead,
-            device_ctx.rr_graph.rr_nodes(),
-            &device_ctx.rr_graph,
-            device_ctx.rr_rc_data,
-            device_ctx.rr_graph.rr_switch(),
-            route_ctx.rr_node_route_inf,
-            is_flat);
+        if (!router_opts.enable_parallel_connection_router) {
+            // Serial Connection Router
+            return std::make_unique<SerialConnectionRouter<HeapType>>(
+                device_ctx.grid,
+                *router_lookahead,
+                device_ctx.rr_graph.rr_nodes(),
+                &device_ctx.rr_graph,
+                device_ctx.rr_rc_data,
+                device_ctx.rr_graph.rr_switch(),
+                route_ctx.rr_node_route_inf,
+                is_flat);
+        } else {
+            // Parallel Connection Router
+            return std::make_unique<ParallelConnectionRouter<HeapType>>(
+                device_ctx.grid,
+                *router_lookahead,
+                device_ctx.rr_graph.rr_nodes(),
+                &device_ctx.rr_graph,
+                device_ctx.rr_rc_data,
+                device_ctx.rr_graph.rr_switch(),
+                route_ctx.rr_node_route_inf,
+                is_flat,
+                router_opts.multi_queue_num_threads,
+                router_opts.multi_queue_num_queues,
+                router_opts.multi_queue_direct_draining);
+        }
     }
     /* Context fields */
-    ConnectionRouter<HeapType> _router;
+    std::unique_ptr<ConnectionRouterInterface> _router;
     const Netlist<>& _net_list;
     const t_router_opts& _router_opts;
     CBRR& _connections_inf;
diff --git a/vpr/src/route/SerialNetlistRouter.tpp b/vpr/src/route/SerialNetlistRouter.tpp
index 63497d7d394..b84acfbd58f 100644
--- a/vpr/src/route/SerialNetlistRouter.tpp
+++ b/vpr/src/route/SerialNetlistRouter.tpp
@@ -22,7 +22,7 @@ inline RouteIterResults SerialNetlistRouter<HeapType>::route_netlist(int itry, f
     for (size_t inet = 0; inet < sorted_nets.size(); inet++) {
         ParentNetId net_id = sorted_nets[inet];
         NetResultFlags flags = route_net(
-            _router,
+            *_router,
             _net_list,
             net_id,
             itry,
@@ -42,7 +42,7 @@ inline RouteIterResults SerialNetlistRouter<HeapType>::route_netlist(int itry, f
             route_ctx.route_bb[net_id]);
 
         if (!flags.success && !flags.retry_with_full_bb) {
-            /* Disconnected RRG and ConnectionRouter doesn't think growing the BB will work */
+            /* Disconnected RRG and SerialConnectionRouter doesn't think growing the BB will work */
             out.is_routable = false;
             return out;
         }
@@ -74,7 +74,7 @@ void SerialNetlistRouter<HeapType>::handle_bb_updated_nets(const std::vector<Par
 
 template<typename HeapType>
 void SerialNetlistRouter<HeapType>::set_rcv_enabled(bool x) {
-    _router.set_rcv_enabled(x);
+    _router->set_rcv_enabled(x);
 }
 
 template<typename HeapType>
diff --git a/vpr/src/route/connection_router.cpp b/vpr/src/route/connection_router.cpp
deleted file mode 100644
index 4e2672e9c97..00000000000
--- a/vpr/src/route/connection_router.cpp
+++ /dev/null
@@ -1,1121 +0,0 @@
-#include "connection_router.h"
-
-#include <algorithm>
-#include "rr_graph.h"
-#include "rr_graph_fwd.h"
-
-/** Used for the flat router. The node isn't relevant to the target if
- * it is an intra-block node outside of our target block */
-static bool relevant_node_to_target(const RRGraphView* rr_graph,
-                                    RRNodeId node_to_add,
-                                    RRNodeId target_node);
-
-static void update_router_stats(RouterStats* router_stats,
-                                bool is_push,
-                                RRNodeId rr_node_id,
-                                const RRGraphView* rr_graph);
-
-/** return tuple <found_path, retry_with_full_bb, cheapest> */
-template<typename Heap>
-std::tuple<bool, bool, RTExploredNode> ConnectionRouter<Heap>::timing_driven_route_connection_from_route_tree(
-    const RouteTreeNode& rt_root,
-    RRNodeId sink_node,
-    const t_conn_cost_params& cost_params,
-    const t_bb& bounding_box,
-    RouterStats& router_stats,
-    const ConnectionParameters& conn_params) {
-    router_stats_ = &router_stats;
-    conn_params_ = &conn_params;
-
-    bool retry = false;
-    retry = timing_driven_route_connection_common_setup(rt_root, sink_node, cost_params, bounding_box);
-
-    if (!std::isinf(rr_node_route_inf_[sink_node].path_cost)) {
-        // Only the `index`, `prev_edge`, and `rcv_path_backward_delay` fields of `out`
-        // are used after this function returns.
-        RTExploredNode out;
-        out.index = sink_node;
-        out.prev_edge = rr_node_route_inf_[sink_node].prev_edge;
-        if (rcv_path_manager.is_enabled()) {
-            out.rcv_path_backward_delay = rcv_path_data[sink_node]->backward_delay;
-            rcv_path_manager.update_route_tree_set(rcv_path_data[sink_node]);
-            rcv_path_manager.empty_heap();
-        }
-        heap_.empty_heap();
-        return std::make_tuple(true, /*retry=*/false, out);
-    } else {
-        reset_path_costs();
-        clear_modified_rr_node_info();
-        heap_.empty_heap();
-        rcv_path_manager.empty_heap();
-        return std::make_tuple(false, retry, RTExploredNode());
-    }
-}
-
-/** Return whether to retry with full bb */
-template<typename Heap>
-bool ConnectionRouter<Heap>::timing_driven_route_connection_common_setup(
-    const RouteTreeNode& rt_root,
-    RRNodeId sink_node,
-    const t_conn_cost_params& cost_params,
-    const t_bb& bounding_box) {
-    //Re-add route nodes from the existing route tree to the heap.
-    //They need to be repushed onto the heap since each node's cost is target specific.
-
-    add_route_tree_to_heap(rt_root, sink_node, cost_params, bounding_box);
-    heap_.build_heap(); // via sifting down everything
-
-    RRNodeId source_node = rt_root.inode;
-
-    if (heap_.is_empty_heap()) {
-        VTR_LOG("No source in route tree: %s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str());
-        return false;
-    }
-
-    VTR_LOGV_DEBUG(router_debug_, "  Routing to %d as normal net (BB: %d,%d,%d x %d,%d,%d)\n", sink_node,
-                   bounding_box.layer_min, bounding_box.xmin, bounding_box.ymin,
-                   bounding_box.layer_max, bounding_box.xmax, bounding_box.ymax);
-
-    timing_driven_route_connection_from_heap(sink_node,
-                                             cost_params,
-                                             bounding_box);
-
-    if (std::isinf(rr_node_route_inf_[sink_node].path_cost)) {
-        // No path found within the current bounding box.
-        //
-        // If the bounding box is already max size, just fail
-        if (bounding_box.xmin == 0
-            && bounding_box.ymin == 0
-            && bounding_box.xmax == (int)(grid_.width() - 1)
-            && bounding_box.ymax == (int)(grid_.height() - 1)
-            && bounding_box.layer_min == 0
-            && bounding_box.layer_max == (int)(grid_.get_num_layers() - 1)) {
-            VTR_LOG("%s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str());
-            return false;
-        }
-
-        // Otherwise, leave unrouted and bubble up a signal to retry this net with a full-device bounding box
-        VTR_LOG_WARN("No routing path for connection to sink_rr %d, leaving unrouted to retry later\n", sink_node);
-        return true;
-    }
-
-    return false;
-}
-
-// Finds a path from the route tree rooted at rt_root to sink_node for a high fanout net.
-//
-// Unlike timing_driven_route_connection_from_route_tree(), only part of the route tree
-// which is spatially close to the sink is added to the heap.
-// Returns a  tuple of <found_path?, retry_with_full_bb?, cheapest> */
-template<typename Heap>
-std::tuple<bool, bool, RTExploredNode> ConnectionRouter<Heap>::timing_driven_route_connection_from_route_tree_high_fanout(
-    const RouteTreeNode& rt_root,
-    RRNodeId sink_node,
-    const t_conn_cost_params& cost_params,
-    const t_bb& net_bounding_box,
-    const SpatialRouteTreeLookup& spatial_rt_lookup,
-    RouterStats& router_stats,
-    const ConnectionParameters& conn_params) {
-    router_stats_ = &router_stats;
-    conn_params_ = &conn_params;
-
-    // re-explore route tree from root to add any new nodes (buildheap afterwards)
-    // route tree needs to be repushed onto the heap since each node's cost is target specific
-    t_bb high_fanout_bb = add_high_fanout_route_tree_to_heap(rt_root, sink_node, cost_params, spatial_rt_lookup, net_bounding_box);
-    heap_.build_heap();
-
-    RRNodeId source_node = rt_root.inode;
-
-    if (heap_.is_empty_heap()) {
-        VTR_LOG("No source in route tree: %s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str());
-        return std::make_tuple(false, false, RTExploredNode());
-    }
-
-    VTR_LOGV_DEBUG(router_debug_, "  Routing to %d as high fanout net (BB: %d,%d,%d x %d,%d,%d)\n", sink_node,
-                   high_fanout_bb.layer_min, high_fanout_bb.xmin, high_fanout_bb.ymin,
-                   high_fanout_bb.layer_max, high_fanout_bb.xmax, high_fanout_bb.ymax);
-
-    bool retry_with_full_bb = false;
-    timing_driven_route_connection_from_heap(sink_node,
-                                             cost_params,
-                                             high_fanout_bb);
-
-    if (std::isinf(rr_node_route_inf_[sink_node].path_cost)) {
-        //Found no path, that may be due to an unlucky choice of existing route tree sub-set,
-        //try again with the full route tree to be sure this is not an artifact of high-fanout routing
-        VTR_LOG_WARN("No routing path found in high-fanout mode for net %zu connection (to sink_rr %d), retrying with full route tree\n", size_t(conn_params.net_id_), sink_node);
-
-        //Reset any previously recorded node costs so timing_driven_route_connection()
-        //starts over from scratch.
-        reset_path_costs();
-        clear_modified_rr_node_info();
-
-        retry_with_full_bb = timing_driven_route_connection_common_setup(rt_root,
-                                                                         sink_node,
-                                                                         cost_params,
-                                                                         net_bounding_box);
-    }
-
-    if (std::isinf(rr_node_route_inf_[sink_node].path_cost)) {
-        VTR_LOG("%s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str());
-
-        heap_.empty_heap();
-        rcv_path_manager.empty_heap();
-        return std::make_tuple(false, retry_with_full_bb, RTExploredNode());
-    }
-
-    RTExploredNode out;
-    out.index = sink_node;
-    out.prev_edge = rr_node_route_inf_[sink_node].prev_edge;
-    if (rcv_path_manager.is_enabled()) {
-        out.rcv_path_backward_delay = rcv_path_data[sink_node]->backward_delay;
-        rcv_path_manager.update_route_tree_set(rcv_path_data[sink_node]);
-        rcv_path_manager.empty_heap();
-    }
-    heap_.empty_heap();
-
-    return std::make_tuple(true, retry_with_full_bb, out);
-}
-
-// Finds a path to sink_node, starting from the elements currently in the heap.
-// This is the core maze routing routine.
-template<typename Heap>
-void ConnectionRouter<Heap>::timing_driven_route_connection_from_heap(RRNodeId sink_node,
-                                                                      const t_conn_cost_params& cost_params,
-                                                                      const t_bb& bounding_box) {
-    VTR_ASSERT_SAFE(heap_.is_valid());
-
-    if (heap_.is_empty_heap()) { //No source
-        VTR_LOGV_DEBUG(router_debug_, "  Initial heap empty (no source)\n");
-    }
-
-    const auto& device_ctx = g_vpr_ctx.device();
-    auto& route_ctx = g_vpr_ctx.mutable_routing();
-
-    // Get bounding box for sink node used in timing_driven_expand_neighbour
-    VTR_ASSERT_SAFE(sink_node != RRNodeId::INVALID());
-
-    t_bb target_bb;
-    if (rr_graph_->node_type(sink_node) == e_rr_type::SINK) { // We need to get a bounding box for the sink's entire tile
-        vtr::Rect<int> tile_bb = grid_.get_tile_bb({rr_graph_->node_xlow(sink_node),
-                                                    rr_graph_->node_ylow(sink_node),
-                                                    rr_graph_->node_layer(sink_node)});
-
-        target_bb.xmin = tile_bb.xmin();
-        target_bb.ymin = tile_bb.ymin();
-        target_bb.xmax = tile_bb.xmax();
-        target_bb.ymax = tile_bb.ymax();
-    } else {
-        target_bb.xmin = rr_graph_->node_xlow(sink_node);
-        target_bb.ymin = rr_graph_->node_ylow(sink_node);
-        target_bb.xmax = rr_graph_->node_xhigh(sink_node);
-        target_bb.ymax = rr_graph_->node_yhigh(sink_node);
-    }
-
-    target_bb.layer_min = rr_graph_->node_layer(RRNodeId(sink_node));
-    target_bb.layer_max = rr_graph_->node_layer(RRNodeId(sink_node));
-
-    // Start measuring path search time
-    std::chrono::steady_clock::time_point begin_time = std::chrono::steady_clock::now();
-
-    HeapNode cheapest;
-    while (heap_.try_pop(cheapest)) {
-        // inode with the cheapest total cost in current route tree to be expanded on
-        const auto& [new_total_cost, inode] = cheapest;
-        update_router_stats(router_stats_,
-                            /*is_push=*/false,
-                            inode,
-                            rr_graph_);
-
-        VTR_LOGV_DEBUG(router_debug_, "  Popping node %d (cost: %g)\n",
-                       inode, new_total_cost);
-
-        // Have we found the target?
-        if (inode == sink_node) {
-            // If we're running RCV, the path will be stored in the path_data->path_rr vector
-            // This is then placed into the traceback so that the correct path is returned
-            // TODO: This can be eliminated by modifying the actual traceback function in route_timing
-            if (rcv_path_manager.is_enabled()) {
-                rcv_path_manager.insert_backwards_path_into_traceback(rcv_path_data[inode],
-                                                                      rr_node_route_inf_[inode].path_cost,
-                                                                      rr_node_route_inf_[inode].backward_path_cost,
-                                                                      route_ctx);
-            }
-            VTR_LOGV_DEBUG(router_debug_, "  Found target %8d (%s)\n", inode, describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, inode, is_flat_).c_str());
-            break;
-        }
-
-        // If not, keep searching
-        timing_driven_expand_cheapest(inode,
-                                      new_total_cost,
-                                      sink_node,
-                                      cost_params,
-                                      bounding_box,
-                                      target_bb);
-    }
-
-    // Stop measuring path search time
-    std::chrono::steady_clock::time_point end_time = std::chrono::steady_clock::now();
-    path_search_cumulative_time += std::chrono::duration_cast<std::chrono::microseconds>(end_time - begin_time);
-}
-
-// Find shortest paths from specified route tree to all nodes in the RR graph
-template<typename Heap>
-vtr::vector<RRNodeId, RTExploredNode> ConnectionRouter<Heap>::timing_driven_find_all_shortest_paths_from_route_tree(
-    const RouteTreeNode& rt_root,
-    const t_conn_cost_params& cost_params,
-    const t_bb& bounding_box,
-    RouterStats& router_stats,
-    const ConnectionParameters& conn_params) {
-    router_stats_ = &router_stats;
-    conn_params_ = &conn_params;
-
-    // Add the route tree to the heap with no specific target node
-    RRNodeId target_node = RRNodeId::INVALID();
-    add_route_tree_to_heap(rt_root, target_node, cost_params, bounding_box);
-    heap_.build_heap(); // via sifting down everything
-
-    auto res = timing_driven_find_all_shortest_paths_from_heap(cost_params, bounding_box);
-    heap_.empty_heap();
-
-    return res;
-}
-
-// Find shortest paths from current heap to all nodes in the RR graph
-//
-// Since there is no single *target* node this uses Dijkstra's algorithm
-// with a modified exit condition (runs until heap is empty).
-template<typename Heap>
-vtr::vector<RRNodeId, RTExploredNode> ConnectionRouter<Heap>::timing_driven_find_all_shortest_paths_from_heap(
-    const t_conn_cost_params& cost_params,
-    const t_bb& bounding_box) {
-    vtr::vector<RRNodeId, RTExploredNode> cheapest_paths(rr_nodes_.size());
-
-    VTR_ASSERT_SAFE(heap_.is_valid());
-
-    if (heap_.is_empty_heap()) { // No source
-        VTR_LOGV_DEBUG(router_debug_, "  Initial heap empty (no source)\n");
-    }
-
-    // Start measuring path search time
-    std::chrono::steady_clock::time_point begin_time = std::chrono::steady_clock::now();
-
-    HeapNode cheapest;
-    while (heap_.try_pop(cheapest)) {
-        // inode with the cheapest total cost in current route tree to be expanded on
-        const auto& [new_total_cost, inode] = cheapest;
-        update_router_stats(router_stats_,
-                            /*is_push=*/false,
-                            inode,
-                            rr_graph_);
-
-        VTR_LOGV_DEBUG(router_debug_, "  Popping node %d (cost: %g)\n",
-                       inode, new_total_cost);
-
-        // Since we want to find shortest paths to all nodes in the graph
-        // we do not specify a target node.
-        //
-        // By setting the target_node to INVALID in combination with the NoOp router
-        // lookahead we can re-use the node exploration code from the regular router
-        RRNodeId target_node = RRNodeId::INVALID();
-
-        timing_driven_expand_cheapest(inode,
-                                      new_total_cost,
-                                      target_node,
-                                      cost_params,
-                                      bounding_box,
-                                      t_bb());
-
-        if (cheapest_paths[inode].index == RRNodeId::INVALID() || cheapest_paths[inode].total_cost >= new_total_cost) {
-            VTR_LOGV_DEBUG(router_debug_, "  Better cost to node %d: %g (was %g)\n", inode, new_total_cost, cheapest_paths[inode].total_cost);
-            // Only the `index` and `prev_edge` fields of `cheapest_paths[inode]` are used after this function returns
-            cheapest_paths[inode].index = inode;
-            cheapest_paths[inode].prev_edge = rr_node_route_inf_[inode].prev_edge;
-        } else {
-            VTR_LOGV_DEBUG(router_debug_, "  Worse cost to node %d: %g (better %g)\n", inode, new_total_cost, cheapest_paths[inode].total_cost);
-        }
-    }
-
-    // Stop measuring path search time
-    std::chrono::steady_clock::time_point end_time = std::chrono::steady_clock::now();
-    path_search_cumulative_time += std::chrono::duration_cast<std::chrono::microseconds>(end_time - begin_time);
-
-    return cheapest_paths;
-}
-
-template<typename Heap>
-void ConnectionRouter<Heap>::timing_driven_expand_cheapest(RRNodeId from_node,
-                                                           float new_total_cost,
-                                                           RRNodeId target_node,
-                                                           const t_conn_cost_params& cost_params,
-                                                           const t_bb& bounding_box,
-                                                           const t_bb& target_bb) {
-    float best_total_cost = rr_node_route_inf_[from_node].path_cost;
-    if (best_total_cost == new_total_cost) {
-        // Explore from this node, since its total cost is exactly the same as
-        // the best total cost ever seen for this node. Otherwise, prune this node
-        // to reduce redundant work (i.e., unnecessary neighbor exploration).
-        // `new_total_cost` is used here as an identifier to detect if the pair
-        // (from_node or inode, new_total_cost) was the most recently pushed
-        // element for the corresponding node.
-        //
-        // Note: For RCV, it often isn't searching for a shortest path; it is
-        // searching for a path in the target delay range. So it might find a
-        // path to node n that has a higher `backward_path_cost` but the `total_cost`
-        // (including expected delay to sink, going through a cost function that
-        // checks that against the target delay) might be lower than the previously
-        // stored value. In that case we want to re-expand the node so long as
-        // it doesn't create a loop. That `rcv_path_manager` should store enough
-        // info for us to avoid loops.
-        RTExploredNode current;
-        current.index = from_node;
-        current.backward_path_cost = rr_node_route_inf_[from_node].backward_path_cost;
-        current.prev_edge = rr_node_route_inf_[from_node].prev_edge;
-        current.R_upstream = rr_node_route_inf_[from_node].R_upstream;
-
-        VTR_LOGV_DEBUG(router_debug_, "    Better cost to %d\n", from_node);
-        VTR_LOGV_DEBUG(router_debug_, "    New total cost: %g\n", new_total_cost);
-        VTR_LOGV_DEBUG(router_debug_ && (current.prev_edge != RREdgeId::INVALID()),
-                       "      Setting path costs for associated node %d (from %d edge %zu)\n",
-                       from_node,
-                       static_cast<size_t>(rr_graph_->edge_src_node(current.prev_edge)),
-                       static_cast<size_t>(current.prev_edge));
-
-        timing_driven_expand_neighbours(current, cost_params, bounding_box, target_node, target_bb);
-    } else {
-        // Post-heap prune, do not re-explore from the current/new partial path as it
-        // has worse cost than the best partial path to this node found so far
-        VTR_LOGV_DEBUG(router_debug_, "    Worse cost to %d\n", from_node);
-        VTR_LOGV_DEBUG(router_debug_, "    Old total cost: %g\n", best_total_cost);
-        VTR_LOGV_DEBUG(router_debug_, "    New total cost: %g\n", new_total_cost);
-    }
-}
-
-template<typename Heap>
-void ConnectionRouter<Heap>::timing_driven_expand_neighbours(const RTExploredNode& current,
-                                                             const t_conn_cost_params& cost_params,
-                                                             const t_bb& bounding_box,
-                                                             RRNodeId target_node,
-                                                             const t_bb& target_bb) {
-    /* Puts all the rr_nodes adjacent to current on the heap. */
-
-    // For each node associated with the current heap element, expand all of it's neighbors
-    auto edges = rr_nodes_.edge_range(current.index);
-
-    // This is a simple prefetch that prefetches:
-    //  - RR node data reachable from this node
-    //  - rr switch data to reach those nodes from this node.
-    //
-    // This code will be a NOP on compiler targets that do not have a
-    // builtin to emit prefetch instructions.
-    //
-    // This code will be a NOP on CPU targets that lack prefetch instructions.
-    // All modern x86 and ARM64 platforms provide prefetch instructions.
-    //
-    // This code delivers ~6-8% reduction in wallclock time when running Titan
-    // benchmarks, and was specifically measured against the gsm_switch and
-    // directrf vtr_reg_weekly running in high effort.
-    //
-    //  - directrf_stratixiv_arch_timing.blif
-    //  - gsm_switch_stratixiv_arch_timing.blif
-    //
-    for (RREdgeId from_edge : edges) {
-        RRNodeId to_node = rr_nodes_.edge_sink_node(from_edge);
-        rr_nodes_.prefetch_node(to_node);
-
-        int switch_idx = rr_nodes_.edge_switch(from_edge);
-        VTR_PREFETCH(&rr_switch_inf_[switch_idx], 0, 0);
-    }
-
-    for (RREdgeId from_edge : edges) {
-        RRNodeId to_node = rr_nodes_.edge_sink_node(from_edge);
-        timing_driven_expand_neighbour(current,
-                                       from_edge,
-                                       to_node,
-                                       cost_params,
-                                       bounding_box,
-                                       target_node,
-                                       target_bb);
-    }
-}
-
-// Conditionally adds to_node to the router heap (via path from from_node via from_edge).
-// RR nodes outside the expanded bounding box specified in bounding_box are not added
-// to the heap.
-template<typename Heap>
-void ConnectionRouter<Heap>::timing_driven_expand_neighbour(const RTExploredNode& current,
-                                                            RREdgeId from_edge,
-                                                            RRNodeId to_node,
-                                                            const t_conn_cost_params& cost_params,
-                                                            const t_bb& bounding_box,
-                                                            RRNodeId target_node,
-                                                            const t_bb& target_bb) {
-    VTR_ASSERT(bounding_box.layer_max < g_vpr_ctx.device().grid.get_num_layers());
-
-    const RRNodeId& from_node = current.index;
-
-    // BB-pruning
-    // Disable BB-pruning if RCV is enabled, as this can make it harder for circuits with high negative hold slack to resolve this
-    // TODO: Only disable pruning if the net has negative hold slack, maybe go off budgets
-    if (!inside_bb(to_node, bounding_box)
-        && !rcv_path_manager.is_enabled()) {
-        VTR_LOGV_DEBUG(router_debug_,
-                       "      Pruned expansion of node %d edge %zu -> %d"
-                       " (to node location %d,%d,%d x %d,%d,%d outside of expanded"
-                       " net bounding box %d,%d,%d x %d,%d,%d)\n",
-                       from_node, size_t(from_edge), size_t(to_node),
-                       rr_graph_->node_xlow(to_node), rr_graph_->node_ylow(to_node), rr_graph_->node_layer(to_node),
-                       rr_graph_->node_xhigh(to_node), rr_graph_->node_yhigh(to_node), rr_graph_->node_layer(to_node),
-                       bounding_box.xmin, bounding_box.ymin, bounding_box.layer_min,
-                       bounding_box.xmax, bounding_box.ymax, bounding_box.layer_max);
-        return; /* Node is outside (expanded) bounding box. */
-    }
-
-    /* Prune away IPINs that lead to blocks other than the target one.  Avoids  *
-     * the issue of how to cost them properly so they don't get expanded before *
-     * more promising routes, but makes route-through (via CLBs) impossible.   *
-     * Change this if you want to investigate route-throughs.                   */
-    if (target_node != RRNodeId::INVALID()) {
-        e_rr_type to_type = rr_graph_->node_type(to_node);
-        if (to_type == e_rr_type::IPIN) {
-            // Check if this IPIN leads to the target block
-            // IPIN's of the target block should be contained within it's bounding box
-            int to_xlow = rr_graph_->node_xlow(to_node);
-            int to_ylow = rr_graph_->node_ylow(to_node);
-            int to_layer = rr_graph_->node_layer(to_node);
-            int to_xhigh = rr_graph_->node_xhigh(to_node);
-            int to_yhigh = rr_graph_->node_yhigh(to_node);
-            if (to_xlow < target_bb.xmin
-                || to_ylow < target_bb.ymin
-                || to_xhigh > target_bb.xmax
-                || to_yhigh > target_bb.ymax
-                || to_layer < target_bb.layer_min
-                || to_layer > target_bb.layer_max) {
-                VTR_LOGV_DEBUG(router_debug_,
-                               "      Pruned expansion of node %d edge %zu -> %d"
-                               " (to node is IPIN at %d,%d,%d x %d,%d,%d which does not"
-                               " lead to target block %d,%d,%d x %d,%d,%d)\n",
-                               from_node, size_t(from_edge), size_t(to_node),
-                               to_xlow, to_ylow, to_layer,
-                               to_xhigh, to_yhigh, to_layer,
-                               target_bb.xmin, target_bb.ymin, target_bb.layer_min,
-                               target_bb.xmax, target_bb.ymax, target_bb.layer_max);
-                return;
-            }
-        }
-    }
-
-    VTR_LOGV_DEBUG(router_debug_, "      Expanding node %d edge %zu -> %d\n",
-                   from_node, size_t(from_edge), size_t(to_node));
-
-    // Check if the node exists in the route tree when RCV is enabled
-    // Other pruning methods have been disabled when RCV is on, so this method is required to prevent "loops" from being created
-    bool node_exists = false;
-    if (rcv_path_manager.is_enabled()) {
-        node_exists = rcv_path_manager.node_exists_in_tree(rcv_path_data[from_node],
-                                                           to_node);
-    }
-
-    if (!node_exists || !rcv_path_manager.is_enabled()) {
-        timing_driven_add_to_heap(cost_params,
-                                  current,
-                                  to_node,
-                                  from_edge,
-                                  target_node);
-    }
-}
-
-// Add to_node to the heap, and also add any nodes which are connected by non-configurable edges
-template<typename Heap>
-void ConnectionRouter<Heap>::timing_driven_add_to_heap(const t_conn_cost_params& cost_params,
-                                                       const RTExploredNode& current,
-                                                       RRNodeId to_node,
-                                                       const RREdgeId from_edge,
-                                                       RRNodeId target_node) {
-    const auto& device_ctx = g_vpr_ctx.device();
-    const RRNodeId& from_node = current.index;
-
-    // Initialized to current
-    RTExploredNode next;
-    next.R_upstream = current.R_upstream;
-    next.index = to_node;
-    next.prev_edge = from_edge;
-    next.total_cost = std::numeric_limits<float>::infinity(); // Not used directly
-    next.backward_path_cost = current.backward_path_cost;
-
-    // Initalize RCV data struct if needed, otherwise it's set to nullptr
-    rcv_path_manager.alloc_path_struct(next.path_data);
-    // path_data variables are initialized to current values
-    if (rcv_path_manager.is_enabled() && rcv_path_data[from_node]) {
-        next.path_data->backward_cong = rcv_path_data[from_node]->backward_cong;
-        next.path_data->backward_delay = rcv_path_data[from_node]->backward_delay;
-    }
-
-    evaluate_timing_driven_node_costs(&next,
-                                      cost_params,
-                                      from_node,
-                                      target_node);
-
-    float best_total_cost = rr_node_route_inf_[to_node].path_cost;
-    float best_back_cost = rr_node_route_inf_[to_node].backward_path_cost;
-
-    float new_total_cost = next.total_cost;
-    float new_back_cost = next.backward_path_cost;
-
-    // We need to only expand this node if it is a better path. And we need to
-    // update its `rr_node_route_inf` data as we put it into the heap; there may
-    // be other (previously explored) paths to this node in the heap already,
-    // but they will be pruned when we pop those heap nodes later as we'll see
-    // they have inferior costs to what is in the `rr_node_route_inf` data for
-    // this node.
-    // FIXME: Adding a link to the FPT paper when it is public
-    //
-    // When RCV is enabled, prune based on the RCV-specific total path cost (see
-    // in `compute_node_cost_using_rcv` in `evaluate_timing_driven_node_costs`)
-    // to allow detours to get better QoR.
-    if ((!rcv_path_manager.is_enabled() && best_back_cost > new_back_cost) || (rcv_path_manager.is_enabled() && best_total_cost > new_total_cost)) {
-        VTR_LOGV_DEBUG(router_debug_, "      Expanding to node %d (%s)\n", to_node,
-                       describe_rr_node(device_ctx.rr_graph,
-                                        device_ctx.grid,
-                                        device_ctx.rr_indexed_data,
-                                        to_node,
-                                        is_flat_)
-                           .c_str());
-        VTR_LOGV_DEBUG(router_debug_, "        New Total Cost %g New back Cost %g\n", new_total_cost, new_back_cost);
-        //Add node to the heap only if the cost via the current partial path is less than the
-        //best known cost, since there is no reason for the router to expand more expensive paths.
-        //
-        //Pre-heap prune to keep the heap small, by not putting paths which are known to be
-        //sub-optimal (at this point in time) into the heap.
-
-        update_cheapest(next, from_node);
-
-        heap_.add_to_heap({new_total_cost, to_node});
-        update_router_stats(router_stats_,
-                            /*is_push=*/true,
-                            to_node,
-                            rr_graph_);
-
-    } else {
-        VTR_LOGV_DEBUG(router_debug_, "      Didn't expand to %d (%s)\n", to_node, describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, to_node, is_flat_).c_str());
-        VTR_LOGV_DEBUG(router_debug_, "        Prev Total Cost %g Prev back Cost %g \n", best_total_cost, best_back_cost);
-        VTR_LOGV_DEBUG(router_debug_, "        New Total Cost %g New back Cost %g \n", new_total_cost, new_back_cost);
-    }
-
-    if (rcv_path_manager.is_enabled() && next.path_data != nullptr) {
-        rcv_path_manager.free_path_struct(next.path_data);
-    }
-}
-
-#ifdef VTR_ASSERT_SAFE_ENABLED
-
-//Returns true if both nodes are part of the same non-configurable edge set
-static bool same_non_config_node_set(RRNodeId from_node, RRNodeId to_node) {
-    auto& device_ctx = g_vpr_ctx.device();
-
-    auto from_itr = device_ctx.rr_node_to_non_config_node_set.find(from_node);
-    auto to_itr = device_ctx.rr_node_to_non_config_node_set.find(to_node);
-
-    if (from_itr == device_ctx.rr_node_to_non_config_node_set.end()
-        || to_itr == device_ctx.rr_node_to_non_config_node_set.end()) {
-        return false; //Not part of a non-config node set
-    }
-
-    return from_itr->second == to_itr->second; //Check for same non-config set IDs
-}
-
-#endif
-
-template<typename Heap>
-float ConnectionRouter<Heap>::compute_node_cost_using_rcv(const t_conn_cost_params cost_params,
-                                                          RRNodeId to_node,
-                                                          RRNodeId target_node,
-                                                          float backwards_delay,
-                                                          float backwards_cong,
-                                                          float R_upstream) {
-    float expected_delay;
-    float expected_cong;
-
-    const t_conn_delay_budget* delay_budget = cost_params.delay_budget;
-    // TODO: This function is not tested for is_flat == true
-    VTR_ASSERT(is_flat_ != true);
-    std::tie(expected_delay, expected_cong) = router_lookahead_.get_expected_delay_and_cong(to_node, target_node, cost_params, R_upstream);
-
-    float expected_total_delay_cost;
-    float expected_total_cong_cost;
-
-    float expected_total_cong = expected_cong + backwards_cong;
-    float expected_total_delay = expected_delay + backwards_delay;
-
-    //If budgets specified calculate cost as described by RCV paper:
-    //    R. Fung, V. Betz and W. Chow, "Slack Allocation and Routing to Improve FPGA Timing While
-    //     Repairing Short-Path Violations," in IEEE Transactions on Computer-Aided Design of
-    //     Integrated Circuits and Systems, vol. 27, no. 4, pp. 686-697, April 2008.
-
-    // Normalization constant defined in RCV paper cited above
-    constexpr float NORMALIZATION_CONSTANT = 100e-12;
-
-    expected_total_delay_cost = expected_total_delay;
-    expected_total_delay_cost += (delay_budget->short_path_criticality + cost_params.criticality) * std::max(0.f, delay_budget->target_delay - expected_total_delay);
-    // expected_total_delay_cost += std::pow(std::max(0.f, expected_total_delay - delay_budget->max_delay), 2) / NORMALIZATION_CONSTANT;
-    expected_total_delay_cost += std::pow(std::max(0.f, delay_budget->min_delay - expected_total_delay), 2) / NORMALIZATION_CONSTANT;
-    expected_total_cong_cost = expected_total_cong;
-
-    float total_cost = expected_total_delay_cost + expected_total_cong_cost;
-
-    return total_cost;
-}
-
-// Empty the route tree set node, use this after each net is routed
-template<typename Heap>
-void ConnectionRouter<Heap>::empty_rcv_route_tree_set() {
-    rcv_path_manager.empty_route_tree_nodes();
-}
-
-// Enable or disable RCV
-template<typename Heap>
-void ConnectionRouter<Heap>::set_rcv_enabled(bool enable) {
-    rcv_path_manager.set_enabled(enable);
-    if (enable) {
-        rcv_path_data.resize(rr_node_route_inf_.size());
-    }
-}
-
-//Calculates the cost of reaching to_node (i.e., to->index)
-template<typename Heap>
-void ConnectionRouter<Heap>::evaluate_timing_driven_node_costs(RTExploredNode* to,
-                                                               const t_conn_cost_params& cost_params,
-                                                               RRNodeId from_node,
-                                                               RRNodeId target_node) {
-    /* new_costs.backward_cost: is the "known" part of the cost to this node -- the
-     * congestion cost of all the routing resources back to the existing route
-     * plus the known delay of the total path back to the source.
-     *
-     * new_costs.total_cost: is this "known" backward cost + an expected cost to get to the target.
-     *
-     * new_costs.R_upstream: is the upstream resistance at the end of this node
-     */
-
-    //Info for the switch connecting from_node to_node (i.e., to->index)
-    int iswitch = rr_nodes_.edge_switch(to->prev_edge);
-    bool switch_buffered = rr_switch_inf_[iswitch].buffered();
-    bool reached_configurably = rr_switch_inf_[iswitch].configurable();
-    float switch_R = rr_switch_inf_[iswitch].R;
-    float switch_Tdel = rr_switch_inf_[iswitch].Tdel;
-    float switch_Cinternal = rr_switch_inf_[iswitch].Cinternal;
-
-    //To node info
-    auto rc_index = rr_graph_->node_rc_index(to->index);
-    float node_C = rr_rc_data_[rc_index].C;
-    float node_R = rr_rc_data_[rc_index].R;
-
-    //From node info
-    float from_node_R = rr_rc_data_[rr_graph_->node_rc_index(from_node)].R;
-
-    //Update R_upstream
-    if (switch_buffered) {
-        to->R_upstream = 0.; //No upstream resistance
-    } else {
-        //R_Upstream already initialized
-    }
-
-    to->R_upstream += switch_R; //Switch resistance
-    to->R_upstream += node_R;   //Node resistance
-
-    //Calculate delay
-    float Rdel = to->R_upstream - 0.5 * node_R; //Only consider half node's resistance for delay
-    float Tdel = switch_Tdel + Rdel * node_C;
-
-    //Depending on the switch used, the Tdel of the upstream node (from_node) may change due to
-    //increased loading from the switch's internal capacitance.
-    //
-    //Even though this delay physically affects from_node, we make the adjustment (now) on the to_node,
-    //since only once we've reached to to_node do we know the connection used (and the switch enabled).
-    //
-    //To adjust for the time delay, we compute the product of the Rdel associated with from_node and
-    //the internal capacitance of the switch.
-    //
-    //First, we will calculate Rdel_adjust (just like in the computation for Rdel, we consider only
-    //half of from_node's resistance).
-    float Rdel_adjust = to->R_upstream - 0.5 * from_node_R;
-
-    //Second, we adjust the Tdel to account for the delay caused by the internal capacitance.
-    Tdel += Rdel_adjust * switch_Cinternal;
-
-    float cong_cost = 0.;
-    if (reached_configurably) {
-        cong_cost = get_rr_cong_cost(to->index, cost_params.pres_fac);
-    } else {
-        //Reached by a non-configurable edge.
-        //Therefore the from_node and to_node are part of the same non-configurable node set.
-#ifdef VTR_ASSERT_SAFE_ENABLED
-        VTR_ASSERT_SAFE_MSG(same_non_config_node_set(from_node, to->index),
-                            "Non-configurably connected edges should be part of the same node set");
-#endif
-
-        //The congestion cost of all nodes in the set has already been accounted for (when
-        //the current path first expanded a node in the set). Therefore do *not* re-add the congestion
-        //cost.
-        cong_cost = 0.;
-    }
-    if (conn_params_->router_opt_choke_points_ && is_flat_ && rr_graph_->node_type(to->index) == e_rr_type::IPIN) {
-        auto find_res = conn_params_->connection_choking_spots_.find(to->index);
-        if (find_res != conn_params_->connection_choking_spots_.end()) {
-            cong_cost = cong_cost / pow(2, (float)find_res->second);
-        }
-    }
-
-    //Update the backward cost (upstream already included)
-    to->backward_path_cost += (1. - cost_params.criticality) * cong_cost; //Congestion cost
-    to->backward_path_cost += cost_params.criticality * Tdel;             //Delay cost
-
-    if (cost_params.bend_cost != 0.) {
-        e_rr_type from_type = rr_graph_->node_type(from_node);
-        e_rr_type to_type = rr_graph_->node_type(to->index);
-        if ((from_type == e_rr_type::CHANX && to_type == e_rr_type::CHANY) || (from_type == e_rr_type::CHANY && to_type == e_rr_type::CHANX)) {
-            to->backward_path_cost += cost_params.bend_cost; //Bend cost
-        }
-    }
-
-    float total_cost = 0.;
-
-    if (rcv_path_manager.is_enabled() && to->path_data != nullptr) {
-        to->path_data->backward_delay += cost_params.criticality * Tdel;
-        to->path_data->backward_cong += (1. - cost_params.criticality) * get_rr_cong_cost(to->index, cost_params.pres_fac);
-
-        total_cost = compute_node_cost_using_rcv(cost_params, to->index, target_node, to->path_data->backward_delay, to->path_data->backward_cong, to->R_upstream);
-    } else {
-        const auto& device_ctx = g_vpr_ctx.device();
-        //Update total cost
-        float expected_cost = router_lookahead_.get_expected_cost(to->index, target_node, cost_params, to->R_upstream);
-        VTR_LOGV_DEBUG(router_debug_ && !std::isfinite(expected_cost),
-                       "        Lookahead from %s (%s) to %s (%s) is non-finite, expected_cost = %f, to->R_upstream = %f\n",
-                       rr_node_arch_name(to->index, is_flat_).c_str(),
-                       describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, to->index, is_flat_).c_str(),
-                       rr_node_arch_name(target_node, is_flat_).c_str(),
-                       describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, target_node, is_flat_).c_str(),
-                       expected_cost, to->R_upstream);
-        total_cost += to->backward_path_cost + cost_params.astar_fac * std::max(0.f, expected_cost - cost_params.astar_offset);
-    }
-    to->total_cost = total_cost;
-}
-
-//Adds the route tree rooted at rt_node to the heap, preparing it to be
-//used as branch-points for further routing.
-template<typename Heap>
-void ConnectionRouter<Heap>::add_route_tree_to_heap(
-    const RouteTreeNode& rt_node,
-    RRNodeId target_node,
-    const t_conn_cost_params& cost_params,
-    const t_bb& net_bb) {
-    /* Puts the entire partial routing below and including rt_node onto the heap *
-     * (except for those parts marked as not to be expanded) by calling itself   *
-     * recursively.                                                              */
-
-    /* Pre-order depth-first traversal */
-    // IPINs and SINKS are not re_expanded
-    if (rt_node.re_expand) {
-        add_route_tree_node_to_heap(rt_node,
-                                    target_node,
-                                    cost_params,
-                                    net_bb);
-    }
-
-    for (const RouteTreeNode& child_node : rt_node.child_nodes()) {
-        if (is_flat_) {
-            if (relevant_node_to_target(rr_graph_,
-                                        child_node.inode,
-                                        target_node)) {
-                add_route_tree_to_heap(child_node,
-                                       target_node,
-                                       cost_params,
-                                       net_bb);
-            }
-        } else {
-            add_route_tree_to_heap(child_node,
-                                   target_node,
-                                   cost_params,
-                                   net_bb);
-        }
-    }
-}
-
-//Unconditionally adds rt_node to the heap
-//
-//Note that if you want to respect rt_node.re_expand that is the caller's
-//responsibility.
-template<typename Heap>
-void ConnectionRouter<Heap>::add_route_tree_node_to_heap(
-    const RouteTreeNode& rt_node,
-    RRNodeId target_node,
-    const t_conn_cost_params& cost_params,
-    const t_bb& net_bb) {
-    const auto& device_ctx = g_vpr_ctx.device();
-    const RRNodeId inode = rt_node.inode;
-    float backward_path_cost = cost_params.criticality * rt_node.Tdel;
-    float R_upstream = rt_node.R_upstream;
-
-    /* Don't push to heap if not in bounding box: no-op for serial router, important for parallel router */
-    if (!inside_bb(rt_node.inode, net_bb))
-        return;
-
-    // after budgets are loaded, calculate delay cost as described by RCV paper
-    /* R. Fung, V. Betz and W. Chow, "Slack Allocation and Routing to Improve FPGA Timing While
-     * Repairing Short-Path Violations," in IEEE Transactions on Computer-Aided Design of
-     * Integrated Circuits and Systems, vol. 27, no. 4, pp. 686-697, April 2008.*/
-    // float expected_cost = router_lookahead_.get_expected_cost(inode, target_node, cost_params, R_upstream);
-
-    if (!rcv_path_manager.is_enabled()) {
-        // tot_cost = backward_path_cost + cost_params.astar_fac * expected_cost;
-        float expected_cost = router_lookahead_.get_expected_cost(inode, target_node, cost_params, R_upstream);
-        float tot_cost = backward_path_cost + cost_params.astar_fac * std::max(0.f, expected_cost - cost_params.astar_offset);
-        VTR_LOGV_DEBUG(router_debug_, "  Adding node %8d to heap from init route tree with cost %g (%s)\n",
-                       inode,
-                       tot_cost,
-                       describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, inode, is_flat_).c_str());
-
-        if (tot_cost > rr_node_route_inf_[inode].path_cost) {
-            return;
-        }
-        add_to_mod_list(inode);
-        rr_node_route_inf_[inode].path_cost = tot_cost;
-        rr_node_route_inf_[inode].prev_edge = RREdgeId::INVALID();
-        rr_node_route_inf_[inode].backward_path_cost = backward_path_cost;
-        rr_node_route_inf_[inode].R_upstream = R_upstream;
-        heap_.push_back({tot_cost, inode});
-
-        // push_back_node(&heap_, rr_node_route_inf_,
-        //                inode, tot_cost, RREdgeId::INVALID(),
-        //                backward_path_cost, R_upstream);
-    } else {
-        float expected_total_cost = compute_node_cost_using_rcv(cost_params, inode, target_node, rt_node.Tdel, 0, R_upstream);
-
-        add_to_mod_list(inode);
-        rr_node_route_inf_[inode].path_cost = expected_total_cost;
-        rr_node_route_inf_[inode].prev_edge = RREdgeId::INVALID();
-        rr_node_route_inf_[inode].backward_path_cost = backward_path_cost;
-        rr_node_route_inf_[inode].R_upstream = R_upstream;
-
-        rcv_path_manager.alloc_path_struct(rcv_path_data[inode]);
-        rcv_path_data[inode]->backward_delay = rt_node.Tdel;
-
-        heap_.push_back({expected_total_cost, inode});
-
-        // push_back_node_with_info(&heap_, inode, expected_total_cost,
-        //                          backward_path_cost, R_upstream, rt_node.Tdel, &rcv_path_manager);
-    }
-
-    update_router_stats(router_stats_,
-                        /*is_push=*/true,
-                        inode,
-                        rr_graph_);
-
-    if constexpr (VTR_ENABLE_DEBUG_LOGGING_CONST_EXPR) {
-        router_stats_->rt_node_pushes[rr_graph_->node_type(inode)]++;
-    }
-}
-
-/* Expand bb by inode's extents and clip against net_bb */
-inline void expand_highfanout_bounding_box(t_bb& bb, const t_bb& net_bb, RRNodeId inode, const RRGraphView* rr_graph) {
-    bb.xmin = std::max<int>(net_bb.xmin, std::min<int>(bb.xmin, rr_graph->node_xlow(inode)));
-    bb.ymin = std::max<int>(net_bb.ymin, std::min<int>(bb.ymin, rr_graph->node_ylow(inode)));
-    bb.xmax = std::min<int>(net_bb.xmax, std::max<int>(bb.xmax, rr_graph->node_xhigh(inode)));
-    bb.ymax = std::min<int>(net_bb.ymax, std::max<int>(bb.ymax, rr_graph->node_yhigh(inode)));
-    bb.layer_min = std::min<int>(bb.layer_min, rr_graph->node_layer(inode));
-    bb.layer_max = std::max<int>(bb.layer_max, rr_graph->node_layer(inode));
-}
-
-/* Expand bb by HIGH_FANOUT_BB_FAC and clip against net_bb */
-inline void adjust_highfanout_bounding_box(t_bb& bb, const t_bb& net_bb) {
-    constexpr int HIGH_FANOUT_BB_FAC = 3;
-
-    bb.xmin = std::max<int>(net_bb.xmin, bb.xmin - HIGH_FANOUT_BB_FAC);
-    bb.ymin = std::max<int>(net_bb.ymin, bb.ymin - HIGH_FANOUT_BB_FAC);
-    bb.xmax = std::min<int>(net_bb.xmax, bb.xmax + HIGH_FANOUT_BB_FAC);
-    bb.ymax = std::min<int>(net_bb.ymax, bb.ymax + HIGH_FANOUT_BB_FAC);
-    bb.layer_min = std::min<int>(net_bb.layer_min, bb.layer_min);
-    bb.layer_max = std::max<int>(net_bb.layer_max, bb.layer_max);
-}
-
-template<typename Heap>
-t_bb ConnectionRouter<Heap>::add_high_fanout_route_tree_to_heap(
-    const RouteTreeNode& rt_root,
-    RRNodeId target_node,
-    const t_conn_cost_params& cost_params,
-    const SpatialRouteTreeLookup& spatial_rt_lookup,
-    const t_bb& net_bounding_box) {
-    //For high fanout nets we only add those route tree nodes which are spatially close
-    //to the sink.
-    //
-    //Based on:
-    //  J. Swartz, V. Betz, J. Rose, "A Fast Routability-Driven Router for FPGAs", FPGA, 1998
-    //
-    //We rely on a grid-based spatial look-up which is maintained for high fanout nets by
-    //update_route_tree(), which allows us to add spatially close route tree nodes without traversing
-    //the entire route tree (which is likely large for a high fanout net).
-
-    //Determine which bin the target node is located in
-
-    int target_bin_x = grid_to_bin_x(rr_graph_->node_xlow(target_node), spatial_rt_lookup);
-    int target_bin_y = grid_to_bin_y(rr_graph_->node_ylow(target_node), spatial_rt_lookup);
-
-    auto target_layer = rr_graph_->node_layer(target_node);
-
-    int chan_nodes_added = 0;
-
-    t_bb highfanout_bb;
-    highfanout_bb.xmin = rr_graph_->node_xlow(target_node);
-    highfanout_bb.xmax = rr_graph_->node_xhigh(target_node);
-    highfanout_bb.ymin = rr_graph_->node_ylow(target_node);
-    highfanout_bb.ymax = rr_graph_->node_yhigh(target_node);
-    highfanout_bb.layer_min = target_layer;
-    highfanout_bb.layer_max = target_layer;
-
-    //Add existing routing starting from the target bin.
-    //If the target's bin has insufficient existing routing add from the surrounding bins
-    constexpr int SINGLE_BIN_MIN_NODES = 2;
-    bool done = false;
-    bool found_node_on_same_layer = false;
-    for (int dx : {0, -1, +1}) {
-        size_t bin_x = target_bin_x + dx;
-
-        if (bin_x > spatial_rt_lookup.dim_size(0) - 1) continue; //Out of range
-
-        for (int dy : {0, -1, +1}) {
-            size_t bin_y = target_bin_y + dy;
-
-            if (bin_y > spatial_rt_lookup.dim_size(1) - 1) continue; //Out of range
-
-            for (const RouteTreeNode& rt_node : spatial_rt_lookup[bin_x][bin_y]) {
-                if (!rt_node.re_expand) // Some nodes (like IPINs) shouldn't be re-expanded
-                    continue;
-                RRNodeId rr_node_to_add = rt_node.inode;
-
-                /* Flat router: don't go into clusters other than the target one */
-                if (is_flat_) {
-                    if (!relevant_node_to_target(rr_graph_, rr_node_to_add, target_node))
-                        continue;
-                }
-
-                /* In case of the parallel router, we may be dealing with a virtual net
-                 * so prune the nodes from the HF lookup against the bounding box just in case */
-                if (!inside_bb(rr_node_to_add, net_bounding_box))
-                    continue;
-
-                auto rt_node_layer_num = rr_graph_->node_layer(rr_node_to_add);
-                if (rt_node_layer_num == target_layer)
-                    found_node_on_same_layer = true;
-
-                // Put the node onto the heap
-                add_route_tree_node_to_heap(rt_node, target_node, cost_params, net_bounding_box);
-
-                // Expand HF BB to include the node (clip by original BB)
-                expand_highfanout_bounding_box(highfanout_bb, net_bounding_box, rr_node_to_add, rr_graph_);
-
-                if (rr_graph_->node_type(rr_node_to_add) == e_rr_type::CHANY || rr_graph_->node_type(rr_node_to_add) == e_rr_type::CHANX) {
-                    chan_nodes_added++;
-                }
-            }
-
-            if (dx == 0 && dy == 0 && chan_nodes_added > SINGLE_BIN_MIN_NODES && found_node_on_same_layer) {
-                //Target bin contained at least minimum amount of routing
-                //
-                //We require at least SINGLE_BIN_MIN_NODES to be added.
-                //This helps ensure we don't end up with, for example, a single
-                //routing wire running in the wrong direction which may not be
-                //able to reach the target within the bounding box.
-                done = true;
-                break;
-            }
-        }
-        if (done) break;
-    }
-    /* If we didn't find enough nodes to branch off near the target
-     * or they are on the wrong grid layer, just add the full route tree */
-    if (chan_nodes_added <= SINGLE_BIN_MIN_NODES || !found_node_on_same_layer) {
-        add_route_tree_to_heap(rt_root, target_node, cost_params, net_bounding_box);
-        return net_bounding_box;
-    } else {
-        //We found nearby routing, replace original bounding box to be localized around that routing
-        adjust_highfanout_bounding_box(highfanout_bb, net_bounding_box);
-        return highfanout_bb;
-    }
-}
-
-static inline bool relevant_node_to_target(const RRGraphView* rr_graph,
-                                           RRNodeId node_to_add,
-                                           RRNodeId target_node) {
-    VTR_ASSERT_SAFE(rr_graph->node_type(target_node) == e_rr_type::SINK);
-    auto node_to_add_type = rr_graph->node_type(node_to_add);
-    return node_to_add_type != e_rr_type::IPIN || node_in_same_physical_tile(node_to_add, target_node);
-}
-
-static inline void update_router_stats(RouterStats* router_stats,
-                                       bool is_push,
-                                       RRNodeId rr_node_id,
-                                       const RRGraphView* rr_graph) {
-    if (is_push) {
-        router_stats->heap_pushes++;
-    } else {
-        router_stats->heap_pops++;
-    }
-
-    if constexpr (VTR_ENABLE_DEBUG_LOGGING_CONST_EXPR) {
-        auto node_type = rr_graph->node_type(rr_node_id);
-        VTR_ASSERT(node_type != e_rr_type::NUM_RR_TYPES);
-
-        if (is_inter_cluster_node(*rr_graph, rr_node_id)) {
-            if (is_push) {
-                router_stats->inter_cluster_node_pushes++;
-                router_stats->inter_cluster_node_type_cnt_pushes[node_type]++;
-            } else {
-                router_stats->inter_cluster_node_pops++;
-                router_stats->inter_cluster_node_type_cnt_pops[node_type]++;
-            }
-        } else {
-            if (is_push) {
-                router_stats->intra_cluster_node_pushes++;
-                router_stats->intra_cluster_node_type_cnt_pushes[node_type]++;
-            } else {
-                router_stats->intra_cluster_node_pops++;
-                router_stats->intra_cluster_node_type_cnt_pops[node_type]++;
-            }
-        }
-    }
-}
-
-std::unique_ptr<ConnectionRouterInterface> make_connection_router(e_heap_type heap_type,
-                                                                  const DeviceGrid& grid,
-                                                                  const RouterLookahead& router_lookahead,
-                                                                  const t_rr_graph_storage& rr_nodes,
-                                                                  const RRGraphView* rr_graph,
-                                                                  const std::vector<t_rr_rc_data>& rr_rc_data,
-                                                                  const vtr::vector<RRSwitchId, t_rr_switch_inf>& rr_switch_inf,
-                                                                  vtr::vector<RRNodeId, t_rr_node_route_inf>& rr_node_route_inf,
-                                                                  bool is_flat) {
-    switch (heap_type) {
-        case e_heap_type::BINARY_HEAP:
-            return std::make_unique<ConnectionRouter<BinaryHeap>>(
-                grid,
-                router_lookahead,
-                rr_nodes,
-                rr_graph,
-                rr_rc_data,
-                rr_switch_inf,
-                rr_node_route_inf,
-                is_flat);
-        case e_heap_type::FOUR_ARY_HEAP:
-            return std::make_unique<ConnectionRouter<FourAryHeap>>(
-                grid,
-                router_lookahead,
-                rr_nodes,
-                rr_graph,
-                rr_rc_data,
-                rr_switch_inf,
-                rr_node_route_inf,
-                is_flat);
-        default:
-            VPR_FATAL_ERROR(VPR_ERROR_ROUTE, "Unknown heap_type %d",
-                            heap_type);
-    }
-}
diff --git a/vpr/src/route/connection_router.h b/vpr/src/route/connection_router.h
index 0de6d508991..f5bb7c57aa9 100644
--- a/vpr/src/route/connection_router.h
+++ b/vpr/src/route/connection_router.h
@@ -1,6 +1,26 @@
 #ifndef _CONNECTION_ROUTER_H
 #define _CONNECTION_ROUTER_H
 
+/**
+ * @file
+ * @brief This file defines the ConnectionRouter class.
+ *
+ * Overview
+ * ========
+ * The ConnectionRouter represents the timing-driven connection routers, which
+ * route from some initial set of sources (via the input rt tree) to a particular
+ * sink. VPR supports two timing-driven connection routers, including the serial
+ * connection router and the MultiQueue-based parallel connection router. This
+ * class defines the interface for the two connection routers and encapsulates
+ * the common member variables and helper functions for them.
+ *
+ * @note
+ * When the ConnectionRouter is used, it mutates the provided rr_node_route_inf.
+ * The routed path can be found by tracing from the sink node (which is returned)
+ * through the rr_node_route_inf. See update_traceback as an example of this tracing.
+ *
+ */
+
 #include "connection_router_interface.h"
 #include "rr_graph_storage.h"
 #include "route_common.h"
@@ -10,16 +30,10 @@
 #include "router_stats.h"
 #include "spatial_route_tree_lookup.h"
 
-#include "d_ary_heap.h"
-
-// This class encapsulates the timing driven connection router. This class
-// routes from some initial set of sources (via the input rt tree) to a
-// particular sink.
-//
-// When the ConnectionRouter is used, it mutates the provided
-// rr_node_route_inf.  The routed path can be found by tracing from the sink
-// node (which is returned) through the rr_node_route_inf.  See
-// update_traceback as an example of this tracing.
+/**
+ * @class ConnectionRouter defines the interface for the serial and parallel connection
+ * routers and encapsulates the common variables and helper functions for the two routers
+ */
 template<typename HeapImplementation>
 class ConnectionRouter : public ConnectionRouterInterface {
   public:
@@ -46,40 +60,36 @@ class ConnectionRouter : public ConnectionRouterInterface {
         , router_debug_(false)
         , path_search_cumulative_time(0) {
         heap_.init_heap(grid);
-        only_opin_inter_layer = (grid.get_num_layers() > 1) && inter_layer_connections_limited_to_opin(*rr_graph);
-    }
-
-    ~ConnectionRouter() {
-        VTR_LOG("Serial Connection Router is being destroyed. Time spent on path search: %.3f seconds.\n",
-                std::chrono::duration<float /*convert to seconds by default*/>(path_search_cumulative_time).count());
-    }
-
-    // Clear's the modified list.  Should be called after reset_path_costs
-    // have been called.
-    void clear_modified_rr_node_info() final {
-        modified_rr_node_inf_.clear();
-    }
-
-    // Reset modified data in rr_node_route_inf based on modified_rr_node_inf.
-    void reset_path_costs() final {
-        // Reset the node info stored in rr_node_route_inf variable
-        ::reset_path_costs(modified_rr_node_inf_);
-        // Reset the node info stored inside the connection router
-        if (rcv_path_manager.is_enabled()) {
-            for (const auto& node : modified_rr_node_inf_) {
-                rcv_path_data[node] = nullptr;
-            }
-        }
     }
 
-    /** Finds a path from the route tree rooted at rt_root to sink_node.
-     * This is used when you want to allow previous routing of the same net to
-     * serve as valid start locations for the current connection.
-     *
-     * Returns a tuple of:
-     * bool: path exists? (hard failure, rr graph disconnected)
-     * bool: should retry with full bounding box? (only used in parallel routing)
-     * RTExploredNode: the explored sink node, from which the cheapest path can be found via back-tracing */
+    virtual ~ConnectionRouter() {}
+
+    /**
+     * @brief Clears the modified list
+     * @note Should be called after reset_path_costs have been called
+     */
+    virtual void clear_modified_rr_node_info() = 0;
+
+    /**
+     * @brief Resets modified data in rr_node_route_inf based on modified_rr_node_inf
+     */
+    virtual void reset_path_costs() = 0;
+
+    /**
+     * @brief Finds a path from the route tree rooted at rt_root to sink_node
+     * @note This is used when you want to allow previous routing of the same
+     * net to serve as valid start locations for the current connection.
+     * @param rt_root RouteTreeNode describing the current routing state
+     * @param sink_node Sink node ID to route to
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     * @param router_stats Update router statistics
+     * @param conn_params Parameters to guide the routing of the given connection
+     * @return A tuple of:
+     * - bool: path exists? (hard failure, rr graph disconnected)
+     * - bool: should retry with full bounding box? (only used in parallel routing)
+     * - RTExploredNode: the explored sink node, from which the cheapest path can be found via back-tracing
+     */
     std::tuple<bool, bool, RTExploredNode> timing_driven_route_connection_from_route_tree(
         const RouteTreeNode& rt_root,
         RRNodeId sink_node,
@@ -88,16 +98,22 @@ class ConnectionRouter : public ConnectionRouterInterface {
         RouterStats& router_stats,
         const ConnectionParameters& conn_params) final;
 
-    /** Finds a path from the route tree rooted at rt_root to sink_node for a
-     * high fanout net.
-     *
-     * Unlike timing_driven_route_connection_from_route_tree(), only part of
-     * the route tree which is spatially close to the sink is added to the heap.
-     *
-     * Returns a tuple of:
-     * bool: path exists? (hard failure, rr graph disconnected)
-     * bool: should retry with full bounding box? (only used in parallel routing)
-     * RTExploredNode: the explored sink node, from which the cheapest path can be found via back-tracing */
+    /**
+     * @brief Finds a path from the route tree rooted at rt_root to sink_node for a high fanout net
+     * @note Unlike timing_driven_route_connection_from_route_tree(), only part of the route tree which
+     * is spatially close to the sink is added to the heap.
+     * @param rt_root RouteTreeNode describing the current routing state
+     * @param sink_node Sink node ID to route to
+     * @param cost_params Cost function parameters
+     * @param net_bounding_box Keep search confined to this bounding box
+     * @param spatial_rt_lookup Route tree spatial lookup
+     * @param router_stats Update router statistics
+     * @param conn_params Parameters to guide the routing of the given connection
+     * @return A tuple of:
+     * - bool: path exists? (hard failure, rr graph disconnected)
+     * - bool: should retry with full bounding box? (only used in parallel routing)
+     * - RTExploredNode: the explored sink node, from which the cheapest path can be found via back-tracing
+     */
     std::tuple<bool, bool, RTExploredNode> timing_driven_route_connection_from_route_tree_high_fanout(
         const RouteTreeNode& rt_root,
         RRNodeId sink_node,
@@ -107,159 +123,150 @@ class ConnectionRouter : public ConnectionRouterInterface {
         RouterStats& router_stats,
         const ConnectionParameters& conn_params) final;
 
-    // Finds a path from the route tree rooted at rt_root to all sinks
-    // available.
-    //
-    // Each element of the returned vector is a reachable sink.
-    //
-    // If cost_params.astar_fac is set to 0, this effectively becomes
-    // Dijkstra's algorithm with a modified exit condition (runs until heap is
-    // empty).  When using cost_params.astar_fac = 0, for efficiency the
-    // RouterLookahead used should be the NoOpLookahead.
-    //
-    // Note: This routine is currently used only to generate information that
-    // may be helpful in debugging an architecture.
-    vtr::vector<RRNodeId, RTExploredNode> timing_driven_find_all_shortest_paths_from_route_tree(
+    /**
+     * @brief Finds shortest paths from the route tree rooted at rt_root to all sinks available
+     * @note Unlike timing_driven_route_connection_from_route_tree(), only part of the route tree which
+     * is spatially close to the sink is added to the heap.
+     * @note If cost_params.astar_fac is set to 0, this effectively becomes Dijkstra's algorithm with a
+     * modified exit condition (runs until heap is empty). When using cost_params.astar_fac = 0, for
+     * efficiency the RouterLookahead used should be the NoOpLookahead.
+     * @note This routine is currently used only to generate information that may be helpful in debugging
+     * an architecture.
+     * @param rt_root RouteTreeNode describing the current routing state
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     * @param router_stats Update router statistics
+     * @param conn_params Parameters to guide the routing of the given connection
+     * @return A vector where each element is a reachable sink
+     */
+    virtual vtr::vector<RRNodeId, RTExploredNode> timing_driven_find_all_shortest_paths_from_route_tree(
         const RouteTreeNode& rt_root,
         const t_conn_cost_params& cost_params,
         const t_bb& bounding_box,
         RouterStats& router_stats,
-        const ConnectionParameters& conn_params) final;
+        const ConnectionParameters& conn_params) = 0;
 
+    /**
+     * @brief Sets router debug option
+     * @param router_debug Router debug option
+     */
     void set_router_debug(bool router_debug) final {
         router_debug_ = router_debug;
     }
 
-    // Empty the route tree set used for RCV node detection
-    // Will return if RCV is disabled
-    // Called after each net is finished routing to flush the set
-    void empty_rcv_route_tree_set() final;
-
-    // Enable or disable RCV in connection router
-    // Enabling this will utilize extra path structures, as well as the RCV cost function
-    //
-    // Ensure route budgets have been calculated before enabling this
-    void set_rcv_enabled(bool enable) final;
-
-  private:
-    // Mark that data associated with rr_node "inode" has been modified, and
-    // needs to be reset in reset_path_costs.
-    void add_to_mod_list(RRNodeId inode) {
-        if (std::isinf(rr_node_route_inf_[inode].path_cost)) {
-            modified_rr_node_inf_.push_back(inode);
-        }
-    }
-
-    // Update the route path to the node `cheapest.index` via the path from
-    // `from_node` via `cheapest.prev_edge`.
-    inline void update_cheapest(RTExploredNode& cheapest, const RRNodeId& from_node) {
-        const RRNodeId& inode = cheapest.index;
-        add_to_mod_list(inode);
-        rr_node_route_inf_[inode].prev_edge = cheapest.prev_edge;
-        rr_node_route_inf_[inode].path_cost = cheapest.total_cost;
-        rr_node_route_inf_[inode].backward_path_cost = cheapest.backward_path_cost;
-
-        // Use the already created next path structure pointer when RCV is enabled
-        if (rcv_path_manager.is_enabled()) {
-            rcv_path_manager.move(rcv_path_data[inode], cheapest.path_data);
-
-            rcv_path_data[inode]->path_rr = rcv_path_data[from_node]->path_rr;
-            rcv_path_data[inode]->edge = rcv_path_data[from_node]->edge;
-            rcv_path_data[inode]->path_rr.push_back(from_node);
-            rcv_path_data[inode]->edge.push_back(cheapest.prev_edge);
-        }
+    /**
+     * @brief Empties the route tree set used for RCV node detection
+     * @note Will immediately return if RCV is disabled. Called after
+     * each net is finished routing to flush the set.
+     */
+    void empty_rcv_route_tree_set() final {
+        rcv_path_manager.empty_route_tree_nodes();
     }
 
-    /** Common logic from timing_driven_route_connection_from_route_tree and
+    /**
+     * @brief Enables or disables RCV in connection router
+     * @note Enabling this will utilize extra path structures, as well as
+     * the RCV cost function. Ensure route budgets have been calculated
+     * before enabling this.
+     * @param enable Whether enabling RCV or not
+     */
+    virtual void set_rcv_enabled(bool enable) = 0;
+
+  protected:
+    /**
+     * @brief Common logic from timing_driven_route_connection_from_route_tree and
      * timing_driven_route_connection_from_route_tree_high_fanout for running
      * the connection router.
-     * @param[in] rt_root RouteTreeNode describing the current routing state
-     * @param[in] sink_node Sink node ID to route to
-     * @param[in] cost_params
-     * @param[in] bounding_box Keep search confined to this bounding box
-     * @return bool Signal to retry this connection with a full-device bounding box */
+     * @param rt_root RouteTreeNode describing the current routing state
+     * @param sink_node Sink node ID to route to
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     * @return bool signal to retry this connection with a full-device bounding box
+     */
     bool timing_driven_route_connection_common_setup(
         const RouteTreeNode& rt_root,
         RRNodeId sink_node,
         const t_conn_cost_params& cost_params,
         const t_bb& bounding_box);
 
-    // Finds a path to sink_node, starting from the elements currently in the
-    // heap.
-    //
-    // If the path is not found, which means that the path_cost of sink_node in
-    // RR node route info has never been updated, `rr_node_route_inf_[sink_node]
-    // .path_cost` will be the initial value (i.e., float infinity). This case
-    // can be detected by `std::isinf(rr_node_route_inf_[sink_node].path_cost)`.
-    //
-    // This is the core maze routing routine.
-    //
-    // Note: For understanding the connection router, start here.
+    /**
+     * @brief Finds a path to sink_node, starting from the elements currently in the heap
+     * @note If the path is not found, which means that the path_cost of sink_node in RR
+     * node route info has never been updated, `rr_node_route_inf_[sink_node].path_cost`
+     * will be the initial value (i.e., float infinity). This case can be detected by
+     * `std::isinf(rr_node_route_inf_[sink_node].path_cost)`.
+     * @note This is the core maze routing routine. For understanding the connection
+     * router, start here.
+     * @param sink_node Sink node ID to route to
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     */
     void timing_driven_route_connection_from_heap(
         RRNodeId sink_node,
         const t_conn_cost_params& cost_params,
         const t_bb& bounding_box);
 
-    // Expand this current node if it is a cheaper path.
-    void timing_driven_expand_cheapest(
-        RRNodeId from_node,
-        float new_total_cost,
-        RRNodeId target_node,
+    /**
+     * @brief Finds the single shortest path from current heap to the sink node in the RR graph
+     * @param sink_node Sink node ID to route to
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     * @param target_bb Prune IPINs that lead to blocks other than the target block
+     */
+    virtual void timing_driven_find_single_shortest_path_from_heap(RRNodeId sink_node,
+                                                                   const t_conn_cost_params& cost_params,
+                                                                   const t_bb& bounding_box,
+                                                                   const t_bb& target_bb) = 0;
+
+    /**
+     * @brief Finds shortest paths from current heap to all nodes in the RR graph
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     * @return A vector where each element contains the shortest route to a specific sink node
+     */
+    virtual vtr::vector<RRNodeId, RTExploredNode> timing_driven_find_all_shortest_paths_from_heap(
         const t_conn_cost_params& cost_params,
-        const t_bb& bounding_box,
-        const t_bb& target_bb);
-
-    // Expand each neighbor of the current node.
-    void timing_driven_expand_neighbours(
-        const RTExploredNode& current,
-        const t_conn_cost_params& cost_params,
-        const t_bb& bounding_box,
+        const t_bb& bounding_box) = 0;
+
+    /**
+     * @brief Unconditionally adds rt_node to the heap
+     * @note If you want to respect rt_node->re_expand that is the caller's responsibility.
+     * @todo Consider moving this function into the ConnectionRouter class after checking
+     * the different prune functions of the serial and parallel connection routers.
+     * @param rt_node RouteTreeNode to be added to the heap
+     * @param target_node Target node ID to route to
+     * @param cost_params Cost function parameters
+     * @param net_bb Do not push to heap if not in bounding box
+     */
+    virtual void add_route_tree_node_to_heap(
+        const RouteTreeNode& rt_node,
         RRNodeId target_node,
-        const t_bb& target_bb);
-
-    // Conditionally adds to_node to the router heap (via path from current.index
-    // via from_edge).
-    //
-    // RR nodes outside bounding box specified in bounding_box are not added
-    // to the heap.
-    void timing_driven_expand_neighbour(
-        const RTExploredNode& current,
-        RREdgeId from_edge,
-        RRNodeId to_node,
         const t_conn_cost_params& cost_params,
-        const t_bb& bounding_box,
-        RRNodeId target_node,
-        const t_bb& target_bb);
-
-    // Add to_node to the heap, and also add any nodes which are connected by
-    // non-configurable edges
-    void timing_driven_add_to_heap(
-        const t_conn_cost_params& cost_params,
-        const RTExploredNode& current,
-        RRNodeId to_node,
-        RREdgeId from_edge,
-        RRNodeId target_node);
-
-    // Calculates the cost of reaching to_node
+        const t_bb& net_bb) = 0;
+
+    /**
+     * @brief Calculates the cost of reaching to_node
+     * @param to Neighbor node to calculate costs before being expanded
+     * @param cost_params Cost function parameters
+     * @param from_node Current node ID being explored
+     * @param target_node Target node ID to route to
+     */
     void evaluate_timing_driven_node_costs(
         RTExploredNode* to,
         const t_conn_cost_params& cost_params,
         RRNodeId from_node,
         RRNodeId target_node);
 
-    // Find paths from current heap to all nodes in the RR graph
-    vtr::vector<RRNodeId, RTExploredNode> timing_driven_find_all_shortest_paths_from_heap(
-        const t_conn_cost_params& cost_params,
-        const t_bb& bounding_box);
-
-    //Adds the route tree rooted at rt_node to the heap, preparing it to be
-    //used as branch-points for further routing.
-    void add_route_tree_to_heap(const RouteTreeNode& rt_node,
-                                RRNodeId target_node,
-                                const t_conn_cost_params& cost_params,
-                                const t_bb& net_bb);
-
-    // Evaluate node costs using the RCV algorith
+    /**
+     * @brief Evaluate node costs using the RCV algorithm
+     * @param cost_params Cost function parameters
+     * @param to_node Neighbor node to calculate costs before being expanded
+     * @param target_node Target node ID to route to
+     * @param backwards_delay "Known" delay up to and including to_node
+     * @param backwards_cong "Known" congestion up to and including to_node
+     * @param R_upstream Upstream resistance to ground from to_node
+     * @return Node cost using RCV
+     */
     float compute_node_cost_using_rcv(const t_conn_cost_params cost_params,
                                       RRNodeId to_node,
                                       RRNodeId target_node,
@@ -267,16 +274,27 @@ class ConnectionRouter : public ConnectionRouterInterface {
                                       float backwards_cong,
                                       float R_upstream);
 
-    //Unconditionally adds rt_node to the heap
-    //
-    //Note that if you want to respect rt_node->re_expand that is the caller's
-    //responsibility.
-    void add_route_tree_node_to_heap(
-        const RouteTreeNode& rt_node,
-        RRNodeId target_node,
-        const t_conn_cost_params& cost_params,
-        const t_bb& net_bb);
-
+    /**
+     * @brief Adds the route tree rooted at rt_node to the heap, preparing
+     * it to be used as branch-points for further routing
+     * @param rt_node RouteTreeNode to be added to the heap
+     * @param target_node Target node ID to route to
+     * @param cost_params Cost function parameters
+     * @param net_bb Do not push to heap if not in bounding box
+     */
+    void add_route_tree_to_heap(const RouteTreeNode& rt_node,
+                                RRNodeId target_node,
+                                const t_conn_cost_params& cost_params,
+                                const t_bb& net_bb);
+    /**
+     * @brief For high fanout nets, adds only route tree nodes which are
+     * spatially close to the sink
+     * @param rt_root RouteTreeNode to be added to the heap
+     * @param target_node Target node ID to route to
+     * @param cost_params Cost function parameters
+     * @param spatial_route_tree_lookup Route tree spatial lookup
+     * @param net_bounding_box Do not push to heap if not in bounding box
+     */
     t_bb add_high_fanout_route_tree_to_heap(
         const RouteTreeNode& rt_root,
         RRNodeId target_node,
@@ -284,47 +302,59 @@ class ConnectionRouter : public ConnectionRouterInterface {
         const SpatialRouteTreeLookup& spatial_route_tree_lookup,
         const t_bb& net_bounding_box);
 
+    /** Device grid */
     const DeviceGrid& grid_;
+
+    /** Router lookahead */
     const RouterLookahead& router_lookahead_;
+
+    /** RR node data */
     const t_rr_graph_view rr_nodes_;
+
+    /** RR graph */
     const RRGraphView* rr_graph_;
+
+    /** RR node resistance/capacitance data */
     vtr::array_view<const t_rr_rc_data> rr_rc_data_;
+
+    /** RR switch data */
     vtr::array_view<const t_rr_switch_inf> rr_switch_inf_;
+
+    //@{
+    /** Net terminal groups */
     const vtr::vector<ParentNetId, std::vector<std::vector<int>>>& net_terminal_groups;
     const vtr::vector<ParentNetId, std::vector<int>>& net_terminal_group_num;
+    //@}
+
+    /** RR node extra information needed during routing */
     vtr::vector<RRNodeId, t_rr_node_route_inf>& rr_node_route_inf_;
+
+    /** Is flat router enabled or not? */
     bool is_flat_;
-    std::vector<RRNodeId> modified_rr_node_inf_;
+
+    /** Router statistics (e.g., heap push/pop counts) */
     RouterStats* router_stats_;
+
+    /** Parameters to guide the routing of the given connection */
     const ConnectionParameters* conn_params_;
+
+    /** Templated heap instance (e.g., binary heap, 4-ary heap, MultiQueue-based parallel heap) */
     HeapImplementation heap_;
-    bool router_debug_;
 
-    bool only_opin_inter_layer;
+    /** Router debug option */
+    bool router_debug_;
 
-    // Cumulative time spent in the path search part of the connection router.
+    /** Cumulative time spent in the path search part of the connection router */
     std::chrono::microseconds path_search_cumulative_time;
 
-    // The path manager for RCV, keeps track of the route tree as a set, also
-    // manages the allocation of `rcv_path_data`.
+    //@{
+    /** The path manager for RCV, keeps track of the route tree as a set, also
+     * manages the allocation of `rcv_path_data`. */
     PathManager rcv_path_manager;
     vtr::vector<RRNodeId, t_heap_path*> rcv_path_data;
+    //@}
 };
 
-/** Construct a connection router that uses the specified heap type.
- * This function is not used, but removing it will result in "undefined reference"
- * errors since heap type specializations won't get emitted from connection_router.cpp
- * without it.
- * The alternative is moving all ConnectionRouter fn implementations into the header. */
-std::unique_ptr<ConnectionRouterInterface> make_connection_router(
-    e_heap_type heap_type,
-    const DeviceGrid& grid,
-    const RouterLookahead& router_lookahead,
-    const t_rr_graph_storage& rr_nodes,
-    const RRGraphView* rr_graph,
-    const std::vector<t_rr_rc_data>& rr_rc_data,
-    const vtr::vector<RRSwitchId, t_rr_switch_inf>& rr_switch_inf,
-    vtr::vector<RRNodeId, t_rr_node_route_inf>& rr_node_route_inf,
-    bool is_flat);
+#include "connection_router.tpp"
 
 #endif /* _CONNECTION_ROUTER_H */
diff --git a/vpr/src/route/connection_router.tpp b/vpr/src/route/connection_router.tpp
new file mode 100644
index 00000000000..baadcd644f2
--- /dev/null
+++ b/vpr/src/route/connection_router.tpp
@@ -0,0 +1,545 @@
+#pragma once
+
+#include "connection_router.h"
+
+#include <algorithm>
+#include "rr_graph.h"
+#include "rr_graph_fwd.h"
+
+/** Used for the flat router. The node isn't relevant to the target if
+ * it is an intra-block node outside of our target block */
+inline bool relevant_node_to_target(const RRGraphView* rr_graph,
+                                    RRNodeId node_to_add,
+                                    RRNodeId target_node);
+
+template<typename Heap>
+std::tuple<bool, bool, RTExploredNode> ConnectionRouter<Heap>::timing_driven_route_connection_from_route_tree(
+    const RouteTreeNode& rt_root,
+    RRNodeId sink_node,
+    const t_conn_cost_params& cost_params,
+    const t_bb& bounding_box,
+    RouterStats& router_stats,
+    const ConnectionParameters& conn_params) {
+    router_stats_ = &router_stats;
+    conn_params_ = &conn_params;
+
+    bool retry = false;
+    retry = timing_driven_route_connection_common_setup(rt_root, sink_node, cost_params, bounding_box);
+
+    if (!std::isinf(rr_node_route_inf_[sink_node].path_cost)) {
+        // Only the `index`, `prev_edge`, and `rcv_path_backward_delay` fields of `out`
+        // are used after this function returns.
+        RTExploredNode out;
+        out.index = sink_node;
+        out.prev_edge = rr_node_route_inf_[sink_node].prev_edge;
+        if (rcv_path_manager.is_enabled()) {
+            out.rcv_path_backward_delay = rcv_path_data[sink_node]->backward_delay;
+            rcv_path_manager.update_route_tree_set(rcv_path_data[sink_node]);
+            rcv_path_manager.empty_heap();
+        }
+        heap_.empty_heap();
+        return std::make_tuple(true, /*retry=*/false, out);
+    } else {
+        reset_path_costs();
+        clear_modified_rr_node_info();
+        heap_.empty_heap();
+        rcv_path_manager.empty_heap();
+        return std::make_tuple(false, retry, RTExploredNode());
+    }
+}
+
+template<typename Heap>
+std::tuple<bool, bool, RTExploredNode> ConnectionRouter<Heap>::timing_driven_route_connection_from_route_tree_high_fanout(
+    const RouteTreeNode& rt_root,
+    RRNodeId sink_node,
+    const t_conn_cost_params& cost_params,
+    const t_bb& net_bounding_box,
+    const SpatialRouteTreeLookup& spatial_rt_lookup,
+    RouterStats& router_stats,
+    const ConnectionParameters& conn_params) {
+    router_stats_ = &router_stats;
+    conn_params_ = &conn_params;
+
+    // re-explore route tree from root to add any new nodes (buildheap afterwards)
+    // route tree needs to be repushed onto the heap since each node's cost is target specific
+    t_bb high_fanout_bb = add_high_fanout_route_tree_to_heap(rt_root, sink_node, cost_params, spatial_rt_lookup, net_bounding_box);
+    heap_.build_heap();
+
+    RRNodeId source_node = rt_root.inode;
+
+    if (heap_.is_empty_heap()) {
+        VTR_LOG("No source in route tree: %s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str());
+        return std::make_tuple(false, false, RTExploredNode());
+    }
+
+    VTR_LOGV_DEBUG(router_debug_, "  Routing to %d as high fanout net (BB: %d,%d,%d x %d,%d,%d)\n", sink_node,
+                   high_fanout_bb.layer_min, high_fanout_bb.xmin, high_fanout_bb.ymin,
+                   high_fanout_bb.layer_max, high_fanout_bb.xmax, high_fanout_bb.ymax);
+
+    bool retry_with_full_bb = false;
+    timing_driven_route_connection_from_heap(sink_node, cost_params, high_fanout_bb);
+
+    if (std::isinf(rr_node_route_inf_[sink_node].path_cost)) {
+        //Found no path, that may be due to an unlucky choice of existing route tree sub-set,
+        //try again with the full route tree to be sure this is not an artifact of high-fanout routing
+        VTR_LOG_WARN("No routing path found in high-fanout mode for net %zu connection (to sink_rr %d), retrying with full route tree\n", size_t(conn_params.net_id_), sink_node);
+
+        //Reset any previously recorded node costs so timing_driven_route_connection()
+        //starts over from scratch.
+        reset_path_costs();
+        clear_modified_rr_node_info();
+
+        retry_with_full_bb = timing_driven_route_connection_common_setup(rt_root, sink_node, cost_params, net_bounding_box);
+    }
+
+    if (std::isinf(rr_node_route_inf_[sink_node].path_cost)) {
+        VTR_LOG("%s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str());
+
+        heap_.empty_heap();
+        rcv_path_manager.empty_heap();
+        return std::make_tuple(false, retry_with_full_bb, RTExploredNode());
+    }
+
+    RTExploredNode out;
+    out.index = sink_node;
+    out.prev_edge = rr_node_route_inf_[sink_node].prev_edge;
+    if (rcv_path_manager.is_enabled()) {
+        out.rcv_path_backward_delay = rcv_path_data[sink_node]->backward_delay;
+        rcv_path_manager.update_route_tree_set(rcv_path_data[sink_node]);
+        rcv_path_manager.empty_heap();
+    }
+    heap_.empty_heap();
+
+    return std::make_tuple(true, retry_with_full_bb, out);
+}
+
+template<typename Heap>
+bool ConnectionRouter<Heap>::timing_driven_route_connection_common_setup(
+    const RouteTreeNode& rt_root,
+    RRNodeId sink_node,
+    const t_conn_cost_params& cost_params,
+    const t_bb& bounding_box) {
+    //Re-add route nodes from the existing route tree to the heap.
+    //They need to be repushed onto the heap since each node's cost is target specific.
+
+    add_route_tree_to_heap(rt_root, sink_node, cost_params, bounding_box);
+    heap_.build_heap(); // via sifting down everything
+
+    RRNodeId source_node = rt_root.inode;
+
+    if (heap_.is_empty_heap()) {
+        VTR_LOG("No source in route tree: %s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str());
+        return false;
+    }
+
+    VTR_LOGV_DEBUG(router_debug_, "  Routing to %d as normal net (BB: %d,%d,%d x %d,%d,%d)\n", sink_node,
+                   bounding_box.layer_min, bounding_box.xmin, bounding_box.ymin,
+                   bounding_box.layer_max, bounding_box.xmax, bounding_box.ymax);
+
+    timing_driven_route_connection_from_heap(sink_node, cost_params, bounding_box);
+
+    if (std::isinf(rr_node_route_inf_[sink_node].path_cost)) {
+        // No path found within the current bounding box.
+        //
+        // If the bounding box is already max size, just fail
+        if (bounding_box.xmin == 0
+            && bounding_box.ymin == 0
+            && bounding_box.xmax == (int)(grid_.width() - 1)
+            && bounding_box.ymax == (int)(grid_.height() - 1)
+            && bounding_box.layer_min == 0
+            && bounding_box.layer_max == (int)(grid_.get_num_layers() - 1)) {
+            VTR_LOG("%s\n", describe_unrouteable_connection(source_node, sink_node, is_flat_).c_str());
+            return false;
+        }
+
+        // Otherwise, leave unrouted and bubble up a signal to retry this net with a full-device bounding box
+        VTR_LOG_WARN("No routing path for connection to sink_rr %d, leaving unrouted to retry later\n", sink_node);
+        return true;
+    }
+
+    return false;
+}
+
+template<typename Heap>
+void ConnectionRouter<Heap>::timing_driven_route_connection_from_heap(RRNodeId sink_node,
+                                                                      const t_conn_cost_params& cost_params,
+                                                                      const t_bb& bounding_box) {
+    VTR_ASSERT_SAFE(heap_.is_valid());
+
+    if (heap_.is_empty_heap()) { //No source
+        VTR_LOGV_DEBUG(router_debug_, "  Initial heap empty (no source)\n");
+    }
+
+    // Get bounding box for sink node used in timing_driven_expand_neighbour
+    VTR_ASSERT_SAFE(sink_node != RRNodeId::INVALID());
+
+    t_bb target_bb;
+    if (rr_graph_->node_type(sink_node) == e_rr_type::SINK) { // We need to get a bounding box for the sink's entire tile
+        vtr::Rect<int> tile_bb = grid_.get_tile_bb({rr_graph_->node_xlow(sink_node),
+                                                    rr_graph_->node_ylow(sink_node),
+                                                    rr_graph_->node_layer(sink_node)});
+
+        target_bb.xmin = tile_bb.xmin();
+        target_bb.ymin = tile_bb.ymin();
+        target_bb.xmax = tile_bb.xmax();
+        target_bb.ymax = tile_bb.ymax();
+    } else {
+        target_bb.xmin = rr_graph_->node_xlow(sink_node);
+        target_bb.ymin = rr_graph_->node_ylow(sink_node);
+        target_bb.xmax = rr_graph_->node_xhigh(sink_node);
+        target_bb.ymax = rr_graph_->node_yhigh(sink_node);
+    }
+
+    target_bb.layer_min = rr_graph_->node_layer(RRNodeId(sink_node));
+    target_bb.layer_max = rr_graph_->node_layer(RRNodeId(sink_node));
+
+    // Start measuring path search time
+    std::chrono::steady_clock::time_point begin_time = std::chrono::steady_clock::now();
+
+    timing_driven_find_single_shortest_path_from_heap(sink_node, cost_params, bounding_box, target_bb);
+
+    // Stop measuring path search time
+    std::chrono::steady_clock::time_point end_time = std::chrono::steady_clock::now();
+    path_search_cumulative_time += std::chrono::duration_cast<std::chrono::microseconds>(end_time - begin_time);
+}
+
+#ifdef VTR_ASSERT_SAFE_ENABLED
+
+//Returns true if both nodes are part of the same non-configurable edge set
+inline bool same_non_config_node_set(RRNodeId from_node, RRNodeId to_node) {
+    auto& device_ctx = g_vpr_ctx.device();
+
+    auto from_itr = device_ctx.rr_node_to_non_config_node_set.find(from_node);
+    auto to_itr = device_ctx.rr_node_to_non_config_node_set.find(to_node);
+
+    if (from_itr == device_ctx.rr_node_to_non_config_node_set.end()
+        || to_itr == device_ctx.rr_node_to_non_config_node_set.end()) {
+        return false; //Not part of a non-config node set
+    }
+
+    return from_itr->second == to_itr->second; //Check for same non-config set IDs
+}
+
+#endif
+
+template<typename Heap>
+float ConnectionRouter<Heap>::compute_node_cost_using_rcv(const t_conn_cost_params cost_params,
+                                                          RRNodeId to_node,
+                                                          RRNodeId target_node,
+                                                          float backwards_delay,
+                                                          float backwards_cong,
+                                                          float R_upstream) {
+    float expected_delay;
+    float expected_cong;
+
+    const t_conn_delay_budget* delay_budget = cost_params.delay_budget;
+    // TODO: This function is not tested for is_flat == true
+    VTR_ASSERT(is_flat_ != true);
+    std::tie(expected_delay, expected_cong) = router_lookahead_.get_expected_delay_and_cong(to_node, target_node, cost_params, R_upstream);
+
+    float expected_total_delay_cost;
+    float expected_total_cong_cost;
+
+    float expected_total_cong = expected_cong + backwards_cong;
+    float expected_total_delay = expected_delay + backwards_delay;
+
+    //If budgets specified calculate cost as described by RCV paper:
+    //    R. Fung, V. Betz and W. Chow, "Slack Allocation and Routing to Improve FPGA Timing While
+    //     Repairing Short-Path Violations," in IEEE Transactions on Computer-Aided Design of
+    //     Integrated Circuits and Systems, vol. 27, no. 4, pp. 686-697, April 2008.
+
+    // Normalization constant defined in RCV paper cited above
+    constexpr float NORMALIZATION_CONSTANT = 100e-12;
+
+    expected_total_delay_cost = expected_total_delay;
+    expected_total_delay_cost += (delay_budget->short_path_criticality + cost_params.criticality) * std::max(0.f, delay_budget->target_delay - expected_total_delay);
+    // expected_total_delay_cost += std::pow(std::max(0.f, expected_total_delay - delay_budget->max_delay), 2) / NORMALIZATION_CONSTANT;
+    expected_total_delay_cost += std::pow(std::max(0.f, delay_budget->min_delay - expected_total_delay), 2) / NORMALIZATION_CONSTANT;
+    expected_total_cong_cost = expected_total_cong;
+
+    float total_cost = expected_total_delay_cost + expected_total_cong_cost;
+
+    return total_cost;
+}
+
+template<typename Heap>
+void ConnectionRouter<Heap>::evaluate_timing_driven_node_costs(RTExploredNode* to,
+                                                               const t_conn_cost_params& cost_params,
+                                                               RRNodeId from_node,
+                                                               RRNodeId target_node) {
+    /* new_costs.backward_cost: is the "known" part of the cost to this node -- the
+     * congestion cost of all the routing resources back to the existing route
+     * plus the known delay of the total path back to the source.
+     *
+     * new_costs.total_cost: is this "known" backward cost + an expected cost to get to the target.
+     *
+     * new_costs.R_upstream: is the upstream resistance at the end of this node
+     */
+
+    //Info for the switch connecting from_node to_node (i.e., to->index)
+    int iswitch = rr_nodes_.edge_switch(to->prev_edge);
+    bool switch_buffered = rr_switch_inf_[iswitch].buffered();
+    bool reached_configurably = rr_switch_inf_[iswitch].configurable();
+    float switch_R = rr_switch_inf_[iswitch].R;
+    float switch_Tdel = rr_switch_inf_[iswitch].Tdel;
+    float switch_Cinternal = rr_switch_inf_[iswitch].Cinternal;
+
+    //To node info
+    auto rc_index = rr_graph_->node_rc_index(to->index);
+    float node_C = rr_rc_data_[rc_index].C;
+    float node_R = rr_rc_data_[rc_index].R;
+
+    //From node info
+    float from_node_R = rr_rc_data_[rr_graph_->node_rc_index(from_node)].R;
+
+    //Update R_upstream
+    if (switch_buffered) {
+        to->R_upstream = 0.; //No upstream resistance
+    } else {
+        //R_Upstream already initialized
+    }
+
+    to->R_upstream += switch_R; //Switch resistance
+    to->R_upstream += node_R;   //Node resistance
+
+    //Calculate delay
+    float Rdel = to->R_upstream - 0.5 * node_R; //Only consider half node's resistance for delay
+    float Tdel = switch_Tdel + Rdel * node_C;
+
+    //Depending on the switch used, the Tdel of the upstream node (from_node) may change due to
+    //increased loading from the switch's internal capacitance.
+    //
+    //Even though this delay physically affects from_node, we make the adjustment (now) on the to_node,
+    //since only once we've reached to to_node do we know the connection used (and the switch enabled).
+    //
+    //To adjust for the time delay, we compute the product of the Rdel associated with from_node and
+    //the internal capacitance of the switch.
+    //
+    //First, we will calculate Rdel_adjust (just like in the computation for Rdel, we consider only
+    //half of from_node's resistance).
+    float Rdel_adjust = to->R_upstream - 0.5 * from_node_R;
+
+    //Second, we adjust the Tdel to account for the delay caused by the internal capacitance.
+    Tdel += Rdel_adjust * switch_Cinternal;
+
+    float cong_cost = 0.;
+    if (reached_configurably) {
+        cong_cost = get_rr_cong_cost(to->index, cost_params.pres_fac);
+    } else {
+        //Reached by a non-configurable edge.
+        //Therefore the from_node and to_node are part of the same non-configurable node set.
+#ifdef VTR_ASSERT_SAFE_ENABLED
+        VTR_ASSERT_SAFE_MSG(same_non_config_node_set(from_node, to->index),
+                            "Non-configurably connected edges should be part of the same node set");
+#endif
+
+        //The congestion cost of all nodes in the set has already been accounted for (when
+        //the current path first expanded a node in the set). Therefore do *not* re-add the congestion
+        //cost.
+        cong_cost = 0.;
+    }
+    if (conn_params_->router_opt_choke_points_ && is_flat_ && rr_graph_->node_type(to->index) == e_rr_type::IPIN) {
+        auto find_res = conn_params_->connection_choking_spots_.find(to->index);
+        if (find_res != conn_params_->connection_choking_spots_.end()) {
+            cong_cost = cong_cost / pow(2, (float)find_res->second);
+        }
+    }
+
+    //Update the backward cost (upstream already included)
+    to->backward_path_cost += (1. - cost_params.criticality) * cong_cost; //Congestion cost
+    to->backward_path_cost += cost_params.criticality * Tdel;             //Delay cost
+
+    if (cost_params.bend_cost != 0.) {
+        e_rr_type from_type = rr_graph_->node_type(from_node);
+        e_rr_type to_type = rr_graph_->node_type(to->index);
+        if ((from_type == e_rr_type::CHANX && to_type == e_rr_type::CHANY) || (from_type == e_rr_type::CHANY && to_type == e_rr_type::CHANX)) {
+            to->backward_path_cost += cost_params.bend_cost; //Bend cost
+        }
+    }
+
+    float total_cost = 0.;
+
+    if (rcv_path_manager.is_enabled() && to->path_data != nullptr) {
+        to->path_data->backward_delay += cost_params.criticality * Tdel;
+        to->path_data->backward_cong += (1. - cost_params.criticality) * get_rr_cong_cost(to->index, cost_params.pres_fac);
+
+        total_cost = compute_node_cost_using_rcv(cost_params, to->index, target_node, to->path_data->backward_delay, to->path_data->backward_cong, to->R_upstream);
+    } else {
+        const auto& device_ctx = g_vpr_ctx.device();
+        //Update total cost
+        float expected_cost = router_lookahead_.get_expected_cost(to->index, target_node, cost_params, to->R_upstream);
+        VTR_LOGV_DEBUG(router_debug_ && !std::isfinite(expected_cost),
+                       "        Lookahead from %s (%s) to %s (%s) is non-finite, expected_cost = %f, to->R_upstream = %f\n",
+                       rr_node_arch_name(to->index, is_flat_).c_str(),
+                       describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, to->index, is_flat_).c_str(),
+                       rr_node_arch_name(target_node, is_flat_).c_str(),
+                       describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, target_node, is_flat_).c_str(),
+                       expected_cost, to->R_upstream);
+        total_cost += to->backward_path_cost + cost_params.astar_fac * std::max(0.f, expected_cost - cost_params.astar_offset);
+    }
+    to->total_cost = total_cost;
+}
+
+template<typename Heap>
+void ConnectionRouter<Heap>::add_route_tree_to_heap(
+    const RouteTreeNode& rt_node,
+    RRNodeId target_node,
+    const t_conn_cost_params& cost_params,
+    const t_bb& net_bb) {
+    /* Puts the entire partial routing below and including rt_node onto the heap *
+     * (except for those parts marked as not to be expanded) by calling itself   *
+     * recursively.                                                              */
+
+    /* Pre-order depth-first traversal */
+    // IPINs and SINKS are not re_expanded
+    if (rt_node.re_expand) {
+        add_route_tree_node_to_heap(rt_node, target_node, cost_params, net_bb);
+    }
+
+    for (const RouteTreeNode& child_node : rt_node.child_nodes()) {
+        if (is_flat_) {
+            if (relevant_node_to_target(rr_graph_, child_node.inode, target_node)) {
+                add_route_tree_to_heap(child_node, target_node, cost_params, net_bb);
+            }
+        } else {
+            add_route_tree_to_heap(child_node, target_node, cost_params, net_bb);
+        }
+    }
+}
+
+/* Expand bb by inode's extents and clip against net_bb */
+inline void expand_highfanout_bounding_box(t_bb& bb, const t_bb& net_bb, RRNodeId inode, const RRGraphView* rr_graph) {
+    bb.xmin = std::max<int>(net_bb.xmin, std::min<int>(bb.xmin, rr_graph->node_xlow(inode)));
+    bb.ymin = std::max<int>(net_bb.ymin, std::min<int>(bb.ymin, rr_graph->node_ylow(inode)));
+    bb.xmax = std::min<int>(net_bb.xmax, std::max<int>(bb.xmax, rr_graph->node_xhigh(inode)));
+    bb.ymax = std::min<int>(net_bb.ymax, std::max<int>(bb.ymax, rr_graph->node_yhigh(inode)));
+    bb.layer_min = std::min<int>(bb.layer_min, rr_graph->node_layer(inode));
+    bb.layer_max = std::max<int>(bb.layer_max, rr_graph->node_layer(inode));
+}
+
+/* Expand bb by HIGH_FANOUT_BB_FAC and clip against net_bb */
+inline void adjust_highfanout_bounding_box(t_bb& bb, const t_bb& net_bb) {
+    constexpr int HIGH_FANOUT_BB_FAC = 3;
+
+    bb.xmin = std::max<int>(net_bb.xmin, bb.xmin - HIGH_FANOUT_BB_FAC);
+    bb.ymin = std::max<int>(net_bb.ymin, bb.ymin - HIGH_FANOUT_BB_FAC);
+    bb.xmax = std::min<int>(net_bb.xmax, bb.xmax + HIGH_FANOUT_BB_FAC);
+    bb.ymax = std::min<int>(net_bb.ymax, bb.ymax + HIGH_FANOUT_BB_FAC);
+    bb.layer_min = std::min<int>(net_bb.layer_min, bb.layer_min);
+    bb.layer_max = std::max<int>(net_bb.layer_max, bb.layer_max);
+}
+
+template<typename Heap>
+t_bb ConnectionRouter<Heap>::add_high_fanout_route_tree_to_heap(
+    const RouteTreeNode& rt_root,
+    RRNodeId target_node,
+    const t_conn_cost_params& cost_params,
+    const SpatialRouteTreeLookup& spatial_rt_lookup,
+    const t_bb& net_bounding_box) {
+    //For high fanout nets we only add those route tree nodes which are spatially close
+    //to the sink.
+    //
+    //Based on:
+    //  J. Swartz, V. Betz, J. Rose, "A Fast Routability-Driven Router for FPGAs", FPGA, 1998
+    //
+    //We rely on a grid-based spatial look-up which is maintained for high fanout nets by
+    //update_route_tree(), which allows us to add spatially close route tree nodes without traversing
+    //the entire route tree (which is likely large for a high fanout net).
+
+    //Determine which bin the target node is located in
+
+    int target_bin_x = grid_to_bin_x(rr_graph_->node_xlow(target_node), spatial_rt_lookup);
+    int target_bin_y = grid_to_bin_y(rr_graph_->node_ylow(target_node), spatial_rt_lookup);
+
+    auto target_layer = rr_graph_->node_layer(target_node);
+
+    int chan_nodes_added = 0;
+
+    t_bb highfanout_bb;
+    highfanout_bb.xmin = rr_graph_->node_xlow(target_node);
+    highfanout_bb.xmax = rr_graph_->node_xhigh(target_node);
+    highfanout_bb.ymin = rr_graph_->node_ylow(target_node);
+    highfanout_bb.ymax = rr_graph_->node_yhigh(target_node);
+    highfanout_bb.layer_min = target_layer;
+    highfanout_bb.layer_max = target_layer;
+
+    //Add existing routing starting from the target bin.
+    //If the target's bin has insufficient existing routing add from the surrounding bins
+    constexpr int SINGLE_BIN_MIN_NODES = 2;
+    bool done = false;
+    bool found_node_on_same_layer = false;
+    for (int dx : {0, -1, +1}) {
+        size_t bin_x = target_bin_x + dx;
+
+        if (bin_x > spatial_rt_lookup.dim_size(0) - 1) continue; //Out of range
+
+        for (int dy : {0, -1, +1}) {
+            size_t bin_y = target_bin_y + dy;
+
+            if (bin_y > spatial_rt_lookup.dim_size(1) - 1) continue; //Out of range
+
+            for (const RouteTreeNode& rt_node : spatial_rt_lookup[bin_x][bin_y]) {
+                if (!rt_node.re_expand) // Some nodes (like IPINs) shouldn't be re-expanded
+                    continue;
+                RRNodeId rr_node_to_add = rt_node.inode;
+
+                /* Flat router: don't go into clusters other than the target one */
+                if (is_flat_) {
+                    if (!relevant_node_to_target(rr_graph_, rr_node_to_add, target_node))
+                        continue;
+                }
+
+                /* In case of the parallel router, we may be dealing with a virtual net
+                 * so prune the nodes from the HF lookup against the bounding box just in case */
+                if (!inside_bb(rr_node_to_add, net_bounding_box))
+                    continue;
+
+                auto rt_node_layer_num = rr_graph_->node_layer(rr_node_to_add);
+                if (rt_node_layer_num == target_layer)
+                    found_node_on_same_layer = true;
+
+                // Put the node onto the heap
+                add_route_tree_node_to_heap(rt_node, target_node, cost_params, net_bounding_box);
+
+                // Expand HF BB to include the node (clip by original BB)
+                expand_highfanout_bounding_box(highfanout_bb, net_bounding_box, rr_node_to_add, rr_graph_);
+
+                if (rr_graph_->node_type(rr_node_to_add) == e_rr_type::CHANY || rr_graph_->node_type(rr_node_to_add) == e_rr_type::CHANX) {
+                    chan_nodes_added++;
+                }
+            }
+
+            if (dx == 0 && dy == 0 && chan_nodes_added > SINGLE_BIN_MIN_NODES && found_node_on_same_layer) {
+                //Target bin contained at least minimum amount of routing
+                //
+                //We require at least SINGLE_BIN_MIN_NODES to be added.
+                //This helps ensure we don't end up with, for example, a single
+                //routing wire running in the wrong direction which may not be
+                //able to reach the target within the bounding box.
+                done = true;
+                break;
+            }
+        }
+        if (done) break;
+    }
+    /* If we didn't find enough nodes to branch off near the target
+     * or they are on the wrong grid layer, just add the full route tree */
+    if (chan_nodes_added <= SINGLE_BIN_MIN_NODES || !found_node_on_same_layer) {
+        add_route_tree_to_heap(rt_root, target_node, cost_params, net_bounding_box);
+        return net_bounding_box;
+    } else {
+        //We found nearby routing, replace original bounding box to be localized around that routing
+        adjust_highfanout_bounding_box(highfanout_bb, net_bounding_box);
+        return highfanout_bb;
+    }
+}
+
+/** Used for the flat router. The node isn't relevant to the target if
+ * it is an intra-block node outside of our target block */
+inline bool relevant_node_to_target(const RRGraphView* rr_graph,
+                                    RRNodeId node_to_add,
+                                    RRNodeId target_node) {
+    VTR_ASSERT_SAFE(rr_graph->node_type(target_node) == e_rr_type::SINK);
+    auto node_to_add_type = rr_graph->node_type(node_to_add);
+    return node_to_add_type != e_rr_type::IPIN || node_in_same_physical_tile(node_to_add, target_node);
+}
diff --git a/vpr/src/route/connection_router_interface.h b/vpr/src/route/connection_router_interface.h
index 96ef278833a..178768bf5d5 100644
--- a/vpr/src/route/connection_router_interface.h
+++ b/vpr/src/route/connection_router_interface.h
@@ -24,6 +24,8 @@ struct t_conn_cost_params {
     float criticality = 1.;
     float astar_fac = 1.2;
     float astar_offset = 0.f;
+    float post_target_prune_fac = 1.2f;
+    float post_target_prune_offset = 0.f;
     float bend_cost = 1.;
     float pres_fac = 1.;
     const t_conn_delay_budget* delay_budget = nullptr;
diff --git a/vpr/src/route/d_ary_heap.h b/vpr/src/route/d_ary_heap.h
index c52cd702d13..ed10b0157bd 100644
--- a/vpr/src/route/d_ary_heap.h
+++ b/vpr/src/route/d_ary_heap.h
@@ -21,6 +21,8 @@
 template<unsigned D>
 class DAryHeap : public HeapInterface {
   public:
+    static constexpr unsigned arg_D = D;
+
     using priority_queue = customized_d_ary_priority_queue<D, HeapNode, std::vector<HeapNode>, HeapNodeComparator>;
 
     DAryHeap() {}
diff --git a/vpr/src/route/multi_queue_d_ary_heap.h b/vpr/src/route/multi_queue_d_ary_heap.h
new file mode 100644
index 00000000000..5a49dadae50
--- /dev/null
+++ b/vpr/src/route/multi_queue_d_ary_heap.h
@@ -0,0 +1,133 @@
+/********************************************************************
+ * MultiQueue Implementation
+ *
+ * Originally authored by Guozheng Zhang, Gilead Posluns, and Mark C. Jeffrey
+ * Published at the 36th ACM Symposium on Parallelism in Algorithms and
+ * Architectures (SPAA), June 2024
+ *
+ * Original source: https://github.com/mcj-group/cps
+ *
+ * This implementation and interface has been modified from the original to:
+ *   - Support queue draining functionality
+ *   - Enable integration with the VTR project
+ *
+ * The MultiQueue data structure provides an efficient concurrent priority
+ * queue implementation designed for parallel processing applications.
+ *
+ * Modified: February 2025
+ ********************************************************************/
+
+#ifndef _MULTI_QUEUE_D_ARY_HEAP_H
+#define _MULTI_QUEUE_D_ARY_HEAP_H
+
+#include "device_grid.h"
+#include "heap_type.h"
+#include "multi_queue_d_ary_heap.tpp"
+#include <tuple>
+#include <memory>
+
+// FIXME: Use unified heap node struct (HeapNodeId) and comparator (HeapNodeComparator)
+// defined in heap_type.h. Currently, the MQ_IO is not compatible with them. Need a lot
+// of refactoring in MQ_IO to make it work, which is left for another PR to clean it up.
+using MQHeapNode = std::tuple<HeapNodePriority, uint32_t /* FIXME: Use HeapNodeId */>;
+
+struct MQHeapNodeTupleComparator /* FIXME: Use HeapNodeComparator */ {
+    bool operator()(const MQHeapNode& u, const MQHeapNode& v) {
+        return std::get<0>(u) > std::get<0>(v);
+    }
+};
+
+template<unsigned D>
+class MultiQueueDAryHeap {
+  public:
+    using MQ_IO = MultiQueueIO<D, MQHeapNode, MQHeapNodeTupleComparator, HeapNodePriority>;
+
+    MultiQueueDAryHeap() {
+        set_num_threads_and_queues(1, 2); // Serial (#threads=1, #queues=2) by default
+    }
+
+    MultiQueueDAryHeap(size_t num_threads, size_t num_queues) {
+        set_num_threads_and_queues(num_threads, num_queues);
+    }
+
+    ~MultiQueueDAryHeap() {}
+
+    void set_num_threads_and_queues(size_t num_threads, size_t num_queues) {
+        pq_.reset();
+        // Note: BE AWARE that in MQ_IO interface, `num_queues` comes first, then `num_threads`!
+        pq_ = std::make_unique<MQ_IO>(num_queues, num_threads, 0 /*Dont care (batch size for only popBatch)*/);
+    }
+
+    void init_heap(const DeviceGrid& grid) {
+        (void)grid;
+        // TODO: Reserve storage for MQ_IO
+        // Note: This function could be called before setting num_threads/num_queues
+    }
+
+    bool try_pop(HeapNode& heap_node) {
+        auto tmp = pq_->tryPop();
+        if (!tmp.has_value()) {
+            return false;
+        } else {
+            uint32_t node_id;
+            std::tie(heap_node.prio, node_id) = tmp.value(); // FIXME: eliminate type cast by modifying MQ_IO
+            heap_node.node = RRNodeId(node_id);
+            return true;
+        }
+    }
+
+    void add_to_heap(const HeapNode& heap_node) {
+        HeapNodePriority prio = heap_node.prio;
+        uint32_t node = size_t(heap_node.node);
+        pq_->push({prio, node});
+    }
+
+    void push_back(const HeapNode& heap_node) {
+        HeapNodePriority prio = heap_node.prio;
+        uint32_t node = size_t(heap_node.node);
+        pq_->push({prio, node}); // FIXME: add to heap without maintaining the heap property
+    }
+
+    void build_heap() {
+        // FIXME: restore the heap property after pushing back nodes
+    }
+
+    bool is_valid() const {
+        return true; // FIXME: checking if the heap property is maintained or not
+    }
+
+    void empty_heap() {
+        pq_->reset(); // TODO: check if adding clear function for MQ_IO is necessary
+    }
+
+    bool is_empty_heap() const {
+        return (bool)(pq_->empty());
+    }
+
+    uint64_t get_num_pushes() const {
+        return pq_->getNumPushes();
+    }
+
+    uint64_t get_num_pops() const {
+        return pq_->getNumPops();
+    }
+
+    uint64_t get_heap_occupancy() const {
+        return pq_->getQueueOccupancy();
+    }
+
+    void reset() {
+        pq_->reset();
+    }
+
+#ifdef MQ_IO_ENABLE_CLEAR_FOR_POP
+    void set_min_priority_for_pop(const HeapNodePriority& minPrio) {
+        pq_->setMinPrioForPop(minPrio);
+    }
+#endif
+
+  private:
+    std::unique_ptr<MQ_IO> pq_;
+};
+
+#endif
diff --git a/vpr/src/route/multi_queue_d_ary_heap.tpp b/vpr/src/route/multi_queue_d_ary_heap.tpp
new file mode 100644
index 00000000000..7168762577e
--- /dev/null
+++ b/vpr/src/route/multi_queue_d_ary_heap.tpp
@@ -0,0 +1,438 @@
+/********************************************************************
+ * MultiQueue Implementation
+ *
+ * Originally authored by Guozheng Zhang, Gilead Posluns, and Mark C. Jeffrey
+ * Published at the 36th ACM Symposium on Parallelism in Algorithms and
+ * Architectures (SPAA), June 2024
+ *
+ * Original source: https://github.com/mcj-group/cps
+ *
+ * This implementation and interface has been modified from the original to:
+ *   - Support queue draining functionality
+ *   - Enable integration with the VTR project
+ *
+ * The MultiQueue data structure provides an efficient concurrent priority
+ * queue implementation designed for parallel processing applications.
+ *
+ * Modified: February 2025
+ ********************************************************************/
+
+#pragma once
+
+#include <cstdlib>
+#include <iostream>
+#include <limits>
+#include <optional>
+#include <vector>
+#include <algorithm>
+#include <atomic>
+#include <cassert>
+#include "d_ary_heap.tpp"
+
+#define CACHELINE 64
+
+// #define PERF 1
+#define MQ_IO_ENABLE_CLEAR_FOR_POP
+
+template<
+    unsigned D,
+    typename PQElement,
+    typename Comparator,
+    typename PrioType>
+class MultiQueueIO {
+    using PQ = customized_d_ary_priority_queue<D, PQElement, std::vector<PQElement>, Comparator>;
+    Comparator compare;
+
+    // Special value used to signify that there is no 'min' element in a PQ
+    // container. The user should ensure that they do not use this priority
+    // while using the MQ.
+    static constexpr PrioType EMPTY_PRIO = std::numeric_limits<PrioType>::max();
+
+    struct PQContainer {
+        uint64_t pushes = 0;
+        uint64_t pops = 0;
+        PQ pq;
+        std::atomic_flag queueLock = ATOMIC_FLAG_INIT;
+        std::atomic<PrioType> min{EMPTY_PRIO};
+
+        void lock() {
+            while (queueLock.test_and_set(std::memory_order_acquire))
+                ;
+        }
+        bool try_lock() { return queueLock.test_and_set(std::memory_order_acquire); }
+        void unlock() { queueLock.clear(std::memory_order_release); }
+
+    } __attribute__((aligned(CACHELINE)));
+
+    std::vector<
+        PQContainer
+        // FIXME: Disabled this due to VTR not using Boost. There is a C++ way
+        //        of doing this, but it requires making an aligned allocator
+        //        class. May be a good idea to add to VTR util in the future.
+        //        Should profile for performance first; may not be worth it.
+        // , boost::alignment::aligned_allocator<PQContainer, CACHELINE>
+        >
+        queues;
+    uint64_t NUM_QUEUES;
+
+    // Termination:
+    //  - numIdle records the number of threads that believe
+    //    there are no more work to do.
+    //  -numEmpty records number of queues that are empty
+    uint64_t threadNum;
+    std::atomic<uint64_t> numIdle{0};
+    std::atomic<uint64_t> numEmpty;
+#ifdef MQ_IO_ENABLE_CLEAR_FOR_POP
+    std::atomic<PrioType> minPrioForPop{std::numeric_limits<PrioType>::max()};
+#endif
+
+    uint64_t batchSize;
+
+  public:
+    MultiQueueIO(uint64_t numQueues, uint64_t numThreads, uint64_t batch)
+        : queues(numQueues)
+        , NUM_QUEUES(numQueues)
+        , threadNum(numThreads)
+        , numEmpty(numQueues)
+        , batchSize(batch) {
+        assert((numQueues >= 2) && "numQueues must be set >= 2");
+    }
+
+#ifdef PERF
+    uint64_t __attribute__((noinline)) ThreadLocalRandom() {
+#else
+    uint64_t ThreadLocalRandom() {
+#endif
+        // static thread_local std::mt19937_64 generator;
+        // std::uniform_real_distribution<> distribution(min,max);
+        // return distribution(generator);
+        static uint64_t modMask = NUM_QUEUES - 1;
+        static thread_local uint64_t x = pthread_self();
+        uint64_t z = (x += UINT64_C(0x9E3779B97F4A7C15));
+        z = (z ^ (z >> 30)) * UINT64_C(0xBF58476D1CE4E5B9);
+        z = (z ^ (z >> 27)) * UINT64_C(0x94D049BB133111EB);
+        return (z ^ (z >> 31)) & modMask;
+    }
+
+#ifdef PERF
+    void __attribute__((noinline)) pushInt(uint64_t queue, PQElement item) {
+        queues[queue].pq.push(item);
+    }
+#endif
+
+#ifdef PERF
+    void __attribute__((noinline)) push(PQElement item) {
+#else
+    inline void push(PQElement item) {
+#endif
+        uint64_t queue;
+        while (true) {
+            queue = ThreadLocalRandom();
+            if (!queues[queue].try_lock()) break;
+        }
+        auto& q = queues[queue];
+        q.pushes++;
+        if (q.pq.empty())
+            numEmpty.fetch_sub(1, std::memory_order_relaxed);
+#ifdef PERF
+        pushInt(queue, item);
+#else
+        q.pq.push(item);
+#endif
+        q.min.store(
+            q.pq.size() > 0
+                ? std::get<0>(q.pq.top())
+                : EMPTY_PRIO,
+            std::memory_order_release);
+        q.unlock();
+    }
+
+#ifdef PERF
+    void __attribute__((noinline)) pushBatch(uint64_t size, PQElement* items) {
+#else
+    inline void pushBatch(uint64_t size, PQElement* items) {
+#endif
+        uint64_t queue;
+        while (true) {
+            queue = ThreadLocalRandom();
+            if (!queues[queue].try_lock()) break;
+        }
+        auto& q = queues[queue];
+        q.pushes += size;
+        if (q.pq.empty())
+            numEmpty.fetch_sub(1, std::memory_order_relaxed);
+        for (uint64_t i = 0; i < size; i++) {
+#ifdef PERF
+            pushInt(queue, items[i]);
+#else
+            q.pq.push(items[i]);
+#endif
+        }
+        q.min.store(
+            q.pq.size() > 0
+                ? std::get<0>(q.pq.top())
+                : EMPTY_PRIO,
+            std::memory_order_release);
+        q.unlock();
+    }
+
+    // Simplified Termination detection idea from the 2021 MultiQueue paper:
+    // Repeatedly try popping and stop when numIdle >= threadNum,
+    // That is, stop when all threads agree that there are no more work
+#ifdef PERF
+    boost::optional<PQElement> __attribute__((noinline)) tryPop() {
+#else
+    inline std::optional<PQElement> tryPop() {
+#endif
+        auto item = pop();
+        if (item) return item;
+
+        // increment count and keep on trying to pop
+        uint64_t num = numIdle.fetch_add(1, std::memory_order_relaxed) + 1;
+        do {
+            item = pop();
+            if (item) break;
+            if (num >= threadNum) return {};
+
+            num = numIdle.load(std::memory_order_relaxed);
+
+        } while (true);
+
+        numIdle.fetch_sub(1, std::memory_order_relaxed);
+        return item;
+    }
+
+#ifdef MQ_IO_ENABLE_CLEAR_FOR_POP
+    inline void setMinPrioForPop(PrioType newMinPrio) {
+        PrioType oldMinPrio = minPrioForPop.load(std::memory_order_relaxed);
+        while (compare({oldMinPrio, 0}, {newMinPrio, 0}) /* old > new */ && !minPrioForPop.compare_exchange_weak(oldMinPrio, newMinPrio))
+            ;
+    }
+#endif
+
+#ifdef PERF
+    boost::optional<PQElement> __attribute__((noinline)) pop() {
+#else
+    inline std::optional<PQElement> pop() {
+#endif
+        uint64_t poppingQueue = NUM_QUEUES;
+        while (true) {
+            // Pick the higher priority max of queue i and j
+            uint64_t i = ThreadLocalRandom();
+            uint64_t j = ThreadLocalRandom();
+            while (j == i) {
+                j = ThreadLocalRandom();
+            }
+
+            PrioType minI = queues[i].min.load(std::memory_order_acquire);
+            PrioType minJ = queues[j].min.load(std::memory_order_acquire);
+
+            if (minI == EMPTY_PRIO && minJ == EMPTY_PRIO) {
+                uint64_t emptyQueues = numEmpty.load(std::memory_order_relaxed);
+                if (emptyQueues >= queues.size())
+                    break;
+                else
+                    continue;
+            }
+
+            if (minI != EMPTY_PRIO && minJ != EMPTY_PRIO) {
+                poppingQueue = compare({minJ, 0}, {minI, 0}) ? i : j;
+            } else if (minJ == EMPTY_PRIO) {
+                poppingQueue = i;
+            } else {
+                poppingQueue = j;
+            }
+            if (queues[poppingQueue].try_lock()) continue;
+            auto& q = queues[poppingQueue];
+            if (!q.pq.empty()) {
+#ifdef MQ_IO_ENABLE_CLEAR_FOR_POP
+                PrioType minPrio = minPrioForPop.load(std::memory_order_acquire);
+                if (compare(q.pq.top(), {minPrio, 0})) {
+                    q.pq.clear();
+                    // do not add `q.pops` on purpose
+                    numEmpty.fetch_add(1, std::memory_order_relaxed);
+                    q.min.store(EMPTY_PRIO, std::memory_order_release);
+                } else {
+#endif
+                    PQElement retItem = q.pq.top();
+                    q.pq.pop();
+                    q.pops++;
+                    if (q.pq.empty())
+                        numEmpty.fetch_add(1, std::memory_order_relaxed);
+                    q.min.store(
+                        q.pq.size() > 0
+                            ? std::get<0>(q.pq.top())
+                            : EMPTY_PRIO,
+                        std::memory_order_release);
+                    q.unlock();
+                    return retItem;
+#ifdef MQ_IO_ENABLE_CLEAR_FOR_POP
+                }
+#endif
+            }
+            q.unlock();
+        }
+        return {};
+    }
+
+#ifdef PERF
+    boost::optional<uint64_t> __attribute__((noinline)) tryPopBatch(PQElement* ret) {
+#else
+    inline std::optional<uint64_t> tryPopBatch(PQElement* ret) {
+#endif
+        auto item = popBatch(ret);
+        if (item) return item;
+
+        // increment count and keep on trying to pop
+        uint64_t num = numIdle.fetch_add(1, std::memory_order_relaxed) + 1;
+        do {
+            item = popBatch(ret);
+            if (item) break;
+            if (num >= threadNum) return {};
+
+            num = numIdle.load(std::memory_order_relaxed);
+
+        } while (true);
+
+        numIdle.fetch_sub(1, std::memory_order_relaxed);
+        return item;
+    }
+
+#ifdef PERF
+    void __attribute__((noinline)) popInt(uint64_t queue, PQElement* ret) {
+        auto& q = queues[queue];
+        *ret = q.pq.top();
+        q.pq.pop();
+    }
+#endif
+
+#ifdef PERF
+    boost::optional<uint64_t> __attribute__((noinline)) popBatch(PQElement* ret){
+#else
+    inline std::optional<uint64_t> popBatch(PQElement* ret) {
+#endif
+        uint64_t poppingQueue = NUM_QUEUES;
+    while (true) {
+        // Pick the higher priority max of queue i and j
+        uint64_t i = ThreadLocalRandom();
+        uint64_t j = ThreadLocalRandom();
+        while (j == i) {
+            j = ThreadLocalRandom();
+        }
+
+        PrioType minI = queues[i].min.load(std::memory_order_acquire);
+        PrioType minJ = queues[j].min.load(std::memory_order_acquire);
+
+        if (minI == EMPTY_PRIO && minJ == EMPTY_PRIO) {
+            uint64_t emptyQueues = numEmpty.load(std::memory_order_relaxed);
+            if (emptyQueues >= queues.size())
+                break;
+            else
+                continue;
+        }
+
+        if (minI != EMPTY_PRIO && minJ != EMPTY_PRIO) {
+            poppingQueue = compare({minJ, 0}, {minI, 0}) ? i : j;
+        } else if (minJ == EMPTY_PRIO) {
+            poppingQueue = i;
+        } else {
+            poppingQueue = j;
+        }
+        if (queues[poppingQueue].try_lock()) continue;
+        auto& q = queues[poppingQueue];
+        if (q.pq.empty()) {
+            q.unlock();
+            continue;
+        }
+
+        uint64_t num = 0;
+        for (num = 0; num < batchSize; num++) {
+            if (q.pq.empty()) break;
+#ifdef PERF
+            popInt(poppingQueue, &ret[num]);
+#else
+                ret[num] = q.pq.top();
+                q.pq.pop();
+#endif
+        }
+        q.pops += num;
+        if (q.pq.empty())
+            numEmpty.fetch_add(1, std::memory_order_relaxed);
+        q.min.store(
+            q.pq.size() > 0
+                ? std::get<0>(q.pq.top())
+                : EMPTY_PRIO,
+            std::memory_order_release);
+        q.unlock();
+        if (num == 0) continue;
+
+        return num;
+    }
+    return {};
+}
+
+inline uint64_t
+getQueueOccupancy() const {
+    uint64_t maxOccupancy = 0;
+    for (uint64_t i = 0; i < NUM_QUEUES; i++) {
+        maxOccupancy = std::max(maxOccupancy, queues[i].pq.size());
+    }
+    return maxOccupancy;
+}
+
+// Get the number of pushes to all queues.
+// Note: this is not lock protected.
+inline uint64_t getNumPushes() const {
+    uint64_t totalPushes = 0;
+    for (uint64_t i = 0; i < NUM_QUEUES; i++) {
+        totalPushes += queues[i].pushes;
+    }
+    return totalPushes;
+}
+
+// Get the number of pops to all queues.
+// Note: this is not lock protected.
+inline uint64_t getNumPops() const {
+    uint64_t totalPops = 0;
+    for (uint64_t i = 0; i < NUM_QUEUES; i++) {
+        totalPops += queues[i].pops;
+    }
+    return totalPops;
+}
+
+inline void stat() const {
+    std::cout << "total pushes " << getNumPushes() << "\n";
+    std::cout << "total pops " << getNumPops() << "\n";
+}
+
+// Note: this is only called at the end of algorithm as a
+// sanity check, therefore it is not lock protected.
+inline bool empty() const {
+    for (uint i = 0; i < NUM_QUEUES; i++) {
+        if (!queues[i].pq.empty()) {
+            return false;
+        }
+    }
+    return true;
+}
+
+// Resets the MultiQueue to a state as if it was reinitialized.
+// This must be called before using the MQ again after using TypPop().
+// Note: this assumes the queues are already empty and unlocked.
+inline void reset() {
+    for (uint64_t i = 0; i < NUM_QUEUES; i++) {
+        assert(queues[i].pq.empty() && "reset() assumes empty queues");
+        assert((queues[i].queueLock.test(std::memory_order_relaxed) == 0)
+               && "reset() assumes unlocked queues");
+        queues[i].pushes = 0;
+        queues[i].pops = 0;
+        queues[i].min.store(EMPTY_PRIO, std::memory_order_relaxed);
+    }
+    numIdle.store(0, std::memory_order_relaxed);
+    numEmpty.store(NUM_QUEUES, std::memory_order_relaxed);
+#ifdef MQ_IO_ENABLE_CLEAR_FOR_POP
+    minPrioForPop.store(std::numeric_limits<PrioType>::max(), std::memory_order_relaxed);
+#endif
+}
+}
+;
diff --git a/vpr/src/route/netlist_routers.h b/vpr/src/route/netlist_routers.h
index d64477f03ad..eb8a220f51f 100644
--- a/vpr/src/route/netlist_routers.h
+++ b/vpr/src/route/netlist_routers.h
@@ -3,7 +3,7 @@
 /** @file Interface for a netlist router.
  *
  * A NetlistRouter manages the required bits of state to complete the netlist routing process,
- * which requires finding a path for every connection in the netlist using a ConnectionRouter.
+ * which requires finding a path for every connection in the netlist using a SerialConnectionRouter.
  * This needs to be an interface because there may be different netlist routing schedules,
  * i.e. parallel or net-decomposing routers.
  *
@@ -19,7 +19,6 @@
 #include "NetPinTimingInvalidator.h"
 #include "clustered_netlist_utils.h"
 #include "connection_based_routing_fwd.h"
-#include "connection_router.h"
 #include "globals.h"
 #include "heap_type.h"
 #include "netlist_fwd.h"
diff --git a/vpr/src/route/parallel_connection_router.cpp b/vpr/src/route/parallel_connection_router.cpp
new file mode 100644
index 00000000000..f3111f156f0
--- /dev/null
+++ b/vpr/src/route/parallel_connection_router.cpp
@@ -0,0 +1,489 @@
+#include "parallel_connection_router.h"
+
+#include <algorithm>
+#include "route_tree.h"
+#include "rr_graph_fwd.h"
+
+/** Post-target pruning: Prune a given node (do not explore it) if the cost of
+ * the best possible path from the source, through the node, to the target is
+ * higher than the cost of the best path found to the target so far. Cited from
+ * the FPT'24 conference paper (more details can also be found there). */
+static inline bool post_target_prune_node(float new_total_cost,
+                                          float new_back_cost,
+                                          float best_back_cost_to_target,
+                                          const t_conn_cost_params& params) {
+    // Divide out the astar_fac, then multiply to get determinism
+    // This is a correction factor to the forward cost to make the total
+    // cost an under-estimate.
+    // TODO: Should investigate creating a heuristic function that is
+    //       gaurenteed to be an under-estimate.
+    // NOTE: Found experimentally that using the original heuristic to order
+    //       the nodes in the queue and then post-target pruning based on the
+    //       under-estimating heuristic has better runtime.
+    float expected_cost = new_total_cost - new_back_cost;
+    float new_expected_cost = expected_cost;
+    // h1 = (h - offset) * fac
+    // Protection for division by zero
+    if (params.astar_fac > 0.001)
+        // To save time, does not recompute the heuristic, just divideds out
+        // the astar_fac.
+        new_expected_cost /= params.astar_fac;
+    new_expected_cost = new_expected_cost - params.post_target_prune_offset;
+    // Max function to prevent the heuristic from going negative
+    new_expected_cost = std::max(0.f, new_expected_cost);
+    new_expected_cost *= params.post_target_prune_fac;
+    if ((new_back_cost + new_expected_cost) > best_back_cost_to_target)
+        return true;
+    // NOTE: we do NOT check for equality here. Equality does not matter for
+    //       determinism when draining the queues (may just lead to a bit more work).
+    return false;
+}
+
+/** Pre-push pruning: when iterating over the neighbors of u, this function
+ * determines whether a path through u to its neighbor node v has a better
+ * backward cost than the best path to v found so far (breaking ties if needed).
+ * Cited from the FPT'24 conference paper (more details can also be found there).
+ */
+// TODO: Once we have a heap node struct, clean this up!
+static inline bool prune_node(RRNodeId inode,
+                              float new_total_cost,
+                              float new_back_cost,
+                              RREdgeId new_prev_edge,
+                              RRNodeId target_node,
+                              vtr::vector<RRNodeId, t_rr_node_route_inf>& rr_node_route_inf_,
+                              const t_conn_cost_params& params) {
+    // Post-target pruning: After the target is reached the first time, should
+    // use the heuristic to help drain the queues.
+    if (inode != target_node) {
+        t_rr_node_route_inf* target_route_inf = &rr_node_route_inf_[target_node];
+        float best_back_cost_to_target = target_route_inf->backward_path_cost;
+        if (post_target_prune_node(new_total_cost, new_back_cost, best_back_cost_to_target, params))
+            return true;
+    }
+
+    // Backwards Pruning
+    // NOTE: When going to the target, we only want to prune on the truth.
+    //       The queues handle using the heuristic to explore nodes faster.
+    t_rr_node_route_inf* route_inf = &rr_node_route_inf_[inode];
+    float best_back_cost = route_inf->backward_path_cost;
+    if (new_back_cost > best_back_cost)
+        return true;
+    // In the case of a tie, need to be picky about whether to prune or not in
+    // order to get determinism.
+    // FIXME: This may not be thread safe. If the best node changes while this
+    //        function is being called, we may have the new_back_cost and best
+    //        prev_edge's being from different heap nodes!
+    // TODO: Move this to within the lock (the rest can stay for performance).
+    if (new_back_cost == best_back_cost) {
+#ifndef NON_DETERMINISTIC_PRUNING
+        // With deterministic pruning, cannot always prune on ties.
+        // In the case of a true tie, just prune, no need to explore neightbors
+        RREdgeId best_prev_edge = route_inf->prev_edge;
+        if (new_prev_edge == best_prev_edge)
+            return true;
+        // When it comes to invalid edge IDs, in the case of a tied back cost,
+        // always try to keep the invalid edge ID (likely the start node).
+        // TODO: Verify this.
+        // If the best previous edge is invalid, prune
+        if (!best_prev_edge.is_valid())
+            return true;
+        // If the new previous edge is invalid (assuming the best is not), accept
+        if (!new_prev_edge.is_valid())
+            return false;
+        // Finally, if this node is not coming from a preferred edge, prune
+        // Deterministic version prefers a given EdgeID, so a unique path is returned since,
+        // in the case of a tie, a determinstic path wins.
+        // Is first preferred over second?
+        auto is_preferred_edge = [](RREdgeId first, RREdgeId second) {
+            return first < second;
+        };
+        if (!is_preferred_edge(new_prev_edge, best_prev_edge))
+            return true;
+#else
+        std::ignore = new_prev_edge;
+        // When we do not care about determinism, always prune on equality.
+        return true;
+#endif
+    }
+
+    // If all above passes, do not prune.
+    return false;
+}
+
+/** Post-pop pruning: After node u is popped from the queue, this function
+ * decides whether to explore the neighbors of u or to prune. Initially, it
+ * performs Post-Target Pruning based on the stopping criterion. Then, the
+ * current total estimated cost of the path through node u (f_u) is compared
+ * to the best total cost so far (most recently pushed) for that node and,
+ * if the two are different, the node u is pruned. During the wave expansion,
+ * u may be pushed to the queue multiple times. For example, node u may be
+ * pushed to the queue and then, before u is popped from the queue, a better
+ * path to u may be found and pushed to the queue. Here we are using f_u as
+ * an optimistic identifier to check if the pair (u, f_u) is the most recently
+ * pushed element for node u. This reduces redundant work.
+ * Cited from the FPT'24 conference paper (more details can also be found there).
+ */
+static inline bool should_not_explore_neighbors(RRNodeId inode,
+                                                float new_total_cost,
+                                                float new_back_cost,
+                                                RRNodeId target_node,
+                                                vtr::vector<RRNodeId, t_rr_node_route_inf>& rr_node_route_inf_,
+                                                const t_conn_cost_params& params) {
+#ifndef NON_DETERMINISTIC_PRUNING
+    // For deterministic pruning, cannot enforce anything on the total cost since
+    // traversal order is not gaurenteed. However, since total cost is used as a
+    // "key" to signify that this node is the last node that was pushed, we can
+    // just check for equality. There is a chance this may cause some duplicates
+    // for the deterministic case, but thats ok they will be handled.
+    // TODO: Maybe consider having the non-deterministic version do this too.
+    if (new_total_cost != rr_node_route_inf_[inode].path_cost)
+        return true;
+#else
+    // For non-deterministic pruning, can greadily just ignore nodes with higher
+    // total cost.
+    if (new_total_cost > rr_node_route_inf_[inode].path_cost)
+        return true;
+#endif
+    // Perform post-target pruning. If this is not done, there is a chance that
+    // several duplicates of a node is in the queue that will never reach the
+    // target better than what we found and they will explore all of their
+    // neighbors which is not good. This is done before obtaining the lock to
+    // prevent lock contention where possible.
+    if (inode != target_node) {
+        float best_back_cost_to_target = rr_node_route_inf_[target_node].backward_path_cost;
+        if (post_target_prune_node(new_total_cost, new_back_cost, best_back_cost_to_target, params))
+            return true;
+    }
+    return false;
+}
+
+template<typename Heap>
+void ParallelConnectionRouter<Heap>::timing_driven_find_single_shortest_path_from_heap(RRNodeId sink_node,
+                                                                                       const t_conn_cost_params& cost_params,
+                                                                                       const t_bb& bounding_box,
+                                                                                       const t_bb& target_bb) {
+    // Assign the thread task function parameters to atomic variables
+    this->sink_node_ = &sink_node;
+    this->cost_params_ = const_cast<t_conn_cost_params*>(&cost_params);
+    this->bounding_box_ = const_cast<t_bb*>(&bounding_box);
+    this->target_bb_ = const_cast<t_bb*>(&target_bb);
+
+    // Synchronize at the barrier before executing a new thread task
+    this->thread_barrier_.wait();
+
+    // Main thread executes a new thread task (helper threads are doing the same in the background)
+    this->timing_driven_find_single_shortest_path_from_heap_thread_func(*this->sink_node_,
+                                                                        *this->cost_params_,
+                                                                        *this->bounding_box_,
+                                                                        *this->target_bb_, 0);
+
+    // Synchronize at the barrier before resetting the heap
+    this->thread_barrier_.wait();
+
+    // Collect the number of heap pushes and pops
+    this->router_stats_->heap_pushes += this->heap_.get_num_pushes();
+    this->router_stats_->heap_pops += this->heap_.get_num_pops();
+
+    // Reset the heap for the next connection
+    this->heap_.reset();
+}
+
+template<typename Heap>
+void ParallelConnectionRouter<Heap>::timing_driven_find_single_shortest_path_from_heap_sub_thread_wrapper(const size_t thread_idx) {
+    this->thread_barrier_.init();
+    while (true) {
+        this->thread_barrier_.wait();
+        if (this->is_router_destroying_ == true) {
+            return;
+        } else {
+            timing_driven_find_single_shortest_path_from_heap_thread_func(*this->sink_node_,
+                                                                          *this->cost_params_,
+                                                                          *this->bounding_box_,
+                                                                          *this->target_bb_,
+                                                                          thread_idx);
+        }
+        this->thread_barrier_.wait();
+    }
+}
+
+template<typename Heap>
+void ParallelConnectionRouter<Heap>::timing_driven_find_single_shortest_path_from_heap_thread_func(RRNodeId sink_node,
+                                                                                                   const t_conn_cost_params& cost_params,
+                                                                                                   const t_bb& bounding_box,
+                                                                                                   const t_bb& target_bb,
+                                                                                                   const size_t thread_idx) {
+    HeapNode cheapest;
+    while (this->heap_.try_pop(cheapest)) {
+        // Pop a new inode with the cheapest total cost in current route tree to be expanded on
+        const auto& [new_total_cost, inode] = cheapest;
+
+        // Check if we should explore the neighbors of this node
+        if (should_not_explore_neighbors(inode, new_total_cost, this->rr_node_route_inf_[inode].backward_path_cost, sink_node, this->rr_node_route_inf_, cost_params)) {
+            continue;
+        }
+
+        // Get the current RR node info within a critical section to prevent data races
+        obtainSpinLock(inode);
+
+        RTExploredNode current;
+        current.index = inode;
+        current.backward_path_cost = this->rr_node_route_inf_[inode].backward_path_cost;
+        current.prev_edge = this->rr_node_route_inf_[inode].prev_edge;
+        current.R_upstream = this->rr_node_route_inf_[inode].R_upstream;
+
+        releaseLock(inode);
+
+        // Double check now just to be sure that we should still explore neighbors
+        // NOTE: A good question is what happened to the uniqueness pruning. The idea
+        //       is that at this point it does not matter. Basically any duplicates
+        //       will act like they were the last one pushed in. This may create some
+        //       duplicates, but it is a simple way of handling this situation.
+        //       It may be worth investigating a better way to do this in the future.
+        // TODO: This is still doing post-target pruning. May want to investigate
+        //       if this is worth doing.
+        // TODO: should try testing without the pruning below and see if anything changes.
+        if (should_not_explore_neighbors(inode, new_total_cost, current.backward_path_cost, sink_node, this->rr_node_route_inf_, cost_params)) {
+            continue;
+        }
+
+        // Adding nodes to heap
+        timing_driven_expand_neighbours(current, cost_params, bounding_box, sink_node, target_bb, thread_idx);
+    }
+}
+
+template<typename Heap>
+void ParallelConnectionRouter<Heap>::timing_driven_expand_neighbours(const RTExploredNode& current,
+                                                                     const t_conn_cost_params& cost_params,
+                                                                     const t_bb& bounding_box,
+                                                                     RRNodeId target_node,
+                                                                     const t_bb& target_bb,
+                                                                     size_t thread_idx) {
+    /* Puts all the rr_nodes adjacent to current on the heap. */
+
+    // For each node associated with the current heap element, expand all of it's neighbors
+    auto edges = this->rr_nodes_.edge_range(current.index);
+
+    // This is a simple prefetch that prefetches:
+    //  - RR node data reachable from this node
+    //  - rr switch data to reach those nodes from this node.
+    //
+    // This code will be a NOP on compiler targets that do not have a
+    // builtin to emit prefetch instructions.
+    //
+    // This code will be a NOP on CPU targets that lack prefetch instructions.
+    // All modern x86 and ARM64 platforms provide prefetch instructions.
+    //
+    // This code delivers ~6-8% reduction in wallclock time when running Titan
+    // benchmarks, and was specifically measured against the gsm_switch and
+    // directrf vtr_reg_weekly running in high effort.
+    //
+    //  - directrf_stratixiv_arch_timing.blif
+    //  - gsm_switch_stratixiv_arch_timing.blif
+    //
+    for (RREdgeId from_edge : edges) {
+        RRNodeId to_node = this->rr_nodes_.edge_sink_node(from_edge);
+        this->rr_nodes_.prefetch_node(to_node);
+
+        int switch_idx = this->rr_nodes_.edge_switch(from_edge);
+        VTR_PREFETCH(&this->rr_switch_inf_[switch_idx], 0, 0);
+    }
+
+    for (RREdgeId from_edge : edges) {
+        RRNodeId to_node = this->rr_nodes_.edge_sink_node(from_edge);
+        timing_driven_expand_neighbour(current,
+                                       from_edge,
+                                       to_node,
+                                       cost_params,
+                                       bounding_box,
+                                       target_node,
+                                       target_bb,
+                                       thread_idx);
+    }
+}
+
+template<typename Heap>
+void ParallelConnectionRouter<Heap>::timing_driven_expand_neighbour(const RTExploredNode& current,
+                                                                    RREdgeId from_edge,
+                                                                    RRNodeId to_node,
+                                                                    const t_conn_cost_params& cost_params,
+                                                                    const t_bb& bounding_box,
+                                                                    RRNodeId target_node,
+                                                                    const t_bb& target_bb,
+                                                                    size_t thread_idx) {
+    // BB-pruning
+    // Disable BB-pruning if RCV is enabled, as this can make it harder for circuits with high negative hold slack to resolve this
+    // TODO: Only disable pruning if the net has negative hold slack, maybe go off budgets
+    if (!inside_bb(to_node, bounding_box)) {
+        // Note: Logging are disabled for parallel connection router
+        return; /* Node is outside (expanded) bounding box. */
+    }
+
+    /* Prune away IPINs that lead to blocks other than the target one.  Avoids  *
+     * the issue of how to cost them properly so they don't get expanded before *
+     * more promising routes, but makes route-through (via CLBs) impossible.   *
+     * Change this if you want to investigate route-throughs.                   */
+    if (target_node != RRNodeId::INVALID()) {
+        e_rr_type to_type = this->rr_graph_->node_type(to_node);
+        if (to_type == e_rr_type::IPIN) {
+            // Check if this IPIN leads to the target block
+            // IPIN's of the target block should be contained within it's bounding box
+            int to_xlow = this->rr_graph_->node_xlow(to_node);
+            int to_ylow = this->rr_graph_->node_ylow(to_node);
+            int to_layer = this->rr_graph_->node_layer(to_node);
+            int to_xhigh = this->rr_graph_->node_xhigh(to_node);
+            int to_yhigh = this->rr_graph_->node_yhigh(to_node);
+            if (to_xlow < target_bb.xmin
+                || to_ylow < target_bb.ymin
+                || to_xhigh > target_bb.xmax
+                || to_yhigh > target_bb.ymax
+                || to_layer < target_bb.layer_min
+                || to_layer > target_bb.layer_max) {
+                // Note: Logging are disabled for parallel connection router
+                return;
+            }
+        }
+    }
+    // Note: Logging are disabled for parallel connection router
+
+    timing_driven_add_to_heap(cost_params,
+                              current,
+                              to_node,
+                              from_edge,
+                              target_node,
+                              thread_idx);
+}
+
+template<typename Heap>
+void ParallelConnectionRouter<Heap>::timing_driven_add_to_heap(const t_conn_cost_params& cost_params,
+                                                               const RTExploredNode& current,
+                                                               RRNodeId to_node,
+                                                               const RREdgeId from_edge,
+                                                               RRNodeId target_node,
+                                                               size_t thread_idx) {
+    const RRNodeId& from_node = current.index;
+
+    // Initialize the neighbor RTExploredNode
+    RTExploredNode next;
+    next.R_upstream = current.R_upstream;
+    next.index = to_node;
+    next.prev_edge = from_edge;
+    next.total_cost = std::numeric_limits<float>::infinity(); // Not used directly
+    next.backward_path_cost = current.backward_path_cost;
+
+    this->evaluate_timing_driven_node_costs(&next, cost_params, from_node, target_node);
+
+    float new_total_cost = next.total_cost;
+    float new_back_cost = next.backward_path_cost;
+
+    // To further reduce lock contention, we add a cheap read-only check before acquiring the lock, motivated by Shun et al.
+    if (prune_node(to_node, new_total_cost, new_back_cost, from_edge, target_node, this->rr_node_route_inf_, cost_params)) {
+        return;
+    }
+
+    obtainSpinLock(to_node);
+
+    if (prune_node(to_node, new_total_cost, new_back_cost, from_edge, target_node, this->rr_node_route_inf_, cost_params)) {
+        releaseLock(to_node);
+        return;
+    }
+
+    update_cheapest(next, thread_idx);
+
+    releaseLock(to_node);
+
+    if (to_node == target_node) {
+#ifdef MQ_IO_ENABLE_CLEAR_FOR_POP
+        if (multi_queue_direct_draining_) {
+            this->heap_.set_min_priority_for_pop(new_total_cost);
+        }
+#endif
+        return;
+    }
+    this->heap_.add_to_heap({new_total_cost, to_node});
+}
+
+template<typename Heap>
+void ParallelConnectionRouter<Heap>::add_route_tree_node_to_heap(
+    const RouteTreeNode& rt_node,
+    RRNodeId target_node,
+    const t_conn_cost_params& cost_params,
+    const t_bb& net_bb) {
+    const auto& device_ctx = g_vpr_ctx.device();
+    const RRNodeId inode = rt_node.inode;
+    float backward_path_cost = cost_params.criticality * rt_node.Tdel;
+    float R_upstream = rt_node.R_upstream;
+
+    /* Don't push to heap if not in bounding box: no-op for serial router, important for parallel router */
+    if (!inside_bb(rt_node.inode, net_bb))
+        return;
+
+    // After budgets are loaded, calculate delay cost as described by RCV paper
+    /* R. Fung, V. Betz and W. Chow, "Slack Allocation and Routing to Improve FPGA Timing While
+     * Repairing Short-Path Violations," in IEEE Transactions on Computer-Aided Design of
+     * Integrated Circuits and Systems, vol. 27, no. 4, pp. 686-697, April 2008.*/
+
+    if (!this->rcv_path_manager.is_enabled()) {
+        float expected_cost = this->router_lookahead_.get_expected_cost(inode, target_node, cost_params, R_upstream);
+        float tot_cost = backward_path_cost + cost_params.astar_fac * std::max(0.f, expected_cost - cost_params.astar_offset);
+        VTR_LOGV_DEBUG(this->router_debug_, "  Adding node %8d to heap from init route tree with cost %g (%s)\n",
+                       inode,
+                       tot_cost,
+                       describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, inode, this->is_flat_).c_str());
+
+        if (prune_node(inode, tot_cost, backward_path_cost, RREdgeId::INVALID(), target_node, this->rr_node_route_inf_, cost_params)) {
+            return;
+        }
+        add_to_mod_list(inode, 0 /*main thread*/);
+        this->rr_node_route_inf_[inode].path_cost = tot_cost;
+        this->rr_node_route_inf_[inode].prev_edge = RREdgeId::INVALID();
+        this->rr_node_route_inf_[inode].backward_path_cost = backward_path_cost;
+        this->rr_node_route_inf_[inode].R_upstream = R_upstream;
+        this->heap_.push_back({tot_cost, inode});
+    }
+    // Note: RCV is not supported by parallel connection router
+}
+
+std::unique_ptr<ConnectionRouterInterface> make_parallel_connection_router(e_heap_type heap_type,
+                                                                           const DeviceGrid& grid,
+                                                                           const RouterLookahead& router_lookahead,
+                                                                           const t_rr_graph_storage& rr_nodes,
+                                                                           const RRGraphView* rr_graph,
+                                                                           const std::vector<t_rr_rc_data>& rr_rc_data,
+                                                                           const vtr::vector<RRSwitchId, t_rr_switch_inf>& rr_switch_inf,
+                                                                           vtr::vector<RRNodeId, t_rr_node_route_inf>& rr_node_route_inf,
+                                                                           bool is_flat,
+                                                                           int multi_queue_num_threads,
+                                                                           int multi_queue_num_queues,
+                                                                           bool multi_queue_direct_draining) {
+    switch (heap_type) {
+        case e_heap_type::BINARY_HEAP:
+            return std::make_unique<ParallelConnectionRouter<BinaryHeap>>(
+                grid,
+                router_lookahead,
+                rr_nodes,
+                rr_graph,
+                rr_rc_data,
+                rr_switch_inf,
+                rr_node_route_inf,
+                is_flat,
+                multi_queue_num_threads,
+                multi_queue_num_queues,
+                multi_queue_direct_draining);
+        case e_heap_type::FOUR_ARY_HEAP:
+            return std::make_unique<ParallelConnectionRouter<FourAryHeap>>(
+                grid,
+                router_lookahead,
+                rr_nodes,
+                rr_graph,
+                rr_rc_data,
+                rr_switch_inf,
+                rr_node_route_inf,
+                is_flat,
+                multi_queue_num_threads,
+                multi_queue_num_queues,
+                multi_queue_direct_draining);
+        default:
+            VPR_FATAL_ERROR(VPR_ERROR_ROUTE, "Unknown heap_type %d",
+                            heap_type);
+    }
+}
diff --git a/vpr/src/route/parallel_connection_router.h b/vpr/src/route/parallel_connection_router.h
new file mode 100644
index 00000000000..18d873e0c6e
--- /dev/null
+++ b/vpr/src/route/parallel_connection_router.h
@@ -0,0 +1,443 @@
+#ifndef _PARALLEL_CONNECTION_ROUTER_H
+#define _PARALLEL_CONNECTION_ROUTER_H
+
+#include "connection_router.h"
+
+#include "d_ary_heap.h"
+#include "multi_queue_d_ary_heap.h"
+
+#include <atomic>
+#include <thread>
+#include <mutex>
+#include <condition_variable>
+
+/**
+ * @brief Spin lock implementation using std::atomic_flag
+ *
+ * It is used per RR node for protecting the update to node costs
+ * to prevent data races. Since different threads rarely work on
+ * the same node simultaneously, this fine-grained locking strategy
+ * of one lock per node reduces contention.
+ */
+class spin_lock_t {
+    /** Atomic flag used for the lock implementation */
+    std::atomic_flag lock_ = ATOMIC_FLAG_INIT;
+
+  public:
+    /**
+     * @brief Acquires the spin lock, repeatedly attempting until successful
+     */
+    void acquire() {
+        while (std::atomic_flag_test_and_set_explicit(&lock_, std::memory_order_acquire))
+            ;
+    }
+
+    /**
+     * @brief Releases the spin lock, allowing other threads to acquire it
+     */
+    void release() {
+        std::atomic_flag_clear_explicit(&lock_, std::memory_order_release);
+    }
+};
+
+/**
+ * @brief Thread barrier implementation using std::mutex
+ *
+ * It ensures all participating threads reach a synchronization point
+ * before any are allowed to proceed further. It uses a mutex and
+ * condition variable to coordinate thread synchronization.
+ */
+class barrier_mutex_t {
+    // FIXME: Try std::barrier (since C++20) to replace this mutex barrier
+    std::mutex mutex_;
+    std::condition_variable cv_;
+    size_t count_;
+    size_t max_count_;
+    size_t generation_ = 0;
+
+  public:
+    /**
+     * @brief Constructs a barrier for a specific number of threads
+     * @param num_threads Number of threads that must call wait() before
+     * any thread is allowed to proceed
+     */
+    explicit barrier_mutex_t(size_t num_threads)
+        : count_(num_threads)
+        , max_count_(num_threads) {}
+
+    /**
+     * @brief Blocks the calling thread until all threads have called wait()
+     *
+     * When the specified number of threads have called this method, all
+     * threads are unblocked and the barrier is reset for the next use.
+     */
+    void wait() {
+        std::unique_lock<std::mutex> lock{mutex_};
+        size_t gen = generation_;
+        if (--count_ == 0) {
+            generation_++;
+            count_ = max_count_;
+            cv_.notify_all();
+        } else {
+            cv_.wait(lock, [this, &gen] { return gen != generation_; });
+        }
+    }
+};
+
+/**
+ * @brief Spin-based thread barrier implementation using std::atomic
+ *
+ * It ensures all participating threads reach a synchronization point
+ * before any are allowed to proceed further. It uses atomic operations
+ * to implement Sense-Reversing Centralized Barrier (from Section 5.2.1
+ * of Michael L. Scott's textbook) without using mutex locks.
+ */
+class barrier_spin_t {
+    /** Number of threads that must reach the barrier */
+    size_t num_threads_ = 1;
+
+    /** Atomic counter tracking the number of threads that have arrived at the barrier */
+    std::atomic<size_t> count_ = 0;
+
+    /** Global sense shared by all participating threads */
+    std::atomic<bool> sense_ = false;
+
+    /** Thread-local sense value for each participating thread */
+    inline static thread_local bool local_sense_ = false;
+
+  public:
+    /**
+     * @brief Constructs a barrier for a specific number of threads
+     * @param num_threads Number of threads that must call wait() before
+     * any thread is allowed to proceed
+     */
+    explicit barrier_spin_t(size_t num_threads) { num_threads_ = num_threads; }
+
+    /**
+     * @brief Initializes the thread-local sense flag
+     * @note Should be called by each thread before first using the barrier.
+     */
+    void init() {
+        local_sense_ = false;
+    }
+
+    /**
+     * @brief Blocks the calling thread until all threads have called wait()
+     *
+     * Uses a sense-reversing algorithm to synchronize threads. The last thread
+     * to arrive unblocks all waiting threads. This method avoids using locks or
+     * condition variables, making it potentially more efficient for short waits.
+     */
+    void wait() {
+        bool s = !local_sense_;
+        local_sense_ = s;
+        size_t num_arrivals = count_.fetch_add(1) + 1;
+        if (num_arrivals == num_threads_) {
+            count_.store(0);
+            sense_.store(s);
+        } else {
+            while (sense_.load() != s)
+                ; // spin until the last thread arrives
+        }
+    }
+};
+
+using barrier_t = barrier_spin_t; // Using the spin-based thread barrier
+
+/**
+ * @class ParallelConnectionRouter implements the MultiQueue-based parallel connection
+ * router (FPT'24) based on the ConnectionRouter interface.
+ * @details The details of the algorithm can be found from the conference paper:
+ * A. Singer, H. Yan, G. Zhang, M. Jeffrey, M. Stojilovic and V. Betz, "MultiQueue-Based FPGA Routing:
+ * Relaxed A* Priority Ordering for Improved Parallelism," Int. Conf. on Field-Programmable Technology,
+ * Dec. 2024.
+ */
+template<typename HeapImplementation>
+class ParallelConnectionRouter : public ConnectionRouter<MultiQueueDAryHeap<HeapImplementation::arg_D>> {
+  public:
+    ParallelConnectionRouter(
+        const DeviceGrid& grid,
+        const RouterLookahead& router_lookahead,
+        const t_rr_graph_storage& rr_nodes,
+        const RRGraphView* rr_graph,
+        const std::vector<t_rr_rc_data>& rr_rc_data,
+        const vtr::vector<RRSwitchId, t_rr_switch_inf>& rr_switch_inf,
+        vtr::vector<RRNodeId, t_rr_node_route_inf>& rr_node_route_inf,
+        bool is_flat,
+        int multi_queue_num_threads,
+        int multi_queue_num_queues,
+        bool multi_queue_direct_draining)
+        : ConnectionRouter<MultiQueueDAryHeap<HeapImplementation::arg_D>>(grid, router_lookahead, rr_nodes, rr_graph, rr_rc_data, rr_switch_inf, rr_node_route_inf, is_flat)
+        , modified_rr_node_inf_(multi_queue_num_threads)
+        , thread_barrier_(multi_queue_num_threads)
+        , is_router_destroying_(false)
+        , locks_(rr_node_route_inf.size())
+        , multi_queue_direct_draining_(multi_queue_direct_draining) {
+        // Set the MultiQueue parameters
+        this->heap_.set_num_threads_and_queues(multi_queue_num_threads, multi_queue_num_queues);
+        // Initialize the thread barrier
+        this->thread_barrier_.init();
+        // Instantiate (multi_queue_num_threads - 1) helper threads
+        this->sub_threads_.resize(multi_queue_num_threads - 1);
+        for (int i = 0; i < multi_queue_num_threads - 1; ++i) {
+            this->sub_threads_[i] = std::thread(&ParallelConnectionRouter::timing_driven_find_single_shortest_path_from_heap_sub_thread_wrapper, this, i + 1 /*0: main thread*/);
+            this->sub_threads_[i].detach();
+        }
+    }
+
+    ~ParallelConnectionRouter() {
+        this->is_router_destroying_ = true; // signal the helper threads to exit
+        this->thread_barrier_.wait();       // wait until all threads reach the barrier
+
+        VTR_LOG("Parallel Connection Router is being destroyed. Time spent on path search: %.3f seconds.\n",
+                std::chrono::duration<float /*convert to seconds by default*/>(this->path_search_cumulative_time).count());
+    }
+
+    /**
+     * @brief Clears the modified list per thread
+     * @note Should be called after reset_path_costs have been called
+     */
+    void clear_modified_rr_node_info() final {
+        for (auto& thread_visited_rr_nodes : this->modified_rr_node_inf_) {
+            thread_visited_rr_nodes.clear();
+        }
+    }
+
+    /**
+     * @brief Resets modified data in rr_node_route_inf based on modified_rr_node_inf
+     */
+    void reset_path_costs() final {
+        // Reset the node info stored in rr_node_route_inf variable
+        for (const auto& thread_visited_rr_nodes : this->modified_rr_node_inf_) {
+            ::reset_path_costs(thread_visited_rr_nodes);
+        }
+    }
+
+    /**
+     * @brief [Not supported] Enables RCV feature
+     * @note RCV for parallel connection router has not been implemented yet.
+     * Thus this function is not expected to be called.
+     */
+    void set_rcv_enabled(bool) final {
+        VPR_FATAL_ERROR(VPR_ERROR_ROUTE, "RCV for parallel connection router not yet implemented. Not expected to be called.");
+    }
+
+    /**
+     * @brief [Not supported] Finds shortest paths from the route tree rooted at rt_root to all sinks available
+     * @note This function has not been implemented yet and is not the focus of parallel connection router.
+     * Thus this function is not expected to be called.
+     */
+    vtr::vector<RRNodeId, RTExploredNode> timing_driven_find_all_shortest_paths_from_route_tree(
+        const RouteTreeNode&,
+        const t_conn_cost_params&,
+        const t_bb&,
+        RouterStats&,
+        const ConnectionParameters&) final {
+        VPR_FATAL_ERROR(VPR_ERROR_ROUTE, "timing_driven_find_all_shortest_paths_from_route_tree not yet implemented (nor is the focus of the parallel connection router). Not expected to be called.");
+    }
+
+  protected:
+    /**
+     * @brief Marks that data associated with rr_node 'inode' has
+     * been modified, and needs to be reset in reset_path_costs
+     */
+    inline void add_to_mod_list(RRNodeId inode, size_t thread_idx) {
+        if (std::isinf(this->rr_node_route_inf_[inode].path_cost)) {
+            this->modified_rr_node_inf_[thread_idx].push_back(inode);
+        }
+    }
+
+    /**
+     * @brief Updates the route path to the node `cheapest.index`
+     * via the path from `from_node` via `cheapest.prev_edge`
+     */
+    inline void update_cheapest(RTExploredNode& cheapest, size_t thread_idx) {
+        const RRNodeId& inode = cheapest.index;
+        add_to_mod_list(inode, thread_idx);
+        this->rr_node_route_inf_[inode].prev_edge = cheapest.prev_edge;
+        this->rr_node_route_inf_[inode].path_cost = cheapest.total_cost;
+        this->rr_node_route_inf_[inode].backward_path_cost = cheapest.backward_path_cost;
+    }
+
+    /**
+     * @brief Obtains the per-node spin locks for protecting node cost updates
+     */
+    inline void obtainSpinLock(const RRNodeId& inode) {
+        this->locks_[size_t(inode)].acquire();
+    }
+
+    /**
+     * @brief Releases the per-node spin lock, allowing other
+     * threads working on the same node to obtain it
+     */
+    inline void releaseLock(const RRNodeId& inode) {
+        this->locks_[size_t(inode)].release();
+    }
+
+    /**
+     * @brief Finds the single shortest path from current heap to the sink node in the RR graph
+     * @param sink_node Sink node ID to route to
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     * @param target_bb Prune IPINs that lead to blocks other than the target block
+     */
+    void timing_driven_find_single_shortest_path_from_heap(RRNodeId sink_node,
+                                                           const t_conn_cost_params& cost_params,
+                                                           const t_bb& bounding_box,
+                                                           const t_bb& target_bb) final;
+
+    /**
+     * @brief Helper thread wrapper function, passed to std::thread instantiation and running a
+     * while-loop to obtain and execute new helper thread tasks until the main thread signals the
+     * threads to exit
+     * @param thread_idx Thread ID (0 means main thread; 1 to #threads-1 means helper threads)
+     */
+    void timing_driven_find_single_shortest_path_from_heap_sub_thread_wrapper(
+        const size_t thread_idx);
+
+    /**
+     * @brief Helper thread task function to find the single shortest path from current heap to
+     * the sink node in the RR graph
+     * @param sink_node Sink node ID to route to
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     * @param target_bb Prune IPINs that lead to blocks other than the target block
+     * @param thread_idx Thread ID (0 means main thread; 1 to #threads-1 means helper threads)
+     */
+    void timing_driven_find_single_shortest_path_from_heap_thread_func(
+        RRNodeId sink_node,
+        const t_conn_cost_params& cost_params,
+        const t_bb& bounding_box,
+        const t_bb& target_bb,
+        const size_t thread_idx);
+
+    /**
+     * @brief Expands each neighbor of the current node in the wave expansion
+     * @param current Current node being explored
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     * @param target_node Target node ID to route to
+     * @param target_bb Prune IPINs that lead to blocks other than the target block
+     * @param thread_idx Thread ID (0 means main thread; 1 to #threads-1 means helper threads)
+     */
+    void timing_driven_expand_neighbours(
+        const RTExploredNode& current,
+        const t_conn_cost_params& cost_params,
+        const t_bb& bounding_box,
+        RRNodeId target_node,
+        const t_bb& target_bb,
+        size_t thread_idx);
+
+    /**
+     * @brief Conditionally adds to_node to the router heap (via path from current.index via from_edge)
+     * @note RR nodes outside bounding box specified in bounding_box are not added to the heap.
+     * @param current Current node being explored
+     * @param from_edge Edge between the current node and the neighbor node
+     * @param to_node Neighbor node to be expanded
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     * @param target_node Target node ID to route to
+     * @param target_bb Prune IPINs that lead to blocks other than the target block
+     * @param thread_idx Thread ID (0 means main thread; 1 to #threads-1 means helper threads)
+     */
+    void timing_driven_expand_neighbour(
+        const RTExploredNode& current,
+        RREdgeId from_edge,
+        RRNodeId to_node,
+        const t_conn_cost_params& cost_params,
+        const t_bb& bounding_box,
+        RRNodeId target_node,
+        const t_bb& target_bb,
+        size_t thread_idx);
+
+    /**
+     * @brief Adds to_node to the heap, and also adds any nodes which are connected by non-configurable edges
+     * @param cost_params Cost function parameters
+     * @param current Current node being explored
+     * @param to_node Neighbor node to be expanded
+     * @param from_edge Edge between the current node and the neighbor node
+     * @param target_node Target node ID to route to
+     * @param thread_idx Thread ID (0 means main thread; 1 to #threads-1 means helper threads)
+     */
+    void timing_driven_add_to_heap(
+        const t_conn_cost_params& cost_params,
+        const RTExploredNode& current,
+        RRNodeId to_node,
+        RREdgeId from_edge,
+        RRNodeId target_node,
+        size_t thread_idx);
+
+    /**
+     * @brief Unconditionally adds rt_node to the heap
+     * @note If you want to respect rt_node->re_expand that is the caller's responsibility.
+     * @todo Consider moving this function into the ConnectionRouter class after checking
+     * the different prune functions of the serial and parallel connection routers.
+     * @param rt_node RouteTreeNode to be added to the heap
+     * @param target_node Target node ID to route to
+     * @param cost_params Cost function parameters
+     * @param net_bb Do not push to heap if not in bounding box
+     */
+    void add_route_tree_node_to_heap(
+        const RouteTreeNode& rt_node,
+        RRNodeId target_node,
+        const t_conn_cost_params& cost_params,
+        const t_bb& net_bb) final;
+
+    /**
+     * @brief [Not supported] Finds shortest paths from current heap to all nodes in the RR graph
+     * @note This function has not been implemented yet and is not the focus of parallel connection router.
+     * Thus this function is not expected to be called.
+     */
+    vtr::vector<RRNodeId, RTExploredNode> timing_driven_find_all_shortest_paths_from_heap(
+        const t_conn_cost_params&,
+        const t_bb&) final {
+        VPR_FATAL_ERROR(VPR_ERROR_ROUTE, "timing_driven_find_all_shortest_paths_from_heap not yet implemented (nor is the focus of this project). Not expected to be called.");
+    }
+
+    /** Node IDs of modified nodes in rr_node_route_inf for each thread*/
+    std::vector<std::vector<RRNodeId>> modified_rr_node_inf_;
+
+    /** Helper threads */
+    std::vector<std::thread> sub_threads_;
+
+    /** Thread barrier for synchronization */
+    barrier_t thread_barrier_;
+
+    /** Signal for helper threads to exit */
+    std::atomic<bool> is_router_destroying_;
+
+    /** Fine-grained locks per RR node */
+    std::vector<spin_lock_t> locks_;
+
+    /** Is queue draining optimization enabled? */
+    bool multi_queue_direct_draining_;
+
+    //@{
+    /** Atomic parameters of thread task functions to pass from main thread to helper threads */
+    std::atomic<RRNodeId*> sink_node_;
+    std::atomic<t_conn_cost_params*> cost_params_;
+    std::atomic<t_bb*> bounding_box_;
+    std::atomic<t_bb*> target_bb_;
+    //@}
+};
+
+/** Construct a parallel connection router that uses the specified heap type.
+ * This function is not used, but removing it will result in "undefined reference"
+ * errors since heap type specializations won't get emitted from parallel_connection_router.cpp
+ * without it.
+ * The alternative is moving all ParallelConnectionRouter fn implementations into the header. */
+std::unique_ptr<ConnectionRouterInterface> make_parallel_connection_router(
+    e_heap_type heap_type,
+    const DeviceGrid& grid,
+    const RouterLookahead& router_lookahead,
+    const t_rr_graph_storage& rr_nodes,
+    const RRGraphView* rr_graph,
+    const std::vector<t_rr_rc_data>& rr_rc_data,
+    const vtr::vector<RRSwitchId, t_rr_switch_inf>& rr_switch_inf,
+    vtr::vector<RRNodeId, t_rr_node_route_inf>& rr_node_route_inf,
+    bool is_flat,
+    int multi_queue_num_threads,
+    int multi_queue_num_queues,
+    bool multi_queue_direct_draining);
+
+#endif /* _PARALLEL_CONNECTION_ROUTER_H */
diff --git a/vpr/src/route/partition_tree.cpp b/vpr/src/route/partition_tree.cpp
index 38ee7abc2dd..497f887cf74 100644
--- a/vpr/src/route/partition_tree.cpp
+++ b/vpr/src/route/partition_tree.cpp
@@ -44,7 +44,7 @@ std::unique_ptr<PartitionTreeNode> PartitionTree::build_helper(const Netlist<>&
      * Do this for every step with only given nets, because each cutline takes some nets out
      * of the game, so if we just built a global lookup it wouldn't yield accurate results.
      *
-     * VPR's bounding boxes include the borders (see ConnectionRouter::timing_driven_expand_neighbour())
+     * VPR's bounding boxes include the borders (see SerialConnectionRouter::timing_driven_expand_neighbour())
      * so try to include x=bb.xmax, y=bb.ymax etc. when calculating things. */
     int width = x2 - x1 + 1;
     int height = y2 - y1 + 1;
diff --git a/vpr/src/route/partition_tree.h b/vpr/src/route/partition_tree.h
index 6bf68be04b8..d30d5121492 100644
--- a/vpr/src/route/partition_tree.h
+++ b/vpr/src/route/partition_tree.h
@@ -1,6 +1,6 @@
 #pragma once
 
-#include "connection_router.h"
+#include "serial_connection_router.h"
 #include "netlist_fwd.h"
 #include "router_stats.h"
 
@@ -27,7 +27,7 @@ inline Side operator!(const Side& rhs) {
 }
 
 /** Part of a net in the context of the \ref DecompNetlistRouter. Sinks and routing resources
- * routable/usable by the \ref ConnectionRouter are constrained to ones inside clipped_bb
+ * routable/usable by the \ref SerialConnectionRouter are constrained to ones inside clipped_bb
  * (\see inside_bb()) */
 class VirtualNet {
   public:
diff --git a/vpr/src/route/route_net.tpp b/vpr/src/route/route_net.tpp
index 0e8c4c268a5..1a5715b7341 100644
--- a/vpr/src/route/route_net.tpp
+++ b/vpr/src/route/route_net.tpp
@@ -17,7 +17,7 @@
 
 /** Attempt to route a single net.
  *
- * @param router The ConnectionRouter instance 
+ * @param router The ConnectionRouterType instance
  * @param net_list Input netlist
  * @param net_id
  * @param itry # of iteration
@@ -40,8 +40,8 @@
  * @param should_setup Should we reset/prune the existing route tree first?
  * @param sink_mask Which sinks to route? Assumed all sinks if nullopt, otherwise a mask of [1..num_sinks+1] where set bits request the sink to be routed
  * @return NetResultFlags for this net */
-template<typename ConnectionRouter>
-inline NetResultFlags route_net(ConnectionRouter& router,
+template<typename ConnectionRouterType>
+inline NetResultFlags route_net(ConnectionRouterType& router,
                                 const Netlist<>& net_list,
                                 const ParentNetId& net_id,
                                 int itry,
@@ -140,6 +140,8 @@ inline NetResultFlags route_net(ConnectionRouter& router,
     t_conn_cost_params cost_params;
     cost_params.astar_fac = router_opts.astar_fac;
     cost_params.astar_offset = router_opts.astar_offset;
+    cost_params.post_target_prune_fac = router_opts.post_target_prune_fac;
+    cost_params.post_target_prune_offset = router_opts.post_target_prune_offset;
     cost_params.bend_cost = router_opts.bend_cost;
     cost_params.pres_fac = pres_fac;
     cost_params.delay_budget = ((budgeting_inf.if_set()) ? &conn_delay_budget : nullptr);
@@ -285,8 +287,8 @@ inline NetResultFlags route_net(ConnectionRouter& router,
 
 /** Route to a "virtual sink" in the netlist which corresponds to the start point
  * of the global clock network. */
-template<typename ConnectionRouter>
-inline NetResultFlags pre_route_to_clock_root(ConnectionRouter& router,
+template<typename ConnectionRouterType>
+inline NetResultFlags pre_route_to_clock_root(ConnectionRouterType& router,
                                               ParentNetId net_id,
                                               const Netlist<>& net_list,
                                               RRNodeId sink_node,
@@ -382,7 +384,7 @@ inline NetResultFlags pre_route_to_clock_root(ConnectionRouter& router,
  * In the process, update global pathfinder costs, rr_node_route_inf and extend the global RouteTree
  * for this net.
  *
- * @param router The ConnectionRouter instance 
+ * @param router The ConnectionRouterType instance
  * @param net_list Input netlist
  * @param net_id
  * @param itarget # of this connection in the net (only used for debug output)
@@ -399,8 +401,8 @@ inline NetResultFlags pre_route_to_clock_root(ConnectionRouter& router,
  * @param is_flat
  * @param net_bb Bounding box for the net (Routing resources outside net_bb will not be used)
  * @return NetResultFlags for this sink to be bubbled up through route_net */
-template<typename ConnectionRouter>
-inline NetResultFlags route_sink(ConnectionRouter& router,
+template<typename ConnectionRouterType>
+inline NetResultFlags route_sink(ConnectionRouterType& router,
                                  const Netlist<>& net_list,
                                  ParentNetId net_id,
                                  unsigned itarget,
diff --git a/vpr/src/route/router_delay_profiling.cpp b/vpr/src/route/router_delay_profiling.cpp
index 68fb441a369..257b35d20f6 100644
--- a/vpr/src/route/router_delay_profiling.cpp
+++ b/vpr/src/route/router_delay_profiling.cpp
@@ -88,6 +88,8 @@ bool RouterDelayProfiler::calculate_delay(RRNodeId source_node,
     cost_params.criticality = 1.;
     cost_params.astar_fac = router_opts.router_profiler_astar_fac;
     cost_params.astar_offset = router_opts.astar_offset;
+    cost_params.post_target_prune_fac = router_opts.post_target_prune_fac;
+    cost_params.post_target_prune_offset = router_opts.post_target_prune_offset;
     cost_params.bend_cost = router_opts.bend_cost;
 
     route_budgets budgeting_inf(net_list_, is_flat_);
@@ -163,6 +165,8 @@ vtr::vector<RRNodeId, float> calculate_all_path_delays_from_rr_node(RRNodeId src
     cost_params.criticality = 1.;
     cost_params.astar_fac = router_opts.astar_fac;
     cost_params.astar_offset = router_opts.astar_offset;
+    cost_params.post_target_prune_fac = router_opts.post_target_prune_fac;
+    cost_params.post_target_prune_offset = router_opts.post_target_prune_offset;
     cost_params.bend_cost = router_opts.bend_cost;
     /* This function is called during placement. Thus, the flat routing option should be disabled. */
     //TODO: Placement is run with is_flat=false. However, since is_flat is passed, det_routing_arch should
@@ -174,7 +178,7 @@ vtr::vector<RRNodeId, float> calculate_all_path_delays_from_rr_node(RRNodeId src
                                                   /*segment_inf=*/{},
                                                   is_flat);
 
-    ConnectionRouter<FourAryHeap> router(
+    SerialConnectionRouter<FourAryHeap> router(
         device_ctx.grid,
         *router_lookahead,
         device_ctx.rr_graph.rr_nodes(),
diff --git a/vpr/src/route/router_delay_profiling.h b/vpr/src/route/router_delay_profiling.h
index ca855720d85..f137e143df9 100644
--- a/vpr/src/route/router_delay_profiling.h
+++ b/vpr/src/route/router_delay_profiling.h
@@ -2,7 +2,7 @@
 #define ROUTER_DELAY_PROFILING_H_
 
 #include "vpr_types.h"
-#include "connection_router.h"
+#include "serial_connection_router.h"
 
 #include <vector>
 
@@ -43,7 +43,7 @@ class RouterDelayProfiler {
   private:
     const Netlist<>& net_list_;
     RouterStats router_stats_;
-    ConnectionRouter<FourAryHeap> router_;
+    SerialConnectionRouter<FourAryHeap> router_;
     vtr::NdMatrix<float, 5> min_delays_; // [physical_type_idx][from_layer][to_layer][dx][dy]
     bool is_flat_;
 };
diff --git a/vpr/src/route/serial_connection_router.cpp b/vpr/src/route/serial_connection_router.cpp
new file mode 100644
index 00000000000..3bb2d2b64a2
--- /dev/null
+++ b/vpr/src/route/serial_connection_router.cpp
@@ -0,0 +1,533 @@
+#include "serial_connection_router.h"
+
+#include <algorithm>
+#include "rr_graph.h"
+#include "rr_graph_fwd.h"
+
+/** Used to update router statistics for serial connection router */
+inline void update_serial_router_stats(RouterStats* router_stats,
+                                       bool is_push,
+                                       RRNodeId rr_node_id,
+                                       const RRGraphView* rr_graph);
+
+template<typename Heap>
+void SerialConnectionRouter<Heap>::timing_driven_find_single_shortest_path_from_heap(RRNodeId sink_node,
+                                                                                     const t_conn_cost_params& cost_params,
+                                                                                     const t_bb& bounding_box,
+                                                                                     const t_bb& target_bb) {
+    const auto& device_ctx = g_vpr_ctx.device();
+    auto& route_ctx = g_vpr_ctx.mutable_routing();
+
+    HeapNode cheapest;
+    while (this->heap_.try_pop(cheapest)) {
+        // Pop a new inode with the cheapest total cost in current route tree to be expanded on
+        const auto& [new_total_cost, inode] = cheapest;
+        update_serial_router_stats(this->router_stats_,
+                                   /*is_push=*/false,
+                                   inode,
+                                   this->rr_graph_);
+
+        VTR_LOGV_DEBUG(this->router_debug_, "  Popping node %d (cost: %g)\n",
+                       inode, new_total_cost);
+
+        // Have we found the target?
+        if (inode == sink_node) {
+            // If we're running RCV, the path will be stored in the path_data->path_rr vector
+            // This is then placed into the traceback so that the correct path is returned
+            // TODO: This can be eliminated by modifying the actual traceback function in route_timing
+            if (this->rcv_path_manager.is_enabled()) {
+                this->rcv_path_manager.insert_backwards_path_into_traceback(this->rcv_path_data[inode],
+                                                                            this->rr_node_route_inf_[inode].path_cost,
+                                                                            this->rr_node_route_inf_[inode].backward_path_cost,
+                                                                            route_ctx);
+            }
+            VTR_LOGV_DEBUG(this->router_debug_, "  Found target %8d (%s)\n", inode, describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, inode, this->is_flat_).c_str());
+            break;
+        }
+
+        // If not, keep searching
+        timing_driven_expand_cheapest(inode,
+                                      new_total_cost,
+                                      sink_node,
+                                      cost_params,
+                                      bounding_box,
+                                      target_bb);
+    }
+}
+
+template<typename Heap>
+vtr::vector<RRNodeId, RTExploredNode> SerialConnectionRouter<Heap>::timing_driven_find_all_shortest_paths_from_route_tree(
+    const RouteTreeNode& rt_root,
+    const t_conn_cost_params& cost_params,
+    const t_bb& bounding_box,
+    RouterStats& router_stats,
+    const ConnectionParameters& conn_params) {
+    this->router_stats_ = &router_stats;
+    this->conn_params_ = &conn_params;
+
+    // Add the route tree to the heap with no specific target node
+    RRNodeId target_node = RRNodeId::INVALID();
+    this->add_route_tree_to_heap(rt_root, target_node, cost_params, bounding_box);
+    this->heap_.build_heap(); // via sifting down everything
+
+    auto res = timing_driven_find_all_shortest_paths_from_heap(cost_params, bounding_box);
+    this->heap_.empty_heap();
+
+    return res;
+}
+
+template<typename Heap>
+vtr::vector<RRNodeId, RTExploredNode> SerialConnectionRouter<Heap>::timing_driven_find_all_shortest_paths_from_heap(
+    const t_conn_cost_params& cost_params,
+    const t_bb& bounding_box) {
+    // Since there is no single *target* node this uses Dijkstra's algorithm
+    // with a modified exit condition (runs until heap is empty).
+
+    vtr::vector<RRNodeId, RTExploredNode> cheapest_paths(this->rr_nodes_.size());
+
+    VTR_ASSERT_SAFE(this->heap_.is_valid());
+
+    if (this->heap_.is_empty_heap()) { // No source
+        VTR_LOGV_DEBUG(this->router_debug_, "  Initial heap empty (no source)\n");
+    }
+
+    // Start measuring path search time
+    std::chrono::steady_clock::time_point begin_time = std::chrono::steady_clock::now();
+
+    HeapNode cheapest;
+    while (this->heap_.try_pop(cheapest)) {
+        // Pop a new inode with the cheapest total cost in current route tree to be expanded on
+        const auto& [new_total_cost, inode] = cheapest;
+        update_serial_router_stats(this->router_stats_,
+                                   /*is_push=*/false,
+                                   inode,
+                                   this->rr_graph_);
+
+        VTR_LOGV_DEBUG(this->router_debug_, "  Popping node %d (cost: %g)\n",
+                       inode, new_total_cost);
+
+        // Since we want to find shortest paths to all nodes in the graph
+        // we do not specify a target node.
+        //
+        // By setting the target_node to INVALID in combination with the NoOp router
+        // lookahead we can re-use the node exploration code from the regular router
+        RRNodeId target_node = RRNodeId::INVALID();
+
+        timing_driven_expand_cheapest(inode,
+                                      new_total_cost,
+                                      target_node,
+                                      cost_params,
+                                      bounding_box,
+                                      t_bb());
+
+        if (cheapest_paths[inode].index == RRNodeId::INVALID() || cheapest_paths[inode].total_cost >= new_total_cost) {
+            VTR_LOGV_DEBUG(this->router_debug_, "  Better cost to node %d: %g (was %g)\n", inode, new_total_cost, cheapest_paths[inode].total_cost);
+            // Only the `index` and `prev_edge` fields of `cheapest_paths[inode]` are used after this function returns
+            cheapest_paths[inode].index = inode;
+            cheapest_paths[inode].prev_edge = this->rr_node_route_inf_[inode].prev_edge;
+        } else {
+            VTR_LOGV_DEBUG(this->router_debug_, "  Worse cost to node %d: %g (better %g)\n", inode, new_total_cost, cheapest_paths[inode].total_cost);
+        }
+    }
+
+    // Stop measuring path search time
+    std::chrono::steady_clock::time_point end_time = std::chrono::steady_clock::now();
+    this->path_search_cumulative_time += std::chrono::duration_cast<std::chrono::microseconds>(end_time - begin_time);
+
+    return cheapest_paths;
+}
+
+template<typename Heap>
+void SerialConnectionRouter<Heap>::timing_driven_expand_cheapest(RRNodeId from_node,
+                                                                 float new_total_cost,
+                                                                 RRNodeId target_node,
+                                                                 const t_conn_cost_params& cost_params,
+                                                                 const t_bb& bounding_box,
+                                                                 const t_bb& target_bb) {
+    float best_total_cost = this->rr_node_route_inf_[from_node].path_cost;
+    if (best_total_cost == new_total_cost) {
+        // Explore from this node, since its total cost is exactly the same as
+        // the best total cost ever seen for this node. Otherwise, prune this node
+        // to reduce redundant work (i.e., unnecessary neighbor exploration).
+        // `new_total_cost` is used here as an identifier to detect if the pair
+        // (from_node or inode, new_total_cost) was the most recently pushed
+        // element for the corresponding node.
+        //
+        // Note: For RCV, it often isn't searching for a shortest path; it is
+        // searching for a path in the target delay range. So it might find a
+        // path to node n that has a higher `backward_path_cost` but the `total_cost`
+        // (including expected delay to sink, going through a cost function that
+        // checks that against the target delay) might be lower than the previously
+        // stored value. In that case we want to re-expand the node so long as
+        // it doesn't create a loop. That `this->rcv_path_manager` should store enough
+        // info for us to avoid loops.
+        RTExploredNode current;
+        current.index = from_node;
+        current.backward_path_cost = this->rr_node_route_inf_[from_node].backward_path_cost;
+        current.prev_edge = this->rr_node_route_inf_[from_node].prev_edge;
+        current.R_upstream = this->rr_node_route_inf_[from_node].R_upstream;
+
+        VTR_LOGV_DEBUG(this->router_debug_, "    Better cost to %d\n", from_node);
+        VTR_LOGV_DEBUG(this->router_debug_, "    New total cost: %g\n", new_total_cost);
+        VTR_LOGV_DEBUG(this->router_debug_ && (current.prev_edge != RREdgeId::INVALID()),
+                       "      Setting path costs for associated node %d (from %d edge %zu)\n",
+                       from_node,
+                       static_cast<size_t>(this->rr_graph_->edge_src_node(current.prev_edge)),
+                       static_cast<size_t>(current.prev_edge));
+
+        timing_driven_expand_neighbours(current, cost_params, bounding_box, target_node, target_bb);
+    } else {
+        // Post-heap prune, do not re-explore from the current/new partial path as it
+        // has worse cost than the best partial path to this node found so far
+        VTR_LOGV_DEBUG(this->router_debug_, "    Worse cost to %d\n", from_node);
+        VTR_LOGV_DEBUG(this->router_debug_, "    Old total cost: %g\n", best_total_cost);
+        VTR_LOGV_DEBUG(this->router_debug_, "    New total cost: %g\n", new_total_cost);
+    }
+}
+
+template<typename Heap>
+void SerialConnectionRouter<Heap>::timing_driven_expand_neighbours(const RTExploredNode& current,
+                                                                   const t_conn_cost_params& cost_params,
+                                                                   const t_bb& bounding_box,
+                                                                   RRNodeId target_node,
+                                                                   const t_bb& target_bb) {
+    /* Puts all the rr_nodes adjacent to current on the heap. */
+
+    // For each node associated with the current heap element, expand all of it's neighbors
+    auto edges = this->rr_nodes_.edge_range(current.index);
+
+    // This is a simple prefetch that prefetches:
+    //  - RR node data reachable from this node
+    //  - rr switch data to reach those nodes from this node.
+    //
+    // This code will be a NOP on compiler targets that do not have a
+    // builtin to emit prefetch instructions.
+    //
+    // This code will be a NOP on CPU targets that lack prefetch instructions.
+    // All modern x86 and ARM64 platforms provide prefetch instructions.
+    //
+    // This code delivers ~6-8% reduction in wallclock time when running Titan
+    // benchmarks, and was specifically measured against the gsm_switch and
+    // directrf vtr_reg_weekly running in high effort.
+    //
+    //  - directrf_stratixiv_arch_timing.blif
+    //  - gsm_switch_stratixiv_arch_timing.blif
+    //
+    for (RREdgeId from_edge : edges) {
+        RRNodeId to_node = this->rr_nodes_.edge_sink_node(from_edge);
+        this->rr_nodes_.prefetch_node(to_node);
+
+        int switch_idx = this->rr_nodes_.edge_switch(from_edge);
+        VTR_PREFETCH(&this->rr_switch_inf_[switch_idx], 0, 0);
+    }
+
+    for (RREdgeId from_edge : edges) {
+        RRNodeId to_node = this->rr_nodes_.edge_sink_node(from_edge);
+        timing_driven_expand_neighbour(current,
+                                       from_edge,
+                                       to_node,
+                                       cost_params,
+                                       bounding_box,
+                                       target_node,
+                                       target_bb);
+    }
+}
+
+template<typename Heap>
+void SerialConnectionRouter<Heap>::timing_driven_expand_neighbour(const RTExploredNode& current,
+                                                                  RREdgeId from_edge,
+                                                                  RRNodeId to_node,
+                                                                  const t_conn_cost_params& cost_params,
+                                                                  const t_bb& bounding_box,
+                                                                  RRNodeId target_node,
+                                                                  const t_bb& target_bb) {
+    VTR_ASSERT(bounding_box.layer_max < g_vpr_ctx.device().grid.get_num_layers());
+
+    const RRNodeId& from_node = current.index;
+
+    // BB-pruning
+    // Disable BB-pruning if RCV is enabled, as this can make it harder for circuits with high negative hold slack to resolve this
+    // TODO: Only disable pruning if the net has negative hold slack, maybe go off budgets
+    if (!inside_bb(to_node, bounding_box)
+        && !this->rcv_path_manager.is_enabled()) {
+        VTR_LOGV_DEBUG(this->router_debug_,
+                       "      Pruned expansion of node %d edge %zu -> %d"
+                       " (to node location %d,%d,%d x %d,%d,%d outside of expanded"
+                       " net bounding box %d,%d,%d x %d,%d,%d)\n",
+                       from_node, size_t(from_edge), size_t(to_node),
+                       this->rr_graph_->node_xlow(to_node), this->rr_graph_->node_ylow(to_node), this->rr_graph_->node_layer(to_node),
+                       this->rr_graph_->node_xhigh(to_node), this->rr_graph_->node_yhigh(to_node), this->rr_graph_->node_layer(to_node),
+                       bounding_box.xmin, bounding_box.ymin, bounding_box.layer_min,
+                       bounding_box.xmax, bounding_box.ymax, bounding_box.layer_max);
+        return; /* Node is outside (expanded) bounding box. */
+    }
+
+    /* Prune away IPINs that lead to blocks other than the target one.  Avoids  *
+     * the issue of how to cost them properly so they don't get expanded before *
+     * more promising routes, but makes route-through (via CLBs) impossible.   *
+     * Change this if you want to investigate route-throughs.                   */
+    if (target_node != RRNodeId::INVALID()) {
+        e_rr_type to_type = this->rr_graph_->node_type(to_node);
+        if (to_type == e_rr_type::IPIN) {
+            // Check if this IPIN leads to the target block
+            // IPIN's of the target block should be contained within it's bounding box
+            int to_xlow = this->rr_graph_->node_xlow(to_node);
+            int to_ylow = this->rr_graph_->node_ylow(to_node);
+            int to_layer = this->rr_graph_->node_layer(to_node);
+            int to_xhigh = this->rr_graph_->node_xhigh(to_node);
+            int to_yhigh = this->rr_graph_->node_yhigh(to_node);
+            if (to_xlow < target_bb.xmin
+                || to_ylow < target_bb.ymin
+                || to_xhigh > target_bb.xmax
+                || to_yhigh > target_bb.ymax
+                || to_layer < target_bb.layer_min
+                || to_layer > target_bb.layer_max) {
+                VTR_LOGV_DEBUG(this->router_debug_,
+                               "      Pruned expansion of node %d edge %zu -> %d"
+                               " (to node is IPIN at %d,%d,%d x %d,%d,%d which does not"
+                               " lead to target block %d,%d,%d x %d,%d,%d)\n",
+                               from_node, size_t(from_edge), size_t(to_node),
+                               to_xlow, to_ylow, to_layer,
+                               to_xhigh, to_yhigh, to_layer,
+                               target_bb.xmin, target_bb.ymin, target_bb.layer_min,
+                               target_bb.xmax, target_bb.ymax, target_bb.layer_max);
+                return;
+            }
+        }
+    }
+
+    VTR_LOGV_DEBUG(this->router_debug_, "      Expanding node %d edge %zu -> %d\n",
+                   from_node, size_t(from_edge), size_t(to_node));
+
+    // Check if the node exists in the route tree when RCV is enabled
+    // Other pruning methods have been disabled when RCV is on, so this method is required to prevent "loops" from being created
+    bool node_exists = false;
+    if (this->rcv_path_manager.is_enabled()) {
+        node_exists = this->rcv_path_manager.node_exists_in_tree(this->rcv_path_data[from_node],
+                                                                 to_node);
+    }
+
+    if (!node_exists || !this->rcv_path_manager.is_enabled()) {
+        timing_driven_add_to_heap(cost_params,
+                                  current,
+                                  to_node,
+                                  from_edge,
+                                  target_node);
+    }
+}
+
+template<typename Heap>
+void SerialConnectionRouter<Heap>::timing_driven_add_to_heap(const t_conn_cost_params& cost_params,
+                                                             const RTExploredNode& current,
+                                                             RRNodeId to_node,
+                                                             const RREdgeId from_edge,
+                                                             RRNodeId target_node) {
+    const auto& device_ctx = g_vpr_ctx.device();
+    const RRNodeId& from_node = current.index;
+
+    // Initialize the neighbor RTExploredNode
+    RTExploredNode next;
+    next.R_upstream = current.R_upstream;
+    next.index = to_node;
+    next.prev_edge = from_edge;
+    next.total_cost = std::numeric_limits<float>::infinity(); // Not used directly
+    next.backward_path_cost = current.backward_path_cost;
+
+    // Initialize RCV data struct if needed, otherwise it's set to nullptr
+    this->rcv_path_manager.alloc_path_struct(next.path_data);
+    // path_data variables are initialized to current values
+    if (this->rcv_path_manager.is_enabled() && this->rcv_path_data[from_node]) {
+        next.path_data->backward_cong = this->rcv_path_data[from_node]->backward_cong;
+        next.path_data->backward_delay = this->rcv_path_data[from_node]->backward_delay;
+    }
+
+    this->evaluate_timing_driven_node_costs(&next, cost_params, from_node, target_node);
+
+    float best_total_cost = this->rr_node_route_inf_[to_node].path_cost;
+    float best_back_cost = this->rr_node_route_inf_[to_node].backward_path_cost;
+
+    float new_total_cost = next.total_cost;
+    float new_back_cost = next.backward_path_cost;
+
+    // We need to only expand this node if it is a better path. And we need to
+    // update its `rr_node_route_inf` data as we put it into the heap; there may
+    // be other (previously explored) paths to this node in the heap already,
+    // but they will be pruned when we pop those heap nodes later as we'll see
+    // they have inferior costs to what is in the `rr_node_route_inf` data for
+    // this node. More details can be found from the FPT'24 parallel connection
+    // router paper.
+    //
+    // When RCV is enabled, prune based on the RCV-specific total path cost (see
+    // in `compute_node_cost_using_rcv` in `evaluate_timing_driven_node_costs`)
+    // to allow detours to get better QoR.
+    if ((!this->rcv_path_manager.is_enabled() && best_back_cost > new_back_cost) || (this->rcv_path_manager.is_enabled() && best_total_cost > new_total_cost)) {
+        VTR_LOGV_DEBUG(this->router_debug_, "      Expanding to node %d (%s)\n", to_node,
+                       describe_rr_node(device_ctx.rr_graph,
+                                        device_ctx.grid,
+                                        device_ctx.rr_indexed_data,
+                                        to_node,
+                                        this->is_flat_)
+                           .c_str());
+        VTR_LOGV_DEBUG(this->router_debug_, "        New Total Cost %g New back Cost %g\n", new_total_cost, new_back_cost);
+        //Add node to the heap only if the cost via the current partial path is less than the
+        //best known cost, since there is no reason for the router to expand more expensive paths.
+        //
+        //Pre-heap prune to keep the heap small, by not putting paths which are known to be
+        //sub-optimal (at this point in time) into the heap.
+
+        update_cheapest(next, from_node);
+
+        this->heap_.add_to_heap({new_total_cost, to_node});
+        update_serial_router_stats(this->router_stats_,
+                                   /*is_push=*/true,
+                                   to_node,
+                                   this->rr_graph_);
+
+    } else {
+        VTR_LOGV_DEBUG(this->router_debug_, "      Didn't expand to %d (%s)\n", to_node, describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, to_node, this->is_flat_).c_str());
+        VTR_LOGV_DEBUG(this->router_debug_, "        Prev Total Cost %g Prev back Cost %g \n", best_total_cost, best_back_cost);
+        VTR_LOGV_DEBUG(this->router_debug_, "        New Total Cost %g New back Cost %g \n", new_total_cost, new_back_cost);
+    }
+
+    if (this->rcv_path_manager.is_enabled() && next.path_data != nullptr) {
+        this->rcv_path_manager.free_path_struct(next.path_data);
+    }
+}
+
+template<typename Heap>
+void SerialConnectionRouter<Heap>::add_route_tree_node_to_heap(
+    const RouteTreeNode& rt_node,
+    RRNodeId target_node,
+    const t_conn_cost_params& cost_params,
+    const t_bb& net_bb) {
+    const auto& device_ctx = g_vpr_ctx.device();
+    const RRNodeId inode = rt_node.inode;
+    float backward_path_cost = cost_params.criticality * rt_node.Tdel;
+    float R_upstream = rt_node.R_upstream;
+
+    /* Don't push to heap if not in bounding box: no-op for serial router, important for parallel router */
+    if (!inside_bb(rt_node.inode, net_bb))
+        return;
+
+    // After budgets are loaded, calculate delay cost as described by RCV paper
+    /* R. Fung, V. Betz and W. Chow, "Slack Allocation and Routing to Improve FPGA Timing While
+     * Repairing Short-Path Violations," in IEEE Transactions on Computer-Aided Design of
+     * Integrated Circuits and Systems, vol. 27, no. 4, pp. 686-697, April 2008.*/
+    // float expected_cost = router_lookahead_.get_expected_cost(inode, target_node, cost_params, R_upstream);
+
+    if (!this->rcv_path_manager.is_enabled()) {
+        float expected_cost = this->router_lookahead_.get_expected_cost(inode, target_node, cost_params, R_upstream);
+        float tot_cost = backward_path_cost + cost_params.astar_fac * std::max(0.f, expected_cost - cost_params.astar_offset);
+        VTR_LOGV_DEBUG(this->router_debug_, "  Adding node %8d to heap from init route tree with cost %g (%s)\n",
+                       inode,
+                       tot_cost,
+                       describe_rr_node(device_ctx.rr_graph, device_ctx.grid, device_ctx.rr_indexed_data, inode, this->is_flat_).c_str());
+
+        if (tot_cost > this->rr_node_route_inf_[inode].path_cost) {
+            return;
+        }
+        add_to_mod_list(inode);
+        this->rr_node_route_inf_[inode].path_cost = tot_cost;
+        this->rr_node_route_inf_[inode].prev_edge = RREdgeId::INVALID();
+        this->rr_node_route_inf_[inode].backward_path_cost = backward_path_cost;
+        this->rr_node_route_inf_[inode].R_upstream = R_upstream;
+        this->heap_.push_back({tot_cost, inode});
+    } else {
+        float expected_total_cost = this->compute_node_cost_using_rcv(cost_params, inode, target_node, rt_node.Tdel, 0, R_upstream);
+
+        add_to_mod_list(inode);
+        this->rr_node_route_inf_[inode].path_cost = expected_total_cost;
+        this->rr_node_route_inf_[inode].prev_edge = RREdgeId::INVALID();
+        this->rr_node_route_inf_[inode].backward_path_cost = backward_path_cost;
+        this->rr_node_route_inf_[inode].R_upstream = R_upstream;
+
+        this->rcv_path_manager.alloc_path_struct(this->rcv_path_data[inode]);
+        this->rcv_path_data[inode]->backward_delay = rt_node.Tdel;
+
+        this->heap_.push_back({expected_total_cost, inode});
+    }
+
+    update_serial_router_stats(this->router_stats_,
+                               /*is_push=*/true,
+                               inode,
+                               this->rr_graph_);
+
+    if constexpr (VTR_ENABLE_DEBUG_LOGGING_CONST_EXPR) {
+        this->router_stats_->rt_node_pushes[this->rr_graph_->node_type(inode)]++;
+    }
+}
+
+std::unique_ptr<ConnectionRouterInterface> make_serial_connection_router(e_heap_type heap_type,
+                                                                         const DeviceGrid& grid,
+                                                                         const RouterLookahead& router_lookahead,
+                                                                         const t_rr_graph_storage& rr_nodes,
+                                                                         const RRGraphView* rr_graph,
+                                                                         const std::vector<t_rr_rc_data>& rr_rc_data,
+                                                                         const vtr::vector<RRSwitchId, t_rr_switch_inf>& rr_switch_inf,
+                                                                         vtr::vector<RRNodeId, t_rr_node_route_inf>& rr_node_route_inf,
+                                                                         bool is_flat) {
+    switch (heap_type) {
+        case e_heap_type::BINARY_HEAP:
+            return std::make_unique<SerialConnectionRouter<BinaryHeap>>(
+                grid,
+                router_lookahead,
+                rr_nodes,
+                rr_graph,
+                rr_rc_data,
+                rr_switch_inf,
+                rr_node_route_inf,
+                is_flat);
+        case e_heap_type::FOUR_ARY_HEAP:
+            return std::make_unique<SerialConnectionRouter<FourAryHeap>>(
+                grid,
+                router_lookahead,
+                rr_nodes,
+                rr_graph,
+                rr_rc_data,
+                rr_switch_inf,
+                rr_node_route_inf,
+                is_flat);
+        default:
+            VPR_FATAL_ERROR(VPR_ERROR_ROUTE, "Unknown heap_type %d",
+                            heap_type);
+    }
+}
+
+/** This function is only used for the serial connection router since some
+ * statistic variables in router_stats are not thread-safe for the parallel
+ * connection router. To update router_stats (more precisely heap_pushes/pops)
+ * for parallel connection router, we use the MultiQueue internal statistics
+ * method instead. */
+inline void update_serial_router_stats(RouterStats* router_stats,
+                                       bool is_push,
+                                       RRNodeId rr_node_id,
+                                       const RRGraphView* rr_graph) {
+    if (is_push) {
+        router_stats->heap_pushes++;
+    } else {
+        router_stats->heap_pops++;
+    }
+
+    if constexpr (VTR_ENABLE_DEBUG_LOGGING_CONST_EXPR) {
+        auto node_type = rr_graph->node_type(rr_node_id);
+        VTR_ASSERT(node_type != e_rr_type::NUM_RR_TYPES);
+
+        if (is_inter_cluster_node(*rr_graph, rr_node_id)) {
+            if (is_push) {
+                router_stats->inter_cluster_node_pushes++;
+                router_stats->inter_cluster_node_type_cnt_pushes[node_type]++;
+            } else {
+                router_stats->inter_cluster_node_pops++;
+                router_stats->inter_cluster_node_type_cnt_pops[node_type]++;
+            }
+        } else {
+            if (is_push) {
+                router_stats->intra_cluster_node_pushes++;
+                router_stats->intra_cluster_node_type_cnt_pushes[node_type]++;
+            } else {
+                router_stats->intra_cluster_node_pops++;
+                router_stats->intra_cluster_node_type_cnt_pops[node_type]++;
+            }
+        }
+    }
+}
diff --git a/vpr/src/route/serial_connection_router.h b/vpr/src/route/serial_connection_router.h
new file mode 100644
index 00000000000..2cd23f1460e
--- /dev/null
+++ b/vpr/src/route/serial_connection_router.h
@@ -0,0 +1,255 @@
+#ifndef _SERIAL_CONNECTION_ROUTER_H
+#define _SERIAL_CONNECTION_ROUTER_H
+
+#include "connection_router.h"
+
+#include "d_ary_heap.h"
+
+/**
+ * @class SerialConnectionRouter implements the AIR's serial timing-driven connection router
+ * @details This class routes from some initial set of sources (via the input rt tree) to a
+ * particular sink using single thread.
+ */
+template<typename HeapImplementation>
+class SerialConnectionRouter : public ConnectionRouter<HeapImplementation> {
+  public:
+    SerialConnectionRouter(
+        const DeviceGrid& grid,
+        const RouterLookahead& router_lookahead,
+        const t_rr_graph_storage& rr_nodes,
+        const RRGraphView* rr_graph,
+        const std::vector<t_rr_rc_data>& rr_rc_data,
+        const vtr::vector<RRSwitchId, t_rr_switch_inf>& rr_switch_inf,
+        vtr::vector<RRNodeId, t_rr_node_route_inf>& rr_node_route_inf,
+        bool is_flat)
+        : ConnectionRouter<HeapImplementation>(grid, router_lookahead, rr_nodes, rr_graph, rr_rc_data, rr_switch_inf, rr_node_route_inf, is_flat) {
+    }
+
+    ~SerialConnectionRouter() {
+        VTR_LOG("Serial Connection Router is being destroyed. Time spent on path search: %.3f seconds.\n",
+                std::chrono::duration<float /*convert to seconds by default*/>(this->path_search_cumulative_time).count());
+    }
+
+    /**
+     * @brief Clears the modified list per thread
+     * @note Should be called after reset_path_costs have been called
+     */
+    void clear_modified_rr_node_info() final {
+        this->modified_rr_node_inf_.clear();
+    }
+
+    /**
+     * @brief Resets modified data in rr_node_route_inf based on modified_rr_node_inf
+     */
+    void reset_path_costs() final {
+        // Reset the node info stored in rr_node_route_inf variable
+        ::reset_path_costs(this->modified_rr_node_inf_);
+        // Reset the node (RCV-related) info stored inside the connection router
+        if (this->rcv_path_manager.is_enabled()) {
+            for (const auto& node : this->modified_rr_node_inf_) {
+                this->rcv_path_data[node] = nullptr;
+            }
+        }
+    }
+
+    /**
+     * @brief Enables or disables RCV in connection router
+     * @note Enabling this will utilize extra path structures, as well as
+     * the RCV cost function. Ensure route budgets have been calculated
+     * before enabling this.
+     * @param enable Whether enabling RCV or not
+     */
+    void set_rcv_enabled(bool enable) final {
+        this->rcv_path_manager.set_enabled(enable);
+        if (enable) {
+            this->rcv_path_data.resize(this->rr_node_route_inf_.size());
+        }
+    }
+
+    /**
+     * @brief Finds shortest paths from the route tree rooted at rt_root to all sinks available
+     * @note Unlike timing_driven_route_connection_from_route_tree(), only part of the route tree which
+     * is spatially close to the sink is added to the heap.
+     * @note If cost_params.astar_fac is set to 0, this effectively becomes Dijkstra's algorithm with a
+     * modified exit condition (runs until heap is empty). When using cost_params.astar_fac = 0, for
+     * efficiency the RouterLookahead used should be the NoOpLookahead.
+     * @note This routine is currently used only to generate information that may be helpful in debugging
+     * an architecture.
+     * @param rt_root RouteTreeNode describing the current routing state
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     * @param router_stats Update router statistics
+     * @param conn_params Parameters to guide the routing of the given connection
+     * @return A vector where each element is a reachable sink
+     */
+    vtr::vector<RRNodeId, RTExploredNode> timing_driven_find_all_shortest_paths_from_route_tree(
+        const RouteTreeNode& rt_root,
+        const t_conn_cost_params& cost_params,
+        const t_bb& bounding_box,
+        RouterStats& router_stats,
+        const ConnectionParameters& conn_params) final;
+
+  protected:
+    /**
+     * @brief Marks that data associated with rr_node 'inode' has
+     * been modified, and needs to be reset in reset_path_costs
+     */
+    inline void add_to_mod_list(RRNodeId inode) {
+        if (std::isinf(this->rr_node_route_inf_[inode].path_cost)) {
+            this->modified_rr_node_inf_.push_back(inode);
+        }
+    }
+
+    /**
+     * @brief Updates the route path to the node `cheapest.index`
+     * via the path from `from_node` via `cheapest.prev_edge`
+     */
+    inline void update_cheapest(RTExploredNode& cheapest, const RRNodeId& from_node) {
+        const RRNodeId& inode = cheapest.index;
+        add_to_mod_list(inode);
+        this->rr_node_route_inf_[inode].prev_edge = cheapest.prev_edge;
+        this->rr_node_route_inf_[inode].path_cost = cheapest.total_cost;
+        this->rr_node_route_inf_[inode].backward_path_cost = cheapest.backward_path_cost;
+
+        // Use the already created next path structure pointer when RCV is enabled
+        if (this->rcv_path_manager.is_enabled()) {
+            this->rcv_path_manager.move(this->rcv_path_data[inode], cheapest.path_data);
+
+            this->rcv_path_data[inode]->path_rr = this->rcv_path_data[from_node]->path_rr;
+            this->rcv_path_data[inode]->edge = this->rcv_path_data[from_node]->edge;
+            this->rcv_path_data[inode]->path_rr.push_back(from_node);
+            this->rcv_path_data[inode]->edge.push_back(cheapest.prev_edge);
+        }
+    }
+
+    /**
+     * @brief Finds the single shortest path from current heap to the sink node in the RR graph
+     * @param sink_node Sink node ID to route to
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     * @param target_bb Prune IPINs that lead to blocks other than the target block
+     */
+    void timing_driven_find_single_shortest_path_from_heap(RRNodeId sink_node,
+                                                           const t_conn_cost_params& cost_params,
+                                                           const t_bb& bounding_box,
+                                                           const t_bb& target_bb) final;
+
+    /**
+     * @brief Expands this current node if it is a cheaper path
+     * @param from_node Current node ID being explored
+     * @param new_total_cost Identifier popped from the heap to detect if the element (pair)
+     * (from_node, new_total_cost) was the most recently pushed element for from_node
+     * @param target_node Target node ID to route to
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     * @param target_bb Prune IPINs that lead to blocks other than the target block
+     */
+    void timing_driven_expand_cheapest(
+        RRNodeId from_node,
+        float new_total_cost,
+        RRNodeId target_node,
+        const t_conn_cost_params& cost_params,
+        const t_bb& bounding_box,
+        const t_bb& target_bb);
+
+    /**
+     * @brief Expands each neighbor of the current node in the wave expansion
+     * @param current Current node being explored
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     * @param target_node Target node ID to route to
+     * @param target_bb Prune IPINs that lead to blocks other than the target block
+     */
+    void timing_driven_expand_neighbours(
+        const RTExploredNode& current,
+        const t_conn_cost_params& cost_params,
+        const t_bb& bounding_box,
+        RRNodeId target_node,
+        const t_bb& target_bb);
+
+    /**
+     * @brief Conditionally adds to_node to the router heap (via path from current.index via from_edge)
+     * @note RR nodes outside bounding box specified in bounding_box are not added to the heap.
+     * @param current Current node being explored
+     * @param from_edge Edge between the current node and the neighbor node
+     * @param to_node Neighbor node to be expanded
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     * @param target_node Target node ID to route to
+     * @param target_bb Prune IPINs that lead to blocks other than the target block
+     */
+    void timing_driven_expand_neighbour(
+        const RTExploredNode& current,
+        RREdgeId from_edge,
+        RRNodeId to_node,
+        const t_conn_cost_params& cost_params,
+        const t_bb& bounding_box,
+        RRNodeId target_node,
+        const t_bb& target_bb);
+
+    /**
+     * @brief Adds to_node to the heap, and also adds any nodes which are connected by non-configurable edges
+     * @param cost_params Cost function parameters
+     * @param current Current node being explored
+     * @param to_node Neighbor node to be expanded
+     * @param from_edge Edge between the current node and the neighbor node
+     * @param target_node Target node ID to route to
+     */
+    void timing_driven_add_to_heap(
+        const t_conn_cost_params& cost_params,
+        const RTExploredNode& current,
+        RRNodeId to_node,
+        RREdgeId from_edge,
+        RRNodeId target_node);
+
+    /**
+     * @brief Unconditionally adds rt_node to the heap
+     * @note If you want to respect rt_node->re_expand that is the caller's responsibility.
+     * @todo Consider moving this function into the ConnectionRouter class after checking
+     * the different prune functions of the serial and parallel connection routers.
+     * @param rt_node RouteTreeNode to be added to the heap
+     * @param target_node Target node ID to route to
+     * @param cost_params Cost function parameters
+     * @param net_bb Do not push to heap if not in bounding box
+     */
+    void add_route_tree_node_to_heap(
+        const RouteTreeNode& rt_node,
+        RRNodeId target_node,
+        const t_conn_cost_params& cost_params,
+        const t_bb& net_bb) final;
+
+    /**
+     * @brief Finds shortest paths from current heap to all nodes in the RR graph
+     *
+     * Since there is no single *target* node this uses Dijkstra's algorithm with
+     * a modified exit condition (runs until heap is empty).
+     *
+     * @param cost_params Cost function parameters
+     * @param bounding_box Keep search confined to this bounding box
+     * @return A vector where each element contains the shortest route to a specific sink node
+     */
+    vtr::vector<RRNodeId, RTExploredNode> timing_driven_find_all_shortest_paths_from_heap(
+        const t_conn_cost_params& cost_params,
+        const t_bb& bounding_box) final;
+
+    /** Node IDs of modified nodes in rr_node_route_inf */
+    std::vector<RRNodeId> modified_rr_node_inf_;
+};
+
+/** Construct a serial connection router that uses the specified heap type.
+ * This function is not used, but removing it will result in "undefined reference"
+ * errors since heap type specializations won't get emitted from serial_connection_router.cpp
+ * without it.
+ * The alternative is moving all SerialConnectionRouter fn implementations into the header. */
+std::unique_ptr<ConnectionRouterInterface> make_serial_connection_router(
+    e_heap_type heap_type,
+    const DeviceGrid& grid,
+    const RouterLookahead& router_lookahead,
+    const t_rr_graph_storage& rr_nodes,
+    const RRGraphView* rr_graph,
+    const std::vector<t_rr_rc_data>& rr_rc_data,
+    const vtr::vector<RRSwitchId, t_rr_switch_inf>& rr_switch_inf,
+    vtr::vector<RRNodeId, t_rr_node_route_inf>& rr_node_route_inf,
+    bool is_flat);
+
+#endif /* _SERIAL_CONNECTION_ROUTER_H */
diff --git a/vpr/test/test_connection_router.cpp b/vpr/test/test_connection_router.cpp
deleted file mode 100644
index 138e003b04e..00000000000
--- a/vpr/test/test_connection_router.cpp
+++ /dev/null
@@ -1,194 +0,0 @@
-#include <tuple>
-#include "catch2/catch_test_macros.hpp"
-
-#include "route_net.h"
-#include "rr_graph_fwd.h"
-#include "vpr_api.h"
-#include "vpr_signal_handler.h"
-#include "globals.h"
-#include "net_delay.h"
-#include "place_and_route.h"
-#include "connection_router.h"
-#include "router_delay_profiling.h"
-
-static constexpr const char kArchFile[] = "../../vtr_flow/arch/timing/k6_frac_N10_mem32K_40nm.xml";
-static constexpr int kMaxHops = 10;
-
-namespace {
-
-// Route from source_node to sink_node, returning either the delay, or infinity if unroutable.
-static float do_one_route(RRNodeId source_node,
-                          RRNodeId sink_node,
-                          const t_det_routing_arch& det_routing_arch,
-                          const t_router_opts& router_opts,
-                          const std::vector<t_segment_inf>& segment_inf) {
-    bool is_flat = router_opts.flat_routing;
-    auto& device_ctx = g_vpr_ctx.device();
-
-    RouteTree tree((RRNodeId(source_node)));
-
-    // Update base costs according to fanout and criticality rules.
-    update_rr_base_costs(1);
-
-    // Bounding box includes the entire grid.
-    t_bb bounding_box;
-    bounding_box.xmin = 0;
-    bounding_box.xmax = device_ctx.grid.width() + 1;
-    bounding_box.ymin = 0;
-    bounding_box.ymax = device_ctx.grid.height() + 1;
-    bounding_box.layer_min = 0;
-    bounding_box.layer_max = device_ctx.grid.get_num_layers() - 1;
-
-    t_conn_cost_params cost_params;
-    cost_params.criticality = router_opts.max_criticality;
-    cost_params.astar_fac = router_opts.astar_fac;
-    cost_params.astar_offset = router_opts.astar_offset;
-    cost_params.bend_cost = router_opts.bend_cost;
-
-    const Netlist<>& net_list = is_flat ? (const Netlist<>&)g_vpr_ctx.atom().netlist() : (const Netlist<>&)g_vpr_ctx.clustering().clb_nlist;
-    route_budgets budgeting_inf(net_list, is_flat);
-
-    RouterStats router_stats;
-    auto router_lookahead = make_router_lookahead(det_routing_arch,
-                                                  router_opts.lookahead_type,
-                                                  router_opts.write_router_lookahead,
-                                                  router_opts.read_router_lookahead,
-                                                  segment_inf,
-                                                  is_flat);
-
-    ConnectionRouter<FourAryHeap> router(
-        device_ctx.grid,
-        *router_lookahead,
-        device_ctx.rr_graph.rr_nodes(),
-        &device_ctx.rr_graph,
-        device_ctx.rr_rc_data,
-        device_ctx.rr_graph.rr_switch(),
-        g_vpr_ctx.mutable_routing().rr_node_route_inf,
-        is_flat);
-
-    // Find the cheapest route if possible.
-    bool found_path;
-    RTExploredNode cheapest;
-    ConnectionParameters conn_params(ParentNetId::INVALID(),
-                                     -1,
-                                     false,
-                                     std::unordered_map<RRNodeId, int>());
-    std::tie(found_path, std::ignore, cheapest) = router.timing_driven_route_connection_from_route_tree(tree.root(),
-                                                                                                        sink_node,
-                                                                                                        cost_params,
-                                                                                                        bounding_box,
-                                                                                                        router_stats,
-                                                                                                        conn_params);
-
-    // Default delay is infinity, which indicates that a route was not found.
-    float delay = std::numeric_limits<float>::infinity();
-    if (found_path) {
-        // Check that the route goes to the requested sink.
-        REQUIRE(RRNodeId(cheapest.index) == sink_node);
-
-        // Get the delay
-        vtr::optional<const RouteTreeNode&> rt_node_of_sink;
-        std::tie(std::ignore, rt_node_of_sink) = tree.update_from_heap(&cheapest, OPEN, nullptr, router_opts.flat_routing);
-        delay = rt_node_of_sink.value().Tdel;
-    }
-
-    // Reset for the next router call.
-    router.reset_path_costs();
-    return delay;
-}
-
-// Find a source and a sink by walking edges.
-std::tuple<RRNodeId, RRNodeId, int> find_source_and_sink() {
-    auto& device_ctx = g_vpr_ctx.device();
-    auto& rr_graph = device_ctx.rr_graph;
-
-    // Current longest walk
-    std::tuple<RRNodeId, RRNodeId, int> longest = std::make_tuple(RRNodeId::INVALID(), RRNodeId::INVALID(), 0);
-
-    // Start from each RR node
-    for (size_t id = 0; id < rr_graph.num_nodes(); id++) {
-        RRNodeId source(id), sink = source;
-        for (int hops = 0; hops < kMaxHops; hops++) {
-            // Take the first edge, if there is one.
-            auto edge = rr_graph.node_first_edge(sink);
-            if (edge == rr_graph.node_last_edge(sink)) {
-                break;
-            }
-            sink = rr_graph.rr_nodes().edge_sink_node(edge);
-
-            // If this is the new longest walk, store it.
-            if (hops > std::get<2>(longest)) {
-                longest = std::make_tuple(source, sink, hops);
-            }
-        }
-    }
-    return longest;
-}
-
-// Test that the router can route nets individually, not considering congestion.
-// This is a minimal timing driven routing test that can be used as documentation,
-// and as a starting point for experimentation.
-TEST_CASE("connection_router", "[vpr]") {
-    // Minimal setup
-    auto options = t_options();
-    auto arch = t_arch();
-    auto vpr_setup = t_vpr_setup();
-
-    vpr_install_signal_handler();
-    vpr_initialize_logging();
-
-    // Command line arguments
-    const char* argv[] = {
-        "test_vpr",
-        kArchFile,
-        "wire.eblif",
-        "--route_chan_width", "100"};
-    vpr_init(sizeof(argv) / sizeof(argv[0]), argv, &options, &vpr_setup, &arch);
-
-    vpr_create_device_grid(vpr_setup, arch);
-    vpr_setup_clock_networks(vpr_setup, arch);
-    auto det_routing_arch = &vpr_setup.RoutingArch;
-    auto& router_opts = vpr_setup.RouterOpts;
-    e_graph_type graph_directionality;
-
-    if (router_opts.route_type == GLOBAL) {
-        graph_directionality = e_graph_type::BIDIR;
-    } else {
-        graph_directionality = (det_routing_arch->directionality == BI_DIRECTIONAL ? e_graph_type::BIDIR : e_graph_type::UNIDIR);
-    }
-
-    auto chan_width = init_chan(vpr_setup.RouterOpts.fixed_channel_width, arch.Chans, graph_directionality);
-
-    alloc_routing_structs(
-        chan_width,
-        vpr_setup.RouterOpts,
-        &vpr_setup.RoutingArch,
-        vpr_setup.Segments,
-        arch.directs,
-        router_opts.flat_routing);
-
-    // Find a source and sink to route
-    RRNodeId source_rr_node, sink_rr_node;
-    int hops;
-    std::tie(source_rr_node, sink_rr_node, hops) = find_source_and_sink();
-
-    // Check that the route will be non-trivial
-    REQUIRE(source_rr_node != sink_rr_node);
-    REQUIRE(hops >= 3);
-
-    // Find the route
-    float delay = do_one_route(source_rr_node,
-                               sink_rr_node,
-                               vpr_setup.RoutingArch,
-                               vpr_setup.RouterOpts,
-                               vpr_setup.Segments);
-
-    // Check that a route was found
-    REQUIRE(delay < std::numeric_limits<float>::infinity());
-
-    // Clean up
-    free_routing_structs();
-    vpr_free_all(arch, vpr_setup);
-}
-
-} // namespace
diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/config.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/config.txt
index 1ccd16490d7..b7420ccf41d 100644
--- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/config.txt
+++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/config.txt
@@ -38,3 +38,6 @@ pass_requirements_file=pass_requirements.txt
 script_params_common=-track_memory_usage
 script_params_list_add =
 script_params_list_add = --router_algorithm parallel
+script_params_list_add = --enable_parallel_connection_router on --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
+script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 4 --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
+script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 8 --multi_queue_direct_draining on --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/golden_results.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/golden_results.txt
index 39aa722daca..13c7fadca33 100644
--- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/golden_results.txt
+++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/koios_test/config/golden_results.txt
@@ -1,3 +1,6 @@
- arch	  circuit	  script_params	  vtr_flow_elapsed_time	  vtr_max_mem_stage	  vtr_max_mem	  error	  odin_synth_time	  max_odin_mem	  parmys_synth_time	  max_parmys_mem	  abc_depth	  abc_synth_time	  abc_cec_time	  abc_sec_time	  max_abc_mem	  ace_time	  max_ace_mem	  num_clb	  num_io	  num_memories	  num_mult	  vpr_status	  vpr_revision	  vpr_build_info	  vpr_compiler	  vpr_compiled	  hostname	  rundir	  max_vpr_mem	  num_primary_inputs	  num_primary_outputs	  num_pre_packed_nets	  num_pre_packed_blocks	  num_netlist_clocks	  num_post_packed_nets	  num_post_packed_blocks	  device_width	  device_height	  device_grid_tiles	  device_limiting_resources	  device_name	  pack_mem	  pack_time	  placed_wirelength_est	  total_swap	  accepted_swap	  rejected_swap	  aborted_swap	  place_mem	  place_time	  place_quench_time	  placed_CPD_est	  placed_setup_TNS_est	  placed_setup_WNS_est	  placed_geomean_nonvirtual_intradomain_critical_path_delay_est	  place_delay_matrix_lookup_time	  place_quench_timing_analysis_time	  place_quench_sta_time	  place_total_timing_analysis_time	  place_total_sta_time	  ap_mem	  ap_time	  ap_full_legalizer_mem	  ap_full_legalizer_time	  min_chan_width	  routed_wirelength	  min_chan_width_route_success_iteration	  logic_block_area_total	  logic_block_area_used	  min_chan_width_routing_area_total	  min_chan_width_routing_area_per_tile	  min_chan_width_route_time	  min_chan_width_total_timing_analysis_time	  min_chan_width_total_sta_time	  crit_path_num_rr_graph_nodes	  crit_path_num_rr_graph_edges	  crit_path_collapsed_nodes	  crit_path_routed_wirelength	  crit_path_route_success_iteration	  crit_path_total_nets_routed	  crit_path_total_connections_routed	  crit_path_total_heap_pushes	  crit_path_total_heap_pops	  critical_path_delay	  geomean_nonvirtual_intradomain_critical_path_delay	  setup_TNS	  setup_WNS	  hold_TNS	  hold_WNS	  crit_path_routing_area_total	  crit_path_routing_area_per_tile	  router_lookahead_computation_time	  crit_path_route_time	  crit_path_create_rr_graph_time	  crit_path_create_intra_cluster_rr_graph_time	  crit_path_tile_lookahead_computation_time	  crit_path_router_lookahead_computation_time	  crit_path_total_timing_analysis_time	  crit_path_total_sta_time	 
- k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml	  test.v	  common	  9.38	  vpr	  77.35 MiB	  	  -1	  -1	  0.36	  22280	  1	  0.10	  -1	  -1	  35580	  -1	  -1	  12	  130	  0	  -1	  success	  v8.0.0-12163-g0dba7016b-dirty	  Release VTR_ASSERT_LEVEL=2	  GNU 11.4.0 on Linux-6.8.0-51-generic x86_64	  2025-02-19T17:54:19	  haydar-Precision-5820-Tower	  /home/haydar/vtr-verilog-to-routing	  79208	  130	  40	  596	  562	  1	  356	  185	  14	  14	  196	  dsp_top	  auto	  38.5 MiB	  0.18	  1862	  38583	  13232	  21153	  4198	  77.4 MiB	  0.24	  0.00	  5.12303	  -624.562	  -5.12303	  5.12303	  0.45	  0.00115671	  0.00104931	  0.13445	  0.124537	  -1	  -1	  -1	  -1	  64	  3969	  9	  4.93594e+06	  1.0962e+06	  976140.	  4980.31	  5.77	  0.971386	  0.907233	  31408	  195022	  -1	  3606	  8	  821	  857	  201107	  78801	  4.57723	  4.57723	  -666.876	  -4.57723	  0	  0	  1.23909e+06	  6321.90	  0.06	  0.12	  0.38	  -1	  -1	  0.06	  0.0628918	  0.0600921	 
- k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml	  test.v	  common_--router_algorithm_parallel	  7.77	  vpr	  77.61 MiB	  	  -1	  -1	  0.36	  22212	  1	  0.08	  -1	  -1	  35140	  -1	  -1	  12	  130	  0	  -1	  success	  v8.0.0-12163-g0dba7016b-dirty	  Release VTR_ASSERT_LEVEL=2	  GNU 11.4.0 on Linux-6.8.0-51-generic x86_64	  2025-02-19T17:54:19	  haydar-Precision-5820-Tower	  /home/haydar/vtr-verilog-to-routing	  79472	  130	  40	  596	  562	  1	  356	  185	  14	  14	  196	  dsp_top	  auto	  38.6 MiB	  0.18	  1862	  38583	  13232	  21153	  4198	  77.6 MiB	  0.37	  0.00	  5.12303	  -624.562	  -5.12303	  5.12303	  0.55	  0.00210597	  0.00194049	  0.204405	  0.191731	  -1	  -1	  -1	  -1	  64	  3993	  10	  4.93594e+06	  1.0962e+06	  976140.	  4980.31	  3.98	  0.785401	  0.735059	  31408	  195022	  -1	  3592	  9	  794	  830	  166912	  64369	  4.57723	  4.57723	  -658.916	  -4.57723	  0	  0	  1.23909e+06	  6321.90	  0.07	  0.13	  0.32	  -1	  -1	  0.07	  0.068841	  0.0645644	 
+arch	circuit	script_params	vtr_flow_elapsed_time	vtr_max_mem_stage	vtr_max_mem	error	odin_synth_time	max_odin_mem	parmys_synth_time	max_parmys_mem	abc_depth	abc_synth_time	abc_cec_time	abc_sec_time	max_abc_mem	ace_time	max_ace_mem	num_clb	num_io	num_memories	num_mult	vpr_status	vpr_revision	vpr_build_info	vpr_compiler	vpr_compiled	hostname	rundir	max_vpr_mem	num_primary_inputs	num_primary_outputs	num_pre_packed_nets	num_pre_packed_blocks	num_netlist_clocks	num_post_packed_nets	num_post_packed_blocks	device_width	device_height	device_grid_tiles	device_limiting_resources	device_name	pack_mem	pack_time	initial_placed_wirelength_est	placed_wirelength_est	total_swap	accepted_swap	rejected_swap	aborted_swap	place_mem	place_time	place_quench_time	initial_placed_CPD_est	placed_CPD_est	placed_setup_TNS_est	placed_setup_WNS_est	placed_geomean_nonvirtual_intradomain_critical_path_delay_est	place_delay_matrix_lookup_time	place_quench_timing_analysis_time	place_quench_sta_time	place_total_timing_analysis_time	place_total_sta_time	ap_mem	ap_time	ap_full_legalizer_mem	ap_full_legalizer_time	min_chan_width	routed_wirelength	min_chan_width_route_success_iteration	logic_block_area_total	logic_block_area_used	min_chan_width_routing_area_total	min_chan_width_routing_area_per_tile	min_chan_width_route_time	min_chan_width_total_timing_analysis_time	min_chan_width_total_sta_time	crit_path_num_rr_graph_nodes	crit_path_num_rr_graph_edges	crit_path_collapsed_nodes	crit_path_routed_wirelength	crit_path_route_success_iteration	crit_path_total_nets_routed	crit_path_total_connections_routed	crit_path_total_heap_pushes	crit_path_total_heap_pops	critical_path_delay	geomean_nonvirtual_intradomain_critical_path_delay	setup_TNS	setup_WNS	hold_TNS	hold_WNS	crit_path_routing_area_total	crit_path_routing_area_per_tile	router_lookahead_computation_time	crit_path_route_time	crit_path_create_rr_graph_time	crit_path_create_intra_cluster_rr_graph_time	crit_path_tile_lookahead_computation_time	crit_path_router_lookahead_computation_time	crit_path_total_timing_analysis_time	crit_path_total_sta_time	
+k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml	test.v	common	3.53	vpr	75.00 MiB		-1	-1	0.20	17948	1	0.05	-1	-1	31776	-1	-1	12	130	0	-1	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	76796	130	40	596	562	1	356	185	14	14	196	dsp_top	auto	36.3 MiB	0.10	3253	1906	39109	13750	20961	4398	75.0 MiB	0.14	0.00	5.12303	5.12303	-649.023	-5.12303	5.12303	0.22	0.000972896	0.000903976	0.0756961	0.0702652	-1	-1	-1	-1	82	3601	9	4.93594e+06	1.0962e+06	1.23902e+06	6321.54	1.39	0.353988	0.326407	33448	250998	-1	3687	9	800	863	234820	89374	4.57723	4.57723	-726.049	-4.57723	0	0	1.53308e+06	7821.82	0.04	0.06	0.21	-1	-1	0.04	0.029531	0.027938	
+k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml	test.v	common_--router_algorithm_parallel	3.67	vpr	75.36 MiB		-1	-1	0.19	17948	1	0.05	-1	-1	31772	-1	-1	12	130	0	-1	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	77172	130	40	596	562	1	356	185	14	14	196	dsp_top	auto	36.1 MiB	0.10	3253	1906	39109	13750	20961	4398	75.4 MiB	0.14	0.00	5.12303	5.12303	-649.023	-5.12303	5.12303	0.27	0.00093525	0.000867226	0.074379	0.0690457	-1	-1	-1	-1	82	3585	15	4.93594e+06	1.0962e+06	1.23902e+06	6321.54	1.46	0.361492	0.333317	33448	250998	-1	3715	9	792	819	214644	81314	4.57723	4.57723	-685.291	-4.57723	0	0	1.53308e+06	7821.82	0.04	0.06	0.21	-1	-1	0.04	0.0295768	0.0280427	
+k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml	test.v	common_--enable_parallel_connection_router_on_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	4.39	vpr	74.76 MiB		-1	-1	0.19	17960	1	0.05	-1	-1	32068	-1	-1	12	130	0	-1	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	76556	130	40	596	562	1	356	185	14	14	196	dsp_top	auto	36.1 MiB	0.10	3253	1906	39109	13750	20961	4398	74.8 MiB	0.14	0.00	5.12303	5.12303	-649.023	-5.12303	5.12303	0.22	0.000971322	0.000902249	0.0754843	0.0701709	-1	-1	-1	-1	82	3577	9	4.93594e+06	1.0962e+06	1.23902e+06	6321.54	2.12	0.359108	0.331481	33448	250998	-1	3424	10	688	706	802479	802479	4.57723	4.57723	-678.711	-4.57723	0	0	1.53308e+06	7821.82	0.04	0.21	0.21	-1	-1	0.04	0.0308245	0.0291405	
+k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml	test.v	common_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_4_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	5.39	vpr	75.00 MiB		-1	-1	0.20	17960	1	0.05	-1	-1	32092	-1	-1	12	130	0	-1	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	76796	130	40	596	562	1	356	185	14	14	196	dsp_top	auto	36.3 MiB	0.12	3253	1906	39109	13750	20961	4398	75.0 MiB	0.14	0.00	5.12303	5.12303	-649.023	-5.12303	5.12303	0.22	0.000968255	0.000899263	0.0760549	0.070686	-1	-1	-1	-1	82	3577	9	4.93594e+06	1.0962e+06	1.23902e+06	6321.54	2.88	0.358969	0.330993	33448	250998	-1	3424	10	688	706	796806	796806	4.57723	4.57723	-678.711	-4.57723	0	0	1.53308e+06	7821.82	0.04	0.43	0.21	-1	-1	0.04	0.0319008	0.0301886	
+k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml	test.v	common_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_8_--multi_queue_direct_draining_on_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	4.39	vpr	74.01 MiB		-1	-1	0.19	17564	1	0.05	-1	-1	31772	-1	-1	12	130	0	-1	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	75788	130	40	596	562	1	356	185	14	14	196	dsp_top	auto	34.9 MiB	0.11	3253	1906	39109	13750	20961	4398	74.0 MiB	0.14	0.00	5.12303	5.12303	-649.023	-5.12303	5.12303	0.24	0.00096838	0.000899881	0.0767282	0.0713559	-1	-1	-1	-1	82	3577	9	4.93594e+06	1.0962e+06	1.23902e+06	6321.54	2.13	0.361612	0.33375	33448	250998	-1	3424	10	688	706	789226	323167	4.57723	4.57723	-678.711	-4.57723	0	0	1.53308e+06	7821.82	0.04	0.18	0.21	-1	-1	0.04	0.0309043	0.029241	
diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/config.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/config.txt
index d59d17d4831..19447432645 100644
--- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/config.txt
+++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/config.txt
@@ -27,3 +27,6 @@ pass_requirements_file=pass_requirements.txt
 script_params_common=-track_memory_usage --route_chan_width 100 --max_router_iterations 100 --router_lookahead map --flat_routing on
 script_params_list_add =
 script_params_list_add = --router_algorithm parallel --num_workers 4
+script_params_list_add = --enable_parallel_connection_router on --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
+script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 4 --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
+script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 8 --multi_queue_direct_draining on --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/golden_results.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/golden_results.txt
index e37401667f7..0be72798bb2 100644
--- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/golden_results.txt
+++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_flat_router/config/golden_results.txt
@@ -1,3 +1,6 @@
- arch	  circuit	  script_params	  vtr_flow_elapsed_time	  vtr_max_mem_stage	  vtr_max_mem	  error	  odin_synth_time	  max_odin_mem	  parmys_synth_time	  max_parmys_mem	  abc_depth	  abc_synth_time	  abc_cec_time	  abc_sec_time	  max_abc_mem	  ace_time	  max_ace_mem	  num_clb	  num_io	  num_memories	  num_mult	  vpr_status	  vpr_revision	  vpr_build_info	  vpr_compiler	  vpr_compiled	  hostname	  rundir	  max_vpr_mem	  num_primary_inputs	  num_primary_outputs	  num_pre_packed_nets	  num_pre_packed_blocks	  num_netlist_clocks	  num_post_packed_nets	  num_post_packed_blocks	  device_width	  device_height	  device_grid_tiles	  device_limiting_resources	  device_name	  pack_mem	  pack_time	  placed_wirelength_est	  total_swap	  accepted_swap	  rejected_swap	  aborted_swap	  place_mem	  place_time	  place_quench_time	  placed_CPD_est	  placed_setup_TNS_est	  placed_setup_WNS_est	  placed_geomean_nonvirtual_intradomain_critical_path_delay_est	  place_delay_matrix_lookup_time	  place_quench_timing_analysis_time	  place_quench_sta_time	  place_total_timing_analysis_time	  place_total_sta_time	  ap_mem	  ap_time	  ap_full_legalizer_mem	  ap_full_legalizer_time	  min_chan_width	  routed_wirelength	  min_chan_width_route_success_iteration	  logic_block_area_total	  logic_block_area_used	  min_chan_width_routing_area_total	  min_chan_width_routing_area_per_tile	  min_chan_width_route_time	  min_chan_width_total_timing_analysis_time	  min_chan_width_total_sta_time	  crit_path_num_rr_graph_nodes	  crit_path_num_rr_graph_edges	  crit_path_collapsed_nodes	  crit_path_routed_wirelength	  crit_path_route_success_iteration	  crit_path_total_nets_routed	  crit_path_total_connections_routed	  crit_path_total_heap_pushes	  crit_path_total_heap_pops	  critical_path_delay	  geomean_nonvirtual_intradomain_critical_path_delay	  setup_TNS	  setup_WNS	  hold_TNS	  hold_WNS	  crit_path_routing_area_total	  crit_path_routing_area_per_tile	  router_lookahead_computation_time	  crit_path_route_time	  crit_path_create_rr_graph_time	  crit_path_create_intra_cluster_rr_graph_time	  crit_path_tile_lookahead_computation_time	  crit_path_router_lookahead_computation_time	  crit_path_total_timing_analysis_time	  crit_path_total_sta_time	 
- k6_frac_N10_frac_chain_mem32K_40nm.xml	  spree.v	  common	  11.85	  vpr	  79.08 MiB	  	  -1	  -1	  3.58	  35500	  16	  0.65	  -1	  -1	  38580	  -1	  -1	  60	  45	  3	  1	  success	  v8.0.0-12163-g0dba7016b-dirty	  Release VTR_ASSERT_LEVEL=2	  GNU 11.4.0 on Linux-6.8.0-51-generic x86_64	  2025-02-19T17:54:19	  haydar-Precision-5820-Tower	  /home/haydar/vtr-verilog-to-routing	  80980	  45	  32	  1192	  1151	  1	  782	  141	  14	  14	  196	  memory	  auto	  40.0 MiB	  3.23	  6742	  28689	  8224	  17037	  3428	  79.1 MiB	  0.65	  0.01	  10.7103	  -7090.32	  -10.7103	  10.7103	  0.00	  0.00310914	  0.00279648	  0.314019	  0.270375	  -1	  -1	  -1	  -1	  -1	  10349	  13	  9.20055e+06	  5.27364e+06	  1.47691e+06	  7535.23	  1.50	  0.423776	  0.367585	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	 
- k6_frac_N10_frac_chain_mem32K_40nm.xml	  spree.v	  common_--router_algorithm_parallel_--num_workers_4	  12.82	  vpr	  78.98 MiB	  	  -1	  -1	  3.48	  35500	  16	  0.73	  -1	  -1	  38088	  -1	  -1	  60	  45	  3	  1	  success	  v8.0.0-12163-g0dba7016b-dirty	  Release VTR_ASSERT_LEVEL=2	  GNU 11.4.0 on Linux-6.8.0-51-generic x86_64	  2025-02-19T17:54:19	  haydar-Precision-5820-Tower	  /home/haydar/vtr-verilog-to-routing	  80880	  45	  32	  1192	  1151	  1	  782	  141	  14	  14	  196	  memory	  auto	  40.1 MiB	  3.28	  6742	  28689	  8224	  17037	  3428	  79.0 MiB	  0.59	  0.01	  10.7103	  -7090.32	  -10.7103	  10.7103	  0.00	  0.00230907	  0.0018852	  0.209392	  0.171163	  -1	  -1	  -1	  -1	  -1	  10313	  15	  9.20055e+06	  5.27364e+06	  1.47691e+06	  7535.23	  2.42	  0.342057	  0.287674	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	 
+arch	circuit	script_params	vtr_flow_elapsed_time	vtr_max_mem_stage	vtr_max_mem	error	odin_synth_time	max_odin_mem	parmys_synth_time	max_parmys_mem	abc_depth	abc_synth_time	abc_cec_time	abc_sec_time	max_abc_mem	ace_time	max_ace_mem	num_clb	num_io	num_memories	num_mult	vpr_status	vpr_revision	vpr_build_info	vpr_compiler	vpr_compiled	hostname	rundir	max_vpr_mem	num_primary_inputs	num_primary_outputs	num_pre_packed_nets	num_pre_packed_blocks	num_netlist_clocks	num_post_packed_nets	num_post_packed_blocks	device_width	device_height	device_grid_tiles	device_limiting_resources	device_name	pack_mem	pack_time	initial_placed_wirelength_est	placed_wirelength_est	total_swap	accepted_swap	rejected_swap	aborted_swap	place_mem	place_time	place_quench_time	initial_placed_CPD_est	placed_CPD_est	placed_setup_TNS_est	placed_setup_WNS_est	placed_geomean_nonvirtual_intradomain_critical_path_delay_est	place_delay_matrix_lookup_time	place_quench_timing_analysis_time	place_quench_sta_time	place_total_timing_analysis_time	place_total_sta_time	ap_mem	ap_time	ap_full_legalizer_mem	ap_full_legalizer_time	min_chan_width	routed_wirelength	min_chan_width_route_success_iteration	logic_block_area_total	logic_block_area_used	min_chan_width_routing_area_total	min_chan_width_routing_area_per_tile	min_chan_width_route_time	min_chan_width_total_timing_analysis_time	min_chan_width_total_sta_time	crit_path_num_rr_graph_nodes	crit_path_num_rr_graph_edges	crit_path_collapsed_nodes	crit_path_routed_wirelength	crit_path_route_success_iteration	crit_path_total_nets_routed	crit_path_total_connections_routed	crit_path_total_heap_pushes	crit_path_total_heap_pops	critical_path_delay	geomean_nonvirtual_intradomain_critical_path_delay	setup_TNS	setup_WNS	hold_TNS	hold_WNS	crit_path_routing_area_total	crit_path_routing_area_per_tile	router_lookahead_computation_time	crit_path_route_time	crit_path_create_rr_graph_time	crit_path_create_intra_cluster_rr_graph_time	crit_path_tile_lookahead_computation_time	crit_path_router_lookahead_computation_time	crit_path_total_timing_analysis_time	crit_path_total_sta_time	
+k6_frac_N10_frac_chain_mem32K_40nm.xml	spree.v	common	6.51	vpr	78.23 MiB		-1	-1	1.74	32316	16	0.38	-1	-1	34724	-1	-1	60	45	3	1	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	80104	45	32	1192	1151	1	782	141	14	14	196	memory	auto	39.9 MiB	1.74	9794	6883	28689	8164	16986	3539	78.2 MiB	0.35	0.00	11.8719	10.9558	-7219.74	-10.9558	10.9558	0.00	0.00180615	0.00158429	0.162884	0.143795	-1	-1	-1	-1	-1	10585	12	9.20055e+06	5.27364e+06	1.47691e+06	7535.23	1.05	0.208714	0.183536	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
+k6_frac_N10_frac_chain_mem32K_40nm.xml	spree.v	common_--router_algorithm_parallel_--num_workers_4	7.03	vpr	77.86 MiB		-1	-1	1.73	32316	16	0.37	-1	-1	34728	-1	-1	60	45	3	1	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	79728	45	32	1192	1151	1	782	141	14	14	196	memory	auto	39.6 MiB	1.89	9794	6883	28689	8164	16986	3539	77.9 MiB	0.44	0.01	11.8719	10.9558	-7219.74	-10.9558	10.9558	0.00	0.00273275	0.00231808	0.222874	0.195896	-1	-1	-1	-1	-1	10620	13	9.20055e+06	5.27364e+06	1.47691e+06	7535.23	1.27	0.289534	0.251628	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
+k6_frac_N10_frac_chain_mem32K_40nm.xml	spree.v	common_--enable_parallel_connection_router_on_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	10.17	vpr	78.52 MiB		-1	-1	1.76	32332	16	0.37	-1	-1	34728	-1	-1	60	45	3	1	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	80400	45	32	1192	1151	1	782	141	14	14	196	memory	auto	40.3 MiB	1.70	9794	6883	28689	8164	16986	3539	78.5 MiB	0.35	0.00	11.8719	10.9558	-7219.74	-10.9558	10.9558	0.00	0.00182918	0.00160106	0.163945	0.144335	-1	-1	-1	-1	-1	10536	11	9.20055e+06	5.27364e+06	1.47691e+06	7535.23	4.73	0.214342	0.188032	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
+k6_frac_N10_frac_chain_mem32K_40nm.xml	spree.v	common_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_4_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	13.04	vpr	78.43 MiB		-1	-1	1.72	31948	16	0.38	-1	-1	34428	-1	-1	60	45	3	1	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	80312	45	32	1192	1151	1	782	141	14	14	196	memory	auto	40.3 MiB	1.72	9794	6883	28689	8164	16986	3539	78.4 MiB	0.35	0.00	11.8719	10.9558	-7219.74	-10.9558	10.9558	0.00	0.00182981	0.00160205	0.163671	0.144301	-1	-1	-1	-1	-1	10536	11	9.20055e+06	5.27364e+06	1.47691e+06	7535.23	7.56	0.218898	0.192942	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
+k6_frac_N10_frac_chain_mem32K_40nm.xml	spree.v	common_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_8_--multi_queue_direct_draining_on_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	9.13	vpr	78.43 MiB		-1	-1	1.78	32312	16	0.37	-1	-1	34724	-1	-1	60	45	3	1	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	80316	45	32	1192	1151	1	782	141	14	14	196	memory	auto	40.3 MiB	1.69	9794	6883	28689	8164	16986	3539	78.4 MiB	0.35	0.00	11.8719	10.9558	-7219.74	-10.9558	10.9558	0.00	0.00182715	0.00160038	0.16255	0.143378	-1	-1	-1	-1	-1	10536	11	9.20055e+06	5.27364e+06	1.47691e+06	7535.23	3.64	0.211401	0.186214	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/config.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/config.txt
index dbceb44a4dc..1a19286d997 100644
--- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/config.txt
+++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/config.txt
@@ -27,3 +27,6 @@ pass_requirements_file=pass_requirements_multiclock.txt
 script_params_common=-starting_stage vpr -sdc_file tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/multiclock.sdc
 script_params_list_add =
 script_params_list_add = --router_algorithm parallel --num_workers 4
+script_params_list_add = --enable_parallel_connection_router on --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
+script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 4 --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
+script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 8 --multi_queue_direct_draining on --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/golden_results.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/golden_results.txt
index 7e566048732..266c7161ad8 100644
--- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/golden_results.txt
+++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_multiclock/config/golden_results.txt
@@ -1,3 +1,6 @@
- arch	  circuit	  script_params	  crit_path_delay_mcw	  clk_to_clk_cpd	  clk_to_clk2_cpd	  clk_to_input_cpd	  clk_to_output_cpd	  clk2_to_clk2_cpd	  clk2_to_clk_cpd	  clk2_to_input_cpd	  clk2_to_output_cpd	  input_to_input_cpd	  input_to_clk_cpd	  input_to_clk2_cpd	  input_to_output_cpd	  output_to_output_cpd	  output_to_clk_cpd	  output_to_clk2_cpd	  output_to_input_cpd	  clk_to_clk_setup_slack	  clk_to_clk2_setup_slack	  clk_to_input_setup_slack	  clk_to_output_setup_slack	  clk2_to_clk2_setup_slack	  clk2_to_clk_setup_slack	  clk2_to_input_setup_slack	  clk2_to_output_setup_slack	  input_to_input_setup_slack	  input_to_clk_setup_slack	  input_to_clk2_setup_slack	  input_to_output_setup_slack	  output_to_output_setup_slack	  output_to_clk_setup_slack	  output_to_clk2_setup_slack	  output_to_input_setup_slack	  clk_to_clk_hold_slack	  clk_to_clk2_hold_slack	  clk_to_input_hold_slack	  clk_to_output_hold_slack	  clk2_to_clk2_hold_slack	  clk2_to_clk_hold_slack	  clk2_to_input_hold_slack	  clk2_to_output_hold_slack	  input_to_input_hold_slack	  input_to_clk_hold_slack	  input_to_clk2_hold_slack	  input_to_output_hold_slack	  output_to_output_hold_slack	  output_to_clk_hold_slack	  output_to_clk2_hold_slack	  output_to_input_hold_slack	 
- k6_frac_N10_mem32K_40nm.xml	  multiclock.blif	  common	  1.59919	  0.595	  0.841581	  -1	  -1	  0.57	  0.814813	  -1	  1.59919	  -1	  1.1662	  -1	  1.8371	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  0.243	  1.71958	  -1	  -1	  0.268	  3.24281	  -1	  1.44782	  -1	  3.4042	  -1	  -1.40928	  -1	  -1	  -1	  -1	 
- k6_frac_N10_mem32K_40nm.xml	  multiclock.blif	  common_--router_algorithm_parallel_--num_workers_4	  1.59919	  0.595	  0.841581	  -1	  -1	  0.57	  0.814813	  -1	  1.59919	  -1	  1.14847	  -1	  1.95678	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  0.243	  1.71958	  -1	  -1	  0.268	  3.24281	  -1	  1.44782	  -1	  3.38647	  -1	  -1.28959	  -1	  -1	  -1	  -1	 
+arch	circuit	script_params	crit_path_delay_mcw	clk_to_clk_cpd	clk_to_clk2_cpd	clk_to_input_cpd	clk_to_output_cpd	clk2_to_clk2_cpd	clk2_to_clk_cpd	clk2_to_input_cpd	clk2_to_output_cpd	input_to_input_cpd	input_to_clk_cpd	input_to_clk2_cpd	input_to_output_cpd	output_to_output_cpd	output_to_clk_cpd	output_to_clk2_cpd	output_to_input_cpd	clk_to_clk_setup_slack	clk_to_clk2_setup_slack	clk_to_input_setup_slack	clk_to_output_setup_slack	clk2_to_clk2_setup_slack	clk2_to_clk_setup_slack	clk2_to_input_setup_slack	clk2_to_output_setup_slack	input_to_input_setup_slack	input_to_clk_setup_slack	input_to_clk2_setup_slack	input_to_output_setup_slack	output_to_output_setup_slack	output_to_clk_setup_slack	output_to_clk2_setup_slack	output_to_input_setup_slack	clk_to_clk_hold_slack	clk_to_clk2_hold_slack	clk_to_input_hold_slack	clk_to_output_hold_slack	clk2_to_clk2_hold_slack	clk2_to_clk_hold_slack	clk2_to_input_hold_slack	clk2_to_output_hold_slack	input_to_input_hold_slack	input_to_clk_hold_slack	input_to_clk2_hold_slack	input_to_output_hold_slack	output_to_output_hold_slack	output_to_clk_hold_slack	output_to_clk2_hold_slack	output_to_input_hold_slack	
+k6_frac_N10_mem32K_40nm.xml	multiclock.blif	common	1.59919	0.595	0.841581	-1	-1	0.57	0.814813	-1	1.59919	-1	1.1662	-1	1.8371	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	0.243	1.71958	-1	-1	0.268	3.24281	-1	1.44782	-1	3.4042	-1	-1.40928	-1	-1	-1	-1	
+k6_frac_N10_mem32K_40nm.xml	multiclock.blif	common_--router_algorithm_parallel_--num_workers_4	1.59919	0.595	0.841581	-1	-1	0.57	0.814813	-1	1.59919	-1	1.14847	-1	1.95678	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	0.243	1.71958	-1	-1	0.268	3.24281	-1	1.44782	-1	3.38647	-1	-1.28959	-1	-1	-1	-1	
+k6_frac_N10_mem32K_40nm.xml	multiclock.blif	common_--enable_parallel_connection_router_on_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	1.59919	0.595	0.841581	-1	-1	0.57	0.814813	-1	1.59919	-1	1.14847	-1	1.95678	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	0.243	1.71958	-1	-1	0.268	3.24281	-1	1.44782	-1	3.38647	-1	-1.28959	-1	-1	-1	-1	
+k6_frac_N10_mem32K_40nm.xml	multiclock.blif	common_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_4_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	1.59919	0.595	0.841581	-1	-1	0.57	0.814813	-1	1.59919	-1	1.14847	-1	1.95678	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	0.243	1.71958	-1	-1	0.268	3.24281	-1	1.44782	-1	3.38647	-1	-1.28959	-1	-1	-1	-1	
+k6_frac_N10_mem32K_40nm.xml	multiclock.blif	common_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_8_--multi_queue_direct_draining_on_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	1.59919	0.595	0.841581	-1	-1	0.57	0.814813	-1	1.59919	-1	1.14847	-1	1.95678	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	0.243	1.71958	-1	-1	0.268	3.24281	-1	1.44782	-1	3.38647	-1	-1.28959	-1	-1	-1	-1	
diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/config.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/config.txt
index dac263af64c..0494e386027 100644
--- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/config.txt
+++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/config.txt
@@ -27,3 +27,6 @@ pass_requirements_file=pass_requirements.txt
 script_params_common = -track_memory_usage
 script_params_list_add =
 script_params_list_add = --router_algorithm parallel --num_workers 4
+script_params_list_add = --enable_parallel_connection_router on --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
+script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 4 --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
+script_params_list_add = --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 8 --multi_queue_direct_draining on --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/golden_results.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/golden_results.txt
index b003134057c..6fffcd3b3ab 100644
--- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/golden_results.txt
+++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing/config/golden_results.txt
@@ -1,3 +1,6 @@
- arch	  circuit	  script_params	  vtr_flow_elapsed_time	  vtr_max_mem_stage	  vtr_max_mem	  error	  odin_synth_time	  max_odin_mem	  parmys_synth_time	  max_parmys_mem	  abc_depth	  abc_synth_time	  abc_cec_time	  abc_sec_time	  max_abc_mem	  ace_time	  max_ace_mem	  num_clb	  num_io	  num_memories	  num_mult	  vpr_status	  vpr_revision	  vpr_build_info	  vpr_compiler	  vpr_compiled	  hostname	  rundir	  max_vpr_mem	  num_primary_inputs	  num_primary_outputs	  num_pre_packed_nets	  num_pre_packed_blocks	  num_netlist_clocks	  num_post_packed_nets	  num_post_packed_blocks	  device_width	  device_height	  device_grid_tiles	  device_limiting_resources	  device_name	  pack_mem	  pack_time	  placed_wirelength_est	  total_swap	  accepted_swap	  rejected_swap	  aborted_swap	  place_mem	  place_time	  place_quench_time	  placed_CPD_est	  placed_setup_TNS_est	  placed_setup_WNS_est	  placed_geomean_nonvirtual_intradomain_critical_path_delay_est	  place_delay_matrix_lookup_time	  place_quench_timing_analysis_time	  place_quench_sta_time	  place_total_timing_analysis_time	  place_total_sta_time	  ap_mem	  ap_time	  ap_full_legalizer_mem	  ap_full_legalizer_time	  min_chan_width	  routed_wirelength	  min_chan_width_route_success_iteration	  logic_block_area_total	  logic_block_area_used	  min_chan_width_routing_area_total	  min_chan_width_routing_area_per_tile	  min_chan_width_route_time	  min_chan_width_total_timing_analysis_time	  min_chan_width_total_sta_time	  crit_path_num_rr_graph_nodes	  crit_path_num_rr_graph_edges	  crit_path_collapsed_nodes	  crit_path_routed_wirelength	  crit_path_route_success_iteration	  crit_path_total_nets_routed	  crit_path_total_connections_routed	  crit_path_total_heap_pushes	  crit_path_total_heap_pops	  critical_path_delay	  geomean_nonvirtual_intradomain_critical_path_delay	  setup_TNS	  setup_WNS	  hold_TNS	  hold_WNS	  crit_path_routing_area_total	  crit_path_routing_area_per_tile	  router_lookahead_computation_time	  crit_path_route_time	  crit_path_create_rr_graph_time	  crit_path_create_intra_cluster_rr_graph_time	  crit_path_tile_lookahead_computation_time	  crit_path_router_lookahead_computation_time	  crit_path_total_timing_analysis_time	  crit_path_total_sta_time	 
- k6_frac_N10_mem32K_40nm.xml	  ch_intrinsics.v	  common	  2.63	  vpr	  68.02 MiB	  	  -1	  -1	  0.39	  22168	  3	  0.11	  -1	  -1	  36800	  -1	  -1	  68	  99	  1	  0	  success	  v8.0.0-12163-g0dba7016b-dirty	  Release VTR_ASSERT_LEVEL=2	  GNU 11.4.0 on Linux-6.8.0-51-generic x86_64	  2025-02-19T17:54:19	  haydar-Precision-5820-Tower	  /home/haydar/vtr-verilog-to-routing	  69656	  99	  130	  344	  474	  1	  227	  298	  12	  12	  144	  clb	  auto	  28.7 MiB	  0.20	  673	  63978	  19550	  30341	  14087	  68.0 MiB	  0.23	  0.00	  1.86472	  -118.834	  -1.86472	  1.86472	  0.15	  0.000594963	  0.000540506	  0.0732034	  0.0668337	  -1	  -1	  -1	  -1	  38	  1389	  12	  5.66058e+06	  4.21279e+06	  319130.	  2216.18	  0.54	  0.213559	  0.195205	  12522	  62564	  -1	  1116	  11	  409	  682	  22304	  6997	  1.90702	  1.90702	  -133.281	  -1.90702	  -1.20917	  -0.320482	  406292.	  2821.48	  0.02	  0.04	  0.08	  -1	  -1	  0.02	  0.0300207	  0.027912	 
- k6_frac_N10_mem32K_40nm.xml	  ch_intrinsics.v	  common_--router_algorithm_parallel_--num_workers_4	  2.86	  vpr	  68.12 MiB	  	  -1	  -1	  0.35	  22168	  3	  0.11	  -1	  -1	  36740	  -1	  -1	  68	  99	  1	  0	  success	  v8.0.0-12163-g0dba7016b-dirty	  Release VTR_ASSERT_LEVEL=2	  GNU 11.4.0 on Linux-6.8.0-51-generic x86_64	  2025-02-19T17:54:19	  haydar-Precision-5820-Tower	  /home/haydar/vtr-verilog-to-routing	  69760	  99	  130	  344	  474	  1	  227	  298	  12	  12	  144	  clb	  auto	  28.7 MiB	  0.20	  673	  63978	  19550	  30341	  14087	  68.1 MiB	  0.27	  0.00	  1.86472	  -118.834	  -1.86472	  1.86472	  0.21	  0.000644886	  0.000574461	  0.100184	  0.0946805	  -1	  -1	  -1	  -1	  38	  1379	  12	  5.66058e+06	  4.21279e+06	  319130.	  2216.18	  0.64	  0.202724	  0.187418	  12522	  62564	  -1	  1115	  10	  390	  630	  21561	  6939	  1.90702	  1.90702	  -131.117	  -1.90702	  -1.20917	  -0.320482	  406292.	  2821.48	  0.02	  0.04	  0.10	  -1	  -1	  0.02	  0.021384	  0.0193317	 
+arch	circuit	script_params	vtr_flow_elapsed_time	vtr_max_mem_stage	vtr_max_mem	error	odin_synth_time	max_odin_mem	parmys_synth_time	max_parmys_mem	abc_depth	abc_synth_time	abc_cec_time	abc_sec_time	max_abc_mem	ace_time	max_ace_mem	num_clb	num_io	num_memories	num_mult	vpr_status	vpr_revision	vpr_build_info	vpr_compiler	vpr_compiled	hostname	rundir	max_vpr_mem	num_primary_inputs	num_primary_outputs	num_pre_packed_nets	num_pre_packed_blocks	num_netlist_clocks	num_post_packed_nets	num_post_packed_blocks	device_width	device_height	device_grid_tiles	device_limiting_resources	device_name	pack_mem	pack_time	initial_placed_wirelength_est	placed_wirelength_est	total_swap	accepted_swap	rejected_swap	aborted_swap	place_mem	place_time	place_quench_time	initial_placed_CPD_est	placed_CPD_est	placed_setup_TNS_est	placed_setup_WNS_est	placed_geomean_nonvirtual_intradomain_critical_path_delay_est	place_delay_matrix_lookup_time	place_quench_timing_analysis_time	place_quench_sta_time	place_total_timing_analysis_time	place_total_sta_time	ap_mem	ap_time	ap_full_legalizer_mem	ap_full_legalizer_time	min_chan_width	routed_wirelength	min_chan_width_route_success_iteration	logic_block_area_total	logic_block_area_used	min_chan_width_routing_area_total	min_chan_width_routing_area_per_tile	min_chan_width_route_time	min_chan_width_total_timing_analysis_time	min_chan_width_total_sta_time	crit_path_num_rr_graph_nodes	crit_path_num_rr_graph_edges	crit_path_collapsed_nodes	crit_path_routed_wirelength	crit_path_route_success_iteration	crit_path_total_nets_routed	crit_path_total_connections_routed	crit_path_total_heap_pushes	crit_path_total_heap_pops	critical_path_delay	geomean_nonvirtual_intradomain_critical_path_delay	setup_TNS	setup_WNS	hold_TNS	hold_WNS	crit_path_routing_area_total	crit_path_routing_area_per_tile	router_lookahead_computation_time	crit_path_route_time	crit_path_create_rr_graph_time	crit_path_create_intra_cluster_rr_graph_time	crit_path_tile_lookahead_computation_time	crit_path_router_lookahead_computation_time	crit_path_total_timing_analysis_time	crit_path_total_sta_time	
+k6_frac_N10_mem32K_40nm.xml	ch_intrinsics.v	common	1.73	vpr	66.23 MiB		-1	-1	0.21	18848	3	0.07	-1	-1	33068	-1	-1	68	99	1	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	67824	99	130	344	474	1	227	298	12	12	144	clb	auto	26.4 MiB	0.11	1695	684	72933	23047	34243	15643	66.2 MiB	0.13	0.00	1.98228	1.86362	-118.513	-1.86362	1.86362	0.10	0.000568632	0.000529819	0.0441667	0.0411462	-1	-1	-1	-1	38	1437	13	5.66058e+06	4.21279e+06	319130.	2216.18	0.33	0.158129	0.144291	12522	62564	-1	1141	11	437	710	29360	10219	1.94502	1.94502	-130.926	-1.94502	-0.717819	-0.29768	406292.	2821.48	0.01	0.03	0.04	-1	-1	0.01	0.0184928	0.0172512	
+k6_frac_N10_mem32K_40nm.xml	ch_intrinsics.v	common_--router_algorithm_parallel_--num_workers_4	1.71	vpr	66.23 MiB		-1	-1	0.22	18852	3	0.07	-1	-1	32740	-1	-1	68	99	1	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	67816	99	130	344	474	1	227	298	12	12	144	clb	auto	26.8 MiB	0.11	1695	684	72933	23047	34243	15643	66.2 MiB	0.13	0.00	1.98228	1.86362	-118.513	-1.86362	1.86362	0.10	0.000612619	0.000577396	0.0464401	0.0430253	-1	-1	-1	-1	38	1420	13	5.66058e+06	4.21279e+06	319130.	2216.18	0.30	0.130527	0.117061	12522	62564	-1	1150	9	446	701	30426	10498	1.94502	1.94502	-131.108	-1.94502	-0.67939	-0.29768	406292.	2821.48	0.01	0.03	0.04	-1	-1	0.01	0.0159542	0.0144078	
+k6_frac_N10_mem32K_40nm.xml	ch_intrinsics.v	common_--enable_parallel_connection_router_on_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	1.84	vpr	66.23 MiB		-1	-1	0.22	18176	3	0.07	-1	-1	33108	-1	-1	68	99	1	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	67820	99	130	344	474	1	227	298	12	12	144	clb	auto	26.4 MiB	0.11	1695	684	72933	23047	34243	15643	66.2 MiB	0.13	0.00	1.98228	1.86362	-118.513	-1.86362	1.86362	0.10	0.000564449	0.000527167	0.0443768	0.0414183	-1	-1	-1	-1	38	1391	8	5.66058e+06	4.21279e+06	319130.	2216.18	0.38	0.152118	0.139015	12522	62564	-1	1128	9	408	669	127450	127450	1.94524	1.94524	-129.851	-1.94524	-1.0383	-0.320482	406292.	2821.48	0.01	0.04	0.04	-1	-1	0.01	0.0159569	0.0149713	
+k6_frac_N10_mem32K_40nm.xml	ch_intrinsics.v	common_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_4_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	1.94	vpr	66.23 MiB		-1	-1	0.22	18868	3	0.07	-1	-1	33116	-1	-1	68	99	1	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	67824	99	130	344	474	1	227	298	12	12	144	clb	auto	26.8 MiB	0.11	1695	684	72933	23047	34243	15643	66.2 MiB	0.13	0.00	1.98228	1.86362	-118.513	-1.86362	1.86362	0.10	0.000582283	0.000543802	0.0455405	0.0424991	-1	-1	-1	-1	38	1391	8	5.66058e+06	4.21279e+06	319130.	2216.18	0.47	0.157265	0.143773	12522	62564	-1	1128	9	408	669	125250	125250	1.94524	1.94524	-129.851	-1.94524	-1.0383	-0.320482	406292.	2821.48	0.01	0.09	0.04	-1	-1	0.01	0.0167922	0.015747	
+k6_frac_N10_mem32K_40nm.xml	ch_intrinsics.v	common_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_8_--multi_queue_direct_draining_on_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	1.89	vpr	66.23 MiB		-1	-1	0.22	18848	3	0.07	-1	-1	33096	-1	-1	68	99	1	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	67820	99	130	344	474	1	227	298	12	12	144	clb	auto	26.8 MiB	0.11	1695	684	72933	23047	34243	15643	66.2 MiB	0.14	0.00	1.98228	1.86362	-118.513	-1.86362	1.86362	0.10	0.000581227	0.000543053	0.0461546	0.0430967	-1	-1	-1	-1	38	1391	8	5.66058e+06	4.21279e+06	319130.	2216.18	0.44	0.161183	0.147677	12522	62564	-1	1128	9	408	669	126593	45250	1.94524	1.94524	-129.851	-1.94524	-1.0383	-0.320482	406292.	2821.48	0.01	0.07	0.04	-1	-1	0.01	0.0170636	0.0159891	
diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing_update_type/config/config.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing_update_type/config/config.txt
index 17b20f60f24..0b366a2736d 100644
--- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing_update_type/config/config.txt
+++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing_update_type/config/config.txt
@@ -31,3 +31,9 @@ script_params_list_add = --timing_update_type incremental
 script_params_list_add = --timing_update_type incremental --quench_recompute_divider 999999999 #Do post-move incremental STA during quench
 script_params_list_add = --timing_update_type incremental --router_algorithm parallel --num_workers 4  # rarely exercised code path
 script_params_list_add = --timing_update_type full --router_algorithm parallel --num_workers 4
+script_params_list_add = --timing_update_type incremental --enable_parallel_connection_router on --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
+script_params_list_add = --timing_update_type incremental --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 4 --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
+script_params_list_add = --timing_update_type incremental --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 8 --multi_queue_direct_draining on --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
+script_params_list_add = --timing_update_type full --enable_parallel_connection_router on --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
+script_params_list_add = --timing_update_type full --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 4 --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
+script_params_list_add = --timing_update_type full --enable_parallel_connection_router on --multi_queue_num_threads 2 --multi_queue_num_queues 8 --multi_queue_direct_draining on --astar_fac 0.0 --post_target_prune_fac 0.0 --post_target_prune_offset 0.0
diff --git a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing_update_type/config/golden_results.txt b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing_update_type/config/golden_results.txt
index e65df342f6a..77ef7d0e8b9 100644
--- a/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing_update_type/config/golden_results.txt
+++ b/vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_timing_update_type/config/golden_results.txt
@@ -1,7 +1,13 @@
- arch	  circuit	  script_params	  vtr_flow_elapsed_time	  vtr_max_mem_stage	  vtr_max_mem	  error	  odin_synth_time	  max_odin_mem	  parmys_synth_time	  max_parmys_mem	  abc_depth	  abc_synth_time	  abc_cec_time	  abc_sec_time	  max_abc_mem	  ace_time	  max_ace_mem	  num_clb	  num_io	  num_memories	  num_mult	  vpr_status	  vpr_revision	  vpr_build_info	  vpr_compiler	  vpr_compiled	  hostname	  rundir	  max_vpr_mem	  num_primary_inputs	  num_primary_outputs	  num_pre_packed_nets	  num_pre_packed_blocks	  num_netlist_clocks	  num_post_packed_nets	  num_post_packed_blocks	  device_width	  device_height	  device_grid_tiles	  device_limiting_resources	  device_name	  pack_mem	  pack_time	  placed_wirelength_est	  total_swap	  accepted_swap	  rejected_swap	  aborted_swap	  place_mem	  place_time	  place_quench_time	  placed_CPD_est	  placed_setup_TNS_est	  placed_setup_WNS_est	  placed_geomean_nonvirtual_intradomain_critical_path_delay_est	  place_delay_matrix_lookup_time	  place_quench_timing_analysis_time	  place_quench_sta_time	  place_total_timing_analysis_time	  place_total_sta_time	  ap_mem	  ap_time	  ap_full_legalizer_mem	  ap_full_legalizer_time	  min_chan_width	  routed_wirelength	  min_chan_width_route_success_iteration	  logic_block_area_total	  logic_block_area_used	  min_chan_width_routing_area_total	  min_chan_width_routing_area_per_tile	  min_chan_width_route_time	  min_chan_width_total_timing_analysis_time	  min_chan_width_total_sta_time	  crit_path_num_rr_graph_nodes	  crit_path_num_rr_graph_edges	  crit_path_collapsed_nodes	  crit_path_routed_wirelength	  crit_path_route_success_iteration	  crit_path_total_nets_routed	  crit_path_total_connections_routed	  crit_path_total_heap_pushes	  crit_path_total_heap_pops	  critical_path_delay	  geomean_nonvirtual_intradomain_critical_path_delay	  setup_TNS	  setup_WNS	  hold_TNS	  hold_WNS	  crit_path_routing_area_total	  crit_path_routing_area_per_tile	  router_lookahead_computation_time	  crit_path_route_time	  crit_path_create_rr_graph_time	  crit_path_create_intra_cluster_rr_graph_time	  crit_path_tile_lookahead_computation_time	  crit_path_router_lookahead_computation_time	  crit_path_total_timing_analysis_time	  crit_path_total_sta_time	 
- k6_N10_mem32K_40nm.xml	  stereovision3.v	  common_--timing_update_type_auto	  1.06	  vpr	  65.94 MiB	  	  -1	  -1	  0.49	  27024	  5	  0.12	  -1	  -1	  36972	  -1	  -1	  12	  10	  0	  0	  success	  v8.0.0-12163-g0dba7016b-dirty	  Release VTR_ASSERT_LEVEL=2	  GNU 11.4.0 on Linux-6.8.0-51-generic x86_64	  2025-02-19T17:54:19	  haydar-Precision-5820-Tower	  /home/haydar/vtr-verilog-to-routing	  67520	  10	  2	  181	  183	  1	  35	  24	  6	  6	  36	  clb	  auto	  27.0 MiB	  0.03	  152	  432	  67	  335	  30	  65.9 MiB	  0.01	  0.00	  2.14835	  -93.0339	  -2.14835	  2.14835	  0.00	  0.000242517	  0.000210755	  0.00440565	  0.00392683	  -1	  -1	  -1	  -1	  -1	  138	  15	  646728	  646728	  138825.	  3856.24	  0.01	  0.0164282	  0.0146736	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	 
- k6_N10_mem32K_40nm.xml	  stereovision3.v	  common_--timing_update_type_full	  1.57	  vpr	  66.14 MiB	  	  -1	  -1	  0.70	  27020	  5	  0.18	  -1	  -1	  36968	  -1	  -1	  12	  10	  0	  0	  success	  v8.0.0-12163-g0dba7016b-dirty	  Release VTR_ASSERT_LEVEL=2	  GNU 11.4.0 on Linux-6.8.0-51-generic x86_64	  2025-02-19T17:54:19	  haydar-Precision-5820-Tower	  /home/haydar/vtr-verilog-to-routing	  67732	  10	  2	  181	  183	  1	  35	  24	  6	  6	  36	  clb	  auto	  27.0 MiB	  0.04	  152	  432	  67	  335	  30	  66.1 MiB	  0.01	  0.00	  2.14835	  -93.0339	  -2.14835	  2.14835	  0.00	  0.000439525	  0.000384869	  0.00748551	  0.00667971	  -1	  -1	  -1	  -1	  -1	  138	  15	  646728	  646728	  138825.	  3856.24	  0.02	  0.0256815	  0.0230005	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	 
- k6_N10_mem32K_40nm.xml	  stereovision3.v	  common_--timing_update_type_incremental	  1.67	  vpr	  66.16 MiB	  	  -1	  -1	  0.82	  27152	  5	  0.18	  -1	  -1	  36840	  -1	  -1	  12	  10	  0	  0	  success	  v8.0.0-12163-g0dba7016b-dirty	  Release VTR_ASSERT_LEVEL=2	  GNU 11.4.0 on Linux-6.8.0-51-generic x86_64	  2025-02-19T17:54:19	  haydar-Precision-5820-Tower	  /home/haydar/vtr-verilog-to-routing	  67748	  10	  2	  181	  183	  1	  35	  24	  6	  6	  36	  clb	  auto	  27.0 MiB	  0.04	  152	  432	  67	  335	  30	  66.2 MiB	  0.01	  0.00	  2.14835	  -93.0339	  -2.14835	  2.14835	  0.00	  3.5959e-05	  2.737e-05	  0.00308117	  0.00275747	  -1	  -1	  -1	  -1	  -1	  138	  15	  646728	  646728	  138825.	  3856.24	  0.01	  0.0135928	  0.0105523	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	 
- k6_N10_mem32K_40nm.xml	  stereovision3.v	  common_--timing_update_type_incremental_--quench_recompute_divider_999999999	  1.25	  vpr	  66.16 MiB	  	  -1	  -1	  0.65	  27036	  5	  0.12	  -1	  -1	  36840	  -1	  -1	  12	  10	  0	  0	  success	  v8.0.0-12163-g0dba7016b-dirty	  Release VTR_ASSERT_LEVEL=2	  GNU 11.4.0 on Linux-6.8.0-51-generic x86_64	  2025-02-19T17:54:19	  haydar-Precision-5820-Tower	  /home/haydar/vtr-verilog-to-routing	  67748	  10	  2	  181	  183	  1	  35	  24	  6	  6	  36	  clb	  auto	  27.0 MiB	  0.03	  152	  432	  67	  335	  30	  66.2 MiB	  0.01	  0.00	  2.14835	  -93.0339	  -2.14835	  2.14835	  0.00	  0.000221863	  9.7941e-05	  0.00222519	  0.00187901	  -1	  -1	  -1	  -1	  -1	  138	  15	  646728	  646728	  138825.	  3856.24	  0.01	  0.00928043	  0.00694563	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	 
- k6_N10_mem32K_40nm.xml	  stereovision3.v	  common_--timing_update_type_incremental_--router_algorithm_parallel_--num_workers_4	  1.74	  vpr	  66.05 MiB	  	  -1	  -1	  0.84	  26784	  5	  0.18	  -1	  -1	  36840	  -1	  -1	  12	  10	  0	  0	  success	  v8.0.0-12163-g0dba7016b-dirty	  Release VTR_ASSERT_LEVEL=2	  GNU 11.4.0 on Linux-6.8.0-51-generic x86_64	  2025-02-19T17:54:19	  haydar-Precision-5820-Tower	  /home/haydar/vtr-verilog-to-routing	  67632	  10	  2	  181	  183	  1	  35	  24	  6	  6	  36	  clb	  auto	  26.9 MiB	  0.04	  152	  432	  67	  335	  30	  66.0 MiB	  0.01	  0.00	  2.14835	  -93.0339	  -2.14835	  2.14835	  0.00	  5.1198e-05	  3.2395e-05	  0.002938	  0.00248994	  -1	  -1	  -1	  -1	  -1	  137	  16	  646728	  646728	  138825.	  3856.24	  0.02	  0.0136494	  0.00987341	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	 
- k6_N10_mem32K_40nm.xml	  stereovision3.v	  common_--timing_update_type_full_--router_algorithm_parallel_--num_workers_4	  2.01	  vpr	  66.03 MiB	  	  -1	  -1	  0.85	  27040	  5	  0.19	  -1	  -1	  36968	  -1	  -1	  12	  10	  0	  0	  success	  v8.0.0-12163-g0dba7016b-dirty	  Release VTR_ASSERT_LEVEL=2	  GNU 11.4.0 on Linux-6.8.0-51-generic x86_64	  2025-02-19T17:54:19	  haydar-Precision-5820-Tower	  /home/haydar/vtr-verilog-to-routing	  67616	  10	  2	  181	  183	  1	  35	  24	  6	  6	  36	  clb	  auto	  26.9 MiB	  0.05	  152	  432	  67	  335	  30	  66.0 MiB	  0.03	  0.00	  2.14835	  -93.0339	  -2.14835	  2.14835	  0.00	  0.00150082	  0.00142957	  0.0169996	  0.0159511	  -1	  -1	  -1	  -1	  -1	  137	  16	  646728	  646728	  138825.	  3856.24	  0.07	  0.0553928	  0.0438556	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	  -1	 
+arch	circuit	script_params	vtr_flow_elapsed_time	vtr_max_mem_stage	vtr_max_mem	error	odin_synth_time	max_odin_mem	parmys_synth_time	max_parmys_mem	abc_depth	abc_synth_time	abc_cec_time	abc_sec_time	max_abc_mem	ace_time	max_ace_mem	num_clb	num_io	num_memories	num_mult	vpr_status	vpr_revision	vpr_build_info	vpr_compiler	vpr_compiled	hostname	rundir	max_vpr_mem	num_primary_inputs	num_primary_outputs	num_pre_packed_nets	num_pre_packed_blocks	num_netlist_clocks	num_post_packed_nets	num_post_packed_blocks	device_width	device_height	device_grid_tiles	device_limiting_resources	device_name	pack_mem	pack_time	initial_placed_wirelength_est	placed_wirelength_est	total_swap	accepted_swap	rejected_swap	aborted_swap	place_mem	place_time	place_quench_time	initial_placed_CPD_est	placed_CPD_est	placed_setup_TNS_est	placed_setup_WNS_est	placed_geomean_nonvirtual_intradomain_critical_path_delay_est	place_delay_matrix_lookup_time	place_quench_timing_analysis_time	place_quench_sta_time	place_total_timing_analysis_time	place_total_sta_time	ap_mem	ap_time	ap_full_legalizer_mem	ap_full_legalizer_time	min_chan_width	routed_wirelength	min_chan_width_route_success_iteration	logic_block_area_total	logic_block_area_used	min_chan_width_routing_area_total	min_chan_width_routing_area_per_tile	min_chan_width_route_time	min_chan_width_total_timing_analysis_time	min_chan_width_total_sta_time	crit_path_num_rr_graph_nodes	crit_path_num_rr_graph_edges	crit_path_collapsed_nodes	crit_path_routed_wirelength	crit_path_route_success_iteration	crit_path_total_nets_routed	crit_path_total_connections_routed	crit_path_total_heap_pushes	crit_path_total_heap_pops	critical_path_delay	geomean_nonvirtual_intradomain_critical_path_delay	setup_TNS	setup_WNS	hold_TNS	hold_WNS	crit_path_routing_area_total	crit_path_routing_area_per_tile	router_lookahead_computation_time	crit_path_route_time	crit_path_create_rr_graph_time	crit_path_create_intra_cluster_rr_graph_time	crit_path_tile_lookahead_computation_time	crit_path_router_lookahead_computation_time	crit_path_total_timing_analysis_time	crit_path_total_sta_time	
+k6_N10_mem32K_40nm.xml	stereovision3.v	common_--timing_update_type_auto	1.18	vpr	63.79 MiB		-1	-1	0.41	23448	5	0.11	-1	-1	32820	-1	-1	12	10	0	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	65324	10	2	181	183	1	35	24	6	6	36	clb	auto	24.5 MiB	0.04	198	153	432	69	336	27	63.8 MiB	0.01	0.00	2.2957	2.14835	-93.1844	-2.14835	2.14835	0.00	0.000235029	0.000214051	0.00424678	0.00391288	-1	-1	-1	-1	-1	141	15	646728	646728	138825.	3856.24	0.01	0.0155905	0.0141088	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
+k6_N10_mem32K_40nm.xml	stereovision3.v	common_--timing_update_type_full	1.20	vpr	63.94 MiB		-1	-1	0.50	23444	5	0.11	-1	-1	32820	-1	-1	12	10	0	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	65472	10	2	181	183	1	35	24	6	6	36	clb	auto	24.9 MiB	0.03	198	153	432	69	336	27	63.9 MiB	0.01	0.00	2.2957	2.14835	-93.1844	-2.14835	2.14835	0.00	0.000230258	0.000209554	0.0042499	0.00390533	-1	-1	-1	-1	-1	141	15	646728	646728	138825.	3856.24	0.01	0.0141455	0.0127342	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
+k6_N10_mem32K_40nm.xml	stereovision3.v	common_--timing_update_type_incremental	1.11	vpr	63.97 MiB		-1	-1	0.42	23452	5	0.11	-1	-1	32820	-1	-1	12	10	0	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	65504	10	2	181	183	1	35	24	6	6	36	clb	auto	24.9 MiB	0.03	198	153	432	69	336	27	64.0 MiB	0.01	0.00	2.2957	2.14835	-93.1844	-2.14835	2.14835	0.00	4.01e-06	1.071e-06	0.00158168	0.00142451	-1	-1	-1	-1	-1	141	15	646728	646728	138825.	3856.24	0.01	0.00723412	0.00548559	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
+k6_N10_mem32K_40nm.xml	stereovision3.v	common_--timing_update_type_incremental_--quench_recompute_divider_999999999	1.13	vpr	63.97 MiB		-1	-1	0.41	23448	5	0.11	-1	-1	32628	-1	-1	12	10	0	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	65504	10	2	181	183	1	35	24	6	6	36	clb	auto	24.5 MiB	0.03	198	153	432	69	336	27	64.0 MiB	0.01	0.00	2.2957	2.14835	-93.1844	-2.14835	2.14835	0.00	0.000115071	2.5318e-05	0.00166514	0.00142585	-1	-1	-1	-1	-1	141	15	646728	646728	138825.	3856.24	0.01	0.00729191	0.00547656	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
+k6_N10_mem32K_40nm.xml	stereovision3.v	common_--timing_update_type_incremental_--router_algorithm_parallel_--num_workers_4	1.11	vpr	63.98 MiB		-1	-1	0.42	23452	5	0.11	-1	-1	32820	-1	-1	12	10	0	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	65512	10	2	181	183	1	35	24	6	6	36	clb	auto	24.5 MiB	0.03	198	153	432	69	336	27	64.0 MiB	0.01	0.00	2.2957	2.14835	-93.1844	-2.14835	2.14835	0.00	9.591e-06	1.027e-06	0.00173563	0.0014501	-1	-1	-1	-1	-1	142	18	646728	646728	138825.	3856.24	0.01	0.00799653	0.00565668	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
+k6_N10_mem32K_40nm.xml	stereovision3.v	common_--timing_update_type_full_--router_algorithm_parallel_--num_workers_4	1.16	vpr	63.95 MiB		-1	-1	0.42	23452	5	0.12	-1	-1	32820	-1	-1	12	10	0	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	65484	10	2	181	183	1	35	24	6	6	36	clb	auto	24.9 MiB	0.03	198	153	432	69	336	27	63.9 MiB	0.01	0.00	2.2957	2.14835	-93.1844	-2.14835	2.14835	0.00	0.00043462	0.000402413	0.00710114	0.00653478	-1	-1	-1	-1	-1	142	18	646728	646728	138825.	3856.24	0.02	0.0217234	0.019335	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
+k6_N10_mem32K_40nm.xml	stereovision3.v	common_--timing_update_type_incremental_--enable_parallel_connection_router_on_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	1.16	vpr	63.97 MiB		-1	-1	0.42	23464	5	0.11	-1	-1	32820	-1	-1	12	10	0	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	65504	10	2	181	183	1	35	24	6	6	36	clb	auto	24.5 MiB	0.03	198	153	432	69	336	27	64.0 MiB	0.01	0.00	2.2957	2.14835	-93.1844	-2.14835	2.14835	0.00	4.083e-06	1.137e-06	0.00158114	0.00142356	-1	-1	-1	-1	-1	145	17	646728	646728	138825.	3856.24	0.03	0.00778711	0.00588202	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
+k6_N10_mem32K_40nm.xml	stereovision3.v	common_--timing_update_type_incremental_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_4_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	1.20	vpr	63.88 MiB		-1	-1	0.42	23468	5	0.11	-1	-1	32824	-1	-1	12	10	0	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	65412	10	2	181	183	1	35	24	6	6	36	clb	auto	24.5 MiB	0.03	198	153	432	69	336	27	63.9 MiB	0.01	0.00	2.2957	2.14835	-93.1844	-2.14835	2.14835	0.00	4.78e-06	1.407e-06	0.0017981	0.00159724	-1	-1	-1	-1	-1	145	17	646728	646728	138825.	3856.24	0.07	0.00815512	0.00618452	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
+k6_N10_mem32K_40nm.xml	stereovision3.v	common_--timing_update_type_incremental_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_8_--multi_queue_direct_draining_on_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	1.18	vpr	63.99 MiB		-1	-1	0.42	23468	5	0.11	-1	-1	32824	-1	-1	12	10	0	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	65528	10	2	181	183	1	35	24	6	6	36	clb	auto	24.9 MiB	0.03	198	153	432	69	336	27	64.0 MiB	0.01	0.00	2.2957	2.14835	-93.1844	-2.14835	2.14835	0.00	4.165e-06	1.08e-06	0.00160201	0.00144669	-1	-1	-1	-1	-1	145	17	646728	646728	138825.	3856.24	0.05	0.00798852	0.00607253	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
+k6_N10_mem32K_40nm.xml	stereovision3.v	common_--timing_update_type_full_--enable_parallel_connection_router_on_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	1.15	vpr	63.94 MiB		-1	-1	0.42	23464	5	0.11	-1	-1	32820	-1	-1	12	10	0	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	65472	10	2	181	183	1	35	24	6	6	36	clb	auto	24.9 MiB	0.03	198	153	432	69	336	27	63.9 MiB	0.01	0.00	2.2957	2.14835	-93.1844	-2.14835	2.14835	0.00	0.000221105	0.000200581	0.00411028	0.00377693	-1	-1	-1	-1	-1	145	17	646728	646728	138825.	3856.24	0.03	0.0144868	0.0129751	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
+k6_N10_mem32K_40nm.xml	stereovision3.v	common_--timing_update_type_full_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_4_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	1.20	vpr	63.53 MiB		-1	-1	0.41	23464	5	0.11	-1	-1	32820	-1	-1	12	10	0	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	65056	10	2	181	183	1	35	24	6	6	36	clb	auto	24.5 MiB	0.03	198	153	432	69	336	27	63.5 MiB	0.01	0.00	2.2957	2.14835	-93.1844	-2.14835	2.14835	0.00	0.000225457	0.000205137	0.00415833	0.0038325	-1	-1	-1	-1	-1	145	17	646728	646728	138825.	3856.24	0.07	0.0147425	0.0132082	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	
+k6_N10_mem32K_40nm.xml	stereovision3.v	common_--timing_update_type_full_--enable_parallel_connection_router_on_--multi_queue_num_threads_2_--multi_queue_num_queues_8_--multi_queue_direct_draining_on_--astar_fac_0.0_--post_target_prune_fac_0.0_--post_target_prune_offset_0.0	1.17	vpr	63.71 MiB		-1	-1	0.42	23452	5	0.11	-1	-1	32816	-1	-1	12	10	0	0	success	c8266d389-dirty	release IPO VTR_ASSERT_LEVEL=2	GNU 13.3.0 on Linux-6.8.0-58-generic x86_64	2025-05-01T21:51:43	betzgrp-wintermute	/home/yanhang1/parallel-router/vtr-verilog-to-routing/vtr_flow/tasks	65240	10	2	181	183	1	35	24	6	6	36	clb	auto	24.3 MiB	0.03	198	153	432	69	336	27	63.7 MiB	0.01	0.00	2.2957	2.14835	-93.1844	-2.14835	2.14835	0.00	0.000227544	0.000206409	0.00424772	0.00391181	-1	-1	-1	-1	-1	145	17	646728	646728	138825.	3856.24	0.05	0.0147416	0.0132112	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1	-1