⚡️ Speed up function get_root_nodes by 140%
#126
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 140% (1.40x) speedup for
get_root_nodesinllama-index-core/llama_index/core/node_parser/relational/hierarchical.py⏱️ Runtime :
859 microseconds→358 microseconds(best of217runs)📝 Explanation and details
The optimization achieves a 140% speedup by implementing two key performance improvements:
1. Eliminated repeated attribute lookups: The original code performed
NodeRelationship.PARENTlookup inside the loop for every node (10,127 times according to profiler). The optimized version stores this in a local variableparent_rel, reducing attribute access overhead from 40.4% to just 3.6% of total runtime.2. Replaced imperative loop with list comprehension: List comprehensions execute at C-speed in Python's internals, avoiding the overhead of repeated
append()calls and intermediate list growth. This change eliminated the 34% runtime spent on append operations.Performance characteristics by test case:
Why this matters: Node parsing is typically performed on substantial document hierarchies where the function processes hundreds or thousands of nodes. The profiler results show this optimization is most effective precisely where it's needed most - at scale. The minor overhead on tiny inputs is vastly outweighed by the substantial gains on realistic workloads.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
node_parser/test_hierarchical.py::test_get_root_nodesnode_parser/test_hierarchical.py::test_get_root_nodes_empty🌀 Generated Regression Tests and Runtime
from typing import Any, Dict, List
imports
import pytest # used for our unit tests
from llama_index.core.node_parser.relational.hierarchical import get_root_nodes
Minimal stubs for BaseNode and NodeRelationship to make tests self-contained
class NodeRelationship:
PARENT = "parent"
CHILD = "child"
SIBLING = "sibling"
# Add more relationships if needed for edge cases
class BaseNode:
def init(self, id: Any, relationships: Dict[str, Any] = None):
self.id = id
# relationships: dict mapping relationship type to node(s)
self.relationships = relationships if relationships is not None else {}
from llama_index.core.node_parser.relational.hierarchical import get_root_nodes
unit tests
--- Basic Test Cases ---
def test_single_node_no_relationships_is_root():
# Single node, no relationships
node = BaseNode(id=1)
codeflash_output = get_root_nodes([node]) # 940ns -> 1.21μs (22.2% slower)
def test_single_node_with_parent_not_root():
# Single node with a parent relationship
node = BaseNode(id=1, relationships={NodeRelationship.PARENT: 2})
codeflash_output = get_root_nodes([node]) # 869ns -> 1.16μs (25.3% slower)
def test_multiple_nodes_some_with_parent():
# Multiple nodes, some with parent, some without
node1 = BaseNode(id=1)
node2 = BaseNode(id=2, relationships={NodeRelationship.PARENT: 1})
node3 = BaseNode(id=3)
node4 = BaseNode(id=4, relationships={NodeRelationship.PARENT: 3})
codeflash_output = get_root_nodes([node1, node2, node3, node4]); result = codeflash_output # 1.37μs -> 1.38μs (0.795% slower)
def test_nodes_with_other_relationships_are_root():
# Node with relationships other than PARENT should be root
node1 = BaseNode(id=1, relationships={NodeRelationship.CHILD: 2})
node2 = BaseNode(id=2, relationships={NodeRelationship.SIBLING: 1})
codeflash_output = get_root_nodes([node1, node2]); result = codeflash_output # 972ns -> 1.20μs (19.0% slower)
--- Edge Test Cases ---
def test_empty_node_list_returns_empty():
# No nodes provided
codeflash_output = get_root_nodes([]) # 380ns -> 832ns (54.3% slower)
def test_node_with_empty_relationships_dict_is_root():
# Node with explicit empty relationships dict
node = BaseNode(id=1, relationships={})
codeflash_output = get_root_nodes([node]) # 831ns -> 1.06μs (21.4% slower)
def test_node_with_none_relationships_is_root():
# Node with relationships=None
node = BaseNode(id=1, relationships=None)
codeflash_output = get_root_nodes([node]) # 796ns -> 1.08μs (26.1% slower)
def test_node_with_multiple_relationships_including_parent_not_root():
# Node with multiple relationships including parent
node = BaseNode(id=1, relationships={NodeRelationship.PARENT: 2, NodeRelationship.CHILD: 3})
codeflash_output = get_root_nodes([node]) # 850ns -> 1.01μs (15.8% slower)
def test_node_with_multiple_relationships_excluding_parent_is_root():
# Node with multiple relationships, none are parent
node = BaseNode(id=1, relationships={NodeRelationship.CHILD: 2, NodeRelationship.SIBLING: 3})
codeflash_output = get_root_nodes([node]) # 803ns -> 1.06μs (24.0% slower)
def test_node_with_parent_relationship_value_none_not_root():
# Node with parent relationship value None (still present)
node = BaseNode(id=1, relationships={NodeRelationship.PARENT: None})
codeflash_output = get_root_nodes([node]) # 703ns -> 1.01μs (30.7% slower)
def test_node_with_parent_relationship_value_empty_list_not_root():
# Node with parent relationship value as empty list
node = BaseNode(id=1, relationships={NodeRelationship.PARENT: []})
codeflash_output = get_root_nodes([node]) # 801ns -> 1.05μs (23.8% slower)
def test_node_with_parent_relationship_key_case_sensitive():
# Node with relationship key 'Parent' (wrong case) should be root
node = BaseNode(id=1, relationships={"Parent": 2})
codeflash_output = get_root_nodes([node]) # 844ns -> 1.01μs (16.2% slower)
def test_node_with_unexpected_relationship_key_is_root():
# Node with unexpected relationship key
node = BaseNode(id=1, relationships={"ancestor": 2})
codeflash_output = get_root_nodes([node]) # 746ns -> 959ns (22.2% slower)
def test_duplicate_nodes_without_parent_are_all_roots():
# Multiple identical nodes without parent
node1 = BaseNode(id=1)
node2 = BaseNode(id=1)
codeflash_output = get_root_nodes([node1, node2]) # 1.04μs -> 1.13μs (8.56% slower)
def test_duplicate_nodes_with_parent_are_not_roots():
# Multiple identical nodes with parent
node1 = BaseNode(id=1, relationships={NodeRelationship.PARENT: 2})
node2 = BaseNode(id=1, relationships={NodeRelationship.PARENT: 2})
codeflash_output = get_root_nodes([node1, node2]) # 960ns -> 1.19μs (19.5% slower)
--- Large Scale Test Cases ---
def test_large_number_of_nodes_all_roots():
# All nodes have no parent relationship
nodes = [BaseNode(id=i) for i in range(1000)]
codeflash_output = get_root_nodes(nodes); result = codeflash_output # 81.4μs -> 31.4μs (159% faster)
def test_large_number_of_nodes_all_have_parent():
# All nodes have parent relationship
nodes = [BaseNode(id=i, relationships={NodeRelationship.PARENT: i-1}) for i in range(1000)]
codeflash_output = get_root_nodes(nodes); result = codeflash_output # 80.2μs -> 32.2μs (149% faster)
def test_large_mixed_nodes_some_roots_some_not():
# Half nodes have parent, half don't
nodes = []
roots = []
for i in range(1000):
if i % 2 == 0:
node = BaseNode(id=i)
roots.append(node)
else:
node = BaseNode(id=i, relationships={NodeRelationship.PARENT: i-1})
nodes.append(node)
codeflash_output = get_root_nodes(nodes); result = codeflash_output # 81.4μs -> 31.6μs (158% faster)
def test_large_nodes_with_various_relationships():
# Nodes with random relationships, only those without parent are roots
nodes = []
roots = []
for i in range(1000):
if i % 3 == 0:
node = BaseNode(id=i, relationships={NodeRelationship.CHILD: i+1})
roots.append(node)
elif i % 3 == 1:
node = BaseNode(id=i, relationships={NodeRelationship.PARENT: i-1})
else:
node = BaseNode(id=i)
roots.append(node)
nodes.append(node)
codeflash_output = get_root_nodes(nodes); result = codeflash_output # 87.7μs -> 32.5μs (170% faster)
def test_large_nodes_with_empty_and_none_relationships():
# Some nodes with relationships=None, some with {}, some with parent
nodes = []
roots = []
for i in range(1000):
if i % 4 == 0:
node = BaseNode(id=i, relationships=None)
roots.append(node)
elif i % 4 == 1:
node = BaseNode(id=i, relationships={})
roots.append(node)
elif i % 4 == 2:
node = BaseNode(id=i, relationships={NodeRelationship.PARENT: i-1})
else:
node = BaseNode(id=i, relationships={NodeRelationship.CHILD: i+1})
roots.append(node)
nodes.append(node)
codeflash_output = get_root_nodes(nodes); result = codeflash_output # 84.8μs -> 32.2μs (164% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import List
imports
import pytest
from llama_index.core.node_parser.relational.hierarchical import get_root_nodes
--- Mocks for llama_index.core.schema.BaseNode and NodeRelationship ---
class NodeRelationship:
PARENT = "parent"
CHILD = "child"
SIBLING = "sibling"
# Add more relationships if needed
class BaseNode:
def init(self, node_id, relationships=None):
self.node_id = node_id
# relationships: dict[str, str] or set[str]
if relationships is None:
self.relationships = {}
elif isinstance(relationships, dict):
self.relationships = relationships
elif isinstance(relationships, set):
# Convert set to dict for compatibility
self.relationships = {rel: None for rel in relationships}
else:
raise ValueError("relationships must be dict or set")
from llama_index.core.node_parser.relational.hierarchical import get_root_nodes
unit tests
----------- Basic Test Cases -----------
def test_single_node_no_relationships():
# Test a single node with no relationships
node = BaseNode("A")
codeflash_output = get_root_nodes([node]); result = codeflash_output # 1.13μs -> 1.65μs (31.8% slower)
def test_single_node_with_parent_relationship():
# Test a single node with a parent relationship
node = BaseNode("A", relationships={NodeRelationship.PARENT: "B"})
codeflash_output = get_root_nodes([node]); result = codeflash_output # 903ns -> 1.18μs (23.2% slower)
def test_multiple_nodes_some_roots():
# Test multiple nodes, some with parent, some without
nodeA = BaseNode("A") # root
nodeB = BaseNode("B", relationships={NodeRelationship.PARENT: "A"})
nodeC = BaseNode("C") # root
nodeD = BaseNode("D", relationships={NodeRelationship.PARENT: "C"})
codeflash_output = get_root_nodes([nodeA, nodeB, nodeC, nodeD]); result = codeflash_output # 1.37μs -> 1.35μs (1.11% faster)
def test_all_nodes_have_parent():
# All nodes have parent relationships: no roots
nodeA = BaseNode("A", relationships={NodeRelationship.PARENT: "B"})
nodeB = BaseNode("B", relationships={NodeRelationship.PARENT: "C"})
nodeC = BaseNode("C", relationships={NodeRelationship.PARENT: "A"})
codeflash_output = get_root_nodes([nodeA, nodeB, nodeC]); result = codeflash_output # 1.11μs -> 1.22μs (9.09% slower)
def test_nodes_with_other_relationships():
# Nodes with relationships, but not parent
nodeA = BaseNode("A", relationships={NodeRelationship.CHILD: "B"})
nodeB = BaseNode("B", relationships={NodeRelationship.SIBLING: "A"})
codeflash_output = get_root_nodes([nodeA, nodeB]); result = codeflash_output # 973ns -> 1.20μs (18.9% slower)
def test_nodes_with_multiple_relationships():
# Nodes with multiple relationships, including parent
nodeA = BaseNode("A", relationships={NodeRelationship.PARENT: "B", NodeRelationship.CHILD: "C"})
nodeB = BaseNode("B", relationships={NodeRelationship.CHILD: "A"})
nodeC = BaseNode("C")
codeflash_output = get_root_nodes([nodeA, nodeB, nodeC]); result = codeflash_output # 1.15μs -> 1.28μs (9.70% slower)
----------- Edge Test Cases -----------
def test_empty_list():
# Edge case: empty input
codeflash_output = get_root_nodes([]); result = codeflash_output # 400ns -> 882ns (54.6% slower)
def test_node_with_empty_relationships_dict():
# Node with empty relationships dict
node = BaseNode("A", relationships={})
codeflash_output = get_root_nodes([node]); result = codeflash_output # 812ns -> 1.12μs (27.8% slower)
def test_node_with_relationships_set_type():
# Node with relationships as a set (should be handled)
node = BaseNode("A", relationships={NodeRelationship.CHILD})
codeflash_output = get_root_nodes([node]); result = codeflash_output # 806ns -> 1.03μs (21.8% slower)
def test_node_with_none_relationships():
# Node with relationships=None
node = BaseNode("A", relationships=None)
codeflash_output = get_root_nodes([node]); result = codeflash_output # 840ns -> 1.10μs (23.4% slower)
def test_node_with_parent_relationship_value_none():
# Node with parent relationship but value is None
node = BaseNode("A", relationships={NodeRelationship.PARENT: None})
codeflash_output = get_root_nodes([node]); result = codeflash_output # 768ns -> 1.06μs (27.8% slower)
def test_node_with_parent_relationship_other_value():
# Node with parent relationship value as empty string
node = BaseNode("A", relationships={NodeRelationship.PARENT: ""})
codeflash_output = get_root_nodes([node]); result = codeflash_output # 800ns -> 1.04μs (22.9% slower)
def test_node_with_multiple_same_relationships():
# Node with multiple relationships of the same type (shouldn't happen with dict, but test)
node = BaseNode("A", relationships={NodeRelationship.PARENT: "B", NodeRelationship.PARENT: "C"})
codeflash_output = get_root_nodes([node]); result = codeflash_output # 772ns -> 1.03μs (25.0% slower)
def test_node_with_unusual_relationship_keys():
# Node with relationships having unexpected keys
node = BaseNode("A", relationships={"random": "X"})
codeflash_output = get_root_nodes([node]); result = codeflash_output # 772ns -> 969ns (20.3% slower)
def test_duplicate_nodes_in_input():
# Duplicate nodes in input
nodeA = BaseNode("A")
codeflash_output = get_root_nodes([nodeA, nodeA]); result = codeflash_output # 1.01μs -> 1.21μs (16.4% slower)
----------- Large Scale Test Cases -----------
def test_large_linear_chain():
# Create a linear chain: node0 is root, node1's parent is node0, ..., node999's parent is node998
nodes = []
for i in range(1000):
if i == 0:
nodes.append(BaseNode(f"N{i}"))
else:
nodes.append(BaseNode(f"N{i}", relationships={NodeRelationship.PARENT: f"N{i-1}"}))
codeflash_output = get_root_nodes(nodes); roots = codeflash_output # 83.5μs -> 31.8μs (163% faster)
def test_large_forest():
# 10 trees, each with 100 nodes, roots at indices 0, 100, ..., 900
nodes = []
for t in range(10):
root_idx = t * 100
nodes.append(BaseNode(f"R{root_idx}")) # root
for i in range(1, 100):
nodes.append(BaseNode(f"N{root_idx+i}", relationships={NodeRelationship.PARENT: f"N{root_idx+i-1}"}))
codeflash_output = get_root_nodes(nodes); roots = codeflash_output # 82.0μs -> 33.7μs (143% faster)
expected_roots = [nodes[i*100] for i in range(10)]
def test_large_all_roots():
# 1000 nodes, none have parent relationships
nodes = [BaseNode(str(i)) for i in range(1000)]
codeflash_output = get_root_nodes(nodes); roots = codeflash_output # 84.9μs -> 32.0μs (165% faster)
def test_large_all_have_parents():
# 1000 nodes, all have parent relationships
nodes = [BaseNode(str(i), relationships={NodeRelationship.PARENT: str(i-1)}) for i in range(1000)]
codeflash_output = get_root_nodes(nodes); roots = codeflash_output # 81.5μs -> 30.4μs (168% faster)
def test_large_mixed_relationships():
# 500 nodes with parent, 500 without
nodes_with_parent = [BaseNode(f"P{i}", relationships={NodeRelationship.PARENT: f"X{i}"}) for i in range(500)]
nodes_without_parent = [BaseNode(f"R{i}") for i in range(500)]
nodes = nodes_with_parent + nodes_without_parent
codeflash_output = get_root_nodes(nodes); roots = codeflash_output # 81.9μs -> 32.9μs (149% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
To edit these changes
git checkout codeflash/optimize-get_root_nodes-mhv9dwbzand push.