Skip to content

⚡️ Speed up function find_last_node by 21,010% #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Jun 27, 2025

📄 21,010% (210.10x) speedup for find_last_node in src/dsa/nodes.py

⏱️ Runtime : 31.5 milliseconds 149 microseconds (best of 664 runs)

📝 Explanation and details

Here is a faster version of your program.
The main optimization is to use a set comprehension to build the set of edge sources and a generator expression for the lookup, as in your original.
However, iterating nodes with a for-loop is actually faster than a generator with next,
since Python avoids the extra function call for each node tested.
Also, depending on the data (if nodes is a large list and you expect exactly one result), we should preallocate the edge sources outside the function if possible—but that changes the function signature, so we'll keep it as is.
(same logic, only faster iteration using a for-loop).

This is marginally faster due to reduced overhead from function calls in generators and next().
For large datasets this effect is measurable.

Additionally, the set creation remains essential for O(1) lookups, which is maximally efficient for this use case.
Let me know if you want even more advanced optimizations!

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 20 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest  # used for our unit tests
from src.dsa.nodes import find_last_node

# unit tests

# ------------------------------
# Basic Test Cases
# ------------------------------

def test_single_node_no_edges():
    # Single node, no edges: node is the last node
    nodes = [{"id": "A"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)

def test_two_nodes_one_edge():
    # Two nodes, one edge: last node is the one not a source
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)

def test_three_nodes_linear_chain():
    # Three nodes in a linear chain: last node is at the end
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [
        {"source": "A", "target": "B"},
        {"source": "B", "target": "C"},
    ]
    codeflash_output = find_last_node(nodes, edges)

def test_three_nodes_branching():
    # Branching: two nodes point to one last node
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [
        {"source": "A", "target": "C"},
        {"source": "B", "target": "C"},
    ]
    codeflash_output = find_last_node(nodes, edges)

def test_multiple_nodes_multiple_leaves():
    # Multiple nodes, two last nodes (should return the first one found)
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}, {"id": "D"}]
    edges = [
        {"source": "A", "target": "B"},
        {"source": "A", "target": "C"},
    ]
    # Both "B" and "C" are leaves; function returns the first one in nodes order
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

# ------------------------------
# Edge Test Cases
# ------------------------------

def test_empty_nodes_and_edges():
    # Both nodes and edges empty: should return None
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)

def test_nodes_but_no_edges():
    # Multiple nodes, no edges: all nodes are "last", returns first node
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)

def test_edges_with_nonexistent_nodes():
    # Edges refer to sources/targets not in nodes: should still return first node
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [
        {"source": "X", "target": "A"},
        {"source": "Y", "target": "B"},
    ]
    # Both nodes are not sources, so first node returned
    codeflash_output = find_last_node(nodes, edges)

def test_cycle_graph():
    # Cycle: all nodes are sources, so no last node
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [
        {"source": "A", "target": "B"},
        {"source": "B", "target": "C"},
        {"source": "C", "target": "A"},
    ]
    codeflash_output = find_last_node(nodes, edges)

def test_duplicate_edges():
    # Duplicate edges: should not affect result
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [
        {"source": "A", "target": "B"},
        {"source": "A", "target": "B"},
    ]
    codeflash_output = find_last_node(nodes, edges)

def test_node_with_self_loop():
    # Node with self-loop: node is a source, so not last node
    nodes = [{"id": "A"}]
    edges = [{"source": "A", "target": "A"}]
    codeflash_output = find_last_node(nodes, edges)

def test_disconnected_nodes():
    # Some nodes not connected by any edge: should return first such node
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [
        {"source": "A", "target": "B"},
    ]
    # "B" and "C" are not sources, so first in nodes order is "B"
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_multiple_edges_to_same_target():
    # Multiple sources to same target: only target is leaf
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [
        {"source": "A", "target": "C"},
        {"source": "B", "target": "C"},
    ]
    codeflash_output = find_last_node(nodes, edges)

def test_node_id_is_integer():
    # Node ids are integers instead of strings
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3},
    ]
    codeflash_output = find_last_node(nodes, edges)

def test_node_id_is_tuple():
    # Node ids are tuples
    nodes = [{"id": (1, "A")}, {"id": (2, "B")}, {"id": (3, "C")}]
    edges = [
        {"source": (1, "A"), "target": (2, "B")},
        {"source": (2, "B"), "target": (3, "C")},
    ]
    codeflash_output = find_last_node(nodes, edges)

# ------------------------------
# Large Scale Test Cases
# ------------------------------

def test_large_linear_chain():
    # Large linear chain: 1000 nodes, last node is at the end
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(N - 1)]
    codeflash_output = find_last_node(nodes, edges)

def test_large_star_graph():
    # Large star: one central node points to 999 leaves
    N = 1000
    nodes = [{"id": 0}] + [{"id": i} for i in range(1, N)]
    edges = [{"source": 0, "target": i} for i in range(1, N)]
    # All leaves are last nodes; function returns first one in nodes order, which is id==1
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_large_disconnected_graph():
    # 500 disconnected pairs: each pair is two nodes, one edge
    N = 500
    nodes = [{"id": f"A{i}"} for i in range(N)] + [{"id": f"B{i}"} for i in range(N)]
    edges = [{"source": f"A{i}", "target": f"B{i}"} for i in range(N)]
    # All "B" nodes are leaves; function returns first one in nodes order, which is "B0"
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_large_all_sources():
    # All nodes are sources (cycle): should return None
    N = 100
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": (i + 1) % N} for i in range(N)]
    codeflash_output = find_last_node(nodes, edges)

def test_large_sparse_graph():
    # 1000 nodes, only 10 edges: most nodes are last nodes
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(10)]
    # Nodes 1..10 are not sources, as well as all nodes after 11
    # So function returns first not in edge_sources, which is node 1
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest  # used for our unit tests
from src.dsa.nodes import find_last_node

# unit tests

# ------------------------------
# Basic Test Cases
# ------------------------------

def test_single_node_no_edges():
    # Single node, no edges: node is the last node
    nodes = [{"id": "A"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges) # 1.62μs -> 542ns (200% faster)

def test_two_nodes_one_edge():
    # Two nodes, one edge: last node is the one not a source
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges) # 2.46μs -> 666ns (269% faster)

def test_three_nodes_linear_chain():
    # Three nodes in a linear chain: last node is at the end
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [
        {"source": "A", "target": "B"},
        {"source": "B", "target": "C"},
    ]
    codeflash_output = find_last_node(nodes, edges) # 3.12μs -> 708ns (341% faster)

def test_three_nodes_branching():
    # Branching: two nodes point to one last node
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [
        {"source": "A", "target": "C"},
        {"source": "B", "target": "C"},
    ]
    codeflash_output = find_last_node(nodes, edges) # 3.08μs -> 750ns (311% faster)

def test_multiple_nodes_multiple_leaves():
    # Multiple nodes, two last nodes (should return the first one found)
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}, {"id": "D"}]
    edges = [
        {"source": "A", "target": "B"},
        {"source": "A", "target": "C"},
    ]
    # Both "B" and "C" are leaves; function returns the first one in nodes order
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.46μs -> 708ns (247% faster)

# ------------------------------
# Edge Test Cases
# ------------------------------

def test_empty_nodes_and_edges():
    # Both nodes and edges empty: should return None
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges) # 708ns -> 375ns (88.8% faster)

def test_nodes_but_no_edges():
    # Multiple nodes, no edges: all nodes are "last", returns first node
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges) # 1.62μs -> 500ns (225% faster)

def test_edges_with_nonexistent_nodes():
    # Edges refer to sources/targets not in nodes: should still return first node
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [
        {"source": "X", "target": "A"},
        {"source": "Y", "target": "B"},
    ]
    # Both nodes are not sources, so first node returned
    codeflash_output = find_last_node(nodes, edges) # 1.96μs -> 584ns (235% faster)

def test_cycle_graph():
    # Cycle: all nodes are sources, so no last node
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [
        {"source": "A", "target": "B"},
        {"source": "B", "target": "C"},
        {"source": "C", "target": "A"},
    ]
    codeflash_output = find_last_node(nodes, edges) # 3.17μs -> 750ns (322% faster)

def test_duplicate_edges():
    # Duplicate edges: should not affect result
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [
        {"source": "A", "target": "B"},
        {"source": "A", "target": "B"},
    ]
    codeflash_output = find_last_node(nodes, edges) # 2.54μs -> 709ns (258% faster)

def test_node_with_self_loop():
    # Node with self-loop: node is a source, so not last node
    nodes = [{"id": "A"}]
    edges = [{"source": "A", "target": "A"}]
    codeflash_output = find_last_node(nodes, edges) # 1.71μs -> 583ns (193% faster)

def test_disconnected_nodes():
    # Some nodes not connected by any edge: should return first such node
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [
        {"source": "A", "target": "B"},
    ]
    # "B" and "C" are not sources, so first in nodes order is "B"
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 2.42μs -> 708ns (241% faster)

def test_multiple_edges_to_same_target():
    # Multiple sources to same target: only target is leaf
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [
        {"source": "A", "target": "C"},
        {"source": "B", "target": "C"},
    ]
    codeflash_output = find_last_node(nodes, edges) # 3.12μs -> 708ns (341% faster)

def test_node_id_is_integer():
    # Node ids are integers instead of strings
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3},
    ]
    codeflash_output = find_last_node(nodes, edges) # 3.04μs -> 792ns (284% faster)

def test_node_id_is_tuple():
    # Node ids are tuples
    nodes = [{"id": (1, "A")}, {"id": (2, "B")}, {"id": (3, "C")}]
    edges = [
        {"source": (1, "A"), "target": (2, "B")},
        {"source": (2, "B"), "target": (3, "C")},
    ]
    codeflash_output = find_last_node(nodes, edges) # 3.21μs -> 833ns (285% faster)

# ------------------------------
# Large Scale Test Cases
# ------------------------------

def test_large_linear_chain():
    # Large linear chain: 1000 nodes, last node is at the end
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(N - 1)]
    codeflash_output = find_last_node(nodes, edges) # 24.5ms -> 60.0μs (40786% faster)

def test_large_star_graph():
    # Large star: one central node points to 999 leaves
    N = 1000
    nodes = [{"id": 0}] + [{"id": i} for i in range(1, N)]
    edges = [{"source": 0, "target": i} for i in range(1, N)]
    # All leaves are last nodes; function returns first one in nodes order, which is id==1
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 50.7μs -> 23.0μs (121% faster)

def test_large_disconnected_graph():
    # 500 disconnected pairs: each pair is two nodes, one edge
    N = 500
    nodes = [{"id": f"A{i}"} for i in range(N)] + [{"id": f"B{i}"} for i in range(N)]
    edges = [{"source": f"A{i}", "target": f"B{i}"} for i in range(N)]
    # All "B" nodes are leaves; function returns first one in nodes order, which is "B0"
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 6.54ms -> 48.3μs (13438% faster)

def test_large_all_sources():
    # All nodes are sources (cycle): should return None
    N = 100
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": (i + 1) % N} for i in range(N)]
    codeflash_output = find_last_node(nodes, edges) # 286μs -> 6.50μs (4312% faster)

def test_large_sparse_graph():
    # 1000 nodes, only 10 edges: most nodes are last nodes
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(10)]
    # Nodes 1..10 are not sources, as well as all nodes after 11
    # So function returns first not in edge_sources, which is node 1
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output # 9.38μs -> 1.33μs (603% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-find_last_node-mce35kmx and push.

Codeflash

Here is a faster version of your program.  
The main optimization is to use a set comprehension to build the set of edge sources and a generator expression for the lookup, as in your original.  
However, iterating `nodes` with a for-loop is actually faster than a generator with `next`,  
since Python avoids the extra function call for each node tested.  
Also, depending on the data (if `nodes` is a large list and you expect exactly one result), we should preallocate the edge sources outside the function if possible—but that changes the function signature, so we'll keep it as is.
(same logic, only faster iteration using a for-loop).



This is marginally faster due to reduced overhead from function calls in generators and `next()`.  
For large datasets this effect is measurable.

Additionally, the set creation remains essential for O(1) lookups, which is maximally efficient for this use case. 
Let me know if you want even more advanced optimizations!
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jun 27, 2025
@codeflash-ai codeflash-ai bot requested a review from KRRT7 June 27, 2025 00:40
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-find_last_node-mce35kmx branch June 27, 2025 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant