⚡️ Speed up method AgentInputComponent._validate_component_outputs by 13%
#139
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 13% (0.13x) speedup for
AgentInputComponent._validate_component_outputsinllama-index-core/llama_index/core/query_pipeline/components/agent.py⏱️ Runtime :
17.5 microseconds→15.5 microseconds(best of250runs)📝 Explanation and details
The optimization applies conditional lazy evaluation to the
get_parameters(fn)call in the__init__method.Key Change:
get_parameters(fn)upfront, then conditionally overwrites with provided valuesget_parameters(fn)when eitherreq_paramsoropt_paramsisNone, avoiding unnecessary computation when both parameters are explicitly providedWhy This is Faster:
The
get_parameters()function uses Python'sinspect.signature()to introspect the callable, which involves parsing the function signature and iterating through parameters. This is computationally expensive compared to simpleNonechecks. When users provide bothreq_paramsandopt_paramsexplicitly, the optimization completely skips this introspection overhead.Performance Impact:
The 13% speedup is achieved by eliminating redundant work. The annotated tests show consistent improvements ranging from 3.98% to 35.3% across different scenarios, with the largest gains occurring in simpler test cases where the optimization has the most relative impact. Tests with empty dictionaries and fewer parameters benefit most (up to 35.3% improvement), while complex scenarios still see meaningful gains (9-22%).
Real-world Benefit:
Since
AgentInputComponentis likely instantiated frequently in query pipeline workflows, this optimization reduces initialization overhead whenever developers provide explicit parameter sets, making component creation more efficient in production scenarios.✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
from typing import Any, Callable, Dict, Optional, Set, Tuple
imports
import pytest
from llama_index.core.query_pipeline.components.agent import
AgentInputComponent
def get_parameters(fn: Callable) -> Tuple[Set[str], Set[str]]:
"""Get parameters from function.
Dummy QueryComponent and Field/PrivateAttr to allow standalone testing
class QueryComponent:
def init(self, **kwargs): pass
def Field(*args, **kwargs): return None
def PrivateAttr(): return None
from llama_index.core.query_pipeline.components.agent import
AgentInputComponent
---------------------- UNIT TESTS ----------------------
1. Basic Test Cases
def test_returns_input_unchanged_simple():
"""Test that the function returns the input dict unchanged for a simple case."""
def fn(a): return a
comp = AgentInputComponent(fn)
input_dict = {"a": 1}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 340ns -> 327ns (3.98% faster)
def test_returns_input_unchanged_multiple_keys():
"""Test with multiple keys in the input dict."""
def fn(a, b, c=3): return a + b + c
comp = AgentInputComponent(fn)
input_dict = {"a": 10, "b": 20, "c": 30}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 351ns -> 321ns (9.35% faster)
def test_returns_input_unchanged_empty_dict():
"""Test with an empty input dict."""
def fn(): return 42
comp = AgentInputComponent(fn)
input_dict = {}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 360ns -> 266ns (35.3% faster)
def test_returns_input_unchanged_with_optional_params():
"""Test with optional parameters in the input dict."""
def fn(a, b=2): return a + b
comp = AgentInputComponent(fn)
input_dict = {"a": 7, "b": 9}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 340ns -> 305ns (11.5% faster)
def test_returns_input_unchanged_with_extra_keys():
"""Test with extra keys not in the function signature."""
def fn(x): return x
comp = AgentInputComponent(fn)
input_dict = {"x": 5, "y": 10, "z": 15}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 342ns -> 272ns (25.7% faster)
2. Edge Test Cases
def test_returns_input_unchanged_with_none_values():
"""Test with None values in the input dict."""
def fn(a, b=None): return a
comp = AgentInputComponent(fn)
input_dict = {"a": None, "b": None}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 360ns -> 323ns (11.5% faster)
def test_returns_input_unchanged_with_nested_dict():
"""Test with nested dictionaries as input values."""
def fn(a): return a
comp = AgentInputComponent(fn)
input_dict = {"a": {"nested": {"key": "value"}}}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 357ns -> 326ns (9.51% faster)
def test_returns_input_unchanged_with_list_values():
"""Test with lists as input values."""
def fn(a, b): return a
comp = AgentInputComponent(fn)
input_dict = {"a": [1, 2, 3], "b": ["x", "y", "z"]}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 320ns -> 299ns (7.02% faster)
def test_returns_input_unchanged_with_tuple_and_set():
"""Test with tuple and set as input values."""
def fn(a, b): return a
comp = AgentInputComponent(fn)
input_dict = {"a": (1, 2), "b": {3, 4}}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 366ns -> 308ns (18.8% faster)
def test_returns_input_unchanged_with_non_string_keys():
"""Test with non-string keys in the input dict."""
def fn(a): return a
comp = AgentInputComponent(fn)
input_dict = {1: "one", 2: "two"}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 344ns -> 310ns (11.0% faster)
def test_returns_input_unchanged_with_large_integers_and_floats():
"""Test with very large integer and float values."""
def fn(a): return a
comp = AgentInputComponent(fn)
input_dict = {"a": 10**100, "b": 1.79e308}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 352ns -> 307ns (14.7% faster)
def test_returns_input_unchanged_with_special_values():
"""Test with special values like True, False, and None."""
def fn(a, b): return a
comp = AgentInputComponent(fn)
input_dict = {"a": True, "b": False, "c": None}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 357ns -> 295ns (21.0% faster)
def test_returns_input_unchanged_with_unicode_keys_and_values():
"""Test with unicode keys and values."""
def fn(α, β): return α
comp = AgentInputComponent(fn)
input_dict = {"α": "αλφα", "β": "βήτα"}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 349ns -> 302ns (15.6% faster)
def test_returns_input_unchanged_with_mutable_values():
"""Test with mutable objects as input values."""
def fn(a): return a
comp = AgentInputComponent(fn)
class Mutable:
def init(self): self.x = 1
def eq(self, other): return isinstance(other, Mutable) and self.x == other.x
m = Mutable()
input_dict = {"a": m}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 372ns -> 333ns (11.7% faster)
def test_returns_input_unchanged_with_callable_values():
"""Test with callables as values in the input dict."""
def fn(a): return a
comp = AgentInputComponent(fn)
input_dict = {"a": lambda x: x + 1}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 357ns -> 319ns (11.9% faster)
3. Large Scale Test Cases
def test_returns_input_unchanged_large_dict():
"""Test with a large input dictionary."""
def fn(**kwargs): return sum(kwargs.values())
comp = AgentInputComponent(fn)
input_dict = {f"key_{i}": i for i in range(1000)}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 333ns -> 305ns (9.18% faster)
def test_returns_input_unchanged_large_nested_dict():
"""Test with a large, deeply nested input dictionary."""
def fn(a): return a
comp = AgentInputComponent(fn)
nested = {"level1": {"level2": {"level3": {"level4": "deep"}}}}
input_dict = {"a": nested, "b": [nested for _ in range(100)]}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 375ns -> 331ns (13.3% faster)
def test_returns_input_unchanged_large_varied_types():
"""Test with a large dict containing varied types."""
def fn(**kwargs): return kwargs
comp = AgentInputComponent(fn)
input_dict = {}
for i in range(300):
input_dict[f"int_{i}"] = i
input_dict[f"float_{i}"] = float(i)
input_dict[f"str_{i}"] = str(i)
input_dict["list"] = list(range(100))
input_dict["dict"] = {str(i): i for i in range(100)}
input_dict["set"] = set(range(100))
input_dict["tuple"] = tuple(range(100))
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 332ns -> 271ns (22.5% faster)
def test_returns_input_unchanged_large_keys_and_values():
"""Test with very large string keys and values."""
def fn(**kwargs): return kwargs
comp = AgentInputComponent(fn)
big_key = "k" * 100
big_val = "v" * 1000
input_dict = {big_key + str(i): big_val * (i+1) for i in range(10)}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 340ns -> 299ns (13.7% faster)
def test_returns_input_unchanged_with_many_types():
"""Test with a dict containing all major Python built-in types."""
def fn(**kwargs): return kwargs
comp = AgentInputComponent(fn)
input_dict = {
"int": 1,
"float": 1.23,
"str": "string",
"bytes": b"bytes",
"list": [1, 2, 3],
"tuple": (4, 5),
"set": {6, 7},
"frozenset": frozenset([8, 9]),
"dict": {"a": 10},
"bool": True,
"none": None,
"complex": 1+2j,
"callable": lambda x: x,
"range": range(10),
}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 349ns -> 311ns (12.2% faster)
--------------- Mutation Testing: Negative Test ---------------
def test_mutation_would_fail():
"""This test would fail if the function mutates or filters the input dict."""
def fn(a, b): return a + b
comp = AgentInputComponent(fn)
input_dict = {"a": 1, "b": 2, "c": 3}
# Simulate a mutated version of the function that removes key 'c'
# This test ensures the function must return the input dict unchanged
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 358ns -> 303ns (18.2% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from inspect import signature
from typing import Any, Callable, Dict, Optional, Set
imports
import pytest # used for our unit tests
from llama_index.core.query_pipeline.components.agent import
AgentInputComponent
def get_parameters(fn: Callable) -> tuple[Set[str], Set[str]]:
"""Get parameters from function.
class QueryComponent:
# Minimal stub for inheritance
def init(self, fn: Callable, async_fn: Optional[Callable] = None, **kwargs: Any):
self.fn = fn
self.async_fn = async_fn
from llama_index.core.query_pipeline.components.agent import
AgentInputComponent
unit tests
--- Basic Test Cases ---
def test_basic_empty_dict():
# Test with an empty dictionary (should return the same empty dict)
comp = AgentInputComponent(lambda: None)
inp = {}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 354ns -> 297ns (19.2% faster)
def test_basic_single_key():
# Test with a single key-value pair
comp = AgentInputComponent(lambda x: x)
inp = {'foo': 123}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 348ns -> 299ns (16.4% faster)
def test_basic_multiple_keys():
# Test with multiple key-value pairs
comp = AgentInputComponent(lambda x, y: x + y)
inp = {'x': 1, 'y': 2, 'z': 3}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 343ns -> 311ns (10.3% faster)
def test_basic_various_types():
# Test with values of various types
comp = AgentInputComponent(lambda a, b: a)
inp = {'a': [1,2,3], 'b': {'nested': True}, 'c': (4,5), 'd': None}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 348ns -> 287ns (21.3% faster)
--- Edge Test Cases ---
def test_edge_special_keys():
# Test with keys that are unusual (empty string, special characters, unicode)
comp = AgentInputComponent(lambda: None)
inp = {'': 'empty', '@!#': 42, '你好': 'hello', '\n': 'newline'}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 344ns -> 308ns (11.7% faster)
def test_edge_mutable_values():
# Test with mutable objects as values (lists, dicts, sets)
comp = AgentInputComponent(lambda: None)
inp = {'list': [1,2], 'dict': {'a':1}, 'set': {1,2}}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 357ns -> 313ns (14.1% faster)
# Mutating the input after output should reflect in output (since same object)
inp['list'].append(3)
def test_edge_nested_dict():
# Test with nested dictionaries
comp = AgentInputComponent(lambda: None)
inp = {'outer': {'inner': {'key': 'value'}}}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 374ns -> 312ns (19.9% faster)
def test_edge_large_keys():
# Test with very long string keys
comp = AgentInputComponent(lambda: None)
long_key = 'x'*500
inp = {long_key: 'value'}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 356ns -> 293ns (21.5% faster)
def test_edge_none_value():
# Test with None as a value
comp = AgentInputComponent(lambda: None)
inp = {'key': None}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 341ns -> 309ns (10.4% faster)
def test_edge_key_collision():
# Test with keys that could collide in other systems (but not in Python dict)
comp = AgentInputComponent(lambda: None)
inp = {'True': 'string_true', True: 'bool_true'}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 329ns -> 316ns (4.11% faster)
def test_edge_key_types():
# Test with non-string keys (int, float, tuple)
comp = AgentInputComponent(lambda: None)
inp = {1: 'int', 2.5: 'float', (1,2): 'tuple'}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 333ns -> 290ns (14.8% faster)
def test_edge_empty_string_value():
# Test with empty string as value
comp = AgentInputComponent(lambda: None)
inp = {'key': ''}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 348ns -> 276ns (26.1% faster)
def test_edge_only_none_dict():
# Test with all values None
comp = AgentInputComponent(lambda: None)
inp = {'a': None, 'b': None}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 337ns -> 313ns (7.67% faster)
--- Large Scale Test Cases ---
def test_large_scale_1000_keys():
# Test with a dictionary with 1000 keys
comp = AgentInputComponent(lambda **kwargs: kwargs)
inp = {f'key{i}': i for i in range(1000)}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 327ns -> 295ns (10.8% faster)
def test_large_scale_large_values():
# Test with a dictionary with a single key and a large value (list of 1000 elements)
comp = AgentInputComponent(lambda x: x)
inp = {'biglist': list(range(1000))}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 360ns -> 321ns (12.1% faster)
def test_large_scale_nested_large_dict():
# Test with a dictionary with a key whose value is a large nested dictionary
comp = AgentInputComponent(lambda x: x)
nested = {f'inner{i}': i for i in range(500)}
inp = {'outer': nested}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 334ns -> 299ns (11.7% faster)
def test_large_scale_deeply_nested_dict():
# Test with a deeply nested dictionary (depth 10)
comp = AgentInputComponent(lambda x: x)
d = {'val': 0}
for i in range(10):
d = {'nest': d}
inp = {'root': d}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 373ns -> 340ns (9.71% faster)
# Traverse down to the innermost value
current = out['root']
for i in range(10):
current = current['nest']
def test_large_scale_many_types():
# Test with a large dictionary with many types of values
comp = AgentInputComponent(lambda **kwargs: kwargs)
inp = {f'key{i}': i if i%5==0 else [i] if i%5==1 else {i: str(i)} if i%5==2 else str(i) if i%5==3 else None for i in range(500)}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 384ns -> 338ns (13.6% faster)
--- Determinism and Object Identity ---
def test_object_identity():
# Ensure the returned object is the same as the input
comp = AgentInputComponent(lambda: None)
inp = {'x': 1}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 346ns -> 305ns (13.4% faster)
def test_determinism():
# Ensure the function is deterministic for repeated calls
comp = AgentInputComponent(lambda: None)
inp = {'y': 2}
codeflash_output = comp._validate_component_outputs(inp); out1 = codeflash_output # 350ns -> 304ns (15.1% faster)
codeflash_output = comp._validate_component_outputs(inp); out2 = codeflash_output # 158ns -> 172ns (8.14% slower)
--- Defensive: Mutability ---
def test_mutability_after_return():
# Ensure mutating the input after return affects the output
comp = AgentInputComponent(lambda: None)
inp = {'a': [1]}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 312ns -> 311ns (0.322% faster)
inp['a'].append(2)
--- Defensive: No Side Effects ---
def test_no_side_effects_on_input():
# Ensure the function does not modify the input dict
comp = AgentInputComponent(lambda: None)
inp = {'x': 1, 'y': 2}
inp_copy = dict(inp)
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 336ns -> 301ns (11.6% faster)
--- Defensive: Accepts Any Dict ---
@pytest.mark.parametrize("inp", [
{}, # empty dict
{'a': 1}, # single key
{'a': None}, # None value
{'a': [1,2,3]}, # list value
{'a': {'b': 2}}, # nested dict
{1: 'int key', (1,2): 'tuple key'}, # non-string keys
{'': 'empty key', ' ': 'space key'}, # special string keys
])
def test_accepts_various_dicts(inp):
comp = AgentInputComponent(lambda: None)
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 2.38μs -> 2.14μs (11.3% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
To edit these changes
git checkout codeflash/optimize-AgentInputComponent._validate_component_outputs-mhvhhcktand push.