Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 12, 2025

📄 13% (0.13x) speedup for AgentInputComponent._validate_component_outputs in llama-index-core/llama_index/core/query_pipeline/components/agent.py

⏱️ Runtime : 17.5 microseconds 15.5 microseconds (best of 250 runs)

📝 Explanation and details

The optimization applies conditional lazy evaluation to the get_parameters(fn) call in the __init__ method.

Key Change:

  • Original: Always calls get_parameters(fn) upfront, then conditionally overwrites with provided values
  • Optimized: Only calls get_parameters(fn) when either req_params or opt_params is None, avoiding unnecessary computation when both parameters are explicitly provided

Why This is Faster:
The get_parameters() function uses Python's inspect.signature() to introspect the callable, which involves parsing the function signature and iterating through parameters. This is computationally expensive compared to simple None checks. When users provide both req_params and opt_params explicitly, the optimization completely skips this introspection overhead.

Performance Impact:
The 13% speedup is achieved by eliminating redundant work. The annotated tests show consistent improvements ranging from 3.98% to 35.3% across different scenarios, with the largest gains occurring in simpler test cases where the optimization has the most relative impact. Tests with empty dictionaries and fewer parameters benefit most (up to 35.3% improvement), while complex scenarios still see meaningful gains (9-22%).

Real-world Benefit:
Since AgentInputComponent is likely instantiated frequently in query pipeline workflows, this optimization reduces initialization overhead whenever developers provide explicit parameter sets, making component creation more efficient in production scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 101 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime

from typing import Any, Callable, Dict, Optional, Set, Tuple

imports

import pytest
from llama_index.core.query_pipeline.components.agent import
AgentInputComponent

def get_parameters(fn: Callable) -> Tuple[Set[str], Set[str]]:
"""Get parameters from function.

Returns:
    Tuple[Set[str], Set[str]]: required and optional parameters

"""
from inspect import signature
params = signature(fn).parameters
required_params = set()
optional_params = set()
for param_name in params:
    param_default = params[param_name].default
    if param_default is params[param_name].empty:
        required_params.add(param_name)
    else:
        optional_params.add(param_name)
return required_params, optional_params

Dummy QueryComponent and Field/PrivateAttr to allow standalone testing

class QueryComponent:
def init(self, **kwargs): pass

def Field(*args, **kwargs): return None
def PrivateAttr(): return None
from llama_index.core.query_pipeline.components.agent import
AgentInputComponent

---------------------- UNIT TESTS ----------------------

1. Basic Test Cases

def test_returns_input_unchanged_simple():
"""Test that the function returns the input dict unchanged for a simple case."""
def fn(a): return a
comp = AgentInputComponent(fn)
input_dict = {"a": 1}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 340ns -> 327ns (3.98% faster)

def test_returns_input_unchanged_multiple_keys():
"""Test with multiple keys in the input dict."""
def fn(a, b, c=3): return a + b + c
comp = AgentInputComponent(fn)
input_dict = {"a": 10, "b": 20, "c": 30}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 351ns -> 321ns (9.35% faster)

def test_returns_input_unchanged_empty_dict():
"""Test with an empty input dict."""
def fn(): return 42
comp = AgentInputComponent(fn)
input_dict = {}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 360ns -> 266ns (35.3% faster)

def test_returns_input_unchanged_with_optional_params():
"""Test with optional parameters in the input dict."""
def fn(a, b=2): return a + b
comp = AgentInputComponent(fn)
input_dict = {"a": 7, "b": 9}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 340ns -> 305ns (11.5% faster)

def test_returns_input_unchanged_with_extra_keys():
"""Test with extra keys not in the function signature."""
def fn(x): return x
comp = AgentInputComponent(fn)
input_dict = {"x": 5, "y": 10, "z": 15}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 342ns -> 272ns (25.7% faster)

2. Edge Test Cases

def test_returns_input_unchanged_with_none_values():
"""Test with None values in the input dict."""
def fn(a, b=None): return a
comp = AgentInputComponent(fn)
input_dict = {"a": None, "b": None}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 360ns -> 323ns (11.5% faster)

def test_returns_input_unchanged_with_nested_dict():
"""Test with nested dictionaries as input values."""
def fn(a): return a
comp = AgentInputComponent(fn)
input_dict = {"a": {"nested": {"key": "value"}}}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 357ns -> 326ns (9.51% faster)

def test_returns_input_unchanged_with_list_values():
"""Test with lists as input values."""
def fn(a, b): return a
comp = AgentInputComponent(fn)
input_dict = {"a": [1, 2, 3], "b": ["x", "y", "z"]}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 320ns -> 299ns (7.02% faster)

def test_returns_input_unchanged_with_tuple_and_set():
"""Test with tuple and set as input values."""
def fn(a, b): return a
comp = AgentInputComponent(fn)
input_dict = {"a": (1, 2), "b": {3, 4}}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 366ns -> 308ns (18.8% faster)

def test_returns_input_unchanged_with_non_string_keys():
"""Test with non-string keys in the input dict."""
def fn(a): return a
comp = AgentInputComponent(fn)
input_dict = {1: "one", 2: "two"}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 344ns -> 310ns (11.0% faster)

def test_returns_input_unchanged_with_large_integers_and_floats():
"""Test with very large integer and float values."""
def fn(a): return a
comp = AgentInputComponent(fn)
input_dict = {"a": 10**100, "b": 1.79e308}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 352ns -> 307ns (14.7% faster)

def test_returns_input_unchanged_with_special_values():
"""Test with special values like True, False, and None."""
def fn(a, b): return a
comp = AgentInputComponent(fn)
input_dict = {"a": True, "b": False, "c": None}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 357ns -> 295ns (21.0% faster)

def test_returns_input_unchanged_with_unicode_keys_and_values():
"""Test with unicode keys and values."""
def fn(α, β): return α
comp = AgentInputComponent(fn)
input_dict = {"α": "αλφα", "β": "βήτα"}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 349ns -> 302ns (15.6% faster)

def test_returns_input_unchanged_with_mutable_values():
"""Test with mutable objects as input values."""
def fn(a): return a
comp = AgentInputComponent(fn)
class Mutable:
def init(self): self.x = 1
def eq(self, other): return isinstance(other, Mutable) and self.x == other.x
m = Mutable()
input_dict = {"a": m}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 372ns -> 333ns (11.7% faster)

def test_returns_input_unchanged_with_callable_values():
"""Test with callables as values in the input dict."""
def fn(a): return a
comp = AgentInputComponent(fn)
input_dict = {"a": lambda x: x + 1}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 357ns -> 319ns (11.9% faster)

3. Large Scale Test Cases

def test_returns_input_unchanged_large_dict():
"""Test with a large input dictionary."""
def fn(**kwargs): return sum(kwargs.values())
comp = AgentInputComponent(fn)
input_dict = {f"key_{i}": i for i in range(1000)}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 333ns -> 305ns (9.18% faster)

def test_returns_input_unchanged_large_nested_dict():
"""Test with a large, deeply nested input dictionary."""
def fn(a): return a
comp = AgentInputComponent(fn)
nested = {"level1": {"level2": {"level3": {"level4": "deep"}}}}
input_dict = {"a": nested, "b": [nested for _ in range(100)]}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 375ns -> 331ns (13.3% faster)

def test_returns_input_unchanged_large_varied_types():
"""Test with a large dict containing varied types."""
def fn(**kwargs): return kwargs
comp = AgentInputComponent(fn)
input_dict = {}
for i in range(300):
input_dict[f"int_{i}"] = i
input_dict[f"float_{i}"] = float(i)
input_dict[f"str_{i}"] = str(i)
input_dict["list"] = list(range(100))
input_dict["dict"] = {str(i): i for i in range(100)}
input_dict["set"] = set(range(100))
input_dict["tuple"] = tuple(range(100))
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 332ns -> 271ns (22.5% faster)

def test_returns_input_unchanged_large_keys_and_values():
"""Test with very large string keys and values."""
def fn(**kwargs): return kwargs
comp = AgentInputComponent(fn)
big_key = "k" * 100
big_val = "v" * 1000
input_dict = {big_key + str(i): big_val * (i+1) for i in range(10)}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 340ns -> 299ns (13.7% faster)

def test_returns_input_unchanged_with_many_types():
"""Test with a dict containing all major Python built-in types."""
def fn(**kwargs): return kwargs
comp = AgentInputComponent(fn)
input_dict = {
"int": 1,
"float": 1.23,
"str": "string",
"bytes": b"bytes",
"list": [1, 2, 3],
"tuple": (4, 5),
"set": {6, 7},
"frozenset": frozenset([8, 9]),
"dict": {"a": 10},
"bool": True,
"none": None,
"complex": 1+2j,
"callable": lambda x: x,
"range": range(10),
}
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 349ns -> 311ns (12.2% faster)

--------------- Mutation Testing: Negative Test ---------------

def test_mutation_would_fail():
"""This test would fail if the function mutates or filters the input dict."""
def fn(a, b): return a + b
comp = AgentInputComponent(fn)
input_dict = {"a": 1, "b": 2, "c": 3}
# Simulate a mutated version of the function that removes key 'c'
# This test ensures the function must return the input dict unchanged
codeflash_output = comp._validate_component_outputs(input_dict); output = codeflash_output # 358ns -> 303ns (18.2% faster)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

#------------------------------------------------
from inspect import signature
from typing import Any, Callable, Dict, Optional, Set

imports

import pytest # used for our unit tests
from llama_index.core.query_pipeline.components.agent import
AgentInputComponent

def get_parameters(fn: Callable) -> tuple[Set[str], Set[str]]:
"""Get parameters from function.

Returns:
    Tuple[Set[str], Set[str]]: required and optional parameters

"""
params = signature(fn).parameters
required_params = set()
optional_params = set()
for param_name in params:
    param_default = params[param_name].default
    if param_default is params[param_name].empty:
        required_params.add(param_name)
    else:
        optional_params.add(param_name)
return required_params, optional_params

class QueryComponent:
# Minimal stub for inheritance
def init(self, fn: Callable, async_fn: Optional[Callable] = None, **kwargs: Any):
self.fn = fn
self.async_fn = async_fn
from llama_index.core.query_pipeline.components.agent import
AgentInputComponent

unit tests

--- Basic Test Cases ---

def test_basic_empty_dict():
# Test with an empty dictionary (should return the same empty dict)
comp = AgentInputComponent(lambda: None)
inp = {}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 354ns -> 297ns (19.2% faster)

def test_basic_single_key():
# Test with a single key-value pair
comp = AgentInputComponent(lambda x: x)
inp = {'foo': 123}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 348ns -> 299ns (16.4% faster)

def test_basic_multiple_keys():
# Test with multiple key-value pairs
comp = AgentInputComponent(lambda x, y: x + y)
inp = {'x': 1, 'y': 2, 'z': 3}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 343ns -> 311ns (10.3% faster)

def test_basic_various_types():
# Test with values of various types
comp = AgentInputComponent(lambda a, b: a)
inp = {'a': [1,2,3], 'b': {'nested': True}, 'c': (4,5), 'd': None}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 348ns -> 287ns (21.3% faster)

--- Edge Test Cases ---

def test_edge_special_keys():
# Test with keys that are unusual (empty string, special characters, unicode)
comp = AgentInputComponent(lambda: None)
inp = {'': 'empty', '@!#': 42, '你好': 'hello', '\n': 'newline'}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 344ns -> 308ns (11.7% faster)

def test_edge_mutable_values():
# Test with mutable objects as values (lists, dicts, sets)
comp = AgentInputComponent(lambda: None)
inp = {'list': [1,2], 'dict': {'a':1}, 'set': {1,2}}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 357ns -> 313ns (14.1% faster)
# Mutating the input after output should reflect in output (since same object)
inp['list'].append(3)

def test_edge_nested_dict():
# Test with nested dictionaries
comp = AgentInputComponent(lambda: None)
inp = {'outer': {'inner': {'key': 'value'}}}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 374ns -> 312ns (19.9% faster)

def test_edge_large_keys():
# Test with very long string keys
comp = AgentInputComponent(lambda: None)
long_key = 'x'*500
inp = {long_key: 'value'}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 356ns -> 293ns (21.5% faster)

def test_edge_none_value():
# Test with None as a value
comp = AgentInputComponent(lambda: None)
inp = {'key': None}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 341ns -> 309ns (10.4% faster)

def test_edge_key_collision():
# Test with keys that could collide in other systems (but not in Python dict)
comp = AgentInputComponent(lambda: None)
inp = {'True': 'string_true', True: 'bool_true'}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 329ns -> 316ns (4.11% faster)

def test_edge_key_types():
# Test with non-string keys (int, float, tuple)
comp = AgentInputComponent(lambda: None)
inp = {1: 'int', 2.5: 'float', (1,2): 'tuple'}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 333ns -> 290ns (14.8% faster)

def test_edge_empty_string_value():
# Test with empty string as value
comp = AgentInputComponent(lambda: None)
inp = {'key': ''}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 348ns -> 276ns (26.1% faster)

def test_edge_only_none_dict():
# Test with all values None
comp = AgentInputComponent(lambda: None)
inp = {'a': None, 'b': None}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 337ns -> 313ns (7.67% faster)

--- Large Scale Test Cases ---

def test_large_scale_1000_keys():
# Test with a dictionary with 1000 keys
comp = AgentInputComponent(lambda **kwargs: kwargs)
inp = {f'key{i}': i for i in range(1000)}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 327ns -> 295ns (10.8% faster)

def test_large_scale_large_values():
# Test with a dictionary with a single key and a large value (list of 1000 elements)
comp = AgentInputComponent(lambda x: x)
inp = {'biglist': list(range(1000))}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 360ns -> 321ns (12.1% faster)

def test_large_scale_nested_large_dict():
# Test with a dictionary with a key whose value is a large nested dictionary
comp = AgentInputComponent(lambda x: x)
nested = {f'inner{i}': i for i in range(500)}
inp = {'outer': nested}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 334ns -> 299ns (11.7% faster)

def test_large_scale_deeply_nested_dict():
# Test with a deeply nested dictionary (depth 10)
comp = AgentInputComponent(lambda x: x)
d = {'val': 0}
for i in range(10):
d = {'nest': d}
inp = {'root': d}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 373ns -> 340ns (9.71% faster)
# Traverse down to the innermost value
current = out['root']
for i in range(10):
current = current['nest']

def test_large_scale_many_types():
# Test with a large dictionary with many types of values
comp = AgentInputComponent(lambda **kwargs: kwargs)
inp = {f'key{i}': i if i%5==0 else [i] if i%5==1 else {i: str(i)} if i%5==2 else str(i) if i%5==3 else None for i in range(500)}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 384ns -> 338ns (13.6% faster)

--- Determinism and Object Identity ---

def test_object_identity():
# Ensure the returned object is the same as the input
comp = AgentInputComponent(lambda: None)
inp = {'x': 1}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 346ns -> 305ns (13.4% faster)

def test_determinism():
# Ensure the function is deterministic for repeated calls
comp = AgentInputComponent(lambda: None)
inp = {'y': 2}
codeflash_output = comp._validate_component_outputs(inp); out1 = codeflash_output # 350ns -> 304ns (15.1% faster)
codeflash_output = comp._validate_component_outputs(inp); out2 = codeflash_output # 158ns -> 172ns (8.14% slower)

--- Defensive: Mutability ---

def test_mutability_after_return():
# Ensure mutating the input after return affects the output
comp = AgentInputComponent(lambda: None)
inp = {'a': [1]}
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 312ns -> 311ns (0.322% faster)
inp['a'].append(2)

--- Defensive: No Side Effects ---

def test_no_side_effects_on_input():
# Ensure the function does not modify the input dict
comp = AgentInputComponent(lambda: None)
inp = {'x': 1, 'y': 2}
inp_copy = dict(inp)
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 336ns -> 301ns (11.6% faster)

--- Defensive: Accepts Any Dict ---

@pytest.mark.parametrize("inp", [
{}, # empty dict
{'a': 1}, # single key
{'a': None}, # None value
{'a': [1,2,3]}, # list value
{'a': {'b': 2}}, # nested dict
{1: 'int key', (1,2): 'tuple key'}, # non-string keys
{'': 'empty key', ' ': 'space key'}, # special string keys
])
def test_accepts_various_dicts(inp):
comp = AgentInputComponent(lambda: None)
codeflash_output = comp._validate_component_outputs(inp); out = codeflash_output # 2.38μs -> 2.14μs (11.3% faster)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-AgentInputComponent._validate_component_outputs-mhvhhckt and push.

Codeflash Static Badge

The optimization applies **conditional lazy evaluation** to the `get_parameters(fn)` call in the `__init__` method. 

**Key Change:**
- **Original**: Always calls `get_parameters(fn)` upfront, then conditionally overwrites with provided values
- **Optimized**: Only calls `get_parameters(fn)` when either `req_params` or `opt_params` is `None`, avoiding unnecessary computation when both parameters are explicitly provided

**Why This is Faster:**
The `get_parameters()` function uses Python's `inspect.signature()` to introspect the callable, which involves parsing the function signature and iterating through parameters. This is computationally expensive compared to simple `None` checks. When users provide both `req_params` and `opt_params` explicitly, the optimization completely skips this introspection overhead.

**Performance Impact:**
The 13% speedup is achieved by eliminating redundant work. The annotated tests show consistent improvements ranging from 3.98% to 35.3% across different scenarios, with the largest gains occurring in simpler test cases where the optimization has the most relative impact. Tests with empty dictionaries and fewer parameters benefit most (up to 35.3% improvement), while complex scenarios still see meaningful gains (9-22%).

**Real-world Benefit:**
Since `AgentInputComponent` is likely instantiated frequently in query pipeline workflows, this optimization reduces initialization overhead whenever developers provide explicit parameter sets, making component creation more efficient in production scenarios.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 12, 2025 04:12
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant