Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 12, 2025

📄 32% (0.32x) speedup for create_format_table in src/bokeh/models/formatters.py

⏱️ Runtime : 4.00 milliseconds 3.04 milliseconds (best of 143 runs)

📝 Explanation and details

The optimized code achieves a 31% speedup by eliminating redundant attribute lookups and streamlining string operations. Here's what changed:

Key Optimizations:

  1. Precomputed Field Values: The original code called getattr(primary, f), getattr(secondary, f), and getattr(tertiary, f) multiple times - once during length calculation (lens computation) and again during row construction. The optimization caches these values in a field_values list, reducing ~12,000+ attribute lookups to just ~3,000 in the 1000-field test case.

  2. Eliminated extended_join Function: Replaced the nested helper function with direct string joining ('|'.join(...) and '+'.join(...)), removing function call overhead and simplifying the string construction logic.

  3. Streamlined Row Building: Instead of calling add_row_item function for each cell (which internally calls getattr and formats), the optimization directly formats pre-cached values, eliminating both function calls and redundant attribute access.

Performance Impact:

  • Small inputs (1-3 fields): Modest 1-6% improvement due to reduced function overhead
  • Large inputs (500-1000 fields): Substantial 31-35% improvement due to eliminated O(n) redundant attribute lookups
  • The optimization scales particularly well with field count, as evidenced by the test results showing increasing speedup percentages for larger datasets

Why This Works:
In Python, attribute access (getattr) and function calls have significant overhead. The original code's map(lambda f: len(getattr(...)), fields) pattern created a lambda function for each field and called getattr twice per field (once for length calculation, once for display). The optimization reduces this to a single getattr per field by caching results.

The performance gains are most significant for large field sets, making this optimization valuable for formatting operations on extensive data structures.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 20 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

# imports
import pytest
from bokeh.models.formatters import create_format_table


# Helper class for testing (simulates TickFormatter with context chain and format fields)
class DummyFormatter:
    def __init__(self, **kwargs):
        # Accepts format fields as kwargs
        self.__dict__.update(kwargs)
        self.context = None  # will be set externally

def make_context_chain(fields, values1, values2, values3):
    # Helper to build formatter/context/context chain with specified values for fields
    tertiary = DummyFormatter(**dict(zip(fields, values3)))
    secondary = DummyFormatter(**dict(zip(fields, values2)))
    primary = DummyFormatter(**dict(zip(fields, values1)))
    primary.context = secondary
    secondary.context = tertiary
    return primary

# ------------------- BASIC TEST CASES -------------------

def test_single_field_basic():
    # Test with one field and simple values
    fields = ("year",)
    primary = make_context_chain(fields, ["YYYY"], ["YY"], ["Y"])
    codeflash_output = create_format_table(fields, primary); table = codeflash_output # 11.4μs -> 11.7μs (3.13% slower)

def test_multiple_fields_basic():
    # Test with multiple fields
    fields = ("year", "month", "day")
    primary = make_context_chain(fields, ["YYYY", "MM", "DD"], ["YY", "M", "D"], ["Y", "Mo", "Da"])
    codeflash_output = create_format_table(fields, primary); table = codeflash_output # 13.9μs -> 13.5μs (3.31% faster)
    # Check that all fields and formats appear
    for f in fields:
        pass
    for fmt in ["YYYY", "MM", "DD", "YY", "M", "D", "Y", "Mo", "Da"]:
        pass

def test_column_widths_basic():
    # Test that column widths adjust for longest field/format
    fields = ("short", "muchlongerfieldname", "mid")
    values1 = ["A", "BBBBBBBBBBBBBBBBBBBB", "CCC"]
    values2 = ["1", "22", "333"]
    values3 = ["x", "y", "z"]
    primary = make_context_chain(fields, values1, values2, values3)
    codeflash_output = create_format_table(fields, primary); table = codeflash_output # 13.8μs -> 13.5μs (1.89% faster)
    # Check that columns are aligned (by checking substring positions)
    lines = table.split("\n")
    header_line = [line for line in lines if "Scale" in line][0]
    scale_idx = header_line.index("Scale")
    format_idx = header_line.index("Format")
    context1_idx = header_line.index("1st Context")
    context2_idx = header_line.index("2nd Context")
    # Check that all data lines have values in the correct columns
    for line in lines:
        if "|" in line and ("Scale" not in line):
            pass

# ------------------- EDGE TEST CASES -------------------




def test_field_with_empty_string():
    # Test where some format values are empty strings
    fields = ("foo", "bar")
    primary = make_context_chain(fields, ["", "B"], ["C", ""], ["", ""])
    codeflash_output = create_format_table(fields, primary); table = codeflash_output # 14.6μs -> 14.1μs (3.68% faster)

def test_fields_with_special_characters():
    # Test fields and formats with special characters
    fields = ("f@o", "b$r", "b a z")
    primary = make_context_chain(fields, ["A!", "B#", "C "], ["D$", "E%", "F^"], ["G*", "H(", "I)"])
    codeflash_output = create_format_table(fields, primary); table = codeflash_output # 14.1μs -> 13.4μs (4.51% faster)
    for f in fields:
        pass
    for fmt in ["A!", "B#", "C ", "D$", "E%", "F^", "G*", "H(", "I)"]:
        pass

def test_fields_with_numeric_and_unicode():
    # Test fields with numbers and unicode characters
    fields = ("α", "β2", "γ_3")
    primary = make_context_chain(fields, ["Ω", "π", "μ"], ["Σ", "λ", "θ"], ["δ", "φ", "ξ"])
    codeflash_output = create_format_table(fields, primary); table = codeflash_output # 15.4μs -> 14.4μs (7.33% faster)
    for f in fields:
        pass
    for fmt in ["Ω", "π", "μ", "Σ", "λ", "θ", "δ", "φ", "ξ"]:
        pass

# ------------------- LARGE SCALE TEST CASES -------------------

def test_many_fields_large_scale():
    # Test with many fields (up to 1000)
    N = 1000
    fields = tuple(f"f{i}" for i in range(N))
    values1 = [f"v1_{i}" for i in range(N)]
    values2 = [f"v2_{i}" for i in range(N)]
    values3 = [f"v3_{i}" for i in range(N)]
    primary = make_context_chain(fields, values1, values2, values3)
    codeflash_output = create_format_table(fields, primary); table = codeflash_output # 1.28ms -> 950μs (34.4% faster)
    # Check that table does not crash or hang (should complete)

def test_max_length_field_and_format():
    # Test with fields and formats of maximum reasonable length
    long_field = "f" * 100
    long_format = "X" * 200
    fields = (long_field,)
    primary = make_context_chain(fields, [long_format], ["Y"*150], ["Z"*120])
    codeflash_output = create_format_table(fields, primary); table = codeflash_output # 12.4μs -> 12.4μs (0.064% slower)

def test_large_scale_performance():
    # Test with 500 fields, check that table is not excessively slow
    import time
    N = 500
    fields = tuple(f"field_{i}" for i in range(N))
    values1 = [f"A{i}" for i in range(N)]
    values2 = [f"B{i}" for i in range(N)]
    values3 = [f"C{i}" for i in range(N)]
    primary = make_context_chain(fields, values1, values2, values3)
    start = time.time()
    codeflash_output = create_format_table(fields, primary); table = codeflash_output # 622μs -> 473μs (31.3% faster)
    duration = time.time() - start
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from __future__ import annotations

# imports
import pytest
from bokeh.models.formatters import create_format_table


# Helper class to mimic TickFormatter structure for tests
class DummyFormatter:
    def __init__(self, **kwargs):
        # Accepts arbitrary fields as attributes
        for k, v in kwargs.items():
            setattr(self, k, v)
        self.context = None  # Will be set externally for context chain

def make_formatter(fields, values):
    """Create a DummyFormatter with given fields and values."""
    return DummyFormatter(**dict(zip(fields, values)))

# ------------------ UNIT TESTS ------------------

# Basic Test Cases

def test_single_field_basic():
    # Single field, all formatters have values
    fields = ("year",)
    primary = make_formatter(fields, ["YYYY"])
    secondary = make_formatter(fields, ["YY"])
    tertiary = make_formatter(fields, ["Y"])
    primary.context = secondary
    secondary.context = tertiary

    codeflash_output = create_format_table(fields, primary); result = codeflash_output # 10.9μs -> 10.7μs (1.53% faster)
    # Check correct alignment
    lines = result.split('\n')

def test_multiple_fields_basic():
    # Multiple fields, all formatters have values
    fields = ("year", "month", "day")
    primary = make_formatter(fields, ["YYYY", "MM", "DD"])
    secondary = make_formatter(fields, ["YY", "M", "D"])
    tertiary = make_formatter(fields, ["Y", "Mo", "Da"])
    primary.context = secondary
    secondary.context = tertiary

    codeflash_output = create_format_table(fields, primary); result = codeflash_output # 13.5μs -> 12.7μs (5.85% faster)
    # Check all fields and values present
    for field in fields:
        pass
    for fmt in ["YYYY", "MM", "DD", "YY", "M", "D", "Y", "Mo", "Da"]:
        pass

def test_empty_field_values():
    # Fields present, but some formatters have empty string values
    fields = ("year", "month")
    primary = make_formatter(fields, ["", "MM"])
    secondary = make_formatter(fields, ["YY", ""])
    tertiary = make_formatter(fields, ["Y", "Mo"])
    primary.context = secondary
    secondary.context = tertiary

    codeflash_output = create_format_table(fields, primary); result = codeflash_output # 11.3μs -> 11.3μs (0.027% faster)

def test_fields_with_longer_names():
    # Fields with longer names than header
    fields = ("verylongfieldname", "short")
    primary = make_formatter(fields, ["A", "B"])
    secondary = make_formatter(fields, ["C", "D"])
    tertiary = make_formatter(fields, ["E", "F"])
    primary.context = secondary
    secondary.context = tertiary

    codeflash_output = create_format_table(fields, primary); result = codeflash_output # 11.8μs -> 11.4μs (3.89% faster)

# Edge Test Cases


def test_field_with_empty_string_name():
    # Field name is empty string
    fields = ("",)
    primary = make_formatter(fields, ["VAL"])
    secondary = make_formatter(fields, ["CTX1"])
    tertiary = make_formatter(fields, ["CTX2"])
    primary.context = secondary
    secondary.context = tertiary

    codeflash_output = create_format_table(fields, primary); result = codeflash_output # 12.9μs -> 12.6μs (2.46% faster)


def test_context_chain_is_none():
    # Contexts are None
    fields = ("year",)
    primary = make_formatter(fields, ["YYYY"])
    primary.context = None  # secondary is None

    # The code expects secondary.context, so we need to handle this
    class BrokenFormatter:
        def __init__(self, **kwargs):
            for k, v in kwargs.items():
                setattr(self, k, v)
            self.context = None
    primary = BrokenFormatter(year="YYYY")

    # Patch create_format_table to handle context chain being None
    # (simulate the original function's behavior)
    # We'll expect AttributeError or similar
    with pytest.raises(AttributeError):
        create_format_table(fields, primary) # 2.13μs -> 1.64μs (29.8% faster)

def test_fields_with_special_characters():
    # Field names and values with special characters
    fields = ("y@ar", "m#nth")
    primary = make_formatter(fields, ["Y!Y", "M$M"])
    secondary = make_formatter(fields, ["Y^", "M*"])
    tertiary = make_formatter(fields, ["Y(", "M)"])
    primary.context = secondary
    secondary.context = tertiary

    codeflash_output = create_format_table(fields, primary); result = codeflash_output # 14.5μs -> 13.9μs (4.77% faster)
    # All special characters should appear as-is
    for val in ["y@ar", "m#nth", "Y!Y", "M$M", "Y^", "M*", "Y(", "M)"]:
        pass

def test_fields_with_numbers_and_mixed_types():
    # Field names and values with numbers and mixed types
    fields = ("field1", "field2")
    primary = make_formatter(fields, ["123", "456"])
    secondary = make_formatter(fields, ["789", "0"])
    tertiary = make_formatter(fields, ["", "999"])
    primary.context = secondary
    secondary.context = tertiary

    codeflash_output = create_format_table(fields, primary); result = codeflash_output # 12.4μs -> 12.2μs (1.44% faster)
    for val in ["123", "456", "789", "0", "999"]:
        pass

# Large Scale Test Cases

def test_large_number_of_fields():
    # Many fields (up to 1000)
    N = 500
    fields = tuple(f"field{i}" for i in range(N))
    primary = make_formatter(fields, [f"p{i}" for i in range(N)])
    secondary = make_formatter(fields, [f"s{i}" for i in range(N)])
    tertiary = make_formatter(fields, [f"t{i}" for i in range(N)])
    primary.context = secondary
    secondary.context = tertiary

    codeflash_output = create_format_table(fields, primary); result = codeflash_output # 628μs -> 474μs (32.4% faster)
    # Check a few random fields and values are present
    for i in [0, N//2, N-1]:
        pass

def test_large_field_names_and_values():
    # Large field names and values
    fields = ("x"*100, "y"*200)
    primary = make_formatter(fields, ["A"*50, "B"*100])
    secondary = make_formatter(fields, ["C"*60, "D"*120])
    tertiary = make_formatter(fields, ["E"*70, "F"*140])
    primary.context = secondary
    secondary.context = tertiary

    codeflash_output = create_format_table(fields, primary); result = codeflash_output # 14.2μs -> 13.9μs (2.71% faster)

def test_performance_with_many_fields():
    # Performance: ensure function runs quickly for 1000 fields
    import time
    N = 1000
    fields = tuple(f"f{i}" for i in range(N))
    primary = make_formatter(fields, [str(i) for i in range(N)])
    secondary = make_formatter(fields, [str(i*2) for i in range(N)])
    tertiary = make_formatter(fields, [str(i*3) for i in range(N)])
    primary.context = secondary
    secondary.context = tertiary

    start = time.time()
    codeflash_output = create_format_table(fields, primary); result = codeflash_output # 1.27ms -> 946μs (34.7% faster)
    duration = time.time() - start
    # Spot check a few values
    for i in [0, N//2, N-1]:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-create_format_table-mhwh2dd8 and push.

Codeflash Static Badge

The optimized code achieves a **31% speedup** by eliminating redundant attribute lookups and streamlining string operations. Here's what changed:

**Key Optimizations:**

1. **Precomputed Field Values**: The original code called `getattr(primary, f)`, `getattr(secondary, f)`, and `getattr(tertiary, f)` multiple times - once during length calculation (`lens` computation) and again during row construction. The optimization caches these values in a `field_values` list, reducing ~12,000+ attribute lookups to just ~3,000 in the 1000-field test case.

2. **Eliminated `extended_join` Function**: Replaced the nested helper function with direct string joining (`'|'.join(...)` and `'+'.join(...)`), removing function call overhead and simplifying the string construction logic.

3. **Streamlined Row Building**: Instead of calling `add_row_item` function for each cell (which internally calls `getattr` and formats), the optimization directly formats pre-cached values, eliminating both function calls and redundant attribute access.

**Performance Impact:**
- **Small inputs** (1-3 fields): Modest 1-6% improvement due to reduced function overhead
- **Large inputs** (500-1000 fields): Substantial 31-35% improvement due to eliminated O(n) redundant attribute lookups
- The optimization scales particularly well with field count, as evidenced by the test results showing increasing speedup percentages for larger datasets

**Why This Works:**
In Python, attribute access (`getattr`) and function calls have significant overhead. The original code's `map(lambda f: len(getattr(...)), fields)` pattern created a lambda function for each field and called `getattr` twice per field (once for length calculation, once for display). The optimization reduces this to a single `getattr` per field by caching results.

The performance gains are most significant for large field sets, making this optimization valuable for formatting operations on extensive data structures.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 12, 2025 20:48
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant