Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 12, 2025

📄 227% (2.27x) speedup for indent in src/bokeh/util/strings.py

⏱️ Runtime : 461 microseconds 141 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 226% speedup by replacing the inefficient split("\n") + generator expression + join() pattern with a direct str.replace() operation.

Key optimizations:

  1. Empty string fast path: Added if not text: return padding to handle empty strings immediately, avoiding unnecessary string operations (238% faster for empty strings).

  2. Direct string replacement: Changed from "\n".join(padding + line for line in text.split("\n")) to padding + text.replace("\n", f"\n{padding}"). This eliminates:

    • String splitting into a list of lines
    • Generator expression iteration
    • String joining operation

    Instead, it performs a single replace operation that inserts padding after each newline, then prepends padding to the entire text.

Why this is faster:

  • str.replace() is a highly optimized C-level operation in Python
  • Eliminates memory allocation for the intermediate list from split()
  • Removes the overhead of iterating through lines and concatenating strings
  • Reduces function call overhead from the generator expression

Performance characteristics:

  • Small text: 60-90% faster due to reduced overhead
  • Large text with many lines: Up to 518% faster (1000-line test case) because the optimization scales better with line count
  • Single lines: Still 50-80% faster due to eliminated split/join operations

Impact on workloads: Based on function_references, this function is used in Bokeh's JavaScript code generation (wrap_in_onload, wrap_in_safely, wrap_in_script_tag). Since these likely run during web page rendering or visualization generation, the performance improvement will reduce latency in generating embedded Bokeh content, especially for complex visualizations with substantial JavaScript code.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 59 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest  # used for our unit tests
from bokeh.util.strings import indent

# unit tests

# -------------------
# Basic Test Cases
# -------------------

def test_indent_basic_default():
    # Single line, default indentation (2 spaces)
    codeflash_output = indent("hello") # 1.53μs -> 795ns (92.1% faster)

def test_indent_basic_custom_n():
    # Single line, custom indentation width
    codeflash_output = indent("hello", n=4) # 1.76μs -> 1.01μs (74.0% faster)

def test_indent_basic_custom_ch():
    # Single line, custom indentation character
    codeflash_output = indent("hello", n=3, ch="-") # 1.71μs -> 1.04μs (64.7% faster)

def test_indent_multiline_default():
    # Multiple lines, default indentation
    text = "line1\nline2\nline3"
    expected = "  line1\n  line2\n  line3"
    codeflash_output = indent(text) # 1.87μs -> 1.10μs (69.5% faster)

def test_indent_multiline_custom():
    # Multiple lines, custom indentation
    text = "a\nb\nc"
    expected = "--a\n--b\n--c"
    codeflash_output = indent(text, n=2, ch="-") # 2.10μs -> 1.28μs (63.6% faster)

def test_indent_empty_line():
    # Empty string input
    codeflash_output = indent("") # 1.47μs -> 437ns (236% faster)

def test_indent_multiline_with_empty_lines():
    # Multiline with empty lines in between
    text = "foo\n\nbar"
    expected = "  foo\n  \n  bar"
    codeflash_output = indent(text) # 1.95μs -> 1.03μs (88.8% faster)

# -------------------
# Edge Test Cases
# -------------------

def test_indent_zero_n():
    # Indentation width zero, should not add any padding
    codeflash_output = indent("abc", n=0) # 1.75μs -> 1.02μs (72.5% faster)
    codeflash_output = indent("a\nb", n=0) # 979ns -> 415ns (136% faster)

def test_indent_negative_n():
    # Negative indentation width, should produce empty padding (no error)
    # This is a design decision: ch * -n == "" in Python
    codeflash_output = indent("abc", n=-3) # 1.48μs -> 893ns (66.1% faster)
    codeflash_output = indent("a\nb", n=-1, ch="*") # 1.18μs -> 651ns (81.3% faster)

def test_indent_empty_ch():
    # Empty indentation character, so padding is always empty
    codeflash_output = indent("abc", n=3, ch="") # 1.58μs -> 990ns (60.1% faster)
    codeflash_output = indent("a\nb", n=2, ch="") # 972ns -> 421ns (131% faster)

def test_indent_multichar_ch():
    # Indentation character is more than one character
    # Should repeat the string n times
    codeflash_output = indent("abc", n=2, ch="xy") # 1.60μs -> 910ns (75.6% faster)
    codeflash_output = indent("a\nb", n=3, ch="*#") # 1.13μs -> 693ns (63.6% faster)

def test_indent_unicode_ch():
    # Indentation character is a Unicode character
    codeflash_output = indent("abc", n=2, ch="😀") # 1.82μs -> 1.54μs (18.0% faster)
    codeflash_output = indent("a\nb", n=1, ch="你") # 1.53μs -> 1.32μs (16.0% faster)

def test_indent_multiline_trailing_newline():
    # Text ends with a newline; should preserve number of lines
    text = "foo\nbar\n"
    expected = "  foo\n  bar\n  "
    codeflash_output = indent(text) # 1.95μs -> 1.04μs (87.1% faster)

def test_indent_multiline_leading_newline():
    # Text starts with a newline; should indent empty line
    text = "\nfoo\nbar"
    expected = "  \n  foo\n  bar"
    codeflash_output = indent(text) # 1.87μs -> 1.00μs (87.4% faster)

def test_indent_multiline_only_newlines():
    # Text is only newlines
    codeflash_output = indent("\n\n") # 1.72μs -> 1.01μs (70.6% faster)

def test_indent_multiline_mixed_whitespace():
    # Lines with spaces and tabs
    text = "a\n b\n\tc"
    expected = "  a\n   b\n  \tc"
    codeflash_output = indent(text) # 1.89μs -> 1.02μs (84.6% faster)

def test_indent_multiline_long_lines():
    # Lines with long content
    text = "x" * 100 + "\n" + "y" * 200
    expected = "  " + "x" * 100 + "\n  " + "y" * 200
    codeflash_output = indent(text) # 2.04μs -> 1.27μs (60.4% faster)

# -------------------
# Large Scale Test Cases
# -------------------

def test_indent_large_number_of_lines():
    # Indent 1000 lines, each with a number
    lines = [str(i) for i in range(1000)]
    text = "\n".join(lines)
    expected = "\n".join("  " + str(i) for i in range(1000))
    codeflash_output = indent(text) # 71.6μs -> 11.6μs (518% faster)

def test_indent_large_line_length():
    # Indent a single line with 1000 characters
    text = "a" * 1000
    expected = "  " + "a" * 1000
    codeflash_output = indent(text) # 2.01μs -> 1.33μs (50.9% faster)

def test_indent_large_multiline_custom_ch():
    # Indent 500 lines with long custom ch
    lines = ["foo"] * 500
    text = "\n".join(lines)
    ch = "xyz"
    n = 2
    expected = "\n".join((ch * n) + "foo" for _ in range(500))
    codeflash_output = indent(text, n=n, ch=ch) # 30.2μs -> 6.70μs (351% faster)

def test_indent_large_multiline_empty_lines():
    # Indent 1000 lines, alternating empty and non-empty
    lines = ["", "foo"] * 500
    text = "\n".join(lines)
    expected = "\n".join("  " + line for line in lines)
    codeflash_output = indent(text) # 46.5μs -> 10.3μs (352% faster)

def test_indent_large_multiline_unicode():
    # Indent 1000 lines with Unicode character
    lines = ["bar"] * 1000
    text = "\n".join(lines)
    ch = "你"
    n = 3
    expected = "\n".join(ch * n + "bar" for _ in range(1000))
    codeflash_output = indent(text, n=n, ch=ch) # 62.6μs -> 17.1μs (267% faster)

# -------------------
# Determinism Test
# -------------------

def test_indent_deterministic():
    # Multiple calls with same input should always give same output
    text = "foo\nbar"
    codeflash_output = indent(text, n=3, ch="."); result1 = codeflash_output # 2.06μs -> 1.29μs (59.6% faster)
    codeflash_output = indent(text, n=3, ch="."); result2 = codeflash_output # 868ns -> 462ns (87.9% faster)

# -------------------
# Type and Value Error Test Cases
# -------------------

def test_indent_non_string_text():
    # Should raise TypeError if text is not a string
    with pytest.raises(AttributeError):
        indent(123) # 1.60μs -> 1.35μs (18.4% faster)

def test_indent_non_int_n():
    # Should raise TypeError if n is not an integer (ch * n fails)
    with pytest.raises(TypeError):
        indent("foo", n="bar") # 1.26μs -> 1.22μs (3.86% faster)

def test_indent_non_string_ch():
    # Should raise TypeError if ch is not a string (ch * n fails)
    with pytest.raises(TypeError):
        indent("foo", n=2, ch=5) # 2.85μs -> 2.10μs (35.6% faster)

def test_indent_negative_large_n():
    # Negative large n, should produce empty padding
    codeflash_output = indent("foo", n=-1000) # 1.83μs -> 1.11μs (65.4% faster)

def test_indent_zero_length_ch():
    # ch is empty string, n > 0, should produce no padding
    codeflash_output = indent("foo", n=10, ch="") # 1.70μs -> 1.08μs (56.3% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import pytest  # used for our unit tests
from bokeh.util.strings import indent

# unit tests

# ------------------------
# Basic Test Cases
# ------------------------

def test_indent_single_line_default():
    # Indenting a single line with default settings
    codeflash_output = indent("hello") # 1.67μs -> 914ns (82.8% faster)

def test_indent_single_line_custom_n():
    # Indenting a single line with n=4 spaces
    codeflash_output = indent("world", n=4) # 1.79μs -> 1.07μs (67.7% faster)

def test_indent_single_line_custom_char():
    # Indenting a single line with n=3 and ch="*"
    codeflash_output = indent("foo", n=3, ch="*") # 1.79μs -> 1.04μs (72.6% faster)

def test_indent_multi_line_default():
    # Indenting multiple lines with default settings
    input_text = "line1\nline2\nline3"
    expected = "  line1\n  line2\n  line3"
    codeflash_output = indent(input_text) # 2.00μs -> 1.28μs (55.7% faster)

def test_indent_multi_line_custom():
    # Indenting multiple lines with n=1 and ch="-"
    input_text = "a\nb\nc"
    expected = "-a\n-b\n-c"
    codeflash_output = indent(input_text, n=1, ch="-") # 2.23μs -> 1.33μs (67.1% faster)

# ------------------------
# Edge Test Cases
# ------------------------

def test_indent_empty_string():
    # Indenting an empty string should return just the padding
    codeflash_output = indent("") # 1.53μs -> 452ns (238% faster)

def test_indent_empty_lines():
    # Indenting a string with empty lines
    input_text = "\n\n"
    expected = "  \n  \n  "
    codeflash_output = indent(input_text) # 1.89μs -> 1.08μs (75.2% faster)

def test_indent_zero_indent():
    # Indenting with n=0 should add no padding
    input_text = "foo\nbar"
    expected = "foo\nbar"
    codeflash_output = indent(input_text, n=0) # 1.99μs -> 1.09μs (82.9% faster)

def test_indent_negative_indent():
    # Negative n should result in empty padding (since ch * n == "")
    input_text = "test"
    expected = "test"
    codeflash_output = indent(input_text, n=-2) # 1.71μs -> 968ns (76.1% faster)

def test_indent_empty_char():
    # Indenting with empty ch should add no padding
    input_text = "abc\ndef"
    expected = "abc\ndef"
    codeflash_output = indent(input_text, n=3, ch="") # 1.93μs -> 1.15μs (67.8% faster)

def test_indent_multichar_ch():
    # Indenting with multi-character ch
    input_text = "x\ny"
    expected = "--x\n--y"
    codeflash_output = indent(input_text, n=1, ch="--") # 1.97μs -> 1.24μs (59.3% faster)

def test_indent_unicode_char():
    # Indenting with a unicode character
    input_text = "hello\nworld"
    expected = "😀😀hello\n😀😀world"
    codeflash_output = indent(input_text, n=2, ch="😀") # 2.66μs -> 2.53μs (5.06% faster)

def test_indent_newline_only():
    # Indenting a string with only newlines
    input_text = "\n"
    expected = "  \n  "
    codeflash_output = indent(input_text) # 1.83μs -> 1.04μs (76.3% faster)

def test_indent_lines_with_whitespace():
    # Indenting lines that already have leading whitespace
    input_text = "  a\n\tb"
    expected = "  " + "  a\n  \tb"
    codeflash_output = indent(input_text) # 1.83μs -> 1.04μs (75.7% faster)

def test_indent_large_n():
    # Indenting with a large n value
    input_text = "z"
    expected = ("#" * 100) + "z"
    codeflash_output = indent(input_text, n=100, ch="#") # 1.84μs -> 1.03μs (78.3% faster)

# ------------------------
# Large Scale Test Cases
# ------------------------

def test_indent_large_text():
    # Indenting a large block of text (1000 lines)
    input_text = "\n".join(f"line{i}" for i in range(1000))
    expected = "\n".join("  " + f"line{i}" for i in range(1000))
    codeflash_output = indent(input_text) # 62.3μs -> 12.6μs (394% faster)

def test_indent_large_text_custom():
    # Indenting a large block of text with custom n and ch
    input_text = "\n".join("x" * 10 for _ in range(500))
    expected = "\n".join("**" + "x" * 10 for _ in range(500))
    codeflash_output = indent(input_text, n=2, ch="*") # 31.2μs -> 7.93μs (293% faster)

def test_indent_large_lines_and_large_n():
    # Indenting large lines with large n
    input_text = "\n".join("abc" * 100 for _ in range(10))
    expected = "\n".join(" " * 50 + "abc" * 100 for _ in range(10))
    codeflash_output = indent(input_text, n=50) # 3.96μs -> 2.68μs (47.7% faster)

def test_indent_performance():
    # Indenting with a large number of lines and characters, checking output length
    lines = 1000
    n = 5
    ch = "x"
    input_text = "\n".join("y" * 10 for _ in range(lines))
    codeflash_output = indent(input_text, n=n, ch=ch); result = codeflash_output # 60.6μs -> 14.8μs (309% faster)
    # Each line should start with 5 'x' followed by 10 'y'
    expected_line = "x" * n + "y" * 10
    output_lines = result.split("\n")

# ------------------------
# Additional Edge Cases
# ------------------------

def test_indent_n_is_none():
    # Passing None for n should raise a TypeError
    with pytest.raises(TypeError):
        indent("abc", n=None) # 1.35μs -> 1.25μs (8.09% faster)

def test_indent_ch_is_none():
    # Passing None for ch should raise a TypeError
    with pytest.raises(TypeError):
        indent("abc", ch=None) # 1.58μs -> 1.54μs (2.79% faster)

def test_indent_non_string_text():
    # Passing non-string text should raise a TypeError
    with pytest.raises(AttributeError):
        indent(123) # 1.55μs -> 1.30μs (19.6% faster)

def test_indent_non_string_ch():
    # Passing non-string ch should raise a TypeError
    with pytest.raises(TypeError):
        indent("abc", n=2, ch=5) # 2.91μs -> 2.16μs (34.8% faster)

def test_indent_multiline_with_trailing_newline():
    # Indenting text with trailing newline
    input_text = "abc\ndef\n"
    expected = "  abc\n  def\n  "
    codeflash_output = indent(input_text) # 2.06μs -> 1.14μs (80.4% faster)

def test_indent_multiline_with_leading_newline():
    # Indenting text with leading newline
    input_text = "\nabc\ndef"
    expected = "  \n  abc\n  def"
    codeflash_output = indent(input_text) # 1.88μs -> 1.05μs (78.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from bokeh.util.strings import indent

def test_indent():
    indent('', n=0, ch='')
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_sstvtaha/tmpcy6l8blp/test_concolic_coverage.py::test_indent 1.70μs 723ns 135%✅

To edit these changes git checkout codeflash/optimize-indent-mhwbeqig and push.

Codeflash Static Badge

The optimized code achieves a **226% speedup** by replacing the inefficient `split("\n")` + generator expression + `join()` pattern with a direct `str.replace()` operation.

**Key optimizations:**

1. **Empty string fast path**: Added `if not text: return padding` to handle empty strings immediately, avoiding unnecessary string operations (238% faster for empty strings).

2. **Direct string replacement**: Changed from `"\n".join(padding + line for line in text.split("\n"))` to `padding + text.replace("\n", f"\n{padding}")`. This eliminates:
   - String splitting into a list of lines
   - Generator expression iteration
   - String joining operation
   
   Instead, it performs a single replace operation that inserts padding after each newline, then prepends padding to the entire text.

**Why this is faster:**
- `str.replace()` is a highly optimized C-level operation in Python
- Eliminates memory allocation for the intermediate list from `split()`
- Removes the overhead of iterating through lines and concatenating strings
- Reduces function call overhead from the generator expression

**Performance characteristics:**
- **Small text**: 60-90% faster due to reduced overhead
- **Large text with many lines**: Up to 518% faster (1000-line test case) because the optimization scales better with line count
- **Single lines**: Still 50-80% faster due to eliminated split/join operations

**Impact on workloads:** Based on `function_references`, this function is used in Bokeh's JavaScript code generation (`wrap_in_onload`, `wrap_in_safely`, `wrap_in_script_tag`). Since these likely run during web page rendering or visualization generation, the performance improvement will reduce latency in generating embedded Bokeh content, especially for complex visualizations with substantial JavaScript code.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 12, 2025 18:09
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant