Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 12, 2025

📄 23% (0.23x) speedup for _signature in src/bokeh/util/token.py

⏱️ Runtime : 4.82 milliseconds 3.91 milliseconds (best of 125 runs)

📝 Explanation and details

The optimization achieves a 23% speedup by replacing higher-level codecs module functions with direct string/bytes method calls, eliminating unnecessary function lookups and indirection.

Key optimizations applied:

  1. Direct string encoding: Replaced codecs.encode(secret_key, 'utf-8') with secret_key.encode('utf-8') in _ensure_bytes(). This eliminates the overhead of looking up the codecs.encode function and its internal dispatch logic.

  2. Direct bytes decoding: Replaced codecs.decode(base64.urlsafe_b64encode(...), 'ascii') with base64.urlsafe_b64encode(...).decode('ascii') in _base64_encode(). This removes an extra layer of function indirection.

  3. Streamlined type handling: In _base64_encode(), the input conversion is now a single inline conditional expression instead of calling _ensure_bytes(), reducing function call overhead.

  4. Consistent direct encoding: In _signature(), replaced codecs.encode(base_id, "utf-8") with base_id.encode('utf-8') for consistency.

Why this matters for performance:

  • The codecs module functions are generic and handle many encoding types, adding dispatch overhead
  • Direct method calls on strings/bytes objects are faster as they bypass this generic layer
  • These functions are called in hot paths - session ID generation, JWT token creation, and signature validation happen frequently in web applications

Impact on workloads:
Based on the function references, _signature() is called during:

  • Session ID generation for each browser tab connection
  • JWT token creation and validation
  • Session signature verification on every request

The test results show 16-48% improvements across various input types, with the optimization being particularly effective for ASCII strings (48% faster) and moderate improvements for Unicode/large inputs (30-40% faster). This makes the optimization valuable for typical web application workloads where session management happens frequently.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 22 Passed
🌀 Generated Regression Tests 1633 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
unit/bokeh/util/test_token.py::TestSessionId.test_signature 16.4μs 14.1μs 16.6%✅
🌀 Generated Regression Tests and Runtime
import base64
import codecs
import hashlib
import hmac

# imports
import pytest  # used for our unit tests
from bokeh.util.token import _signature

# unit tests

# ---------------------------
# Basic Test Cases
# ---------------------------




def test_signature_is_deterministic():
    # The same inputs always produce the same output
    base_id = "abc123"
    secret_key = "key"
    codeflash_output = _signature(base_id, secret_key); sig1 = codeflash_output # 18.8μs -> 15.6μs (20.3% faster)
    codeflash_output = _signature(base_id, secret_key); sig2 = codeflash_output # 4.38μs -> 3.62μs (21.1% faster)

def test_signature_differs_with_different_base_id():
    # Different base_id yields different signature
    base_id1 = "id1"
    base_id2 = "id2"
    secret_key = "key"
    codeflash_output = _signature(base_id1, secret_key); sig1 = codeflash_output # 9.44μs -> 8.08μs (16.9% faster)
    codeflash_output = _signature(base_id2, secret_key); sig2 = codeflash_output # 3.81μs -> 3.27μs (16.5% faster)

def test_signature_differs_with_different_secret_key():
    # Different secret_key yields different signature
    base_id = "id"
    secret_key1 = "key1"
    secret_key2 = "key2"
    codeflash_output = _signature(base_id, secret_key1); sig1 = codeflash_output # 8.82μs -> 7.53μs (17.2% faster)
    codeflash_output = _signature(base_id, secret_key2); sig2 = codeflash_output # 3.69μs -> 3.10μs (19.2% faster)

# ---------------------------
# Edge Test Cases
# ---------------------------




def test_signature_with_none_secret_key_raises():
    # secret_key=None should raise AssertionError
    base_id = "id"
    with pytest.raises(AssertionError):
        _signature(base_id, None) # 2.98μs -> 1.40μs (113% faster)







def test_signature_performance_many_calls():
    # Run 1000 signatures to check performance and determinism
    base_id = "id"
    secret_key = "key"
    sigs = set()
    for i in range(1000):
        codeflash_output = _signature(base_id + str(i), secret_key); sig = codeflash_output # 2.85ms -> 2.32ms (22.9% faster)
        sigs.add(sig)


def test_signature_padding_removed():
    # The output should never have '=' padding at the end
    base_id = "padtest"
    secret_key = "padkey"
    codeflash_output = _signature(base_id, secret_key); sig = codeflash_output # 18.6μs -> 15.5μs (20.0% faster)

def test_signature_output_is_ascii():
    # The signature should only contain URL-safe base64 characters
    base_id = "ascii"
    secret_key = "ascii"
    codeflash_output = _signature(base_id, secret_key); sig = codeflash_output # 10.5μs -> 8.49μs (23.8% faster)
    for c in sig:
        pass

def test_signature_changes_with_case():
    # Changing the case of base_id or secret_key changes the signature
    base_id = "CaseTest"
    secret_key = "Secret"
    codeflash_output = _signature(base_id, secret_key); sig1 = codeflash_output # 9.34μs -> 8.43μs (10.9% faster)
    codeflash_output = _signature(base_id.lower(), secret_key); sig2 = codeflash_output # 4.03μs -> 3.37μs (19.6% faster)
    codeflash_output = _signature(base_id, secret_key.lower()); sig3 = codeflash_output # 2.97μs -> 2.55μs (16.8% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import base64
import codecs
import hashlib
import hmac

# imports
import pytest
from bokeh.util.token import _signature

# unit tests

# --- Basic Test Cases ---

def test_signature_basic_ascii():
    # Basic test with ascii string and ascii key
    base_id = "test"
    secret_key = "key"
    # Compute expected signature manually
    expected = base64.urlsafe_b64encode(
        hmac.new(b"key", b"test", hashlib.sha256).digest()
    ).decode("ascii").rstrip("=")
    codeflash_output = _signature(base_id, secret_key) # 7.13μs -> 4.82μs (48.0% faster)

def test_signature_bytes_key():
    # Test with bytes secret key
    base_id = "hello"
    secret_key = b"mysecret"
    expected = base64.urlsafe_b64encode(
        hmac.new(b"mysecret", b"hello", hashlib.sha256).digest()
    ).decode("ascii").rstrip("=")
    codeflash_output = _signature(base_id, secret_key) # 5.83μs -> 4.20μs (39.0% faster)

def test_signature_unicode_base_id():
    # Test with unicode base_id
    base_id = "héllo世界"
    secret_key = "unicodekey"
    expected = base64.urlsafe_b64encode(
        hmac.new(b"unicodekey", base_id.encode("utf-8"), hashlib.sha256).digest()
    ).decode("ascii").rstrip("=")
    codeflash_output = _signature(base_id, secret_key) # 5.98μs -> 4.31μs (38.6% faster)

def test_signature_empty_base_id():
    # Test with empty base_id
    base_id = ""
    secret_key = "empty"
    expected = base64.urlsafe_b64encode(
        hmac.new(b"empty", b"", hashlib.sha256).digest()
    ).decode("ascii").rstrip("=")
    codeflash_output = _signature(base_id, secret_key) # 5.71μs -> 4.16μs (37.3% faster)

def test_signature_empty_secret_key():
    # Test with empty secret_key
    base_id = "something"
    secret_key = ""
    expected = base64.urlsafe_b64encode(
        hmac.new(b"", b"something", hashlib.sha256).digest()
    ).decode("ascii").rstrip("=")
    codeflash_output = _signature(base_id, secret_key) # 5.85μs -> 4.09μs (43.1% faster)

# --- Edge Test Cases ---

def test_signature_secret_key_none():
    # Should raise AssertionError if secret_key is None
    with pytest.raises(AssertionError):
        _signature("data", None) # 1.67μs -> 1.09μs (53.1% faster)

def test_signature_base_id_non_ascii_bytes_key():
    # base_id is unicode, key is bytes with non-ascii values
    base_id = "emoji😊"
    secret_key = b"\xff\xfe\xfd"
    expected = base64.urlsafe_b64encode(
        hmac.new(b"\xff\xfe\xfd", base_id.encode("utf-8"), hashlib.sha256).digest()
    ).decode("ascii").rstrip("=")
    codeflash_output = _signature(base_id, secret_key) # 5.95μs -> 4.22μs (41.0% faster)

def test_signature_long_secret_key():
    # Very long secret key
    base_id = "short"
    secret_key = "a" * 512
    expected = base64.urlsafe_b64encode(
        hmac.new(b"a" * 512, b"short", hashlib.sha256).digest()
    ).decode("ascii").rstrip("=")
    codeflash_output = _signature(base_id, secret_key) # 6.64μs -> 4.81μs (37.9% faster)

def test_signature_long_base_id():
    # Very long base_id
    base_id = "b" * 512
    secret_key = "key"
    expected = base64.urlsafe_b64encode(
        hmac.new(b"key", b"b" * 512, hashlib.sha256).digest()
    ).decode("ascii").rstrip("=")
    codeflash_output = _signature(base_id, secret_key) # 6.25μs -> 4.47μs (39.8% faster)

def test_signature_secret_key_bytes_vs_str_equivalence():
    # str and bytes keys with same value should produce same result
    base_id = "abc"
    key_str = "samekey"
    key_bytes = b"samekey"
    codeflash_output = _signature(base_id, key_str) # 8.28μs -> 6.85μs (20.8% faster)

def test_signature_base_id_with_null_bytes():
    # base_id contains null bytes
    base_id = "abc\x00def"
    secret_key = "key"
    expected = base64.urlsafe_b64encode(
        hmac.new(b"key", b"abc\x00def", hashlib.sha256).digest()
    ).decode("ascii").rstrip("=")
    codeflash_output = _signature(base_id, secret_key) # 5.81μs -> 4.14μs (40.2% faster)

def test_signature_secret_key_with_null_bytes():
    # secret_key contains null bytes
    base_id = "test"
    secret_key = b"key\x00withnull"
    expected = base64.urlsafe_b64encode(
        hmac.new(b"key\x00withnull", b"test", hashlib.sha256).digest()
    ).decode("ascii").rstrip("=")
    codeflash_output = _signature(base_id, secret_key) # 5.40μs -> 3.99μs (35.3% faster)

# --- Large Scale Test Cases ---

def test_signature_large_base_id_and_key():
    # Large base_id and key (up to 1000 chars)
    base_id = "x" * 1000
    secret_key = "y" * 1000
    expected = base64.urlsafe_b64encode(
        hmac.new(b"y" * 1000, b"x" * 1000, hashlib.sha256).digest()
    ).decode("ascii").rstrip("=")
    codeflash_output = _signature(base_id, secret_key) # 7.49μs -> 5.75μs (30.2% faster)

def test_signature_many_unique_inputs():
    # Test many unique combinations to check determinism and uniqueness
    key = "fixedkey"
    results = set()
    for i in range(100):
        base_id = f"id_{i}"
        codeflash_output = _signature(base_id, key); sig = codeflash_output # 292μs -> 238μs (22.8% faster)
        results.add(sig)

def test_signature_performance_large_batch():
    # Test that function can handle a batch of 500 signatures quickly
    key = "batchkey"
    for i in range(500):
        base_id = "item" + str(i)
        codeflash_output = _signature(base_id, key); sig = codeflash_output # 1.42ms -> 1.15ms (23.5% faster)

def test_signature_output_length():
    # The output length should be 43 or 44 chars (base64url of 32 bytes, unpadded)
    base_id = "sample"
    secret_key = "lenkey"
    codeflash_output = _signature(base_id, secret_key); sig = codeflash_output # 11.6μs -> 9.84μs (17.5% faster)

def test_signature_output_charset():
    # Output should only contain base64url-safe chars
    base_id = "check"
    secret_key = "safe"
    codeflash_output = _signature(base_id, secret_key); sig = codeflash_output # 8.47μs -> 7.26μs (16.6% faster)
    allowed = set("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_")

# --- Determinism Test Case ---

def test_signature_determinism():
    # The same input always produces the same output
    base_id = "repeat"
    secret_key = "repeatkey"
    codeflash_output = _signature(base_id, secret_key); sig1 = codeflash_output # 8.38μs -> 7.41μs (13.0% faster)
    codeflash_output = _signature(base_id, secret_key); sig2 = codeflash_output # 3.78μs -> 3.09μs (22.3% faster)

# --- Mutation Testing Catchers ---

def test_signature_changes_with_base_id():
    # Changing base_id should change the signature
    key = "mutkey"
    codeflash_output = _signature("foo", key); sig1 = codeflash_output # 8.34μs -> 7.10μs (17.4% faster)
    codeflash_output = _signature("bar", key); sig2 = codeflash_output # 3.95μs -> 3.40μs (16.2% faster)

def test_signature_changes_with_secret_key():
    # Changing secret_key should change the signature
    base_id = "foobar"
    codeflash_output = _signature(base_id, "key1"); sig1 = codeflash_output # 8.18μs -> 7.07μs (15.7% faster)
    codeflash_output = _signature(base_id, "key2"); sig2 = codeflash_output # 3.86μs -> 3.21μs (20.1% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from bokeh.util.token import _signature
import pytest

def test__signature():
    with pytest.raises(TypeError, match="a\\ bytes\\-like\\ object\\ is\\ required,\\ not\\ 'SymbolicBytes'"):
        _signature('', b'')

def test__signature_2():
    with pytest.raises(AssertionError):
        _signature('', None)
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_sstvtaha/tmpoxdl72nn/test_concolic_coverage.py::test__signature_2 2.97μs 1.37μs 117%✅

To edit these changes git checkout codeflash/optimize-_signature-mhw7idzv and push.

Codeflash Static Badge

The optimization achieves a **23% speedup** by replacing higher-level `codecs` module functions with direct string/bytes method calls, eliminating unnecessary function lookups and indirection.

**Key optimizations applied:**

1. **Direct string encoding**: Replaced `codecs.encode(secret_key, 'utf-8')` with `secret_key.encode('utf-8')` in `_ensure_bytes()`. This eliminates the overhead of looking up the `codecs.encode` function and its internal dispatch logic.

2. **Direct bytes decoding**: Replaced `codecs.decode(base64.urlsafe_b64encode(...), 'ascii')` with `base64.urlsafe_b64encode(...).decode('ascii')` in `_base64_encode()`. This removes an extra layer of function indirection.

3. **Streamlined type handling**: In `_base64_encode()`, the input conversion is now a single inline conditional expression instead of calling `_ensure_bytes()`, reducing function call overhead.

4. **Consistent direct encoding**: In `_signature()`, replaced `codecs.encode(base_id, "utf-8")` with `base_id.encode('utf-8')` for consistency.

**Why this matters for performance:**
- The `codecs` module functions are generic and handle many encoding types, adding dispatch overhead
- Direct method calls on strings/bytes objects are faster as they bypass this generic layer
- These functions are called in **hot paths** - session ID generation, JWT token creation, and signature validation happen frequently in web applications

**Impact on workloads:**
Based on the function references, `_signature()` is called during:
- Session ID generation for each browser tab connection
- JWT token creation and validation 
- Session signature verification on every request

The test results show **16-48% improvements** across various input types, with the optimization being particularly effective for ASCII strings (48% faster) and moderate improvements for Unicode/large inputs (30-40% faster). This makes the optimization valuable for typical web application workloads where session management happens frequently.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 12, 2025 16:20
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant