⚡️ Speed up function `_ext_use_mathjax` by 2,554% #119

codeflash-ai · 2025-11-12T15:47:03Z

📄 2,554% (25.54x) speedup for `_ext_use_mathjax` in `src/bokeh/embed/bundle.py`

⏱️ Runtime : 86.8 milliseconds → 3.27 milliseconds (best of 250 runs)

📝 Explanation and details

The optimization significantly improves performance by precomputing a lookup map that groups models by their top-level package name, eliminating a costly nested loop structure.

Key optimization:

Before: For each object in all_objs, the code iterated through ALL models in HasProps.model_class_reverse_map.values() (~15,000+ models based on profiler data) and checked model.__module__.startswith(name) for each one
After: Models are preprocessed once into a dictionary model_map keyed by top-level package name, so the inner loop only processes relevant models for each package

Performance impact from profiler data:

The original nested loop executed 1.5 million iterations (1,500,520 hits on startswith check)
The optimized version reduces this to just 3,220 iterations in the inner loop
Total runtime drops from 650ms to 17.6ms in _query_extensions (37x faster)

Why this works:

The original startswith() check on every model is expensive when repeated across thousands of models
Dictionary lookup model_map.get(name, ()) is O(1) vs O(N) iteration
Most objects share the same top-level package names, so the precomputation amortizes well

Impact on workloads:
Based on function_references, this function is called by _use_mathjax() which determines MathJax requirements for Bokeh document embedding. The optimization particularly benefits scenarios with:

Large documents: Test cases show 70-90x speedups for 500+ objects
Many extension packages: Each unique package triggers model scanning, now much faster
Repeated calls: The optimization scales better as document complexity grows

The optimization maintains identical behavior while dramatically reducing computational complexity from O(objects × total_models) to O(objects + total_models).

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 34 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	88.2%

🌀 Generated Regression Tests and Runtime

from __future__ import annotations

from typing import Callable

# imports
import pytest  # used for our unit tests
from bokeh.embed.bundle import _ext_use_mathjax


# --- Minimal stubs to allow testing ---
class HasProps:
    # Simulate Bokeh's model_class_reverse_map
    model_class_reverse_map = {}

    def __init_subclass__(cls, **kwargs):
        # Register subclass in reverse map
        HasProps.model_class_reverse_map[cls.__name__] = cls

# Simulate bokeh.models.text.MathText
class MathText(HasProps):
    pass

# Simulate other model classes for test purposes
class SomeOtherModel(HasProps):
    pass

class AnotherModel(HasProps):
    pass

# Helper function to simulate __view_module__ attribute
def make_obj(cls, view_module):
    obj = cls.__new__(cls)
    obj.__view_module__ = view_module
    return obj
from bokeh.embed.bundle import _ext_use_mathjax

# --- Unit tests ---

# Basic Test Cases

def test_empty_set_returns_false():
    # No objects: should not require MathJax
    codeflash_output = _ext_use_mathjax(set()) # 4.45μs -> 85.6μs (94.8% slower)

def test_no_mathtext_objects_returns_false():
    # Only non-MathText objects
    obj1 = make_obj(SomeOtherModel, "external_pkg.models")
    obj2 = make_obj(AnotherModel, "external_pkg.models")
    codeflash_output = _ext_use_mathjax({obj1, obj2}) # 76.9μs -> 70.2μs (9.55% faster)

def test_mathtext_object_triggers_true():
    # At least one object from a module that includes MathText
    obj1 = make_obj(MathText, "external_pkg.models")
    codeflash_output = _ext_use_mathjax({obj1}) # 55.3μs -> 66.3μs (16.6% slower)

def test_mixed_objects_one_mathtext_triggers_true():
    # Mixed objects, one is MathText
    obj1 = make_obj(SomeOtherModel, "external_pkg.models")
    obj2 = make_obj(MathText, "external_pkg.models")
    codeflash_output = _ext_use_mathjax({obj1, obj2}) # 52.9μs -> 64.4μs (17.9% slower)

def test_multiple_mathtext_objects_only_one_needed():
    # Multiple MathText objects, should still return True
    obj1 = make_obj(MathText, "external_pkg.models")
    obj2 = make_obj(MathText, "external_pkg.models")
    codeflash_output = _ext_use_mathjax({obj1, obj2}) # 52.3μs -> 64.7μs (19.0% slower)

# Edge Test Cases

def test_object_with_implementation_skipped():
    # Object with __implementation__ should be skipped
    obj = make_obj(MathText, "external_pkg.models")
    obj.__implementation__ = "dummy"
    codeflash_output = _ext_use_mathjax({obj}) # 3.81μs -> 63.6μs (94.0% slower)

def test_object_from_bokeh_module_skipped():
    # Object from 'bokeh' module should be skipped
    obj = make_obj(MathText, "bokeh.models")
    codeflash_output = _ext_use_mathjax({obj}) # 3.92μs -> 64.0μs (93.9% slower)

def test_duplicate_module_names_only_checked_once():
    # Two objects from same module, only one check should occur
    obj1 = make_obj(SomeOtherModel, "external_pkg.models")
    obj2 = make_obj(MathText, "external_pkg.models")
    # Should still return True, even if module is duplicated
    codeflash_output = _ext_use_mathjax({obj1, obj2}) # 60.5μs -> 65.1μs (6.96% slower)

def test_object_with_nonstandard_view_module():
    # __view_module__ with unusual format
    obj = make_obj(MathText, "external_pkg")
    codeflash_output = _ext_use_mathjax({obj}) # 53.0μs -> 63.4μs (16.5% slower)

def test_object_with_view_module_with_extra_dots():
    # __view_module__ with extra dots
    obj = make_obj(MathText, "external_pkg.models.text")
    codeflash_output = _ext_use_mathjax({obj}) # 51.2μs -> 63.6μs (19.5% slower)

def test_no_models_in_module_returns_false():
    # Object from module with no registered models
    obj = make_obj(SomeOtherModel, "unknown_pkg.models")
    # Remove all models from 'unknown_pkg'
    # (simulate by ensuring no model class has __module__ starting with 'unknown_pkg')
    codeflash_output = _ext_use_mathjax({obj}) # 50.7μs -> 63.2μs (19.8% slower)

def test_non_mathtext_subclass_model_returns_false():
    # Add a model in a module, but not a MathText subclass
    class NotMathText(HasProps):
        pass
    obj = make_obj(NotMathText, "external_pkg.models")
    codeflash_output = _ext_use_mathjax({obj}) # 51.8μs -> 65.1μs (20.4% slower)

# Large Scale Test Cases

def test_large_number_of_non_mathtext_objects_returns_false():
    # Create 500 objects, none are MathText
    objs = {make_obj(SomeOtherModel, f"external_pkg{i}.models") for i in range(500)}
    codeflash_output = _ext_use_mathjax(objs) # 13.3ms -> 183μs (7191% faster)

def test_large_number_of_objects_one_mathtext_returns_true():
    # Create 499 non-MathText objects, and one MathText object
    objs = {make_obj(SomeOtherModel, f"external_pkg{i}.models") for i in range(499)}
    mathtext_obj = make_obj(MathText, "external_pkg500.models")
    objs.add(mathtext_obj)
    codeflash_output = _ext_use_mathjax(objs) # 13.3ms -> 183μs (7179% faster)

def test_many_modules_some_with_mathtext_returns_true():
    # Create 100 modules, half with MathText, half without
    objs = set()
    for i in range(50):
        objs.add(make_obj(MathText, f"ext_pkg{i}.models"))
    for i in range(50, 100):
        objs.add(make_obj(SomeOtherModel, f"ext_pkg{i}.models"))
    codeflash_output = _ext_use_mathjax(objs) # 2.70ms -> 97.2μs (2679% faster)

def test_many_modules_none_with_mathtext_returns_false():
    # Create 100 modules, none with MathText
    objs = {make_obj(SomeOtherModel, f"ext_pkg{i}.models") for i in range(100)}
    codeflash_output = _ext_use_mathjax(objs) # 2.68ms -> 91.0μs (2849% faster)

def test_performance_with_maximum_objects():
    # 999 objects, only last is MathText
    objs = {make_obj(SomeOtherModel, f"ext_pkg{i}.models") for i in range(998)}
    mathtext_obj = make_obj(MathText, "ext_pkg999.models")
    objs.add(mathtext_obj)
    codeflash_output = _ext_use_mathjax(objs) # 26.6ms -> 286μs (9204% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from __future__ import annotations

from typing import Callable

# imports
import pytest  # used for our unit tests
from bokeh.embed.bundle import _ext_use_mathjax


# Minimal HasProps stub for testing
class HasProps:
    model_class_reverse_map = {}

    def __init__(self):
        pass

# Simulate the MathText class for testing
class MathText(HasProps):
    pass

# Simulate other model classes
class NotMathText(HasProps):
    pass

class SubMathText(MathText):
    pass

class AnotherClass(HasProps):
    pass

# Simulate __view_module__ property for objects
class ObjWithViewModule(HasProps):
    def __init__(self, module_name, implementation=False):
        self.__view_module__ = module_name
        if implementation:
            self.__implementation__ = True

# Patch HasProps.model_class_reverse_map for tests
def patch_model_class_reverse_map(classes):
    HasProps.model_class_reverse_map = {cls.__name__: cls for cls in classes}
from bokeh.embed.bundle import _ext_use_mathjax

# -----------------
# unit tests
# -----------------

# Basic Test Cases

def test_empty_objs_returns_false():
    """Test with empty input set: should return False."""
    patch_model_class_reverse_map([MathText, NotMathText])
    codeflash_output = _ext_use_mathjax(set()) # 4.87μs -> 81.3μs (94.0% slower)

def test_no_mathtext_model_returns_false():
    """Test with objects not referencing MathText: should return False."""
    patch_model_class_reverse_map([NotMathText, AnotherClass])
    objs = {ObjWithViewModule("externalmod")}
    codeflash_output = _ext_use_mathjax(objs) # 72.3μs -> 72.9μs (0.751% slower)

def test_single_mathtext_model_returns_true():
    """Test with one object referencing MathText: should return True."""
    patch_model_class_reverse_map([MathText, NotMathText])
    objs = {ObjWithViewModule("externalmod")}
    codeflash_output = _ext_use_mathjax(objs) # 62.4μs -> 70.9μs (11.9% slower)

def test_multiple_objs_one_mathtext_returns_true():
    """Test with multiple objects, one referencing MathText: should return True."""
    patch_model_class_reverse_map([MathText, NotMathText, AnotherClass])
    objs = {ObjWithViewModule("externalmod"), ObjWithViewModule("anothermod")}
    codeflash_output = _ext_use_mathjax(objs) # 85.7μs -> 71.5μs (19.9% faster)

def test_multiple_objs_no_mathtext_returns_false():
    """Test with multiple objects, none referencing MathText: should return False."""
    patch_model_class_reverse_map([NotMathText, AnotherClass])
    objs = {ObjWithViewModule("externalmod"), ObjWithViewModule("anothermod")}
    codeflash_output = _ext_use_mathjax(objs) # 82.7μs -> 72.5μs (14.1% faster)

# Edge Test Cases

def test_obj_with_implementation_skipped():
    """Test that objects with __implementation__ are skipped."""
    patch_model_class_reverse_map([MathText, NotMathText])
    objs = {ObjWithViewModule("externalmod", implementation=True)}
    codeflash_output = _ext_use_mathjax(objs) # 4.29μs -> 71.9μs (94.0% slower)

def test_obj_with_bokeh_module_skipped():
    """Test that objects with __view_module__ 'bokeh' are skipped."""
    patch_model_class_reverse_map([MathText, NotMathText])
    objs = {ObjWithViewModule("bokeh")}
    codeflash_output = _ext_use_mathjax(objs) # 4.43μs -> 72.4μs (93.9% slower)

def test_duplicate_module_names():
    """Test that duplicate module names do not cause false positives."""
    patch_model_class_reverse_map([MathText, NotMathText])
    objs = {ObjWithViewModule("externalmod"), ObjWithViewModule("externalmod")}
    codeflash_output = _ext_use_mathjax(objs) # 68.0μs -> 72.4μs (5.97% slower)

def test_subclass_of_mathtext_returns_true():
    """Test that subclasses of MathText are detected."""
    patch_model_class_reverse_map([SubMathText, NotMathText])
    objs = {ObjWithViewModule("externalmod")}
    codeflash_output = _ext_use_mathjax(objs) # 60.1μs -> 71.8μs (16.3% slower)

def test_module_name_with_dot():
    """Test module names with dots are parsed correctly."""
    patch_model_class_reverse_map([MathText, NotMathText])
    objs = {ObjWithViewModule("externalmod.submod")}
    codeflash_output = _ext_use_mathjax(objs) # 60.4μs -> 70.0μs (13.6% slower)

def test_model_class_module_name_mismatch():
    """Test that models with module names not matching __view_module__ are ignored."""
    class OffModuleMathText(MathText):
        pass
    OffModuleMathText.__module__ = "othermod"
    patch_model_class_reverse_map([OffModuleMathText, NotMathText])
    objs = {ObjWithViewModule("externalmod")}
    codeflash_output = _ext_use_mathjax(objs) # 59.1μs -> 72.0μs (17.9% slower)

def test_model_class_module_name_partial_match():
    """Test that models with module names starting with __view_module__ are detected."""
    class PartialModuleMathText(MathText):
        pass
    PartialModuleMathText.__module__ = "externalmod.submod"
    patch_model_class_reverse_map([PartialModuleMathText, NotMathText])
    objs = {ObjWithViewModule("externalmod")}
    codeflash_output = _ext_use_mathjax(objs) # 59.2μs -> 74.1μs (20.1% slower)

def test_obj_with_missing_view_module_attribute():
    """Test objects missing __view_module__ attribute raise AttributeError."""
    patch_model_class_reverse_map([MathText, NotMathText])
    class BadObj(HasProps):
        pass
    objs = {BadObj()}
    with pytest.raises(AttributeError):
        _ext_use_mathjax(objs) # 5.18μs -> 72.8μs (92.9% slower)

# Large Scale Test Cases

def test_large_number_of_objs_and_models():
    """Test with a large number of objects and model classes."""
    # Create 500 dummy model classes, only one is a MathText subclass
    models = []
    for i in range(499):
        cls = type(f"DummyModel{i}", (HasProps,), {})
        cls.__module__ = f"mod{i}"
        models.append(cls)
    mathtext_cls = type("LargeMathText", (MathText,), {})
    mathtext_cls.__module__ = "bigmod"
    models.append(mathtext_cls)
    patch_model_class_reverse_map(models)
    # Create 500 objects, only one with bigmod
    objs = {ObjWithViewModule(f"mod{i}") for i in range(499)}
    objs.add(ObjWithViewModule("bigmod"))
    codeflash_output = _ext_use_mathjax(objs) # 13.4ms -> 190μs (6911% faster)

def test_large_number_of_objs_no_mathtext():
    """Test with a large number of objects, none referencing MathText."""
    models = []
    for i in range(500):
        cls = type(f"DummyModel{i}", (HasProps,), {})
        cls.__module__ = f"mod{i}"
        models.append(cls)
    patch_model_class_reverse_map(models)
    objs = {ObjWithViewModule(f"mod{i}") for i in range(500)}
    codeflash_output = _ext_use_mathjax(objs) # 13.4ms -> 187μs (7031% faster)

def test_large_number_of_duplicate_module_names():
    """Test with many objects having duplicate module names."""
    patch_model_class_reverse_map([MathText, NotMathText])
    objs = {ObjWithViewModule("externalmod") for _ in range(1000)}
    codeflash_output = _ext_use_mathjax(objs) # 161μs -> 187μs (13.9% slower)

def test_large_number_of_implementation_objs():
    """Test with many objects, all with __implementation__, should skip all."""
    patch_model_class_reverse_map([MathText, NotMathText])
    objs = {ObjWithViewModule("externalmod", implementation=True) for _ in range(1000)}
    codeflash_output = _ext_use_mathjax(objs) # 32.0μs -> 117μs (72.8% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_ext_use_mathjax-mhw6azz9 and push.

The optimization significantly improves performance by **precomputing a lookup map** that groups models by their top-level package name, eliminating a costly nested loop structure. **Key optimization:** - **Before**: For each object in `all_objs`, the code iterated through ALL models in `HasProps.model_class_reverse_map.values()` (~15,000+ models based on profiler data) and checked `model.__module__.startswith(name)` for each one - **After**: Models are preprocessed once into a dictionary `model_map` keyed by top-level package name, so the inner loop only processes relevant models for each package **Performance impact from profiler data:** - The original nested loop executed **1.5 million iterations** (1,500,520 hits on `startswith` check) - The optimized version reduces this to just **3,220 iterations** in the inner loop - Total runtime drops from 650ms to 17.6ms in `_query_extensions` (**37x faster**) **Why this works:** - The original `startswith()` check on every model is expensive when repeated across thousands of models - Dictionary lookup `model_map.get(name, ())` is O(1) vs O(N) iteration - Most objects share the same top-level package names, so the precomputation amortizes well **Impact on workloads:** Based on `function_references`, this function is called by `_use_mathjax()` which determines MathJax requirements for Bokeh document embedding. The optimization particularly benefits scenarios with: - **Large documents**: Test cases show 70-90x speedups for 500+ objects - **Many extension packages**: Each unique package triggers model scanning, now much faster - **Repeated calls**: The optimization scales better as document complexity grows The optimization maintains identical behavior while dramatically reducing computational complexity from O(objects × total_models) to O(objects + total_models).

codeflash-ai bot requested a review from mashraf-222 November 12, 2025 15:47

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `_ext_use_mathjax` by 2,554% #119

⚡️ Speed up function `_ext_use_mathjax` by 2,554% #119

Uh oh!

codeflash-ai bot commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function _ext_use_mathjax by 2,554% #119

Are you sure you want to change the base?

⚡️ Speed up function _ext_use_mathjax by 2,554% #119

Uh oh!

Conversation

codeflash-ai bot commented Nov 12, 2025

📄 2,554% (25.54x) speedup for _ext_use_mathjax in src/bokeh/embed/bundle.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `_ext_use_mathjax` by 2,554% #119

⚡️ Speed up function `_ext_use_mathjax` by 2,554% #119

📄 2,554% (25.54x) speedup for `_ext_use_mathjax` in `src/bokeh/embed/bundle.py`