Skip to content

⚡️ Speed up function speedup_critic by 15% in PR #555 (refinement) #557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: refinement
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Jul 17, 2025

⚡️ This pull request contains optimizations for PR #555

If you approve this dependent PR, these changes will be merged into the original PR branch refinement.

This PR will be automatically closed if the original PR is merged.


📄 15% (0.15x) speedup for speedup_critic in codeflash/result/critic.py

⏱️ Runtime : 1.84 milliseconds 1.60 milliseconds (best of 56 runs)

📝 Explanation and details

Here’s an optimized version that preserves all existing function signatures, logic, and return values but reduces unnecessary overhead, short-circuits early, and eliminates redundant object lookups and function calls.

Key Optimizations:

  • Use local variable binding early in get_pr_number to avoid repeated imports/GL lookups for get_cached_gh_event_data.
  • Inline the import of get_cached_gh_event_data once at the top—doing so locally in the function is much slower.
  • Use early returns in speedup_critic after fast checks to avoid unnecessary branches and function calls.
  • Remove unneeded bool() wrappers where the result is already bool.
  • Use direct access to already-imported functions instead of accessing via module (inlining env_utils.get_pr_number).

Summary:
All function return values and signatures are preserved. Redundant lookups are eliminated, external calls are reduced, and fast-path branches short-circuit unnecessary logic to reduce overall runtime and memory allocations. Comments are preserved unless the associated code was optimized.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 6 Passed
🌀 Generated Regression Tests 4018 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_critic.py::test_speedup_critic 3.69μs 3.48μs ✅6.07%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import json
import os
from functools import lru_cache
from pathlib import Path
from typing import Any, Optional

# imports
import pytest  # used for our unit tests
from codeflash.result.critic import speedup_critic

# Dummy MIN_IMPROVEMENT_THRESHOLD for testing
MIN_IMPROVEMENT_THRESHOLD = 0.01

# Dummy env_utils for testing
class DummyEnvUtils:
    _pr_number = None

    @classmethod
    def set_pr_number(cls, value):
        cls._pr_number = value

    @classmethod
    def get_pr_number(cls):
        return cls._pr_number

# Patch env_utils to our dummy for tests
env_utils = DummyEnvUtils

# Dummy OptimizedCandidateResult for testing
class OptimizedCandidateResult:
    def __init__(self, best_test_runtime: int):
        self.best_test_runtime = best_test_runtime
from codeflash.result.critic import speedup_critic





def test_basic_negative_improvement():
    """
    Test case where the optimization is actually slower.
    """
    orig = 100_000
    opt = 120_000
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.88μs -> 1.61μs (16.8% faster)

def test_best_runtime_until_now_none_vs_value():
    """
    Test that best_runtime_until_now is handled correctly.
    """
    orig = 100_000
    opt = 80_000
    candidate = OptimizedCandidateResult(opt)
    # best_runtime_until_now is None, so only perf_gain > noise_floor matters
    codeflash_output = speedup_critic(candidate, orig, None) # 1.81μs -> 1.62μs (11.8% faster)
    # best_runtime_until_now is worse (higher), so should still be True
    codeflash_output = speedup_critic(candidate, orig, 90_000) # 942ns -> 832ns (13.2% faster)
    # best_runtime_until_now is better (lower), so should be False
    codeflash_output = speedup_critic(candidate, orig, 70_000) # 601ns -> 460ns (30.7% faster)

def test_disable_gh_action_noise_flag():
    """
    Test that disabling the GH Action noise disables the noise floor doubling.
    """
    orig = 5_000  # < 10_000, so noise_floor = 3*MIN_IMPROVEMENT_THRESHOLD
    opt = 4_950  # Less than 1% improvement
    candidate = OptimizedCandidateResult(opt)
    env_utils.set_pr_number(123)  # Simulate GH Actions
    # With disable_gh_action_noise, should use lower noise floor
    codeflash_output = speedup_critic(candidate, orig, None, disable_gh_action_noise=True) # 1.79μs -> 1.80μs (0.555% slower)
    # Without disabling, noise floor is doubled, so improvement is not enough
    codeflash_output = speedup_critic(candidate, orig, None, disable_gh_action_noise=False) # 1.44μs -> 1.22μs (18.0% faster)

# ------------------------
# Edge Test Cases
# ------------------------



def test_edge_both_runtimes_zero():
    """
    Test edge case where both runtimes are zero.
    """
    orig = 0
    opt = 0
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.67μs -> 1.51μs (10.6% faster)

def test_edge_runtime_just_below_noise_floor():
    """
    Test where improvement is just below the noise floor.
    """
    orig = 10_000
    # MIN_IMPROVEMENT_THRESHOLD = 0.01, so need perf_gain <= 0.01 to fail
    opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD)) + 1  # slightly worse than threshold
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.64μs -> 1.45μs (13.1% faster)

def test_edge_runtime_just_above_noise_floor():
    """
    Test where improvement is just above the noise floor.
    """
    orig = 10_000
    # MIN_IMPROVEMENT_THRESHOLD = 0.01, so need perf_gain > 0.01 to pass
    opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD))  # just at threshold
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.64μs -> 1.30μs (26.1% faster)

def test_edge_small_original_runtime_noise_floor():
    """
    Test noise floor logic when original runtime is below 10_000.
    """
    orig = 9_000  # < 10_000
    # noise_floor = 3*MIN_IMPROVEMENT_THRESHOLD = 0.03
    opt = int(orig / (1 + 0.03))  # just at threshold
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.80μs -> 1.64μs (9.80% faster)

def test_edge_github_actions_noise_floor():
    """
    Test that noise floor is doubled in GH Actions mode.
    """
    orig = 9_000
    # noise_floor = 3*MIN_IMPROVEMENT_THRESHOLD = 0.03, doubled = 0.06
    env_utils.set_pr_number(42)
    opt = int(orig / (1 + 0.06))  # just at threshold
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.84μs -> 1.56μs (18.0% faster)
    # Now, make improvement just below doubled noise floor
    opt = int(orig / (1 + 0.059)) + 1
    candidate = OptimizedCandidateResult(opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 872ns -> 822ns (6.08% faster)

def test_edge_best_runtime_equal_to_candidate():
    """
    Test when best_runtime_until_now equals candidate's best_test_runtime.
    """
    orig = 100_000
    opt = 80_000
    candidate = OptimizedCandidateResult(opt)
    # perf_gain is above threshold, but candidate is not better than best_runtime_until_now
    codeflash_output = speedup_critic(candidate, orig, 80_000) # 1.88μs -> 1.69μs (11.2% faster)

# ------------------------
# Large Scale Test Cases
# ------------------------


def test_large_scale_runtime_near_threshold():
    """
    Test with many candidates whose improvements are near the threshold.
    """
    orig = 1_000_000
    threshold = MIN_IMPROVEMENT_THRESHOLD
    # Candidate runtimes just at and just below the threshold
    opt_just_above = int(orig / (1 + threshold))
    opt_just_below = int(orig / (1 + threshold)) + 2
    candidates = [OptimizedCandidateResult(opt_just_above) for _ in range(500)]
    candidates += [OptimizedCandidateResult(opt_just_below) for _ in range(500)]
    for i, candidate in enumerate(candidates):
        if i < 500:
            codeflash_output = speedup_critic(candidate, orig, None)
        else:
            codeflash_output = speedup_critic(candidate, orig, None)

def test_large_scale_github_actions_mode():
    """
    Test large scale with GH Actions noise floor enabled.
    """
    env_utils.set_pr_number(100)
    orig = 1_000_000
    # noise_floor = 0.01 * 2 = 0.02
    opt_just_above = int(orig / (1 + 0.021))
    opt_just_below = int(orig / (1 + 0.019)) + 1
    candidates = [OptimizedCandidateResult(opt_just_above) for _ in range(500)]
    candidates += [OptimizedCandidateResult(opt_just_below) for _ in range(500)]
    for i, candidate in enumerate(candidates):
        if i < 500:
            codeflash_output = speedup_critic(candidate, orig, None)
        else:
            codeflash_output = speedup_critic(candidate, orig, None)



from __future__ import annotations

import json
import os
from functools import lru_cache
from pathlib import Path
from typing import Any, Optional

# imports
import pytest
from codeflash.result.critic import speedup_critic

# --- Begin stubs and constants for isolated testing ---
# Since we don't have the actual codeflash modules, we define minimal stubs/mocks for testing

# Simulate the MIN_IMPROVEMENT_THRESHOLD constant
MIN_IMPROVEMENT_THRESHOLD = 0.01  # 1% improvement threshold

# Simulate OptimizedCandidateResult dataclass
class OptimizedCandidateResult:
    def __init__(self, best_test_runtime: int):
        self.best_test_runtime = best_test_runtime
from codeflash.result.critic import speedup_critic

# unit tests

# --- Basic Test Cases ---















def test_edge_disable_gh_action_noise_small_runtime(monkeypatch):
    """Test: disable_gh_action_noise disables doubling even for small runtimes."""
    monkeypatch.setattr(env_utils, "get_pr_number", lambda: 123)
    orig = 9_000
    opt = int(orig * 0.96)  # 4% improvement, noise floor is 3% (not doubled)
    candidate = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(candidate, orig, None, disable_gh_action_noise=True)

# --- Large Scale Test Cases ---

@pytest.mark.parametrize("orig,opt,prev_best,expected", [
    # 1000 cases, all above threshold
    (100_000, 90_000, None, True),
    (100_000, 89_000, 90_000, True),
    (100_000, 89_000, 89_500, True),
    (100_000, 95_000, 94_000, False),  # Not better than prev_best
    (100_000, 99_500, None, False),    # Below threshold
])
def test_large_scale_varied(monkeypatch, orig, opt, prev_best, expected):
    """Test: Large scale, multiple scenarios in one parametrized test."""
    monkeypatch.setattr(env_utils, "get_pr_number", lambda: None)
    candidate = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(candidate, orig, prev_best)

def test_large_scale_many_candidates(monkeypatch):
    """Test: Simulate a batch of 500 candidates, all with small random improvements."""
    import random
    monkeypatch.setattr(env_utils, "get_pr_number", lambda: None)
    orig = 100_000
    # Generate 500 candidates with improvement between 0.5% and 2%
    count_pass = 0
    for i in range(500):
        improvement = random.uniform(0.005, 0.02)
        opt = int(orig * (1 - improvement))
        candidate = OptimizedCandidateResult(best_test_runtime=opt)
        codeflash_output = speedup_critic(candidate, orig, None); result = codeflash_output
        # Should only pass if improvement > MIN_IMPROVEMENT_THRESHOLD (1%)
        if improvement > MIN_IMPROVEMENT_THRESHOLD:
            count_pass += 1
        else:
            pass

To edit these changes git checkout codeflash/optimize-pr555-2025-07-17T21.19.50 and push.

Codeflash

Here’s an optimized version that preserves all existing function signatures, logic, and return values but reduces unnecessary overhead, short-circuits early, and eliminates redundant object lookups and function calls.

**Key Optimizations:**
- Use local variable binding early in `get_pr_number` to avoid repeated imports/GL lookups for `get_cached_gh_event_data`.
- Inline the import of `get_cached_gh_event_data` once at the top—doing so locally in the function is much slower.
- Use early returns in `speedup_critic` after fast checks to avoid unnecessary branches and function calls.
- Remove unneeded bool() wrappers where the result is already bool.
- Use direct access to already-imported functions instead of accessing via module (inlining `env_utils.get_pr_number`).



**Summary**:  
All function return values and signatures are preserved. Redundant lookups are eliminated, external calls are reduced, and fast-path branches short-circuit unnecessary logic to reduce overall runtime and memory allocations. Comments are preserved unless the associated code was optimized.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 17, 2025
@codeflash-ai codeflash-ai bot mentioned this pull request Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants