Skip to content

Conversation

@mohammedahmed18
Copy link
Contributor

@mohammedahmed18 mohammedahmed18 commented Dec 18, 2025

User description

fixes CF-938


PR Type

Enhancement


Description

  • Introduce effort-based optimization controls

  • Replace fixed constants with dynamic effort values

  • Add CLI flag for effort selection

  • Align API payload keys and defaults


Diagram Walkthrough

flowchart LR
  CLI["CLI --effort flag"] -- "sets effort level" --> Optimizer["Function optimizer"]
  Config["Effort config (values, keys)"] -- "get_effort_value()" --> Optimizer
  Optimizer -- "n_candidates / tests / repairs / refine" --> AISvc["AI service payloads"]
  AISvc -- "n_candidates / n_candidates_lp" --> Server["Backend endpoints"]
Loading

File Walkthrough

Relevant files
Enhancement
aiservice.py
Align API payload keys and candidate defaults                       

codeflash/api/aiservice.py

  • Rename payload keys to n_candidates and n_candidates_lp
  • Remove deprecated effective candidate constants from payload
  • Adjust default LP candidates to 8
  • Maintain metadata fields unchanged
+3/-6     
cli.py
Add CLI effort level option                                                           

codeflash/cli_cmds/cli.py

  • Add --effort CLI flag
  • Provide choices: low, medium, high
  • Default effort level set to medium
+3/-0     
config_consts.py
Introduce effort-based configuration system                           

codeflash/code_utils/config_consts.py

  • Remove many fixed candidate/test constants
  • Add EffortLevel, EffortKeys, and values map
  • Implement get_effort_value helper
  • Keep effective looping time and review limit
+50/-19 
git_utils.py
Remove worktree utilities and tidy imports                             

codeflash/code_utils/git_utils.py

  • Remove unused worktree management helpers
  • Drop dependency on candidate count constant
  • Cleanup unused imports
+0/-34   
function_optimizer.py
Apply effort-based tuning throughout optimizer                     

codeflash/optimization/function_optimizer.py

  • Replace constants with get_effort_value across flows
  • Effort-based counts for candidates, LP, tests, repairs
  • Effort-driven refinement thresholds and top selection
  • Update API calls to pass effort-derived values
+20/-17 

@github-actions
Copy link

github-actions bot commented Dec 18, 2025

PR Reviewer Guide 🔍

(Review updated until commit 3e20a37)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The dict literal for effort values mixes key types: other entries use .value when indexing by EffortKeys, but EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT is used directly without .value. This likely makes get_effort_value fail to find the key at runtime for that entry.

    EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 3, EffortLevel.HIGH: 4},
}
Type Hint

EFFORT_VALUES and get_effort_value use any rather than Any. This will not type-check and may fail linting; prefer typing.Any.

EFFORT_VALUES: dict[str, dict[EffortLevel, any]] = {
    EffortKeys.N_OPTIMIZER_CANDIDATES.value: {EffortLevel.LOW: 3, EffortLevel.MEDIUM: 4, EffortLevel.HIGH: 5},
    EffortKeys.N_OPTIMIZER_LP_CANDIDATES.value: {EffortLevel.LOW: 3, EffortLevel.MEDIUM: 5, EffortLevel.HIGH: 6},
    # we don't use effort with generated tests for now
    EffortKeys.N_GENERATED_TESTS.value: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 2, EffortLevel.HIGH: 2},
    # maximum number of repairs we will do for each function
    EffortKeys.MAX_CODE_REPAIRS_PER_TRACE.value: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 4, EffortLevel.HIGH: 5},
    # if the percentage of unmatched tests is greater than this, we won't fix it (lowering this value makes the repair more stricted)
    # on the low effort we lower the limit to 20% to be more strict (less repairs)
    EffortKeys.REPAIR_UNMATCHED_PERCENTAGE_LIMIT.value: {
        EffortLevel.LOW: 0.2,
        EffortLevel.MEDIUM: 0.4,
        EffortLevel.HIGH: 0.5,
    },
    # when valid optimizations count is N or less, refine all optimizations
    EffortKeys.REFINE_ALL_THRESHOLD.value: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 3, EffortLevel.HIGH: 4},
    # Top valid candidates percentage for refinements
    EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 3, EffortLevel.HIGH: 4},
}


def get_effort_value(key: EffortKeys, effort: EffortLevel) -> any:
    key_str = key.value
    if key_str in EFFORT_VALUES:
        if effort in EFFORT_VALUES[key_str]:
            return EFFORT_VALUES[key_str][effort]
        msg = f"Invalid effort level: {effort}"
        raise ValueError(msg)
    msg = f"Invalid key: {key_str}"
    raise ValueError(msg)
Logic Change

Refinement selection switched from percentage-based to fixed top-N from effort config. Validate that low/medium/high values produce intended proportions across different candidate counts to avoid refining too few/many items.

top_n_candidates = int(
    min(
        get_effort_value(EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT, self.args.effort),
        len(self.all_refinements_data),
    )
)

if top_n_candidates == len(self.all_refinements_data) or len(self.all_refinements_data) <= get_effort_value(
    EffortKeys.REFINE_ALL_THRESHOLD, self.args.effort
):
    for data in self.all_refinements_data:
        future_refinements.append(self.refine_optimizations([data]))  # noqa: PERF401
else:
    diff_lens_list = []
    runtimes_list = []
    for c in self.all_refinements_data:
        diff_lens_list.append(diff_length(c.original_source_code, c.optimized_source_code))
        runtimes_list.append(c.optimized_code_runtime)

    runtime_w, diff_w = REFINED_CANDIDATE_RANKING_WEIGHTS
    weights = choose_weights(runtime=runtime_w, diff=diff_w)

    runtime_norm = normalize_by_max(runtimes_list)
    diffs_norm = normalize_by_max(diff_lens_list)
    # the lower the better
    score_dict = create_score_dictionary_from_metrics(weights, runtime_norm, diffs_norm)
    top_indecies = sorted(score_dict, key=score_dict.get)[:top_n_candidates]

    for idx in top_indecies:
        data = self.all_refinements_data[idx]
        future_refinements.append(self.refine_optimizations([data]))

@github-actions
Copy link

github-actions bot commented Dec 18, 2025

PR Code Suggestions ✨

Latest suggestions up to 3e20a37
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Maintain API payload compatibility

The API payload key was renamed from num_variants to n_candidates. Ensure the server
expects this new key, or include both for backward compatibility to avoid breaking
requests. This prevents request failures due to mismatched parameter names.

codeflash/api/aiservice.py [140-152]

 payload = {
     "source_code": source_code,
     "dependency_code": dependency_code,
+    "num_variants": num_candidates,  # backward compatibility
     "n_candidates": num_candidates,
     "trace_id": trace_id,
     "python_version": platform.python_version(),
     "experiment_metadata": experiment_metadata,
     "codeflash_version": codeflash_version,
     "current_username": get_last_commit_author_if_pr_exists(None),
     "repo_owner": git_repo_owner,
     "repo_name": git_repo_name,
     "is_async": is_async,
 }
Suggestion importance[1-10]: 7

__

Why: The PR changes payload key to n_candidates; adding num_variants alongside it can prevent breaking older servers. It's accurate to the diff and improves backward compatibility, though not strictly necessary if server is updated.

Medium
Fix invalid typing and keys

The EFFORT_VALUES annotation uses any which is undefined and weakly typed, risking
runtime/type errors. Use Any from typing and ensure key presence by referencing
.value consistently. This prevents NameError and improves static/type checking.

codeflash/code_utils/config_consts.py [33-67]

+from typing import Any
+
 class EffortKeys(StrEnum):
     N_OPTIMIZER_CANDIDATES = auto()
     N_OPTIMIZER_LP_CANDIDATES = auto()
     N_GENERATED_TESTS = auto()
     MAX_CODE_REPAIRS_PER_TRACE = auto()
     REPAIR_UNMATCHED_PERCENTAGE_LIMIT = auto()
     REFINE_ALL_THRESHOLD = auto()
     TOP_VALID_CANDIDATES_FOR_REFINEMENT = auto()
 
-...
-EFFORT_VALUES: dict[str, dict[EffortLevel, any]] = {
+EFFORT_VALUES: dict[str, dict[EffortLevel, Any]] = {
     EffortKeys.N_OPTIMIZER_CANDIDATES.value: {EffortLevel.LOW: 3, EffortLevel.MEDIUM: 4, EffortLevel.HIGH: 5},
     EffortKeys.N_OPTIMIZER_LP_CANDIDATES.value: {EffortLevel.LOW: 3, EffortLevel.MEDIUM: 5, EffortLevel.HIGH: 6},
-    # we don't use effort with generated tests for now
     EffortKeys.N_GENERATED_TESTS.value: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 2, EffortLevel.HIGH: 2},
-    # maximum number of repairs we will do for each function
     EffortKeys.MAX_CODE_REPAIRS_PER_TRACE.value: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 4, EffortLevel.HIGH: 5},
-    # if the percentage of unmatched tests is greater than this, we won't fix it (lowering this value makes the repair more stricted)
-    # on the low effort we lower the limit to 20% to be more strict (less repairs)
     EffortKeys.REPAIR_UNMATCHED_PERCENTAGE_LIMIT.value: {
         EffortLevel.LOW: 0.2,
         EffortLevel.MEDIUM: 0.4,
         EffortLevel.HIGH: 0.5,
     },
-    # when valid optimizations count is N or less, refine all optimizations
     EffortKeys.REFINE_ALL_THRESHOLD.value: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 3, EffortLevel.HIGH: 4},
-    # Top valid candidates percentage for refinements
-    EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 3, EffortLevel.HIGH: 4},
+    EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT.value: {
+        EffortLevel.LOW: 2,
+        EffortLevel.MEDIUM: 3,
+        EffortLevel.HIGH: 4,
+    },
 }
Suggestion importance[1-10]: 6

__

Why: Using any as a type is incorrect; switching to typing.Any improves type safety and avoids a potential NameError. Also fixing the missing .value on the last key aligns with surrounding usage; a modest, correct maintainability improvement.

Low
General
Correct refinement candidate selection

TOP_VALID_CANDIDATES_FOR_REFINEMENT is used as an absolute count, but its comment
suggests a percentage/ratio selection. Clarify intent and compute top_n_candidates
as a proportion if percentage, or rename the key. This prevents refining too
few/many candidates.

codeflash/optimization/function_optimizer.py [191-221]

-top_n_candidates = int(
-    min(
-        get_effort_value(EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT, self.args.effort),
-        len(self.all_refinements_data),
-    )
-)
+# Interpret effort value as a fraction of total valid candidates
+fraction = float(get_effort_value(EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT, self.args.effort))
+total = len(self.all_refinements_data)
+top_n_candidates = max(1, int(round(min(1.0, fraction) * total)))
 
-if top_n_candidates == len(self.all_refinements_data) or len(self.all_refinements_data) <= get_effort_value(
-    EffortKeys.REFINE_ALL_THRESHOLD, self.args.effort
-):
+if top_n_candidates >= total or total <= get_effort_value(EffortKeys.REFINE_ALL_THRESHOLD, self.args.effort):
     for data in self.all_refinements_data:
         future_refinements.append(self.refine_optimizations([data]))  # noqa: PERF401
 else:
     ...
     score_dict = create_score_dictionary_from_metrics(weights, runtime_norm, diffs_norm)
     top_indecies = sorted(score_dict, key=score_dict.get)[:top_n_candidates]
Suggestion importance[1-10]: 4

__

Why: The code treats TOP_VALID_CANDIDATES_FOR_REFINEMENT as a count; the suggestion asserts it should be a fraction, but the PR sets integer values and uses them as counts, so this is speculative. It’s a reasonable consideration but not clearly required by the PR.

Low

Previous suggestions

Suggestions up to commit 1018a69
CategorySuggestion                                                                                                                                    Impact
General
Default missing effort safely

Accessing args.effort assumes the attribute is always present; older code paths or
programmatic invocations may omit it, causing AttributeError. Default to "medium"
when missing or falsy to ensure stable behavior.

codeflash/optimization/function_optimizer.py [240-243]

-n_tests = Effort.get_number_of_generated_tests(args.effort)
+effort_level = getattr(args, "effort", None) or "medium"
+n_tests = Effort.get_number_of_generated_tests(effort_level)
Suggestion importance[1-10]: 7

__

Why: Using args.effort directly can raise AttributeError in older or programmatic paths; defaulting to "medium" improves robustness with minimal risk and aligns with CLI default.

Medium
Possible issue
Preserve API request compatibility

Ensure the request keys match the backend API contract. The key was changed from
num_variants to n_candidates; if the server still expects num_variants, this will
break requests. Include both keys mapped to the same value to maintain backward
compatibility.

codeflash/api/aiservice.py [129-141]

 payload = {
     "source_code": source_code,
     "dependency_code": dependency_code,
     "n_candidates": num_candidates,
+    "num_variants": num_candidates,  # backward compatibility
     "trace_id": trace_id,
     "python_version": platform.python_version(),
     "experiment_metadata": experiment_metadata,
     "codeflash_version": codeflash_version,
     "current_username": get_last_commit_author_if_pr_exists(None),
     "repo_owner": git_repo_owner,
     "repo_name": git_repo_name,
     "is_async": is_async,
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion is contextually valid: the PR changes the payload key from num_variants to n_candidates. Proposing to send both keys can prevent backend mismatches during rollout, though it assumes server behavior and adds redundancy, so impact is moderate.

Low
Avoid API field mismatch

The request key was changed from num_variants to n_candidates_lp. If the backend
still accepts num_variants (or n_candidates_lp is not recognized), requests will
fail. Send both keys with the same value for safe rollout.

codeflash/api/aiservice.py [191-201]

 payload = {
     "source_code": source_code,
     "dependency_code": dependency_code,
     "n_candidates_lp": num_candidates,
+    "num_variants": num_candidates,  # backward compatibility
     "line_profiler_results": line_profiler_results,
     "trace_id": trace_id,
     "python_version": platform.python_version(),
     "experiment_metadata": experiment_metadata,
     "codeflash_version": codeflash_version,
     "lsp_mode": is_LSP_enabled(),
 }
Suggestion importance[1-10]: 6

__

Why: Similar to suggestion 1, this guards against backend expecting num_variants when the PR sends n_candidates_lp. It’s reasonable for backward compatibility, but speculative about server requirements and adds duplicate data.

Low

@mohammedahmed18 mohammedahmed18 marked this pull request as ready for review December 19, 2025 14:49
@github-actions
Copy link

Persistent review updated to latest commit 3e20a37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants