Optimization effort #978

mohammedahmed18 · 2025-12-18T15:55:51Z

User description

fixes CF-938

PR Type

Enhancement

Description

Introduce effort-based optimization controls
Replace fixed constants with dynamic effort values
Add CLI flag for effort selection
Align API payload keys and defaults

Diagram Walkthrough

flowchart LR
  CLI["CLI --effort flag"] -- "sets effort level" --> Optimizer["Function optimizer"]
  Config["Effort config (values, keys)"] -- "get_effort_value()" --> Optimizer
  Optimizer -- "n_candidates / tests / repairs / refine" --> AISvc["AI service payloads"]
  AISvc -- "n_candidates / n_candidates_lp" --> Server["Backend endpoints"]

File Walkthrough

Relevant files

Enhancement

aiservice.py `Align API payload keys and candidate defaults` codeflash/api/aiservice.py Rename payload keys to `n_candidates` and `n_candidates_lp` Remove deprecated effective candidate constants from payload Adjust default LP candidates to 8 Maintain metadata fields unchanged	+3/-6
cli.py `Add CLI effort level option` codeflash/cli_cmds/cli.py Add `--effort` CLI flag Provide choices: low, medium, high Default effort level set to medium	+3/-0
config_consts.py `Introduce effort-based configuration system` codeflash/code_utils/config_consts.py Remove many fixed candidate/test constants Add `EffortLevel`, `EffortKeys`, and values map Implement `get_effort_value` helper Keep effective looping time and review limit	+50/-19
git_utils.py `Remove worktree utilities and tidy imports` codeflash/code_utils/git_utils.py Remove unused worktree management helpers Drop dependency on candidate count constant Cleanup unused imports	+0/-34
function_optimizer.py `Apply effort-based tuning throughout optimizer` codeflash/optimization/function_optimizer.py Replace constants with `get_effort_value` across flows Effort-based counts for candidates, LP, tests, repairs Effort-driven refinement thresholds and top selection Update API calls to pass effort-derived values	+20/-17

github-actions · 2025-12-18T15:56:51Z

PR Reviewer Guide 🔍

(Review updated until commit `3e20a37`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue The dict literal for effort values mixes key types: other entries use `.value` when indexing by `EffortKeys`, but `EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT` is used directly without `.value`. This likely makes `get_effort_value` fail to find the key at runtime for that entry. EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 3, EffortLevel.HIGH: 4}, } Type Hint `EFFORT_VALUES` and `get_effort_value` use `any` rather than `Any`. This will not type-check and may fail linting; prefer `typing.Any`. EFFORT_VALUES: dict[str, dict[EffortLevel, any]] = { EffortKeys.N_OPTIMIZER_CANDIDATES.value: {EffortLevel.LOW: 3, EffortLevel.MEDIUM: 4, EffortLevel.HIGH: 5}, EffortKeys.N_OPTIMIZER_LP_CANDIDATES.value: {EffortLevel.LOW: 3, EffortLevel.MEDIUM: 5, EffortLevel.HIGH: 6}, # we don't use effort with generated tests for now EffortKeys.N_GENERATED_TESTS.value: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 2, EffortLevel.HIGH: 2}, # maximum number of repairs we will do for each function EffortKeys.MAX_CODE_REPAIRS_PER_TRACE.value: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 4, EffortLevel.HIGH: 5}, # if the percentage of unmatched tests is greater than this, we won't fix it (lowering this value makes the repair more stricted) # on the low effort we lower the limit to 20% to be more strict (less repairs) EffortKeys.REPAIR_UNMATCHED_PERCENTAGE_LIMIT.value: { EffortLevel.LOW: 0.2, EffortLevel.MEDIUM: 0.4, EffortLevel.HIGH: 0.5, }, # when valid optimizations count is N or less, refine all optimizations EffortKeys.REFINE_ALL_THRESHOLD.value: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 3, EffortLevel.HIGH: 4}, # Top valid candidates percentage for refinements EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 3, EffortLevel.HIGH: 4}, } def get_effort_value(key: EffortKeys, effort: EffortLevel) -> any: key_str = key.value if key_str in EFFORT_VALUES: if effort in EFFORT_VALUES[key_str]: return EFFORT_VALUES[key_str][effort] msg = f"Invalid effort level: {effort}" raise ValueError(msg) msg = f"Invalid key: {key_str}" raise ValueError(msg) Logic Change Refinement selection switched from percentage-based to fixed top-N from effort config. Validate that low/medium/high values produce intended proportions across different candidate counts to avoid refining too few/many items. top_n_candidates = int( min( get_effort_value(EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT, self.args.effort), len(self.all_refinements_data), ) ) if top_n_candidates == len(self.all_refinements_data) or len(self.all_refinements_data) <= get_effort_value( EffortKeys.REFINE_ALL_THRESHOLD, self.args.effort ): for data in self.all_refinements_data: future_refinements.append(self.refine_optimizations([data])) # noqa: PERF401 else: diff_lens_list = [] runtimes_list = [] for c in self.all_refinements_data: diff_lens_list.append(diff_length(c.original_source_code, c.optimized_source_code)) runtimes_list.append(c.optimized_code_runtime) runtime_w, diff_w = REFINED_CANDIDATE_RANKING_WEIGHTS weights = choose_weights(runtime=runtime_w, diff=diff_w) runtime_norm = normalize_by_max(runtimes_list) diffs_norm = normalize_by_max(diff_lens_list) # the lower the better score_dict = create_score_dictionary_from_metrics(weights, runtime_norm, diffs_norm) top_indecies = sorted(score_dict, key=score_dict.get)[:top_n_candidates] for idx in top_indecies: data = self.all_refinements_data[idx] future_refinements.append(self.refine_optimizations([data]))

github-actions · 2025-12-18T15:57:37Z

PR Code Suggestions ✨

Latest suggestions up to 3e20a37
Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Maintain API payload compatibility The API payload key was renamed from `num_variants` to `n_candidates`. Ensure the server expects this new key, or include both for backward compatibility to avoid breaking requests. This prevents request failures due to mismatched parameter names. codeflash/api/aiservice.py [140-152] payload = { "source_code": source_code, "dependency_code": dependency_code, + "num_variants": num_candidates, # backward compatibility "n_candidates": num_candidates, "trace_id": trace_id, "python_version": platform.python_version(), "experiment_metadata": experiment_metadata, "codeflash_version": codeflash_version, "current_username": get_last_commit_author_if_pr_exists(None), "repo_owner": git_repo_owner, "repo_name": git_repo_name, "is_async": is_async, } Suggestion importance[1-10]: 7 __ Why: The PR changes payload key to `n_candidates`; adding `num_variants` alongside it can prevent breaking older servers. It's accurate to the diff and improves backward compatibility, though not strictly necessary if server is updated.	Medium
Possible issue	Fix invalid typing and keys The `EFFORT_VALUES` annotation uses `any` which is undefined and weakly typed, risking runtime/type errors. Use `Any` from `typing` and ensure key presence by referencing `.value` consistently. This prevents NameError and improves static/type checking. codeflash/code_utils/config_consts.py [33-67] +from typing import Any + class EffortKeys(StrEnum): N_OPTIMIZER_CANDIDATES = auto() N_OPTIMIZER_LP_CANDIDATES = auto() N_GENERATED_TESTS = auto() MAX_CODE_REPAIRS_PER_TRACE = auto() REPAIR_UNMATCHED_PERCENTAGE_LIMIT = auto() REFINE_ALL_THRESHOLD = auto() TOP_VALID_CANDIDATES_FOR_REFINEMENT = auto() -... -EFFORT_VALUES: dict[str, dict[EffortLevel, any]] = { +EFFORT_VALUES: dict[str, dict[EffortLevel, Any]] = { EffortKeys.N_OPTIMIZER_CANDIDATES.value: {EffortLevel.LOW: 3, EffortLevel.MEDIUM: 4, EffortLevel.HIGH: 5}, EffortKeys.N_OPTIMIZER_LP_CANDIDATES.value: {EffortLevel.LOW: 3, EffortLevel.MEDIUM: 5, EffortLevel.HIGH: 6}, - # we don't use effort with generated tests for now EffortKeys.N_GENERATED_TESTS.value: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 2, EffortLevel.HIGH: 2}, - # maximum number of repairs we will do for each function EffortKeys.MAX_CODE_REPAIRS_PER_TRACE.value: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 4, EffortLevel.HIGH: 5}, - # if the percentage of unmatched tests is greater than this, we won't fix it (lowering this value makes the repair more stricted) - # on the low effort we lower the limit to 20% to be more strict (less repairs) EffortKeys.REPAIR_UNMATCHED_PERCENTAGE_LIMIT.value: { EffortLevel.LOW: 0.2, EffortLevel.MEDIUM: 0.4, EffortLevel.HIGH: 0.5, }, - # when valid optimizations count is N or less, refine all optimizations EffortKeys.REFINE_ALL_THRESHOLD.value: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 3, EffortLevel.HIGH: 4}, - # Top valid candidates percentage for refinements - EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT: {EffortLevel.LOW: 2, EffortLevel.MEDIUM: 3, EffortLevel.HIGH: 4}, + EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT.value: { + EffortLevel.LOW: 2, + EffortLevel.MEDIUM: 3, + EffortLevel.HIGH: 4, + }, } Suggestion importance[1-10]: 6 __ Why: Using `any` as a type is incorrect; switching to `typing.Any` improves type safety and avoids a potential NameError. Also fixing the missing `.value` on the last key aligns with surrounding usage; a modest, correct maintainability improvement.	Low
General	Correct refinement candidate selection `TOP_VALID_CANDIDATES_FOR_REFINEMENT` is used as an absolute count, but its comment suggests a percentage/ratio selection. Clarify intent and compute `top_n_candidates` as a proportion if percentage, or rename the key. This prevents refining too few/many candidates. codeflash/optimization/function_optimizer.py [191-221] -top_n_candidates = int( - min( - get_effort_value(EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT, self.args.effort), - len(self.all_refinements_data), - ) -) +# Interpret effort value as a fraction of total valid candidates +fraction = float(get_effort_value(EffortKeys.TOP_VALID_CANDIDATES_FOR_REFINEMENT, self.args.effort)) +total = len(self.all_refinements_data) +top_n_candidates = max(1, int(round(min(1.0, fraction) * total))) -if top_n_candidates == len(self.all_refinements_data) or len(self.all_refinements_data) <= get_effort_value( - EffortKeys.REFINE_ALL_THRESHOLD, self.args.effort -): +if top_n_candidates >= total or total <= get_effort_value(EffortKeys.REFINE_ALL_THRESHOLD, self.args.effort): for data in self.all_refinements_data: future_refinements.append(self.refine_optimizations([data])) # noqa: PERF401 else: ... score_dict = create_score_dictionary_from_metrics(weights, runtime_norm, diffs_norm) top_indecies = sorted(score_dict, key=score_dict.get)[:top_n_candidates] Suggestion importance[1-10]: 4 __ Why: The code treats `TOP_VALID_CANDIDATES_FOR_REFINEMENT` as a count; the suggestion asserts it should be a fraction, but the PR sets integer values and uses them as counts, so this is speculative. It’s a reasonable consideration but not clearly required by the PR.	Low

Previous suggestions

Suggestions up to commit 1018a69

Category	Suggestion	Impact
General	Default missing effort safely Accessing `args.effort` assumes the attribute is always present; older code paths or programmatic invocations may omit it, causing AttributeError. Default to `"medium"` when missing or falsy to ensure stable behavior. codeflash/optimization/function_optimizer.py [240-243] -n_tests = Effort.get_number_of_generated_tests(args.effort) +effort_level = getattr(args, "effort", None) or "medium" +n_tests = Effort.get_number_of_generated_tests(effort_level) Suggestion importance[1-10]: 7 __ Why: Using `args.effort` directly can raise `AttributeError` in older or programmatic paths; defaulting to "medium" improves robustness with minimal risk and aligns with CLI default.	Medium
Possible issue	Preserve API request compatibility Ensure the request keys match the backend API contract. The key was changed from `num_variants` to `n_candidates`; if the server still expects `num_variants`, this will break requests. Include both keys mapped to the same value to maintain backward compatibility. codeflash/api/aiservice.py [129-141] payload = { "source_code": source_code, "dependency_code": dependency_code, "n_candidates": num_candidates, + "num_variants": num_candidates, # backward compatibility "trace_id": trace_id, "python_version": platform.python_version(), "experiment_metadata": experiment_metadata, "codeflash_version": codeflash_version, "current_username": get_last_commit_author_if_pr_exists(None), "repo_owner": git_repo_owner, "repo_name": git_repo_name, "is_async": is_async, } Suggestion importance[1-10]: 6 __ Why: The suggestion is contextually valid: the PR changes the payload key from `num_variants` to `n_candidates`. Proposing to send both keys can prevent backend mismatches during rollout, though it assumes server behavior and adds redundancy, so impact is moderate.	Low
Possible issue	Avoid API field mismatch The request key was changed from `num_variants` to `n_candidates_lp`. If the backend still accepts `num_variants` (or `n_candidates_lp` is not recognized), requests will fail. Send both keys with the same value for safe rollout. codeflash/api/aiservice.py [191-201] payload = { "source_code": source_code, "dependency_code": dependency_code, "n_candidates_lp": num_candidates, + "num_variants": num_candidates, # backward compatibility "line_profiler_results": line_profiler_results, "trace_id": trace_id, "python_version": platform.python_version(), "experiment_metadata": experiment_metadata, "codeflash_version": codeflash_version, "lsp_mode": is_LSP_enabled(), } Suggestion importance[1-10]: 6 __ Why: Similar to suggestion 1, this guards against backend expecting `num_variants` when the PR sends `n_candidates_lp`. It’s reasonable for backward compatibility, but speculative about server requirements and adds duplicate data.	Low

…ation-effort

github-actions · 2025-12-19T14:50:15Z

Persistent review updated to latest commit 3e20a37

optimization effort

1018a69

github-actions bot added the Review effort 2/5 label Dec 18, 2025

mohammedahmed18 added 3 commits December 19, 2025 14:29

Merge branch 'main' of github.com:codeflash-ai/codeflash into optimiz…

4fe32d7

…ation-effort

Merge branch 'main' of github.com:codeflash-ai/codeflash into optimiz…

444aab2

…ation-effort

more effort values

3e20a37

mohammedahmed18 marked this pull request as ready for review December 19, 2025 14:49

github-actions bot added Review effort 3/5 and removed Review effort 2/5 labels Dec 19, 2025

fix

f4be23b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimization effort #978

Optimization effort #978

mohammedahmed18 commented Dec 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Optimization effort #978

Are you sure you want to change the base?

Optimization effort #978

Conversation

mohammedahmed18 commented Dec 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 3e20a37)

Uh oh!

github-actions bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mohammedahmed18 commented Dec 18, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Dec 18, 2025 •

edited

Loading

(Review updated until commit `3e20a37`)

github-actions bot commented Dec 18, 2025 •

edited

Loading