Skip to content

start using markdown representation for read writable context (CF-687) #553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

mohammedahmed18
Copy link
Contributor

@mohammedahmed18 mohammedahmed18 commented Jul 16, 2025

this still WIP, most tests are expected to fail

PR Type

Enhancement, Tests


Description

  • Switch read/write context extraction to markdown

  • Introduce CodeStringsMarkdown with splitter and cache

  • Update CodeOptimizationContext model type

  • Adjust tests for markdown splitter and __str__


Changes diagram

flowchart LR
  A["get_code_optimization_context()"] --> B["extract_code_markdown_context_from_files()"]
  B --> C["CodeStringsMarkdown (with splitter)"]
  C --> D["encoded_tokens_len(__str__)"]
  C --> E["find_preexisting_objects(__str__)"]
Loading

Changes walkthrough 📝

Relevant files
Enhancement
code_context_extractor.py
Switch to markdown context extraction                                       

codeflash/context/code_context_extractor.py

  • Replaced extract_code_string_context_from_files call
  • Now uses extract_code_markdown_context_from_files
  • Token length uses .__str__ of markdown context
  • Preexisting objects extracted via .__str__
  • +6/-5     
    models.py
    Add markdown code strings model                                                   

    codeflash/models/models.py

  • Added get_code_block_splitter helper
  • Enhanced CodeStringsMarkdown with cache and __str__
  • Changed read_writable_code type to CodeStringsMarkdown
  • Removed unused Field import
  • +16/-2   
    Tests
    test_code_context_extractor.py
    Update tests for markdown splitter                                             

    tests/test_code_context_extractor.py

  • Imported get_code_block_splitter
  • Updated expected context with splitter f-string
  • Assert on read_writable_code.__str__
  • +4/-3     

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Magic method misuse

    Defining __str__ as a property may override or conflict with Python's built-in __str__ behavior; consider implementing the actual __str__ method instead of a property

    @property
    def __str__(self) -> str:
        if self.cached_code is not None:
            return self.cached_code
        self.cached_code = "\n\n".join(
            get_code_block_splitter(block.file_path) + "\n" + block.code for block in self.code_strings
        )
        return self.cached_code
    Property invocation

    The code passes final_read_writable_code.__str__ to functions without invoking it; ensure the string is obtained (e.g., via str(...) or accessing the property) rather than passing the property object

    final_read_writable_tokens = encoded_tokens_len(final_read_writable_code.__str__)
    if final_read_writable_tokens > optim_token_limit:
        raise ValueError("Read-writable code has exceeded token limit, cannot proceed")
    
    # Setup preexisting objects for code replacer
    preexisting_objects = set(
        chain(
            find_preexisting_objects(final_read_writable_code.__str__),

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    General
    Normalize splitter path

    Use file_path.as_posix() in the f-string to ensure the splitter uses a consistent
    POSIX path, avoiding backslashes on Windows and matching test expectations.

    codeflash/models/models.py [142-143]

     def get_code_block_splitter(file_path: Path) -> str:
    -    return f"# codeflash-splitter__{file_path}"
    +    return f"# codeflash-splitter__{file_path.as_posix()}"
    Suggestion importance[1-10]: 7

    __

    Why: Using file_path.as_posix() in the splitter guarantees forward-slash paths on all OSes, avoiding backslashes on Windows and preventing cross-platform test failures.

    Medium
    Properly implement __str__

    Replace the @property on str with a standard special method override so calling
    str(obj) invokes it correctly and doesn’t shadow built-in behavior.

    codeflash/models/models.py [150-157]

    -@property
     def __str__(self) -> str:
         if self.cached_code is not None:
             return self.cached_code
         self.cached_code = "\n\n".join(
             get_code_block_splitter(block.file_path) + "\n" + block.code for block in self.code_strings
         )
         return self.cached_code
    Suggestion importance[1-10]: 6

    __

    Why: Defining __str__ with @property prevents str(obj) from invoking this logic; converting it to a normal special method override ensures the custom formatter is used when calling str(), improving usability.

    Low

    @mohammedahmed18 mohammedahmed18 changed the title start using markdown representation for read writable context start using markdown representation for read writable context (CF-687) Jul 16, 2025
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant