Skip to content

Conversation

@xesws
Copy link

@xesws xesws commented Dec 3, 2025

No description provided.

tsljgj and others added 13 commits December 1, 2025 21:55
Added a note about test-build edits to the README.
Updated workflow to build SWE-Bench Lite images with new dataset and streamlined job steps.
Added missing descriptions for dataset and split inputs. Changed single quotes to double quotes for variable usage in build commands.
Fixed missing descriptions for dataset and split inputs, improved formatting for YAML syntax, and ensured variable usage is consistent with double quotes.
Summary:
This commit introduces the complete implementation of the Code Agent Workflow Memory (CAWM) system, a modular framework for extracting, compressing, clustering, and inducing reusable workflows from agent execution trajectories. It covers both Stage 1 (Foundation & Modules) and Stage 2 (Enhancements & Testing).

Detailed Changes:

Stage 1: Core Architecture & Foundation
1.  Data Models (CAWM/models.py):
    -   Defined core data structures: `Trajectory`, `Workflow`, `WorkflowStep`, `TrajectoryEvent`, and `TrajectoryCluster`.
    -   Introduced `ActionType` enum for classifying agent actions (e.g., EXPLORATION, FILE_EDIT, TESTING).
    -   Implemented helper functions for path abstraction and action classification.

2.  LLM Client (CAWM/llm_client.py):
    -   Created a unified `LLMClient` wrapper supporting OpenRouter, OpenAI, and Anthropic providers.
    -   Implemented standard completion and structured JSON parsing methods.

3.  Compression Module (CAWM/compression.py):
    -   Implemented `CompressionModule` with multiple strategies:
        -   `KEY_STEP_EXTRACTION`: Heuristic-based retention of edit/test steps.
        -   `ACTION_TYPE_FILTERING`: Filter events by specific action types.
        -   `HIERARCHICAL_SUMMARIZATION`: (Skeleton) LLM-based summarization.

4.  Clustering Module (CAWM/clustering.py):
    -   Implemented `ClusteringModule` for grouping similar trajectories.
    -   Added support for `ACTION_SEQUENCE` similarity (Jaccard index of action types).

5.  Induction Module (CAWM/induction.py):
    -   Developed `InductionModule` to extract abstract workflows from trajectory clusters using LLMs.
    -   Supports extracting workflows at `GENERAL` (cross-project) and `SPECIFIC` levels.

6.  Pipeline Orchestration (CAWM/pipeline.py):
    -   Created `CAWMPipeline` to orchestrate the full flow: Load -> Compress -> Cluster -> Induce -> Save.

Stage 2: Enhancements & Robustness
1.  Enhanced LLM Client:
    -   Integrated `tenacity` for robust retry logic (exponential backoff) handling rate limits and server errors.
    -   Added configurable timeouts and improved provider error handling.

2.  Advanced Compression:
    -   Implemented the `_compress_summarization` logic in `CompressionModule`, enabling LLM-based hierarchical summarization of trajectory chunks into high-level thoughts.

3.  Advanced Clustering:
    -   Implemented `_cluster_problem_description` using Jaccard similarity on tokenized instruction text.
    -   Implemented `_cluster_code_modification` using `unidiff` to analyze git patches and cluster based on modified file paths.

4.  Testing & Verification:
    -   Added comprehensive unit tests in `tests/test_stage_2.py` covering new compression and clustering features.
    -   Created an end-to-end integration test script `tests/run_cawm_demo.py` that:
        -   Loads real trajectory data.
        -   Executes the full pipeline with Stage 2 features (Summarization + Clustering).
        -   Saves induced workflows to `tests/results/induced_workflows.json`.

Infrastructure & Dependencies:
-   Added `openai`, `anthropic`, and `tenacity` to `pyproject.toml`.
-   Created detailed development plans and documentation:
    -   `GEMINI.md`: Project overview and context.
    -   `CAWM/README.md`: Comprehensive documentation for the CAWM system, including configuration, usage patterns, and API key setup (defaulting to Kimi via OpenRouter).
    -   `dev_plans/`: Stored detailed stage plans (`stage-1-overall.md`, `stage-2-plan.md`, etc.).
Summary:
This commit consolidates improvements across Stages 3, 4, and 5, focusing on robust OpenRouter integration, pipeline validation, and clustering logic fixes.

Detailed Changes:

Stage 3: Provider & Integration Enhancements
- **LLM Client ()**:
    - Fixed OpenRouter provider integration.
    - Enhanced error handling for API connectivity.
- **Pipeline ()**:
    - Refined pipeline execution flow for better reliability.

Stage 4: Testing & Validation Framework
- **Test Artifacts**:
    - Added structured test directories (, ).
    - Included pipeline statistics tracking ().
- **Documentation**:
    - Added  detailing validation results.

Stage 5: Clustering Logic Fixes
- **Clustering Module ()**:
    - Fixed logic errors in  and .
    - Addressed issues identified in .
- **Induction Module ()**:
    - Minor adjustments to support improved cluster input.

Documentation Updates:
- **README ()**:
    - Updated configuration examples to use .
    - Removed legacy direct API references (Anthropic/OpenAI) in favor of OpenRouter.
    - Updated environment variable documentation.

Misc:
- Added  as an entry point.
Summary:
This commit fixes a bug in `CAWM/llm_base.py` regarding the parameter passing for token limits (max_tokens) to the LLM client.

Detailed Changes:
- **LLM Base (`CAWM/llm_base.py`)**:
    - Corrected the variable name or method signature for passing `max_tokens` to ensure the LLM respects the configured output length.

- **Test Artifacts**:
    - Updated test outputs in `CAWM/workflow/testing/` (clusters, induction details, pipeline stats, summary, workflows) reflecting the results of the latest run with the fix.

- **Documentation**:
    - Minor update to `CAWM/README.md`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants