compressor_induction_version1203 #127

xesws · 2025-12-03T05:52:05Z

No description provided.

Added a note about test-build edits to the README.

Updated workflow to build SWE-Bench Lite images with new dataset and streamlined job steps.

commit for image build

Added missing descriptions for dataset and split inputs. Changed single quotes to double quotes for variable usage in build commands.

Fixed missing descriptions for dataset and split inputs, improved formatting for YAML syntax, and ensured variable usage is consistent with double quotes.

Summary: This commit introduces the complete implementation of the Code Agent Workflow Memory (CAWM) system, a modular framework for extracting, compressing, clustering, and inducing reusable workflows from agent execution trajectories. It covers both Stage 1 (Foundation & Modules) and Stage 2 (Enhancements & Testing). Detailed Changes: Stage 1: Core Architecture & Foundation 1. Data Models (CAWM/models.py): - Defined core data structures: `Trajectory`, `Workflow`, `WorkflowStep`, `TrajectoryEvent`, and `TrajectoryCluster`. - Introduced `ActionType` enum for classifying agent actions (e.g., EXPLORATION, FILE_EDIT, TESTING). - Implemented helper functions for path abstraction and action classification. 2. LLM Client (CAWM/llm_client.py): - Created a unified `LLMClient` wrapper supporting OpenRouter, OpenAI, and Anthropic providers. - Implemented standard completion and structured JSON parsing methods. 3. Compression Module (CAWM/compression.py): - Implemented `CompressionModule` with multiple strategies: - `KEY_STEP_EXTRACTION`: Heuristic-based retention of edit/test steps. - `ACTION_TYPE_FILTERING`: Filter events by specific action types. - `HIERARCHICAL_SUMMARIZATION`: (Skeleton) LLM-based summarization. 4. Clustering Module (CAWM/clustering.py): - Implemented `ClusteringModule` for grouping similar trajectories. - Added support for `ACTION_SEQUENCE` similarity (Jaccard index of action types). 5. Induction Module (CAWM/induction.py): - Developed `InductionModule` to extract abstract workflows from trajectory clusters using LLMs. - Supports extracting workflows at `GENERAL` (cross-project) and `SPECIFIC` levels. 6. Pipeline Orchestration (CAWM/pipeline.py): - Created `CAWMPipeline` to orchestrate the full flow: Load -> Compress -> Cluster -> Induce -> Save. Stage 2: Enhancements & Robustness 1. Enhanced LLM Client: - Integrated `tenacity` for robust retry logic (exponential backoff) handling rate limits and server errors. - Added configurable timeouts and improved provider error handling. 2. Advanced Compression: - Implemented the `_compress_summarization` logic in `CompressionModule`, enabling LLM-based hierarchical summarization of trajectory chunks into high-level thoughts. 3. Advanced Clustering: - Implemented `_cluster_problem_description` using Jaccard similarity on tokenized instruction text. - Implemented `_cluster_code_modification` using `unidiff` to analyze git patches and cluster based on modified file paths. 4. Testing & Verification: - Added comprehensive unit tests in `tests/test_stage_2.py` covering new compression and clustering features. - Created an end-to-end integration test script `tests/run_cawm_demo.py` that: - Loads real trajectory data. - Executes the full pipeline with Stage 2 features (Summarization + Clustering). - Saves induced workflows to `tests/results/induced_workflows.json`. Infrastructure & Dependencies: - Added `openai`, `anthropic`, and `tenacity` to `pyproject.toml`. - Created detailed development plans and documentation: - `GEMINI.md`: Project overview and context. - `CAWM/README.md`: Comprehensive documentation for the CAWM system, including configuration, usage patterns, and API key setup (defaulting to Kimi via OpenRouter). - `dev_plans/`: Stored detailed stage plans (`stage-1-overall.md`, `stage-2-plan.md`, etc.).

Summary: This commit consolidates improvements across Stages 3, 4, and 5, focusing on robust OpenRouter integration, pipeline validation, and clustering logic fixes. Detailed Changes: Stage 3: Provider & Integration Enhancements - **LLM Client ()**: - Fixed OpenRouter provider integration. - Enhanced error handling for API connectivity. - **Pipeline ()**: - Refined pipeline execution flow for better reliability. Stage 4: Testing & Validation Framework - **Test Artifacts**: - Added structured test directories (, ). - Included pipeline statistics tracking (). - **Documentation**: - Added detailing validation results. Stage 5: Clustering Logic Fixes - **Clustering Module ()**: - Fixed logic errors in and . - Addressed issues identified in . - **Induction Module ()**: - Minor adjustments to support improved cluster input. Documentation Updates: - **README ()**: - Updated configuration examples to use . - Removed legacy direct API references (Anthropic/OpenAI) in favor of OpenRouter. - Updated environment variable documentation. Misc: - Added as an entry point.

Summary: This commit fixes a bug in `CAWM/llm_base.py` regarding the parameter passing for token limits (max_tokens) to the LLM client. Detailed Changes: - **LLM Base (`CAWM/llm_base.py`)**: - Corrected the variable name or method signature for passing `max_tokens` to ensure the LLM respects the configured output length. - **Test Artifacts**: - Updated test outputs in `CAWM/workflow/testing/` (clusters, induction details, pipeline stats, summary, workflows) reflecting the results of the latest run with the fix. - **Documentation**: - Minor update to `CAWM/README.md`.

tsljgj and others added 13 commits December 1, 2025 21:55

first version

d8fbdb6

modified swe-bench image-build file

6ff48e0

commit for image build

f20b99d

Added a note about test-build edits to the README.

revised build iamge yml

93ec666

Modify workflow for SWE-Bench Lite image builds

61f26f5

Updated workflow to build SWE-Bench Lite images with new dataset and streamlined job steps.

Merge pull request #3 from tsljgj/test-build

1e49eda

commit for image build

Fix input descriptions and quote usage in workflow

1715b69

Added missing descriptions for dataset and split inputs. Changed single quotes to double quotes for variable usage in build commands.

Fix input descriptions and improve YAML formatting

16e2642

Fixed missing descriptions for dataset and split inputs, improved formatting for YAML syntax, and ensured variable usage is consistent with double quotes.

Merge branch 'main' of https://github.com/tsljgj/benchmarks

e5bab7a

added the readme for online

3a5a0f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

compressor_induction_version1203 #127

compressor_induction_version1203 #127

Uh oh!

xesws commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

compressor_induction_version1203 #127

Are you sure you want to change the base?

compressor_induction_version1203 #127

Uh oh!

Conversation

xesws commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants