Skip to content

Conversation

@Mada2aa
Copy link

@Mada2aa Mada2aa commented Dec 31, 2025

Description

This pull request addresses several critical memory leaks and state persistence issues identified during interactive or long-running agent sessions. These issues caused the agent's memory
usage to grow unbounded and led to context pollution, where the state from previous tasks (e.g., tool history, shell environment) would bleed into new tasks.

Detailed Changes

  1. Fix Unbounded Tool List Growth
  • Issue: In interactive mode, the TraeAgent would re-initialize MCP tools for every new task using initialise_mcp(). This method appended new tool instances to self._tools without
    removing the old ones.
  • Fix:
    • Added self._base_tools to BaseAgent to store the initial, immutable set of tools.
    • In TraeAgent.reset(), self._tools is now restored from self._base_tools, effectively discarding any accumulated MCP tools from the previous session.
    • self.mcp_tools list is explicitly cleared during reset.
  1. Fix Tool State Persistence
  • Issue: Tools like SequentialThinkingTool and BashTool maintained their internal state (e.g., thought history, active subprocesses) indefinitely because the tool instances were reused
    across tasks.
  • Fix:
    • Introduced an async reset() method in the Tool base class.
    • SequentialThinkingTool: Implemented reset() to clear thought_history and branches.
    • BashTool: Implemented reset() to properly close the bash session.
    • BaseAgent: Iterates through all tools and calls their reset() method when a task is reset.
  1. Fix LLM Context Window Leak
  • Issue: The LLM client's message_history was never explicitly cleared between tasks. While new_task reset the initial messages, the underlying client kept appending to the same history
    list, eventually exhausting the context window.
  • Fix:
    • Added an abstract clear_history() method to BaseLLMClient.
    • Implemented clear_history() for all providers (OpenAI, Anthropic, Google, Ollama) to empty their internal message buffers.
    • BaseAgent.reset() now calls self._llm_client.clear_history() to ensure a fresh context for each new task.
  1. Prevent Docker Context Pollution
  • Issue: In Docker mode, the persistent shell (used for interactive commands) maintained its environment variables and working directory across tasks. This meant a cd /tmp or export
    VAR=foo in Task A would affect Task B.
  • Fix:
    • Added restart_shell() to DockerManager, which kills and respawns the persistent shell session.
    • BaseAgent.reset() triggers this restart if a Docker manager is active, ensuring a clean shell environment for every new task while keeping the container filesystem intact.
  1. Lifecycle Management
  • Issue: The reset logic was scattered or missing.
  • Fix:
    • Centralized cleanup logic in BaseAgent.reset() and TraeAgent.reset().
    • Updated Agent.run() to explicitly call await self.agent.reset() before starting any new task.

Verification

A reproduction script (reproduce_leaks.py) was used to simulate multiple consecutive tasks in a single agent lifecycle.

Before Fix:

  • Tool Count: Linear growth (e.g., 2 -> 4 -> 7 tools after 3 tasks).
  • Sequential Thinking: History from Task 1 (e.g., "Thought A") was present in Task 2.
  • LLM History: Previous task messages remained in the context.

After Fix:

  • Tool Count: Constant (remains at 2).
  • Sequential Thinking: History is empty at the start of each task.
  • LLM History: Context is fresh for each task.
  • Docker Shell: Environment variables and CWD are reset.

verification script:
verify_integrated_fix.py

import asyncio
import sys
from unittest.mock import MagicMock

# Mock heavy/external dependencies
sys.modules["mcp"] = MagicMock()
sys.modules["mcp.client.stdio"] = MagicMock()
sys.modules["tree_sitter"] = MagicMock()
sys.modules["tree_sitter_languages"] = MagicMock()

from trae_agent.agent.agent import Agent, AgentType
from trae_agent.utils.config import Config, ModelProvider, ModelConfig, TraeAgentConfig, MCPServerConfig
from trae_agent.tools.base import Tool, ToolExecResult
from trae_agent.utils.llm_clients.llm_basics import LLMResponse

# --- Mocks ---
class MockTool(Tool):
    def get_name(self):
        return "mock_tool"
    def get_description(self):
        return "A mock tool"
    def get_parameters(self):
        return []
    async def execute(self, args):
        return ToolExecResult(output="done")

class MockMCPClient:
    def __init__(self):
        self.session = MagicMock()
    async def connect_and_discover(self, name, config, container, provider):
        # Simulate finding a tool
        container.append(MockTool())
    async def cleanup(self, name):
        pass

# --- Configuration ---
provider = ModelProvider(api_key="sk-test", provider="openai", base_url="http://test")
m_config = ModelConfig(model="gpt-4", model_provider=provider, temperature=0.5, top_p=1.0, top_k=50, parallel_tool_calls=True, max_retries=3)
trae_config = TraeAgentConfig(model=m_config, max_steps=1, tools=["sequentialthinking"], mcp_servers_config={"test_server": MCPServerConfig(command="echo", args=[])}, allow_mcp_servers=["test_server"])
full_config = Config(trae_agent=trae_config)

async def main():
    print("=== Integrated Fix Verification ===\n")

    # Patch MCPClient in the specific module
    import trae_agent.agent.trae_agent as trae_agent_module
    trae_agent_module.MCPClient = MockMCPClient

    # Initialize the high-level Agent
    agent_wrapper = Agent(agent_type=AgentType.TraeAgent, config=full_config)
    
    # Access the underlying TraeAgent instance
    trae_agent = agent_wrapper.agent
    
    # Mock LLM Client to prevent network calls and auto-complete task
    trae_agent._llm_client = MagicMock()
    trae_agent._llm_client.provider.value = "openai"
    trae_agent._llm_client.chat = MagicMock(return_value=LLMResponse(content="done", model="gpt-4", finish_reason="stop", tool_calls=[]))
    trae_agent._llm_client.clear_history = MagicMock()

    for i in range(1, 4):
        print(f"--- Task {i} ---")
        
        # [INTEGRATION TEST]: We ONLY call run(). 
        # It should call reset() automatically inside.
        task_args = {"project_path": "/tmp", "issue": f"Task {i}", "must_patch": "false"}
        await agent_wrapper.run(f"Task {i}", extra_args=task_args)
        
        # Check Results
        mcp_tools = [t for t in trae_agent.tools if isinstance(t, MockTool)]
        print(f"Tools list size: {len(trae_agent.tools)}")
        print(f"MockTool instances: {len(mcp_tools)}")
        
        if len(mcp_tools) > 1:
             print("!! FAIL: MCP Tools accumulated. Reset did not work.")
        else:
             print(">>> PASS: Tools reset successfully.")

        seq_tool = next(t for t in trae_agent.tools if t.name == "sequentialthinking")
        
        # Verify history is empty AFTER run (because run() starts with reset())
        # But we added state at the END of the previous task loop iteration.
        # So history size should ALWAYS be 1 here because we add one right after checking.
        print(f"SequentialThinking history size: {len(seq_tool.thought_history)}")
        if len(seq_tool.thought_history) > 1:
            print("!! FAIL: State leaked from previous task.")
        
        # Add history for the NEXT iteration to find and clean
        from trae_agent.tools.sequential_thinking_tool import ThoughtData
        seq_tool.thought_history.append(ThoughtData(thought=f"Garbage from Task {i}", thought_number=1, total_thoughts=1, next_thought_needed=False))
        print("")

    print("=== Verification Finished ===")

if __name__ == "__main__":
    asyncio.run(main())

Before Fix:
image

After Fix:
image

Fixed unbounded growth of tool lists, implemented reset logic for tools/LLM clients, and ensured Docker shell restarts to prevent context leaks.
@CLAassistant
Copy link

CLAassistant commented Dec 31, 2025

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants