Skip to content

Conversation

@Varahiskillhub
Copy link

…p pre-filter, memory compression

  • Add string-similarity pre-filter to vulnerability deduplication to limit LLM comparisons to the top 10 most similar reports instead of all reports
  • Replace per-request httpx.AsyncClient with persistent connection pool per sandbox, eliminating repeated TCP/TLS handshake overhead
  • Execute independent tools concurrently via asyncio.gather while keeping state-modifying tools sequential
  • Lower memory compression threshold from 100K to 60K tokens and cache token counts to avoid redundant litellm.token_counter calls
  • Double compression chunk size from 10 to 20 messages to halve LLM calls
  • Replace asyncio.sleep(0.5) polling with event-based wake signaling in agent state for immediate response to state changes

…p pre-filter, memory compression

- Add string-similarity pre-filter to vulnerability deduplication to limit
  LLM comparisons to the top 10 most similar reports instead of all reports
- Replace per-request httpx.AsyncClient with persistent connection pool per
  sandbox, eliminating repeated TCP/TLS handshake overhead
- Execute independent tools concurrently via asyncio.gather while keeping
  state-modifying tools sequential
- Lower memory compression threshold from 100K to 60K tokens and cache
  token counts to avoid redundant litellm.token_counter calls
- Double compression chunk size from 10 to 20 messages to halve LLM calls
- Replace asyncio.sleep(0.5) polling with event-based wake signaling in
  agent state for immediate response to state changes

https://claude.ai/code/session_012JYGtxVh4zRbzXKarNmb11
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 5, 2026

Greptile Overview

Greptile Summary

This PR implements several performance optimizations to reduce latency and LLM API costs:

  • Connection pooling: Persistent HTTP clients per sandbox eliminate repeated TCP/TLS handshakes (executor.py:30-52)
  • Parallel tool execution: Independent tools run concurrently via asyncio.gather while state-modifying tools remain sequential (executor.py:336-406)
  • Event-based signaling: Replaced asyncio.sleep(0.5) polling with asyncio.Event for immediate wake on state changes (state.py:46, base_agent.py:278)
  • Deduplication pre-filter: String similarity limits LLM comparisons to top 10 most similar vulnerability reports instead of all reports (dedupe.py:144-194)
  • Memory compression tuning: Lowered threshold from 100K to 60K tokens, doubled chunk size from 10 to 20 messages, and cached token counts (memory_compressor.py:12-62)

Issues found:

  • close_sandbox_client function defined but never called, causing connection pool resource leaks when sandboxes are torn down
  • _token_cache in memory_compressor.py grows unbounded without eviction strategy

Confidence Score: 3/5

  • Generally safe performance improvements with two resource leak issues that need resolution before production use
  • The optimizations are well-designed and properly implement connection pooling, parallelization, and event-based signaling. However, the missing cleanup mechanism for HTTP connection pools and unbounded token cache growth are production-ready concerns that could cause memory/connection leaks in long-running systems.
  • Pay close attention to strix/tools/executor.py (connection pool cleanup) and strix/llm/memory_compressor.py (cache eviction)

Important Files Changed

Filename Overview
strix/tools/executor.py Added HTTP connection pooling per sandbox and parallel tool execution. Missing cleanup mechanism for persistent connections.
strix/agents/state.py Replaced polling with event-based signaling using asyncio.Event. Properly excluded from serialization.
strix/llm/dedupe.py Added string similarity pre-filtering to limit LLM comparisons to top 10 candidates. Efficient and well-implemented.
strix/llm/memory_compressor.py Added token count caching and increased compression chunk size. Cache grows unbounded without eviction strategy.
strix/agents/base_agent.py Replaced sleep polling with event-based wait. Clean, minimal change that improves responsiveness.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +48 to +52
async def close_sandbox_client(sandbox_id: str) -> None:
"""Close and remove the HTTP client for a sandbox when it's torn down."""
client = _sandbox_clients.pop(sandbox_id, None)
if client:
await client.aclose()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close_sandbox_client is defined but never called in the codebase. Connection pool clients accumulate without cleanup when sandboxes are torn down, leading to resource leaks.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/tools/executor.py
Line: 48:52

Comment:
`close_sandbox_client` is defined but never called in the codebase. Connection pool clients accumulate without cleanup when sandboxes are torn down, leading to resource leaks.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +47 to +62
_token_cache: dict[int, int] = {}


def _count_tokens(text: str, model: str) -> int:
cache_key = hash(text)
if cache_key in _token_cache:
return _token_cache[cache_key]

try:
count = litellm.token_counter(model=model, text=text)
return int(count)
count = int(litellm.token_counter(model=model, text=text))
except Exception:
logger.exception("Failed to count tokens")
return len(text) // 4 # Rough estimate
count = len(text) // 4 # Rough estimate

_token_cache[cache_key] = count
return count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_token_cache grows unbounded. For long-running agents with many unique messages, this will consume increasing memory. Consider adding LRU eviction or size limits.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/memory_compressor.py
Line: 47:62

Comment:
`_token_cache` grows unbounded. For long-running agents with many unique messages, this will consume increasing memory. Consider adding LRU eviction or size limits.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants