-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Improve tool efficiency: connection pooling, parallel execution, dedu… #305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Improve tool efficiency: connection pooling, parallel execution, dedu… #305
Conversation
…p pre-filter, memory compression - Add string-similarity pre-filter to vulnerability deduplication to limit LLM comparisons to the top 10 most similar reports instead of all reports - Replace per-request httpx.AsyncClient with persistent connection pool per sandbox, eliminating repeated TCP/TLS handshake overhead - Execute independent tools concurrently via asyncio.gather while keeping state-modifying tools sequential - Lower memory compression threshold from 100K to 60K tokens and cache token counts to avoid redundant litellm.token_counter calls - Double compression chunk size from 10 to 20 messages to halve LLM calls - Replace asyncio.sleep(0.5) polling with event-based wake signaling in agent state for immediate response to state changes https://claude.ai/code/session_012JYGtxVh4zRbzXKarNmb11
Greptile OverviewGreptile SummaryThis PR implements several performance optimizations to reduce latency and LLM API costs:
Issues found:
Confidence Score: 3/5
Important Files Changed
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 2 comments
| async def close_sandbox_client(sandbox_id: str) -> None: | ||
| """Close and remove the HTTP client for a sandbox when it's torn down.""" | ||
| client = _sandbox_clients.pop(sandbox_id, None) | ||
| if client: | ||
| await client.aclose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close_sandbox_client is defined but never called in the codebase. Connection pool clients accumulate without cleanup when sandboxes are torn down, leading to resource leaks.
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/tools/executor.py
Line: 48:52
Comment:
`close_sandbox_client` is defined but never called in the codebase. Connection pool clients accumulate without cleanup when sandboxes are torn down, leading to resource leaks.
How can I resolve this? If you propose a fix, please make it concise.| _token_cache: dict[int, int] = {} | ||
|
|
||
|
|
||
| def _count_tokens(text: str, model: str) -> int: | ||
| cache_key = hash(text) | ||
| if cache_key in _token_cache: | ||
| return _token_cache[cache_key] | ||
|
|
||
| try: | ||
| count = litellm.token_counter(model=model, text=text) | ||
| return int(count) | ||
| count = int(litellm.token_counter(model=model, text=text)) | ||
| except Exception: | ||
| logger.exception("Failed to count tokens") | ||
| return len(text) // 4 # Rough estimate | ||
| count = len(text) // 4 # Rough estimate | ||
|
|
||
| _token_cache[cache_key] = count | ||
| return count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_token_cache grows unbounded. For long-running agents with many unique messages, this will consume increasing memory. Consider adding LRU eviction or size limits.
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/memory_compressor.py
Line: 47:62
Comment:
`_token_cache` grows unbounded. For long-running agents with many unique messages, this will consume increasing memory. Consider adding LRU eviction or size limits.
How can I resolve this? If you propose a fix, please make it concise.
…p pre-filter, memory compression