feat: add Anthropic SDK evaluator (#11)#19
Open
don-petry wants to merge 4 commits intoJoaolfelicio:mainfrom
Open
feat: add Anthropic SDK evaluator (#11)#19don-petry wants to merge 4 commits intoJoaolfelicio:mainfrom
don-petry wants to merge 4 commits intoJoaolfelicio:mainfrom
Conversation
Adds AnthropicEvaluator that calls the Anthropic API directly via the SDK instead of shelling out to the Claude CLI. Benefits: - Built-in retry/backoff from the SDK - No CLI installation required (just ANTHROPIC_API_KEY) - Proper error handling and model selection The SDK is an optional dependency (pip install context-scribe[anthropic]). Auto-detection falls back to the SDK evaluator when no CLI tools are available but ANTHROPIC_API_KEY is set. Closes Joaolfelicio#11 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new Anthropic SDK–backed evaluator to context-scribe so Claude rule extraction can run without shelling out to the claude CLI, and updates evaluator auto-detection accordingly.
Changes:
- Introduce
AnthropicEvaluatorthat calls Anthropic via the Python SDK and add tests for rule/NO_RULE/model/key behaviors. - Add an
anthropicoptional dependency extra inpyproject.toml. - Update evaluator auto-detection to fall back to the SDK evaluator when no CLIs are present but
ANTHROPIC_API_KEYis set.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
context_scribe/evaluator/anthropic_llm.py |
New SDK-based evaluator implementation. |
context_scribe/evaluator/__init__.py |
Conditionally registers the new evaluator in the factory/registry. |
context_scribe/main.py |
Adds auto-detection fallback to the Anthropic evaluator and updates the error message. |
pyproject.toml |
Adds anthropic as an optional dependency extra. |
tests/test_anthropic_evaluator.py |
New unit tests for Anthropic evaluator behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Move anthropic import to module level so try/except in __init__.py correctly skips registration when SDK is missing - Add 120s timeout to Anthropic client (matching other evaluators) - Update error message assertion in test_main.py - Fix test mocking to use sys.modules patching for optional dependency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The evaluator factory pattern removed direct imports of evaluator classes from main.py. Tests must patch get_evaluator instead. Found via full test suite run with Python 3.12 + mcp installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Remove unused evaluator_class from test_daemons parametrize - Remove unused importlib import from test_anthropic_evaluator - Add test for _detect_evaluator ANTHROPIC_API_KEY fallback Found via full test suite run with Python 3.12 + mcp installed (74/74 pass). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
ClaudeEvaluatorcurrently shells out to theclaudeCLI with a hard 120s timeout, no retry logic, and requires the CLI binary installed on the host. Users without the CLI installed cannot use Claude for evaluation at all. The Anthropic Python SDK provides built-in retry/backoff, proper error handling, and model selection — and only requires an API key, not a CLI installation.Summary
AnthropicEvaluatorthat calls the Anthropic API directly via the Python SDK instead of shelling out to the Claude CLI--evaluator anthropicANTHROPIC_API_KEYis setanthropicis an optional dependency:pip install context-scribe[anthropic]import anthropicso thetry/except ImportErrorin__init__.pycorrectly gates registrationCloses #11
Testing evidence (Python 3.12 + mcp + anthropic SDK installed)
'anthropic' not in EVALUATOR_REGISTRYwhen SDK absent'anthropic' in EVALUATOR_REGISTRY= Trueget_evaluator('anthropic')returnsAnthropicEvaluatorwith correct modelget_evaluatorimports fineanthropic>=0.40.0_detect_evaluatorfallback test: returns"anthropic"when API key setTest plan
🤖 Generated with Claude Code