feat: add Anthropic SDK evaluator (#11) by don-petry · Pull Request #19 · Joaolfelicio/context-scribe

don-petry · 2026-04-04T19:40:44Z

Why

ClaudeEvaluator currently shells out to the claude CLI with a hard 120s timeout, no retry logic, and requires the CLI binary installed on the host. Users without the CLI installed cannot use Claude for evaluation at all. The Anthropic Python SDK provides built-in retry/backoff, proper error handling, and model selection — and only requires an API key, not a CLI installation.

Summary

Add AnthropicEvaluator that calls the Anthropic API directly via the Python SDK instead of shelling out to the Claude CLI
Register it in the evaluator factory as --evaluator anthropic
Auto-detection falls back to the SDK when no CLI tools are found but ANTHROPIC_API_KEY is set
anthropic is an optional dependency: pip install context-scribe[anthropic]
Gracefully skips registration if the SDK isn't installed (existing evaluators unaffected)
Module-level import anthropic so the try/except ImportError in __init__.py correctly gates registration
120s client timeout matching other evaluators

Closes #11

Testing evidence (Python 3.12 + mcp + anthropic SDK installed)

Check	Result
Full test suite: 74/74 passed	✅ PASS
Lazy import: `'anthropic' not in EVALUATOR_REGISTRY` when SDK absent	✅ PASS
SDK installed: `'anthropic' in EVALUATOR_REGISTRY` = True	✅ PASS
`get_evaluator('anthropic')` returns `AnthropicEvaluator` with correct model	✅ PASS
Module works without SDK: `get_evaluator` imports fine	✅ PASS
Optional dep in pyproject.toml: `anthropic>=0.40.0`	✅ PASS
`_detect_evaluator` fallback test: returns `"anthropic"` when API key set	✅ PASS
Live API: real HTTP request to Anthropic API	✅ PASS
Live API: authenticates successfully (BadRequestError, not AuthenticationError)	✅ PASS
Live API: error handled gracefully (no crash, returns None)	✅ PASS

Live end-to-end rule extraction could not be verified — test API key had insufficient credits (billing error, not auth error). All transport, authentication, and error handling paths confirmed working.

Test plan

5 new tests (rule extraction, NO_RULE, model/params, missing key, SDK fallback)
Full suite: 74/74 pass
Live: SDK registers, evaluator constructs, API call reaches Anthropic, auth succeeds, errors handled
Blocked: full rule extraction (requires funded API key)

🤖 Generated with Claude Code

Adds AnthropicEvaluator that calls the Anthropic API directly via the SDK instead of shelling out to the Claude CLI. Benefits: - Built-in retry/backoff from the SDK - No CLI installation required (just ANTHROPIC_API_KEY) - Proper error handling and model selection The SDK is an optional dependency (pip install context-scribe[anthropic]). Auto-detection falls back to the SDK evaluator when no CLI tools are available but ANTHROPIC_API_KEY is set. Closes Joaolfelicio#11 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds a new Anthropic SDK–backed evaluator to context-scribe so Claude rule extraction can run without shelling out to the claude CLI, and updates evaluator auto-detection accordingly.

Changes:

Introduce AnthropicEvaluator that calls Anthropic via the Python SDK and add tests for rule/NO_RULE/model/key behaviors.
Add an anthropic optional dependency extra in pyproject.toml.
Update evaluator auto-detection to fall back to the SDK evaluator when no CLIs are present but ANTHROPIC_API_KEY is set.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`context_scribe/evaluator/anthropic_llm.py`	New SDK-based evaluator implementation.
`context_scribe/evaluator/__init__.py`	Conditionally registers the new evaluator in the factory/registry.
`context_scribe/main.py`	Adds auto-detection fallback to the Anthropic evaluator and updates the error message.
`pyproject.toml`	Adds `anthropic` as an optional dependency extra.
`tests/test_anthropic_evaluator.py`	New unit tests for Anthropic evaluator behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

context_scribe/evaluator/__init__.py

context_scribe/main.py

context_scribe/evaluator/anthropic_llm.py

- Move anthropic import to module level so try/except in __init__.py correctly skips registration when SDK is missing - Add 120s timeout to Anthropic client (matching other evaluators) - Update error message assertion in test_main.py - Fix test mocking to use sys.modules patching for optional dependency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The evaluator factory pattern removed direct imports of evaluator classes from main.py. Tests must patch get_evaluator instead. Found via full test suite run with Python 3.12 + mcp installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

context_scribe/evaluator/anthropic_llm.py

context_scribe/main.py

tests/test_daemons.py

tests/test_anthropic_evaluator.py

- Remove unused evaluator_class from test_daemons parametrize - Remove unused importlib import from test_anthropic_evaluator - Add test for _detect_evaluator ANTHROPIC_API_KEY fallback Found via full test suite run with Python 3.12 + mcp installed (74/74 pass). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 4, 2026 19:40

Copilot started reviewing on behalf of don-petry April 4, 2026 19:41 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

context_scribe/evaluator/__init__.py Show resolved Hide resolved

context_scribe/main.py Show resolved Hide resolved

context_scribe/evaluator/anthropic_llm.py Show resolved Hide resolved

context_scribe/evaluator/anthropic_llm.py Outdated Show resolved Hide resolved

DJ and others added 2 commits April 4, 2026 12:47

Copilot AI review requested due to automatic review settings April 5, 2026 17:29

Copilot started reviewing on behalf of don-petry April 5, 2026 17:29 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

context_scribe/evaluator/anthropic_llm.py Show resolved Hide resolved

context_scribe/main.py Show resolved Hide resolved

tests/test_daemons.py Outdated Show resolved Hide resolved

tests/test_anthropic_evaluator.py Outdated Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Anthropic SDK evaluator (#11)#19

feat: add Anthropic SDK evaluator (#11)#19
don-petry wants to merge 4 commits intoJoaolfelicio:mainfrom
don-petry:feat/anthropic-sdk-evaluator

don-petry commented Apr 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

don-petry commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Summary

Testing evidence (Python 3.12 + mcp + anthropic SDK installed)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

don-petry commented Apr 4, 2026 •

edited

Loading