-
Notifications
You must be signed in to change notification settings - Fork 1
Claude Code + promptfoo統合テストシステムの実装(article_guardrail_review.md対象) #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
64 tasks
- Create test directory structure (configs, providers, mocks, evaluators) - Add promptfoo configuration and TypeScript setup - Implement custom Claude Code provider for test execution - Add package.json with required dependencies - Add comprehensive README for test setup and usage This provides the foundation for testing all Claude Code commands with promptfoo's evaluation framework. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Created MockDateProvider for consistent test dates (fixed to 2025-07-21) - Added mock article files: - Clean article that should pass all checks - Article with multiple guardrail violations - Empty article for edge case testing - Created mock resource files matching the expected digest output structure - Updated ClaudeCodeProvider to support test mode with date mocking - Created test configuration for article_guardrail_review command - Added test runner script and npm script for easy execution - Added comprehensive documentation for the mock environment Test with: npm run test:article-guardrail 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…view - Add test suites: basic-scenarios, guardrail-scenarios, edge-cases - Create extensive mock data variations with 9 violation categories - Implement test execution utilities and validation scripts - Add 37 test cases covering all guardrail categories - Include edge cases and boundary condition testing - Update package.json with required dependencies
…ardrail review - Add 4 custom evaluators: article-approval, violation-detection, format-compliance, response-quality - Create 3 evaluation utilities: response-parser, violation-classifier, metrics-calculator - Implement sophisticated scoring algorithms with precision/recall metrics - Add performance benchmarking and quality assessment capabilities - Update test configurations with custom evaluators and thresholds - Create comprehensive test runner scripts with multiple execution modes - Add validation scripts for basic evaluator testing - Include detailed documentation and usage instructions Total implementation: 12 new files with comprehensive evaluation logic
…oo testing system - Add GitHub Actions workflow for automated testing with PR comments - Create comprehensive documentation suite (setup, test writing, troubleshooting) - Implement maintenance utilities (mock updates, test reports, cleanup) - Add configuration management (.env.example, default settings) - Create quickstart.sh for one-command setup and execution - Update main README with testing integration documentation - Add npm scripts for all testing operations - Make all scripts executable with proper error handling This completes all 5 phases of the promptfoo testing system implementation.
…ail_review.md - Remove generic claude-code-provider.ts, replace with focused article-guardrail-provider.ts - Delete unnecessary generic utilities (cleanup, update-mocks, generate-test-report, etc.) - Remove over-engineered documentation and scripts not specific to article guardrail testing - Update all configurations to use the targeted provider - Simplify package.json to focus on article guardrail review testing only - Streamline npm scripts to essential article guardrail operations - Validate setup passes (36/36 checks) with simplified architecture The testing system now focuses exclusively on testing the article_guardrail_review.md command without unnecessary generic components.
…sting - Delete mocks/resources/ directory (not needed for article review) - Remove mock-date-provider.ts (article review doesn't need date mocking) - Delete redundant shell scripts and demo files - Update README files to accurately reflect article-only testing scope - Clean up article-guardrail-provider.ts imports The test system now contains only files needed for testing article_guardrail_review.md: - Mock articles (with violations for testing) - Custom evaluators for guardrail detection - Basic test configurations This focuses on the core requirement: testing generated article review functionality.
- Delete .github/workflows/promptfoo-tests.yml (GitHub Actions workflow) - Delete .env.example (CI environment configuration) - Delete quickstart.sh (CI automation script) - Delete .claude/commands/create-command.md (unrelated command) The testing system is now focused on local manual execution only, without CI/CD automation complexity.
…egration - Replace direct Anthropic API calls with Claude Code execution via `claude -p` - Create claude-code-provider.ts that executes article_guardrail_review.md locally - Update all config files to use claude-code-provider instead of direct API - Modify test approach: promptfoo → claude-code-provider → `claude -p` → article_guardrail_review.md - Update README to clearly explain the integration architecture - Ensure tests execute actual Claude Code commands with mock articles This implements the correct 'Claude Code + promptfoo + mock environment' integration rather than simple Anthropic API testing.
… integration - Remove @anthropic-ai/sdk, typescript, tsx dependencies (Claude Code handles API calls) - Delete tsconfig.json (no TypeScript compilation needed for simple spawn execution) - Remove complex validation scripts (validate-setup.ts, simple-evaluator-test.js) - Delete evaluator-validation.yaml (over-engineered validation config) - Add simple check-setup.sh for basic Claude Code + promptfoo validation - Simplify package.json to only essential promptfoo dependency and scripts The integration now uses minimal dependencies for Claude Code execution via spawn.
- Move promptfoo-specific ignore patterns from tests/promptfoo/.gitignore to root .gitignore - Delete redundant tests/promptfoo/.gitignore file - Add specific paths for promptfoo test artifacts in root gitignore - Maintain clean repository structure with single gitignore management
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
概要
Issue #3 の実装: Claude Code + promptfoo + モック環境の統合テストシステム。
🎯 最終アーキテクチャ(完全最小構成)
実装内容 ✅ 完了
完全最小統合システム
完全削除済み不要ファイル
依存関係(最小限)
{ "dependencies": { "promptfoo": "^0.49.0" // のみ } }Git管理最適化
.gitignoreにpromptfoo関連パターン統合テスト実行(Claude Code統合)
🔍 統合テストの流れ
claude -p .claude/commands/article_guardrail_review.md実行目標達成 🎯
article_guardrail_review.mdの真の統合テストシステム(完全最小構成)が完成しました!
Closes #3