Claude Code + promptfoo統合テストシステムの実装（article_guardrail_review.md対象） #4

pppp606 · 2025-07-21T13:32:39Z

概要

Issue #3 の実装: Claude Code + promptfoo + モック環境の統合テストシステム。

🎯 最終アーキテクチャ（完全最小構成）

promptfoo → claude-code-provider → `claude -p` → article_guardrail_review.md → モック記事

実装内容 ✅ 完了

完全最小統合システム

tests/promptfoo/
├── providers/claude-code-provider.ts     # `claude -p` 実行
├── evaluators/                          # promptfoo評価関数
├── mocks/articles/                      # テスト記事のみ
├── configs/                             # 基本設定
└── scripts/                             # 環境チェック

完全削除済み不要ファイル

❌ 重複gitignore: tests/promptfoo/.gitignore → ルート.gitignoreに統合
❌ TypeScript関連: tsconfig.json, @types/*, typescript, tsx
❌ Anthropic SDK: @anthropic-ai/sdk（Claude Codeが処理）
❌ CI/CD関連: GitHub Actions, .env, quickstart.sh
❌ 複雑検証: validate-setup.ts, evaluator-validation.yaml
❌ 汎用プロバイダー: article-guardrail-provider.ts
❌ 記事生成リソース: mocks/resources/
❌ 日付モック: mock-date-provider.ts

依存関係（最小限）

{
  "dependencies": {
    "promptfoo": "^0.49.0"  // のみ
  }
}

Git管理最適化

ルート.gitignoreにpromptfoo関連パターン統合
重複ファイル完全除去
クリーンなリポジトリ構造

テスト実行（Claude Code統合）

cd tests/promptfoo

# 環境チェック
npm run check

# Claude Code + promptfoo 統合テスト
npm test                    # 基本機能（APPROVED判定）
npm run test:guardrails     # 違反検出精度
npm run test:edge-cases     # エラーハンドリング

🔍 統合テストの流れ

promptfoo がテストケース管理
claude-code-provider が claude -p .claude/commands/article_guardrail_review.md 実行
Claude Code環境 でarticle_guardrail_review.mdが実際に動作
promptfoo評価関数 が結果精度を測定

目標達成 🎯

✅ 真の統合: Claude Code + promptfoo + モック環境
✅ 完全最小構成: 不要ファイル完全削除、promptfoo依存のみ
✅ 実環境テスト: 実際のClaude Codeコマンド実行
✅ 高精度評価: ガードレール違反検出の precision/recall 測定
✅ クリーンリポジトリ: gitignore統合、重複除去

article_guardrail_review.mdの真の統合テストシステム（完全最小構成）が完成しました！

Closes #3

- Create test directory structure (configs, providers, mocks, evaluators) - Add promptfoo configuration and TypeScript setup - Implement custom Claude Code provider for test execution - Add package.json with required dependencies - Add comprehensive README for test setup and usage This provides the foundation for testing all Claude Code commands with promptfoo's evaluation framework. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Created MockDateProvider for consistent test dates (fixed to 2025-07-21) - Added mock article files: - Clean article that should pass all checks - Article with multiple guardrail violations - Empty article for edge case testing - Created mock resource files matching the expected digest output structure - Updated ClaudeCodeProvider to support test mode with date mocking - Created test configuration for article_guardrail_review command - Added test runner script and npm script for easy execution - Added comprehensive documentation for the mock environment Test with: npm run test:article-guardrail 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

…view - Add test suites: basic-scenarios, guardrail-scenarios, edge-cases - Create extensive mock data variations with 9 violation categories - Implement test execution utilities and validation scripts - Add 37 test cases covering all guardrail categories - Include edge cases and boundary condition testing - Update package.json with required dependencies

…ardrail review - Add 4 custom evaluators: article-approval, violation-detection, format-compliance, response-quality - Create 3 evaluation utilities: response-parser, violation-classifier, metrics-calculator - Implement sophisticated scoring algorithms with precision/recall metrics - Add performance benchmarking and quality assessment capabilities - Update test configurations with custom evaluators and thresholds - Create comprehensive test runner scripts with multiple execution modes - Add validation scripts for basic evaluator testing - Include detailed documentation and usage instructions Total implementation: 12 new files with comprehensive evaluation logic

…oo testing system - Add GitHub Actions workflow for automated testing with PR comments - Create comprehensive documentation suite (setup, test writing, troubleshooting) - Implement maintenance utilities (mock updates, test reports, cleanup) - Add configuration management (.env.example, default settings) - Create quickstart.sh for one-command setup and execution - Update main README with testing integration documentation - Add npm scripts for all testing operations - Make all scripts executable with proper error handling This completes all 5 phases of the promptfoo testing system implementation.

…ail_review.md - Remove generic claude-code-provider.ts, replace with focused article-guardrail-provider.ts - Delete unnecessary generic utilities (cleanup, update-mocks, generate-test-report, etc.) - Remove over-engineered documentation and scripts not specific to article guardrail testing - Update all configurations to use the targeted provider - Simplify package.json to focus on article guardrail review testing only - Streamline npm scripts to essential article guardrail operations - Validate setup passes (36/36 checks) with simplified architecture The testing system now focuses exclusively on testing the article_guardrail_review.md command without unnecessary generic components.

…sting - Delete mocks/resources/ directory (not needed for article review) - Remove mock-date-provider.ts (article review doesn't need date mocking) - Delete redundant shell scripts and demo files - Update README files to accurately reflect article-only testing scope - Clean up article-guardrail-provider.ts imports The test system now contains only files needed for testing article_guardrail_review.md: - Mock articles (with violations for testing) - Custom evaluators for guardrail detection - Basic test configurations This focuses on the core requirement: testing generated article review functionality.

- Delete .github/workflows/promptfoo-tests.yml (GitHub Actions workflow) - Delete .env.example (CI environment configuration) - Delete quickstart.sh (CI automation script) - Delete .claude/commands/create-command.md (unrelated command) The testing system is now focused on local manual execution only, without CI/CD automation complexity.

…egration - Replace direct Anthropic API calls with Claude Code execution via `claude -p` - Create claude-code-provider.ts that executes article_guardrail_review.md locally - Update all config files to use claude-code-provider instead of direct API - Modify test approach: promptfoo → claude-code-provider → `claude -p` → article_guardrail_review.md - Update README to clearly explain the integration architecture - Ensure tests execute actual Claude Code commands with mock articles This implements the correct 'Claude Code + promptfoo + mock environment' integration rather than simple Anthropic API testing.

… integration - Remove @anthropic-ai/sdk, typescript, tsx dependencies (Claude Code handles API calls) - Delete tsconfig.json (no TypeScript compilation needed for simple spawn execution) - Remove complex validation scripts (validate-setup.ts, simple-evaluator-test.js) - Delete evaluator-validation.yaml (over-engineered validation config) - Add simple check-setup.sh for basic Claude Code + promptfoo validation - Simplify package.json to only essential promptfoo dependency and scripts The integration now uses minimal dependencies for Claude Code execution via spawn.

- Move promptfoo-specific ignore patterns from tests/promptfoo/.gitignore to root .gitignore - Delete redundant tests/promptfoo/.gitignore file - Add specific paths for promptfoo test artifacts in root gitignore - Maintain clean repository structure with single gitignore management

[skip ci] Initial commit for issue #3

23a87b4

pppp606 mentioned this pull request Jul 21, 2025

Claude Code + promptfoo統合テストシステムの実装（article_guardrail_review.md対象） #3

Closed

64 tasks

pppp606 and others added 11 commits July 21, 2025 22:35

pppp606 merged commit 8cfb996 into main Jul 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Claude Code + promptfoo統合テストシステムの実装（article_guardrail_review.md対象） #4

Claude Code + promptfoo統合テストシステムの実装（article_guardrail_review.md対象） #4

Uh oh!

pppp606 commented Jul 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Claude Code + promptfoo統合テストシステムの実装（article_guardrail_review.md対象） #4

Claude Code + promptfoo統合テストシステムの実装（article_guardrail_review.md対象） #4

Uh oh!

Conversation

pppp606 commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

概要

🎯 最終アーキテクチャ（完全最小構成）

実装内容 ✅ 完了

完全最小統合システム

完全削除済み不要ファイル

依存関係（最小限）

Git管理最適化

テスト実行（Claude Code統合）

🔍 統合テストの流れ

目標達成 🎯

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pppp606 commented Jul 21, 2025 •

edited

Loading