Skip to content

Conversation

@pppp606
Copy link
Owner

@pppp606 pppp606 commented Jul 21, 2025

概要

Issue #3 の実装: Claude Code + promptfoo + モック環境の統合テストシステム。

🎯 最終アーキテクチャ(完全最小構成)

promptfoo → claude-code-provider → `claude -p` → article_guardrail_review.md → モック記事

実装内容 ✅ 完了

完全最小統合システム

tests/promptfoo/
├── providers/claude-code-provider.ts     # `claude -p` 実行
├── evaluators/                          # promptfoo評価関数
├── mocks/articles/                      # テスト記事のみ
├── configs/                             # 基本設定
└── scripts/                             # 環境チェック

完全削除済み不要ファイル

  • 重複gitignore: tests/promptfoo/.gitignore → ルート.gitignoreに統合
  • TypeScript関連: tsconfig.json, @types/*, typescript, tsx
  • Anthropic SDK: @anthropic-ai/sdk(Claude Codeが処理)
  • CI/CD関連: GitHub Actions, .env, quickstart.sh
  • 複雑検証: validate-setup.ts, evaluator-validation.yaml
  • 汎用プロバイダー: article-guardrail-provider.ts
  • 記事生成リソース: mocks/resources/
  • 日付モック: mock-date-provider.ts

依存関係(最小限)

{
  "dependencies": {
    "promptfoo": "^0.49.0"  // のみ
  }
}

Git管理最適化

  • ルート.gitignoreにpromptfoo関連パターン統合
  • 重複ファイル完全除去
  • クリーンなリポジトリ構造

テスト実行(Claude Code統合)

cd tests/promptfoo

# 環境チェック
npm run check

# Claude Code + promptfoo 統合テスト
npm test                    # 基本機能(APPROVED判定)
npm run test:guardrails     # 違反検出精度
npm run test:edge-cases     # エラーハンドリング

🔍 統合テストの流れ

  1. promptfoo がテストケース管理
  2. claude-code-providerclaude -p .claude/commands/article_guardrail_review.md 実行
  3. Claude Code環境 でarticle_guardrail_review.mdが実際に動作
  4. promptfoo評価関数 が結果精度を測定

目標達成 🎯

  • 真の統合: Claude Code + promptfoo + モック環境
  • 完全最小構成: 不要ファイル完全削除、promptfoo依存のみ
  • 実環境テスト: 実際のClaude Codeコマンド実行
  • 高精度評価: ガードレール違反検出の precision/recall 測定
  • クリーンリポジトリ: gitignore統合、重複除去

article_guardrail_review.mdの真の統合テストシステム(完全最小構成)が完成しました!

Closes #3

pppp606 and others added 11 commits July 21, 2025 22:35
- Create test directory structure (configs, providers, mocks, evaluators)
- Add promptfoo configuration and TypeScript setup
- Implement custom Claude Code provider for test execution
- Add package.json with required dependencies
- Add comprehensive README for test setup and usage

This provides the foundation for testing all Claude Code commands
with promptfoo's evaluation framework.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Created MockDateProvider for consistent test dates (fixed to 2025-07-21)
- Added mock article files:
  - Clean article that should pass all checks
  - Article with multiple guardrail violations
  - Empty article for edge case testing
- Created mock resource files matching the expected digest output structure
- Updated ClaudeCodeProvider to support test mode with date mocking
- Created test configuration for article_guardrail_review command
- Added test runner script and npm script for easy execution
- Added comprehensive documentation for the mock environment

Test with: npm run test:article-guardrail

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…view

- Add test suites: basic-scenarios, guardrail-scenarios, edge-cases
- Create extensive mock data variations with 9 violation categories
- Implement test execution utilities and validation scripts
- Add 37 test cases covering all guardrail categories
- Include edge cases and boundary condition testing
- Update package.json with required dependencies
…ardrail review

- Add 4 custom evaluators: article-approval, violation-detection, format-compliance, response-quality
- Create 3 evaluation utilities: response-parser, violation-classifier, metrics-calculator
- Implement sophisticated scoring algorithms with precision/recall metrics
- Add performance benchmarking and quality assessment capabilities
- Update test configurations with custom evaluators and thresholds
- Create comprehensive test runner scripts with multiple execution modes
- Add validation scripts for basic evaluator testing
- Include detailed documentation and usage instructions

Total implementation: 12 new files with comprehensive evaluation logic
…oo testing system

- Add GitHub Actions workflow for automated testing with PR comments
- Create comprehensive documentation suite (setup, test writing, troubleshooting)
- Implement maintenance utilities (mock updates, test reports, cleanup)
- Add configuration management (.env.example, default settings)
- Create quickstart.sh for one-command setup and execution
- Update main README with testing integration documentation
- Add npm scripts for all testing operations
- Make all scripts executable with proper error handling

This completes all 5 phases of the promptfoo testing system implementation.
…ail_review.md

- Remove generic claude-code-provider.ts, replace with focused article-guardrail-provider.ts
- Delete unnecessary generic utilities (cleanup, update-mocks, generate-test-report, etc.)
- Remove over-engineered documentation and scripts not specific to article guardrail testing
- Update all configurations to use the targeted provider
- Simplify package.json to focus on article guardrail review testing only
- Streamline npm scripts to essential article guardrail operations
- Validate setup passes (36/36 checks) with simplified architecture

The testing system now focuses exclusively on testing the article_guardrail_review.md command without unnecessary generic components.
…sting

- Delete mocks/resources/ directory (not needed for article review)
- Remove mock-date-provider.ts (article review doesn't need date mocking)
- Delete redundant shell scripts and demo files
- Update README files to accurately reflect article-only testing scope
- Clean up article-guardrail-provider.ts imports

The test system now contains only files needed for testing article_guardrail_review.md:
- Mock articles (with violations for testing)
- Custom evaluators for guardrail detection
- Basic test configurations
This focuses on the core requirement: testing generated article review functionality.
- Delete .github/workflows/promptfoo-tests.yml (GitHub Actions workflow)
- Delete .env.example (CI environment configuration)
- Delete quickstart.sh (CI automation script)
- Delete .claude/commands/create-command.md (unrelated command)

The testing system is now focused on local manual execution only,
without CI/CD automation complexity.
…egration

- Replace direct Anthropic API calls with Claude Code execution via `claude -p`
- Create claude-code-provider.ts that executes article_guardrail_review.md locally
- Update all config files to use claude-code-provider instead of direct API
- Modify test approach: promptfoo → claude-code-provider → `claude -p` → article_guardrail_review.md
- Update README to clearly explain the integration architecture
- Ensure tests execute actual Claude Code commands with mock articles

This implements the correct 'Claude Code + promptfoo + mock environment' integration
rather than simple Anthropic API testing.
… integration

- Remove @anthropic-ai/sdk, typescript, tsx dependencies (Claude Code handles API calls)
- Delete tsconfig.json (no TypeScript compilation needed for simple spawn execution)
- Remove complex validation scripts (validate-setup.ts, simple-evaluator-test.js)
- Delete evaluator-validation.yaml (over-engineered validation config)
- Add simple check-setup.sh for basic Claude Code + promptfoo validation
- Simplify package.json to only essential promptfoo dependency and scripts

The integration now uses minimal dependencies for Claude Code execution via spawn.
- Move promptfoo-specific ignore patterns from tests/promptfoo/.gitignore to root .gitignore
- Delete redundant tests/promptfoo/.gitignore file
- Add specific paths for promptfoo test artifacts in root gitignore
- Maintain clean repository structure with single gitignore management
@pppp606 pppp606 merged commit 8cfb996 into main Jul 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Claude Code + promptfoo統合テストシステムの実装(article_guardrail_review.md対象)

2 participants