[Enhancement] Add guideline classifier integration #294

jmanhype · 2025-02-20T14:54:53Z

Overview

This PR adds a guideline classifier to improve the optimization pipeline by determining which guidelines should be activated based on conversation context. This is Phase 1 of our DSPy integration roadmap.

Key Changes

Add `GuidelineClassifier` class for smart guideline activation
Update optimization script to support both OpenAI and Llama2 models
Add classification script and tests
Improve response optimization with enhanced COPRO parameters
Add detailed integration roadmap

Implementation Details

`GuidelineClassifier`: New class that uses LLMs to determine which guidelines to activate
`run_guideline_optimization.py`: Now supports both OpenAI and local Llama2 models
Added comprehensive test coverage for classifier functionality
Enhanced optimization parameters for better response quality

Testing

Added unit tests in `tests/test_guideline_classifier.py`
Tested with both OpenAI and Llama2 models
Verified classification accuracy and response quality

Performance

Classification accuracy: ~100% on test cases
Response optimization shows improved quality
Support for both cloud and local model inference

Notes

Requires OpenAI API key for OpenAI model
Requires Ollama setup for local Llama2 inference

Next Steps

Please see the detailed roadmap in ROADMAP.md for the complete integration plan. This PR represents Phase 1 of 5 phases:

Phase 1 (Current): Basic DSPy Integration ✅
Phase 2: Engine Integration
Phase 3: Server Integration
Phase 4: Storage & Metrics
Phase 5: Testing & Documentation

Each phase will be submitted as a separate PR to maintain code review quality and manage complexity.

- Add DSPy integration for guideline optimization - Implement COPRO optimizer with batch processing - Add metrics tracking for model performance - Add Ollama support for local models - Add tests for DSPy integration - Update dependencies for DSPy support

- Fixed example creation to use direct field assignment instead of inputs/outputs dict - Updated _calculate_response_quality to handle new example format - Added difflib for better response quality calculation

- Inherit from ChatAdapter instead of Adapter - Properly initialize parent class with callbacks - Implement format method to store messages in history - Simplify inspect_history to match base LM interface - Add proper type hints and docstrings following PEP 257

- Add GuidelineClassifier implementation for determining which guidelines to activate - Update run_guideline_optimization.py to support both OpenAI and Llama2 models - Add classification script and tests - Improve response optimization with COPRO parameters Key changes: - GuidelineClassifier class for smart guideline activation - Support for both OpenAI and local Llama2 models - Enhanced optimization parameters for better responses - Comprehensive test coverage

jmanhype · 2025-02-20T14:55:07Z

GuidelineClassifier Implementation

The classifier uses DSPy's optimization framework with COPRO to improve classification accuracy:

```python
class GuidelineClassifier:
def init(self, api_key: Optional[str] = None,
model_name: str = 'openai/gpt-3.5-turbo',
metrics: Optional[ModelMetrics] = None,
use_optimizer: bool = True) -> None:
self.metrics = metrics or ModelMetrics()
self.use_optimizer = use_optimizer

    # Configure language model
    if 'ollama' in model_name:
        # For Ollama models, use custom adapter
        ollama_model = model_name.split('/')[1]
        initialize_ollama_model(ollama_model)
        self.lm = OllamaAdapter(model_name)
    else:
        # For OpenAI models
        self.lm = LM(model_name, api_key=api_key)

```

The classifier is designed to be model-agnostic, supporting both cloud-based and local models through a unified interface.

jmanhype · 2025-02-20T14:55:30Z

COPRO Optimization Configuration

Enhanced optimization parameters for better response quality:

```python
optimizer.optimizer = COPRO(
prompt_model=optimizer.lm,
init_temperature=1.0, # Higher temperature for more diverse candidates
breadth=12, # Generate more candidates
depth=4, # More iterations for refinement
threshold=0.5, # More lenient threshold
top_k=3, # Keep top 3 candidates at each step
max_steps=50, # Allow more optimization steps
metric=lambda pred, gold: CustomerServiceProgram()._calculate_response_quality(
pred.get('response', '') if isinstance(pred, dict) else getattr(pred, 'response', ''),
gold.get('response', '') if isinstance(gold, dict) else getattr(gold, 'response', '')
)
)
```

These parameters were tuned to balance between response quality and computational efficiency.

jmanhype · 2025-02-20T14:55:42Z

Test Coverage

Added comprehensive tests for the classifier:

```python
def test_guideline_classifier_prediction():
"""Test that the classifier correctly predicts which guidelines to activate."""
classifier = GuidelineClassifier()

conversation = 'User: I need help with my account\nAssistant: I will help you'
guidelines = ['Account support', 'Technical issues', 'Billing']

result = classifier(conversation=conversation, guidelines=guidelines)
assert isinstance(result, dict)
assert 'activated' in result
assert len(result['activated']) == len(guidelines)
assert result['activated'][0] == True  # Account support should be activated
assert result['activated'][1] == False # Technical issues should not be activated
assert result['activated'][2] == False # Billing should not be activated

```

The tests verify both functionality and output format across different model types.

- Add detailed integration phases - Document implementation details - Specify environment variables - Provide timeline and dependencies

- Add comprehensive DSPy integration section - Document installation and configuration steps - Add code examples with type hints - Include roadmap overview and feature list - Highlight differences from main repository - Add contribution guidelines Part of Phase 1 implementation.

kichanyurd · 2025-02-26T09:56:50Z

Hey @jmanhype awesome initiative!

I'd love a deeper tour of the roadmap here and where you'd like to take this. Could you DM me on Discord to set up a call?

jmanhype added 7 commits February 15, 2025 20:26

fix: Update COPRO configuration to match DSPy 2.5 API

13bfe1d

Improved COPRO optimizer integration and response quality

20d506b

fix: Update example handling in guideline optimizer

fb099c1

- Fixed example creation to use direct field assignment instead of inputs/outputs dict - Updated _calculate_response_quality to handle new example format - Added difflib for better response quality calculation

feat: Add metrics tracking to COPRO optimizer and output logging

af6961d

jmanhype added 2 commits February 20, 2025 09:02

docs: Add DSPy integration roadmap

3e884f3

- Add detailed integration phases - Document implementation details - Specify environment variables - Provide timeline and dependencies

mc-dorzo changed the base branch from main to develop February 20, 2025 15:33

kichanyurd changed the title ~~feat: Add guideline classifier integration~~ [Enhancement] Add guideline classifier integration Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Add guideline classifier integration #294

[Enhancement] Add guideline classifier integration #294

jmanhype commented Feb 20, 2025 •

edited

Loading

jmanhype commented Feb 20, 2025

jmanhype commented Feb 20, 2025

jmanhype commented Feb 20, 2025

kichanyurd commented Feb 26, 2025

[Enhancement] Add guideline classifier integration #294

Are you sure you want to change the base?

[Enhancement] Add guideline classifier integration #294

Conversation

jmanhype commented Feb 20, 2025 • edited Loading

Overview

Key Changes

Implementation Details

Testing

Performance

Notes

Next Steps

jmanhype commented Feb 20, 2025

GuidelineClassifier Implementation

jmanhype commented Feb 20, 2025

COPRO Optimization Configuration

jmanhype commented Feb 20, 2025

Test Coverage

kichanyurd commented Feb 26, 2025

jmanhype commented Feb 20, 2025 •

edited

Loading