UpGrade is an open source A/B testing platform for education software. UpGradeAgent is a chatbot that can make requests to UpGrade's client API endpoints for testing, simulating, and verifying its functionalities based on natural language inputs.
This document describes the MVP design of this chatbot which uses a streamlined 5-node architecture to provide reliable, context-aware conversations about A/B testing operations. The app is built with Python using the LangGraph library and Anthropic Claude Sonnet model.
For the MVP, users interact with the chatbot in the Terminal console (it may later become a Slack bot).
The app prioritizes:
- Accuracy over token cost - Reliable understanding and execution of A/B testing operations
- Intelligent clarification - Asks for clarification on ambiguous queries (e.g., "What's the status?") rather than making assumptions
- Progressive information gathering - Naturally collects missing information through conversation
- Safety first - Always confirms before potentially destructive actions
- Conversational memory - Retains and uses prior context within sessions
UpGradeAgent uses a streamlined 5-node architecture that handles all conversation flows intelligently:
- Conversation Analyzer (LLM) - Intent classification and orchestration
- Information Gatherer (LLM) - Data collection and validation
- Confirmation Handler (Non-LLM) - Safety confirmations for destructive actions
- Tool Executor (Non-LLM) - API execution layer
- Response Generator (LLM) - All user-facing communication
Each node has access to specific tools that match its responsibilities:
- Node-specific tool access: Enforces architectural boundaries and maintains separation of concerns
- Auto-storage pattern: Gatherer tools automatically store results in predictable locations
- Progressive parameter building: Schema tools guide the LLM through complex parameter collection
- Granular error handling: Specific error types (api, auth, validation, not_found, unknown) for better debugging
- Graph Structure - Complete 5-node architecture design and implementation
- Tools - Comprehensive tool specifications and API integration patterns
- Core Terms - Essential UpGrade terminology and concepts
- Assignment Behavior - How assignment rules and consistency work together
- API Reference - Complete API endpoints and request/response examples
Note: Supporting documentation files contain the knowledge base content used by the Information Gatherer node.
- Explain Terms and Concepts - Answer questions about UpGrade terminology and A/B testing
- Assignment Behavior Guidance - Explain how different assignment rules interact
- Schema Information - Provide parameter requirements for various operations
- System Health - Check UpGrade service status and version information
- Context Discovery - List available app contexts and their supported values
- List and Search - Find experiments by name, context, or other criteria
- Create Experiments - Guide users through experiment creation with validation
- Update Experiments - Modify existing experiments with partial updates
- Delete Experiments - Safely remove experiments with confirmation
- Status Management - Start, stop, and change experiment states
- User Initialization - Set up users with group memberships for testing
- Condition Assignment - Simulate users getting assigned to experimental conditions
- Decision Point Tracking - Record when users visit experiment decision points
- Assignment Analysis - Verify condition balance and consistency rule behavior
User: "What contexts are available?"
Bot: Lists all available contexts from UpGrade
User: "Create an experiment called 'Math Hints' in assign-prog context"
Bot: Guides through parameter collection → Confirms creation → Executes
User: "What conditions did user123 get for that experiment?"
Bot: Simulates user assignment and shows conditions
User: "Delete the Math Hints experiment"
Bot: ⚠️ Shows destruction warning → Confirms → Deletes if approved
GET /
- Health check and version infoGET /experiments/contextMetaData
- Get available app contexts and their supported values
GET /experiments/names
- Get all experiment names and IDsGET /experiments
- Get all experiments with optional filteringGET /experiments/single/<experiment_id>
- Get detailed experiment configurationPOST /experiments
- Create new experimentPUT /experiments/<experiment_id>
- Update experiment configuration (supports partial updates)POST /experiments/state
- Update experiment status (start/stop)DELETE /experiments/<experiment_id>
- Delete experiment
POST /v6/init
- Initialize users with group membershipsPOST /v6/assign
- Get experiment condition assignments for usersPOST /v6/mark
- Record decision point visits
The following UpGrade features are not supported in the MVP:
- Within-subjects Experiments (Unit of Assignment)
- Factorial Experiments (Design Type)
- Stratified Random Sampling Experiments (Assignment Algorithm)
- TS Configurable (MOOClet) Experiments (Assignment Algorithm)
- Payload Management - Viewing, adding, or updating payloads from experiments
- Metrics and Logging - Management and logging via
/v6/log
endpoint - Feature Flags - Preview users and feature flag functionality
- Public Segments - Creation, management, or assignment of public segments
- Advanced Analytics - Detailed reporting and statistical analysis
- Bulk Operations - Mass experiment creation or updates
- Import/Export - Experiment configuration import/export
- Terminal Interface Only - Web UI not supported in MVP
- Single Session Memory - No persistence across application restarts
- English Only - No internationalization support
- Basic Error Recovery - Limited retry logic for transient failures
The bot provides helpful error messages for common issues:
- Authentication errors - Clear guidance on token setup
- Validation errors - Specific parameter requirements and fixes
- API errors - User-friendly explanations of service issues
- Not found errors - Suggestions for finding the right resources
- Unknown errors - Graceful degradation with error reporting
/src/
/nodes/ # LangGraph node implementations
/tools/ # Tool functions organized by node
/api/ # UpGrade API client
/exceptions/ # Custom exception types
/models/ # Pydantic data models
/config/ # Configuration management
- Async/await throughout - All API calls and tools are async
- Type safety - Extensive use of TypedDict and Pydantic models
- Error propagation - Structured error handling with specific types
- Auto-storage decorators - Automatic result caching with predictable keys
- Progressive disclosure - Complex operations broken into guided steps