-
-
Notifications
You must be signed in to change notification settings - Fork 9
Token Optimization
Advanced strategies for reducing API token usage while maintaining code quality and development speed.
Token optimization is a core feature of Agentwise that delivers 30-40% reduction in API costs through intelligent compression, context sharing, and adaptive processing techniques. This guide covers all optimization strategies and best practices.
Tokens are the fundamental units of processing in language models:
- Input Tokens: Your prompts, context, and code
- Output Tokens: Generated responses and code
- Context Tokens: Shared information between agents
// Example token breakdown
{
"task": "Create React component",
"token_usage": {
"input": {
"prompt": 150,
"context": 800,
"code_examples": 400,
"total": 1350
},
"output": {
"generated_code": 600,
"comments": 100,
"total": 700
},
"total_cost": 2050
}
}
- Prompt Complexity: Detailed prompts require more tokens
- Context Size: Large codebases increase context tokens
- Output Length: Complex code generates more output tokens
- Agent Communication: Inter-agent messaging uses tokens
- Error Recovery: Failed attempts consume additional tokens
Agentwise uses adaptive compression to reduce context size without losing essential information:
// Before compression (5,000 tokens)
{
"context": {
"full_codebase": "...[entire project files]...",
"dependencies": "...[all package.json content]...",
"documentation": "...[complete README files]...",
"history": "...[all previous conversations]..."
}
}
// After compression (2,000 tokens)
{
"context": {
"relevant_files": "...[only modified files]...",
"key_dependencies": "...[essential packages only]...",
"summary": "...[compressed documentation]...",
"recent_changes": "...[last 3 relevant interactions]..."
}
}
Agents share a common context pool to eliminate redundant information:
// Traditional approach (per agent)
{
"frontend_agent": {
"context": "project_config + tech_stack + requirements", // 3,000 tokens
},
"backend_agent": {
"context": "project_config + tech_stack + requirements", // 3,000 tokens
},
"total_tokens": 6000
}
// Agentwise approach (shared)
{
"shared_context": "project_config + tech_stack + requirements", // 3,000 tokens
"agent_specific": {
"frontend_agent": "ui_requirements", // 500 tokens
"backend_agent": "api_requirements" // 500 tokens
},
"total_tokens": 4000,
"savings": "33%"
}
Break large tasks into smaller, token-efficient chunks:
// Large task (high token usage)
{
"task": "Build complete e-commerce application",
"estimated_tokens": 50000,
"approach": "monolithic"
}
// Incremental approach (optimized)
{
"tasks": [
{ "task": "Setup project structure", "tokens": 3000 },
{ "task": "Create user authentication", "tokens": 4000 },
{ "task": "Build product catalog", "tokens": 5000 },
{ "task": "Add shopping cart", "tokens": 4000 }
],
"total_tokens": 16000,
"savings": "68%"
}
Agentwise automatically adjusts compression based on available budget:
{
"compression_strategy": {
"budget_remaining": 75000,
"current_task_complexity": "medium",
"compression_level": "light",
"techniques": ["summary", "deduplication"]
}
}
// When budget is low
{
"compression_strategy": {
"budget_remaining": 15000,
"current_task_complexity": "medium",
"compression_level": "aggressive",
"techniques": ["summary", "deduplication", "abstraction", "template_reuse"]
}
}
Common patterns are templated to reduce repeated generation:
// Template library
{
"templates": {
"react_component": {
"tokens_saved": 200,
"usage_count": 15,
"total_savings": 3000
},
"express_route": {
"tokens_saved": 150,
"usage_count": 8,
"total_savings": 1200
},
"database_model": {
"tokens_saved": 100,
"usage_count": 6,
"total_savings": 600
}
}
}
Learn from past projects to optimize future token usage:
{
"optimization_patterns": {
"project_type": "web_application",
"learned_optimizations": [
{
"pattern": "React component creation",
"optimization": "Use component template",
"avg_savings": 180
},
{
"pattern": "API endpoint patterns",
"optimization": "Reuse route structure",
"avg_savings": 120
}
]
}
}
Organize context in layers of importance:
{
"context_hierarchy": {
"critical": {
"tokens": 1000,
"content": "core requirements, main tech stack"
},
"important": {
"tokens": 1500,
"content": "architectural decisions, key constraints"
},
"helpful": {
"tokens": 2000,
"content": "examples, documentation, preferences"
},
"optional": {
"tokens": 1000,
"content": "nice-to-have features, alternative approaches"
}
}
}
Automatically reduce relevance of old context:
{
"context_aging": {
"recent": { "age": "0-30min", "weight": 1.0 },
"medium": { "age": "30min-2h", "weight": 0.7 },
"old": { "age": "2h-6h", "weight": 0.4 },
"stale": { "age": ">6h", "weight": 0.1 }
}
}
Use semantic understanding to compress without losing meaning:
// Original context (500 tokens)
{
"requirement": "The application should have a user authentication system that allows users to register with email and password, login securely, logout, and reset their password if they forget it. The system should validate email formats and password strength."
}
// Compressed context (150 tokens)
{
"requirement": "User auth: email/password registration, login/logout, password reset, validation"
}
{
"frontend_optimizations": {
"component_templates": {
"enabled": true,
"savings_per_component": 180,
"library": ["Button", "Input", "Modal", "Card"]
},
"style_reuse": {
"enabled": true,
"css_templates": ["layout", "theme", "responsive"],
"savings": 25
},
"state_patterns": {
"enabled": true,
"common_patterns": ["useState", "useEffect", "useContext"],
"savings": 50
}
}
}
{
"backend_optimizations": {
"route_templates": {
"enabled": true,
"patterns": ["CRUD", "auth", "middleware"],
"savings_per_route": 120
},
"middleware_reuse": {
"enabled": true,
"common_middleware": ["cors", "auth", "validation"],
"savings": 80
},
"database_patterns": {
"enabled": true,
"orm_templates": ["model", "migration", "seed"],
"savings": 100
}
}
}
{
"database_optimizations": {
"schema_templates": {
"enabled": true,
"common_schemas": ["user", "product", "order"],
"savings_per_table": 90
},
"migration_patterns": {
"enabled": true,
"templates": ["add_column", "create_index", "foreign_key"],
"savings": 60
}
}
}
{
"budget_management": {
"total_budget": 100000,
"allocation": {
"reserved": 20000,
"active_tasks": 60000,
"optimization_buffer": 20000
},
"thresholds": {
"warning": 0.8,
"critical": 0.95,
"emergency_optimization": 0.98
}
}
}
When token budget is critically low:
{
"emergency_mode": {
"triggered_at": 0.98,
"optimizations": [
"maximum_compression",
"template_only_responses",
"minimal_context",
"agent_pooling",
"task_deferral"
],
"quality_impact": "minimal",
"savings": "up_to_60%"
}
}
{
"optimization_metrics": {
"current_session": {
"tokens_saved": 25000,
"savings_percentage": 35,
"techniques_used": [
"context_compression",
"template_reuse",
"smart_caching"
]
},
"by_technique": {
"context_compression": { "savings": 12000, "percentage": 48 },
"template_reuse": { "savings": 8000, "percentage": 32 },
"smart_caching": { "savings": 5000, "percentage": 20 }
}
}
}
{
"historical_metrics": {
"last_30_days": {
"total_projects": 45,
"average_savings": 38,
"best_savings": 52,
"techniques_impact": {
"context_sharing": 15,
"compression": 12,
"templates": 8,
"caching": 3
}
}
}
}
{
"optimization": {
"enabled": true,
"aggressiveness": "adaptive",
"techniques": {
"context_compression": {
"enabled": true,
"level": "adaptive",
"preserve_quality": true
},
"template_reuse": {
"enabled": true,
"auto_create": true,
"minimum_usage": 3
},
"context_sharing": {
"enabled": true,
"max_shared_size": 10000
},
"smart_caching": {
"enabled": true,
"ttl": 3600,
"max_size": "100MB"
}
}
}
}
{
"budget_alerts": {
"enabled": true,
"thresholds": {
"warning": {
"percentage": 75,
"action": "notify"
},
"critical": {
"percentage": 90,
"action": "enable_aggressive_optimization"
},
"emergency": {
"percentage": 95,
"action": "emergency_mode"
}
}
}
}
Break content into semantically meaningful chunks:
{
"semantic_chunking": {
"strategy": "function_based",
"chunk_size": "adaptive",
"overlap": "minimal",
"preservation": ["function_signatures", "key_variables", "comments"]
}
}
Reveal information progressively as needed:
{
"progressive_disclosure": {
"initial_context": "project_overview + immediate_requirements",
"expansion_triggers": [
"complexity_increase",
"agent_request",
"error_occurrence"
],
"max_expansion": 3
}
}
Remove duplicate information across agents:
{
"deduplication": {
"enabled": true,
"strategies": ["exact_match", "semantic_similarity", "hash_based"],
"threshold": 0.8,
"preserve_unique": true
}
}
Optimization should not compromise output quality:
{
"quality_preservation": {
"minimum_quality_score": 8.5,
"fallback_strategies": [
"reduce_compression",
"increase_context",
"use_full_templates"
],
"quality_metrics": [
"code_correctness",
"best_practices",
"completeness",
"documentation"
]
}
}
{
"trade_off_profiles": {
"maximum_quality": {
"optimization_level": "light",
"expected_savings": 15,
"quality_impact": "none"
},
"balanced": {
"optimization_level": "medium",
"expected_savings": 35,
"quality_impact": "minimal"
},
"maximum_efficiency": {
"optimization_level": "aggressive",
"expected_savings": 55,
"quality_impact": "slight"
}
}
}
- Start Conservative: Begin with light optimization and increase gradually
- Monitor Quality: Continuously check output quality metrics
- Use Templates: Create reusable templates for common patterns
- Share Context: Maximize context sharing between agents
- Cache Strategically: Cache frequently used components and patterns
- Over-Optimization: Aggressive optimization that degrades quality
- Context Loss: Removing essential context information
- Template Overuse: Using templates inappropriately
- Premature Optimization: Optimizing before understanding patterns
{
"optimization": {
"aggressiveness": "light",
"preserve_debugging_info": true,
"cache_templates": false
}
}
{
"optimization": {
"aggressiveness": "adaptive",
"preserve_debugging_info": false,
"cache_templates": true
}
}
- Quality Degradation: Optimization too aggressive
- Context Confusion: Important information compressed away
- Template Mismatches: Wrong templates applied
- Cache Invalidation: Stale cached results
# Check optimization metrics
npm run optimization:status
# Analyze token usage
npm run tokens:analyze
# Adjust optimization level
npm run optimization:set-level medium
# Clear optimization cache
npm run optimization:clear-cache
For more information, see Configuration, Performance Tuning, or Agent System.
Support
- Discord: @vibecodingwithphil
- GitHub: @VibeCodingWithPhil