Skip to content

Token Optimization

Haveapp1 edited this page Aug 22, 2025 · 1 revision

Token Optimization

Advanced strategies for reducing API token usage while maintaining code quality and development speed.

Overview

Token optimization is a core feature of Agentwise that delivers 30-40% reduction in API costs through intelligent compression, context sharing, and adaptive processing techniques. This guide covers all optimization strategies and best practices.

Understanding Token Usage

Token Basics

Tokens are the fundamental units of processing in language models:

  • Input Tokens: Your prompts, context, and code
  • Output Tokens: Generated responses and code
  • Context Tokens: Shared information between agents
// Example token breakdown
{
  "task": "Create React component",
  "token_usage": {
    "input": {
      "prompt": 150,
      "context": 800,
      "code_examples": 400,
      "total": 1350
    },
    "output": {
      "generated_code": 600,
      "comments": 100,
      "total": 700
    },
    "total_cost": 2050
  }
}

Token Cost Factors

  1. Prompt Complexity: Detailed prompts require more tokens
  2. Context Size: Large codebases increase context tokens
  3. Output Length: Complex code generates more output tokens
  4. Agent Communication: Inter-agent messaging uses tokens
  5. Error Recovery: Failed attempts consume additional tokens

Core Optimization Strategies

1. Context Compression

Agentwise uses adaptive compression to reduce context size without losing essential information:

// Before compression (5,000 tokens)
{
  "context": {
    "full_codebase": "...[entire project files]...",
    "dependencies": "...[all package.json content]...",
    "documentation": "...[complete README files]...",
    "history": "...[all previous conversations]..."
  }
}

// After compression (2,000 tokens)
{
  "context": {
    "relevant_files": "...[only modified files]...",
    "key_dependencies": "...[essential packages only]...",
    "summary": "...[compressed documentation]...",
    "recent_changes": "...[last 3 relevant interactions]..."
  }
}

2. Intelligent Context Sharing

Agents share a common context pool to eliminate redundant information:

// Traditional approach (per agent)
{
  "frontend_agent": {
    "context": "project_config + tech_stack + requirements", // 3,000 tokens
  },
  "backend_agent": {
    "context": "project_config + tech_stack + requirements", // 3,000 tokens  
  },
  "total_tokens": 6000
}

// Agentwise approach (shared)
{
  "shared_context": "project_config + tech_stack + requirements", // 3,000 tokens
  "agent_specific": {
    "frontend_agent": "ui_requirements", // 500 tokens
    "backend_agent": "api_requirements"  // 500 tokens
  },
  "total_tokens": 4000,
  "savings": "33%"
}

3. Incremental Processing

Break large tasks into smaller, token-efficient chunks:

// Large task (high token usage)
{
  "task": "Build complete e-commerce application",
  "estimated_tokens": 50000,
  "approach": "monolithic"
}

// Incremental approach (optimized)
{
  "tasks": [
    { "task": "Setup project structure", "tokens": 3000 },
    { "task": "Create user authentication", "tokens": 4000 },
    { "task": "Build product catalog", "tokens": 5000 },
    { "task": "Add shopping cart", "tokens": 4000 }
  ],
  "total_tokens": 16000,
  "savings": "68%"
}

Adaptive Optimization Techniques

1. Dynamic Compression Levels

Agentwise automatically adjusts compression based on available budget:

{
  "compression_strategy": {
    "budget_remaining": 75000,
    "current_task_complexity": "medium",
    "compression_level": "light",
    "techniques": ["summary", "deduplication"]
  }
}

// When budget is low
{
  "compression_strategy": {
    "budget_remaining": 15000,
    "current_task_complexity": "medium", 
    "compression_level": "aggressive",
    "techniques": ["summary", "deduplication", "abstraction", "template_reuse"]
  }
}

2. Smart Template Reuse

Common patterns are templated to reduce repeated generation:

// Template library
{
  "templates": {
    "react_component": {
      "tokens_saved": 200,
      "usage_count": 15,
      "total_savings": 3000
    },
    "express_route": {
      "tokens_saved": 150,
      "usage_count": 8,
      "total_savings": 1200
    },
    "database_model": {
      "tokens_saved": 100,
      "usage_count": 6,
      "total_savings": 600
    }
  }
}

3. Predictive Optimization

Learn from past projects to optimize future token usage:

{
  "optimization_patterns": {
    "project_type": "web_application",
    "learned_optimizations": [
      {
        "pattern": "React component creation",
        "optimization": "Use component template",
        "avg_savings": 180
      },
      {
        "pattern": "API endpoint patterns",
        "optimization": "Reuse route structure",
        "avg_savings": 120
      }
    ]
  }
}

Context Management Strategies

1. Hierarchical Context

Organize context in layers of importance:

{
  "context_hierarchy": {
    "critical": {
      "tokens": 1000,
      "content": "core requirements, main tech stack"
    },
    "important": {
      "tokens": 1500,
      "content": "architectural decisions, key constraints"
    },
    "helpful": {
      "tokens": 2000,
      "content": "examples, documentation, preferences"
    },
    "optional": {
      "tokens": 1000,
      "content": "nice-to-have features, alternative approaches"
    }
  }
}

2. Context Aging

Automatically reduce relevance of old context:

{
  "context_aging": {
    "recent": { "age": "0-30min", "weight": 1.0 },
    "medium": { "age": "30min-2h", "weight": 0.7 },
    "old": { "age": "2h-6h", "weight": 0.4 },
    "stale": { "age": ">6h", "weight": 0.1 }
  }
}

3. Semantic Compression

Use semantic understanding to compress without losing meaning:

// Original context (500 tokens)
{
  "requirement": "The application should have a user authentication system that allows users to register with email and password, login securely, logout, and reset their password if they forget it. The system should validate email formats and password strength."
}

// Compressed context (150 tokens)
{
  "requirement": "User auth: email/password registration, login/logout, password reset, validation"
}

Agent-Specific Optimizations

Frontend Agent Optimizations

{
  "frontend_optimizations": {
    "component_templates": {
      "enabled": true,
      "savings_per_component": 180,
      "library": ["Button", "Input", "Modal", "Card"]
    },
    "style_reuse": {
      "enabled": true,
      "css_templates": ["layout", "theme", "responsive"],
      "savings": 25
    },
    "state_patterns": {
      "enabled": true,
      "common_patterns": ["useState", "useEffect", "useContext"],
      "savings": 50
    }
  }
}

Backend Agent Optimizations

{
  "backend_optimizations": {
    "route_templates": {
      "enabled": true,
      "patterns": ["CRUD", "auth", "middleware"],
      "savings_per_route": 120
    },
    "middleware_reuse": {
      "enabled": true,
      "common_middleware": ["cors", "auth", "validation"],
      "savings": 80
    },
    "database_patterns": {
      "enabled": true,
      "orm_templates": ["model", "migration", "seed"],
      "savings": 100
    }
  }
}

Database Agent Optimizations

{
  "database_optimizations": {
    "schema_templates": {
      "enabled": true,
      "common_schemas": ["user", "product", "order"],
      "savings_per_table": 90
    },
    "migration_patterns": {
      "enabled": true,
      "templates": ["add_column", "create_index", "foreign_key"],
      "savings": 60
    }
  }
}

Budget Management

Dynamic Budget Allocation

{
  "budget_management": {
    "total_budget": 100000,
    "allocation": {
      "reserved": 20000,
      "active_tasks": 60000,
      "optimization_buffer": 20000
    },
    "thresholds": {
      "warning": 0.8,
      "critical": 0.95,
      "emergency_optimization": 0.98
    }
  }
}

Emergency Optimization Modes

When token budget is critically low:

{
  "emergency_mode": {
    "triggered_at": 0.98,
    "optimizations": [
      "maximum_compression",
      "template_only_responses", 
      "minimal_context",
      "agent_pooling",
      "task_deferral"
    ],
    "quality_impact": "minimal",
    "savings": "up_to_60%"
  }
}

Optimization Metrics

Real-Time Monitoring

{
  "optimization_metrics": {
    "current_session": {
      "tokens_saved": 25000,
      "savings_percentage": 35,
      "techniques_used": [
        "context_compression",
        "template_reuse", 
        "smart_caching"
      ]
    },
    "by_technique": {
      "context_compression": { "savings": 12000, "percentage": 48 },
      "template_reuse": { "savings": 8000, "percentage": 32 },
      "smart_caching": { "savings": 5000, "percentage": 20 }
    }
  }
}

Historical Analysis

{
  "historical_metrics": {
    "last_30_days": {
      "total_projects": 45,
      "average_savings": 38,
      "best_savings": 52,
      "techniques_impact": {
        "context_sharing": 15,
        "compression": 12,
        "templates": 8,
        "caching": 3
      }
    }
  }
}

Configuration Options

Optimization Settings

{
  "optimization": {
    "enabled": true,
    "aggressiveness": "adaptive",
    "techniques": {
      "context_compression": {
        "enabled": true,
        "level": "adaptive",
        "preserve_quality": true
      },
      "template_reuse": {
        "enabled": true,
        "auto_create": true,
        "minimum_usage": 3
      },
      "context_sharing": {
        "enabled": true,
        "max_shared_size": 10000
      },
      "smart_caching": {
        "enabled": true,
        "ttl": 3600,
        "max_size": "100MB"
      }
    }
  }
}

Budget Alerts

{
  "budget_alerts": {
    "enabled": true,
    "thresholds": {
      "warning": {
        "percentage": 75,
        "action": "notify"
      },
      "critical": {
        "percentage": 90,
        "action": "enable_aggressive_optimization"
      },
      "emergency": {
        "percentage": 95,
        "action": "emergency_mode"
      }
    }
  }
}

Advanced Techniques

1. Semantic Chunking

Break content into semantically meaningful chunks:

{
  "semantic_chunking": {
    "strategy": "function_based",
    "chunk_size": "adaptive",
    "overlap": "minimal",
    "preservation": ["function_signatures", "key_variables", "comments"]
  }
}

2. Progressive Disclosure

Reveal information progressively as needed:

{
  "progressive_disclosure": {
    "initial_context": "project_overview + immediate_requirements",
    "expansion_triggers": [
      "complexity_increase",
      "agent_request",
      "error_occurrence"
    ],
    "max_expansion": 3
  }
}

3. Context Deduplication

Remove duplicate information across agents:

{
  "deduplication": {
    "enabled": true,
    "strategies": ["exact_match", "semantic_similarity", "hash_based"],
    "threshold": 0.8,
    "preserve_unique": true
  }
}

Quality Preservation

Maintaining Code Quality

Optimization should not compromise output quality:

{
  "quality_preservation": {
    "minimum_quality_score": 8.5,
    "fallback_strategies": [
      "reduce_compression",
      "increase_context",
      "use_full_templates"
    ],
    "quality_metrics": [
      "code_correctness",
      "best_practices",
      "completeness",
      "documentation"
    ]
  }
}

Quality vs. Efficiency Trade-offs

{
  "trade_off_profiles": {
    "maximum_quality": {
      "optimization_level": "light",
      "expected_savings": 15,
      "quality_impact": "none"
    },
    "balanced": {
      "optimization_level": "medium",
      "expected_savings": 35,
      "quality_impact": "minimal"
    },
    "maximum_efficiency": {
      "optimization_level": "aggressive",
      "expected_savings": 55,
      "quality_impact": "slight"
    }
  }
}

Best Practices

Optimization Guidelines

  1. Start Conservative: Begin with light optimization and increase gradually
  2. Monitor Quality: Continuously check output quality metrics
  3. Use Templates: Create reusable templates for common patterns
  4. Share Context: Maximize context sharing between agents
  5. Cache Strategically: Cache frequently used components and patterns

Common Pitfalls

  1. Over-Optimization: Aggressive optimization that degrades quality
  2. Context Loss: Removing essential context information
  3. Template Overuse: Using templates inappropriately
  4. Premature Optimization: Optimizing before understanding patterns

Recommended Settings

Development Environment

{
  "optimization": {
    "aggressiveness": "light",
    "preserve_debugging_info": true,
    "cache_templates": false
  }
}

Production Environment

{
  "optimization": {
    "aggressiveness": "adaptive",
    "preserve_debugging_info": false,
    "cache_templates": true
  }
}

Troubleshooting Optimization

Common Issues

  1. Quality Degradation: Optimization too aggressive
  2. Context Confusion: Important information compressed away
  3. Template Mismatches: Wrong templates applied
  4. Cache Invalidation: Stale cached results

Diagnostic Commands

# Check optimization metrics
npm run optimization:status

# Analyze token usage
npm run tokens:analyze

# Adjust optimization level
npm run optimization:set-level medium

# Clear optimization cache
npm run optimization:clear-cache

For more information, see Configuration, Performance Tuning, or Agent System.

Navigation

πŸš€ Getting Started

πŸ“š Documentation

πŸ› οΈ Development

🎯 Advanced Topics

πŸ“– Resources

βš–οΈ Legal

πŸ”— Quick Links


Support

  • Discord: @vibecodingwithphil
  • GitHub: @VibeCodingWithPhil
Clone this wiki locally