Last Updated: October 12, 2025 Status: ✅ PRODUCTION READY - Dual-Mode Orchestration (Frontier + Lightning) Version: V7.2.0
NEW CAPABILITIES:
- ✅ Dual-Mode Orchestration: Frontier (research-quality) + Lightning (fast answers)
- ✅ Frontier Mode: Claude Sonnet 4.5, 10k-25k chars, RAG + Perplexity + LLM, 1.0 questions, 60-90s
- ✅ Lightning Mode: GPT-4o-mini, 2k-5k chars, RAG + LLM only, 0.5 questions, <30s
- ✅ Fractional Question Deduction: 0.5 questions for Lightning, 1.0 for Frontier
- ✅ Performance Optimization: Lightning mode skips Perplexity for 2.4x speedup
- ✅ Dual System Messages: Separate prompts for Frontier (comprehensive) and Lightning (concise)
- ✅ Dynamic Model Selection: Admin configures both models, system selects based on mode
Previous Updates (v7.1):
- ✅ RAG Quality Verification: Complete quality scoring system for all documents (0-100 scale)
- ✅ Admin Quality Dashboard: Document management UI with quality badges and detailed metrics
- ✅ RAG Testing Interface:
/admin/rag-testingfor query testing with relevance scores - ✅ Enhanced User Chat: Prominent sources display with type icons (RAG 📚, Internet 🌐, General 🧠)
- ✅ User Document Upload: Quality validation prevents bad files from entering RAG system
- ✅ Orchestration Transparency: Full visibility into tools used, sources, and processing time
See /docs/handoff/LIGHTNING_MODE_COMPLETE.md for complete dual-mode implementation details.
See /docs/architecture/multimodal-rag-architecture.md for RAG Quality Verification details.
The V7 AI Orchestration system is the core product value of Moj AI. It coordinates multiple AI models and tools to deliver two types of responses:
- Model: Claude Sonnet 4.5 (admin-configurable)
- Tools: RAG + Perplexity + General Knowledge
- Response: 10,000-25,000 characters
- Time: 60-90 seconds
- Cost: 1.0 questions
- Use Case: Comprehensive research, legal analysis, detailed reports
- Model: GPT-4o-mini (admin-configurable)
- Tools: RAG + General Knowledge (Perplexity skipped for speed)
- Response: 2,000-5,000 characters
- Time: <30 seconds
- Cost: 0.5 questions
- Use Case: Quick answers, simple queries, cost-effective responses
Key Principle: ONE main orchestrator LLM per mode with tools at its disposal:
- Perplexity - Internet research tool (Frontier only)
- RAG - Document knowledge base (both modes)
- Orchestrator's general knowledge - Legal reasoning and analysis (both modes)
NOT three separate LLMs in sequence - the orchestrator calls tools as needed.
User Query + Mode Selection (Frontier/Lightning)
↓
┌─────────────────────────────────────────────────────────┐
│ Query Analyzer V7 │
│ - Detects: RAG, Perplexity, Forms, Legal Reasoning │
│ - Lightning Mode: Skip Perplexity for speed ⚡ │
│ - Frontier Mode: Use all tools for quality 🚀 │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Tool Coordinator V7 │
│ - Executes tools in parallel/sequence │
│ - RAG Search (Weaviate) - BOTH MODES │
│ - Internet Research (Perplexity) - FRONTIER ONLY │
│ - Form Discovery - BOTH MODES │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Legal Reasoning Tool V7 │
│ - Frontier: Claude Sonnet 4.5 + Frontier System Msg │
│ - Lightning: GPT-4o-mini + Lightning System Msg │
│ - Selects config based on lightning_mode parameter │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Response Synthesizer V7 │
│ - Combines tool results │
│ - Formats final response │
│ - Adds source citations │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Usage Tracker │
│ - Frontier: Deduct 1.0 questions │
│ - Lightning: Deduct 0.5 questions │
│ - Atomic database-level increments │
└─────────────────────────────────────────────────────────┘
↓
Final Response + Metadata
↓
Response Synthesizer
↓
Final Response (5-10 pages, Slovenian legal format)
| Feature | Frontier Mode 🚀 | Lightning Mode ⚡ |
|---|---|---|
| Model | Claude Sonnet 4.5 | GPT-4o-mini |
| System Message | 3,773 chars (comprehensive) | 2,474 chars (concise) |
| Response Length | 10,000-25,000 chars | 2,000-5,000 chars |
| Processing Time | 60-90 seconds | <30 seconds |
| Tools Used | RAG + Perplexity + LLM | RAG + LLM only |
| Question Cost | 1.0 questions | 0.5 questions |
| Use Case | Research, analysis, reports | Quick answers, simple queries |
| Perplexity | ✅ Enabled | ❌ Skipped for speed |
Admins can configure both modes independently via Admin UI:
Frontier Mode Settings:
- Model:
claude-sonnet-4-5-20250929(or other Claude/GPT models) - System Message: Stored in
admin_settings.frontier_system_message - Max Tokens: 4000
- Temperature: 0.7
Lightning Mode Settings:
- Model:
gpt-4o-mini(or other fast models) - System Message: Stored in
admin_settings.lightning_system_message - Max Tokens: 4000
- Temperature: 0.7
Database Storage:
-- admin_settings table (key-value JSONB)
{
"lead_orchestrator": {"provider": "anthropic", "model": "claude-sonnet-4-5-20250929"},
"lightning_orchestrator": {"provider": "openai", "model": "gpt-4o-mini"},
"frontier_system_message": "You are an expert AI orchestrator...",
"lightning_system_message": "You are an expert AI assistant optimized for fast...",
"api_keys": {"anthropic": "...", "openai": "...", "perplexity": "..."}
}Purpose: Analyze user query and determine which tools are needed
Capabilities:
- Detects need for internet research (keywords: "preišči", "internet", "aktualno")
- Detects need for RAG (cities, districts, legal keywords, uploaded documents)
- Detects need for form discovery (keywords: "obrazec", "formular")
- Always includes legal reasoning (lead orchestrator)
- Always includes quality validation
- NEW: Skips Perplexity in Lightning mode for speed optimization
Output: QueryRequirements object with:
needs: List of tool namescomplexity: SIMPLE, MODERATE, COMPLEXgeographic_scope: City/district if detectedlegal_domain: Type of legal questionconfidence: 0.0-1.0
Lightning Mode Optimization:
# Skip Perplexity in Lightning mode
lightning_mode = user_context.get("lightning_mode", False)
if internet_needed and not lightning_mode:
needs.append("internet_research")
elif internet_needed and lightning_mode:
logger.info("⚡ Internet research SKIPPED (Lightning mode)")Purpose: Execute tools in optimal order and manage context flow
Execution Strategy:
- Gathering Phase (parallel): Internet research, RAG search, form discovery
- Reasoning Phase (sequential): Legal reasoning with context from gathering tools
- Validation Phase: Quality validation of final response
Tools Available:
rag_search- Search uploaded documents (Weaviate)internet_research- Search internet (Perplexity Sonar Pro)form_discovery- Find relevant formslegal_reasoning- Main orchestrator (Claude/GPT)quality_validation- Validate response quality
Purpose: Main orchestrator that performs legal analysis with dual-mode support
Key Features:
- Dual Config Support: Frontier (Claude Sonnet 4.5) + Lightning (GPT-4o-mini)
- Mode Selection: Chooses config based on
lightning_modeparameter - Receives context from RAG and internet research
- Applies mode-specific system message
- Generates mode-appropriate responses (10k-25k for Frontier, 2k-5k for Lightning)
- Includes source citations
Mode Selection Logic:
# Select config and system message based on mode
if lightning_mode:
active_config = self.lightning_config
base_system_message = self.lightning_system_message
mode_name = "LIGHTNING"
else:
active_config = self.frontier_config
base_system_message = self.frontier_system_message
mode_name = "FRONTIER"
logger.info(f"🔧 Using {mode_name} mode: {active_config.provider}/{active_config.model}")System Message Structure:
Mode-Specific System Message (Frontier or Lightning from admin UI)
+
RAG Context (if available)
+
Internet Research Context (if available, Frontier only)
+
Quality Requirements (mode-specific length targets)
Frontier System Message (3,773 chars):
- Comprehensive instructions for research-quality responses
- Target: 10,000-25,000 characters
- Detailed sourcing requirements
- Slovenian legal format (IZVRŠNI POVZETEK, KLJUČNE UGOTOVITVE, etc.)
Lightning System Message (2,474 chars):
- Concise instructions for fast responses
- Target: 2,000-5,000 characters
- Essential sourcing only
- Simplified format for quick answers
Purpose: Combine tool results into final response
Synthesis Strategy:
- Primary: Legal reasoning response (if successful)
- Fallback: Combine internet research + RAG results
- Always: Add sources section with citations
- Format: Slovenian legal structure (IZVRŠNI POVZETEK, KLJUČNE UGOTOVITVE, etc.)
Purpose: Load and cache admin configuration from database
Critical Fix (Oct 9, 2025):
- DECRYPTS API KEYS when loading from database (lines 276-287)
- Keys stored encrypted in
admin_settingstable - Decryption happens ONCE here, not in config builder
Settings Loaded (Dual-Mode):
lead_orchestrator: Frontier mode provider, model, temperature, max_tokenslightning_orchestrator: Lightning mode provider, model, temperature, max_tokensapi_keys: Anthropic, OpenAI, Perplexity (DECRYPTED)tools_config: RAG enabled, Perplexity enabled, modelsfrontier_system_message: Comprehensive system message for Frontier modelightning_system_message: Concise system message for Lightning modequality_settings: Temperature, max_tokens
Caching:
- 5-minute TTL
- Invalidated on admin settings update
- Fresh load on every orchestration request
Purpose: Build V7 config objects from admin settings with dual-mode support
Critical Fix (Oct 9, 2025):
- REMOVED DOUBLE DECRYPTION (lines 116-123, 137-147)
- Now uses already-decrypted keys from admin settings service
- No longer calls
self._decrypt_key()
Dual-Mode Update (Oct 12, 2025):
- Builds TWO configs: Frontier + Lightning
- Extracts
lead_orchestratorfor Frontier mode - Extracts
lightning_orchestratorfor Lightning mode - Extracts both
frontier_system_messageandlightning_system_message - Passes both configs to orchestration engine
Builds:
LeadOrchestratorConfig(Frontier): Provider, model, API key, temperature, max_tokensLightningOrchestratorConfig(Lightning): Provider, model, API key, temperature, max_tokensToolsConfig: RAG/Perplexity enabled, API keys, modelsAdminConfig: Complete configuration object with dual configs
Admin UI (plaintext key)
↓
Backend API (/api/v1/admin/orchestration/settings)
↓
AdminSettingsService.encrypt() [Fernet]
↓
Database (encrypted string)
Database (encrypted string)
↓
AdminSettingsService._load_settings_from_db()
↓
self._encryption.decrypt() [Fernet] ← HAPPENS HERE (ONCE)
↓
Config Builder (uses decrypted keys directly)
↓
Tools (receive valid API keys)
Encryption Details:
- Algorithm: Fernet (symmetric encryption)
- Key: Stored in
.envasAPI_KEY_ENCRYPTION_KEY - Current Key:
8VrK3B1kyrkrO65fvYvkl6FnKWwWJlTDvCKyChOuYig=
Implementation: usage_tracker.py + conversation_production.py
Question Costs:
- Frontier Mode: 1.0 questions
- Lightning Mode: 0.5 questions
Tracking Flow:
# Determine question cost based on mode
question_cost = 0.5 if request.lightning_mode else 1.0
# Track usage with fractional cost
await usage_tracker.track_usage(
tenant_id=str(tenant.id),
user_id=str(user.id),
action_type="question",
tokens_used=orchestration_result.metadata.get("total_tokens", 1000),
cost_estimate=orchestration_result.metadata.get("total_cost_usd", 0.01),
metadata={
"lightning_mode": request.lightning_mode,
"question_cost": question_cost
},
db=self.db,
question_cost=question_cost # NEW: Fractional support
)Database Updates:
# Atomic increment with fractional support
await db.execute(
update(Subscription)
.where(Subscription.id == subscription.id)
.values(questions_used=Subscription.questions_used + question_cost)
)Benefits:
- Users get 2x more Lightning queries for same cost
- Encourages use of fast mode for simple questions
- Reduces infrastructure costs (GPT-4o-mini cheaper than Claude Sonnet 4.5)
- Improves user experience with faster responses
Location: Admin UI → Orchestration → Frontier System Message
Storage: admin_settings table, key: frontier_system_message
Current Version: V7.2.0 (3,773 characters)
Purpose: Comprehensive instructions for research-quality responses
Key Sections:
- Role definition (Slovenian building legislation expert)
- Response quality requirements (10,000-25,000 chars, research-quality)
- Tool usage strategy (RAG-first, then internet, then general knowledge)
- Source citation requirements (inline [1], [2], [3])
- Response format (Slovenian legal structure: IZVRŠNI POVZETEK, KLJUČNE UGOTOVITVE, etc.)
- Language requirements (always Slovenian)
Application:
- Applied to Frontier mode (Claude Sonnet 4.5)
- Combined with RAG + Perplexity context
- Enforces comprehensive response quality
Location: Admin UI → Orchestration → Lightning System Message
Storage: admin_settings table, key: lightning_system_message
Current Version: V7.2.0 (2,474 characters)
Purpose: Concise instructions for fast, accurate responses
Key Sections:
- Role definition (Slovenian building legislation assistant)
- Response quality requirements (2,000-5,000 chars, concise)
- Tool usage strategy (RAG-first, then general knowledge)
- Source citation requirements (essential sources only)
- Response format (simplified structure: KRATEK ODGOVOR, KLJUČNE TOČKE, etc.)
- Language requirements (always Slovenian)
Application:
- Applied to Lightning mode (GPT-4o-mini)
- Combined with RAG context only (no Perplexity)
- Optimized for speed and cost-effectiveness
- Length: 10,000-25,000 characters (10-25 pages)
- Language: Slovenian (always)
- Format: Slovenian legal structure
- IZVRŠNI POVZETEK (Executive Summary)
- KLJUČNE UGOTOVITVE (Key Findings)
- PRAVNA PODLAGA (Legal Basis)
- TRENUTNE TRŽNE RAZMERE (Current Market Conditions)
- PRIPOROČILA (Recommendations)
- VIRI (Sources)
- Sources: Minimum 5-10 sources with inline citations
- Citations: Embedded in text [1], [2], [3] with full list at end
- Quality: Comparable to Claude Sonnet 4.5 research mode with internet access
- Depth: Comprehensive analysis with real-world examples
- Specificity: Exact forms, procedures, calculations
- Actionability: Clear next steps and recommendations
- Length: 2,000-5,000 characters (2-5 pages)
- Language: Slovenian (always)
- Format: Simplified structure
- KRATEK ODGOVOR (Short Answer)
- KLJUČNE TOČKE (Key Points)
- VIRI (Sources)
- Sources: Minimum 1-3 essential sources
- Citations: Essential citations only
- Quality: Accurate, concise, actionable
- Depth: Focused on core information
- Specificity: Key facts and procedures
- Actionability: Clear immediate next steps
Endpoint: /api/v1/chat/progress/{conversation_id}
Progress States:
analyzing- Query analysis in progressreasoning- Tool selection completeresearching- Internet research in progresssearching- RAG search in progressgenerating- Legal reasoning in progressvalidating- Quality validation in progresscomplete- Orchestration complete
Known Issue: Progress bar shows generic "100% complete, validating" instead of real-time tool execution updates. Needs frontend investigation.
Purpose: Show how answer was constructed and all sources used
Contents:
- Version (V7.0.0)
- Processing time
- Confidence score
- LLMs used (list of tools executed)
- Sources (count and list)
- Tool execution details:
- Tool name
- Success/failure
- Response length
- Sources found
- Execution time
Storage: Included in message metadata Display: Icon below response (frontend implementation needed)
Problem: API keys decrypted twice, resulting in empty strings → 401 errors
Fix: Decrypt once in admin_settings_service.py, use directly in config_builder_v7.py
Files:
backend/app/services/orchestration/admin_settings_service.py(lines 276-287)backend/app/services/orchestration_v7/config_builder_v7.py(lines 116-123, 137-147)
Problem: Generic progress instead of actual tool execution updates
Impact: User can't see what orchestration is doing during 5+ minute processing
Next Steps: Investigate SSE implementation in frontend
Problem: Report exists in backend but not displayed in UI
Impact: Users can't verify how answer was constructed
Next Steps: Add icon below response to show report details
- Admin:
produkcija97@gmail.com - User:
blocklabstech@gmail.com
Analiziraj trg Slovenije, preišči internet, nepremičninske oglase, preglej odprte nepremičninske forume in naredi podrobno analizo trga glede na cene gradnje in prodaje stanovanj za zadnje 3 leta (2023, 2024, 2025). Primerjaj 3 leta in naredi analizo?
- ✅ Claude API receives valid decrypted key
- ✅ Legal reasoning tool executes successfully
- ⏳ Response is 10,000-25,000 characters
- ⏳ Response is in Slovenian
- ⏳ Response includes 5+ sources with citations
- ⏳ Orchestration report shows correct tool usage
- ⏳ API usage visible in Anthropic/Perplexity dashboards
- System Overview:
docs/architecture/system-overview.md - Database Schema:
docs/architecture/database-schema.md - Admin Guide:
docs/admin/admin-guide.md - Handoff (Oct 9):
docs/handoff/2025-10-09-api-key-decryption-fix.md - Issue Tracking:
docs/issues/ORCHESTRATION_HANGING_FIX_2025-10-09.md
End of AI Orchestration Architecture Documentation