Intelligent LLM orchestrator for SAP AI Core with automatic model routing, multi-turn conversations, and rule-based configuration.
✅ Multi-Provider Support
- OpenAI-compatible models via proxy (GPT-4o, GPT-4.1, GPT-4o-mini)
- Claude models via native Bedrock Converse API (Claude 4.5 Opus, Claude 4.5 Sonnet, Claude 4.5 Haiku)
- Extensible architecture for additional providers
✅ Intelligent Auto-Routing
- Automatic model selection based on prompt complexity and task type
- Cost optimization with
--prefer-cheapflag - Rule-based routing from JSON configuration
- Routing explanation with confidence scores
✅ Session Management
- Multi-turn conversations with automatic context preservation
- Session persistence to disk
- Export sessions to markdown format
- Session listing and deletion
✅ JSON-Based Configuration
- Define model capabilities and cost tiers
- Create routing rules with priorities
- Keyword-based and complexity-based matching
- Hot-reloadable configuration
For macOS/Linux users with access to the ausardcompany private tap:
# Add the private tap (requires GitHub authentication)
brew tap ausardcompany/tap git@github.com:ausardcompany/homebrew-tap.git
# Install
brew install alexi
# Use the CLI
alexi chat -m "Hello!"git clone git@github.com:ausardcompany/alexi.git
cd alexi
npm install
npm run buildnpm installCreate a .env file (see .env.example):
# Proxy configuration (for OpenAI-compatible models)
SAP_PROXY_BASE_URL=http://127.0.0.1:3001/v1
SAP_PROXY_API_KEY=your_secret_key
SAP_PROXY_MODEL=gpt-4o
# Native SAP AI Core (for Claude models)
AICORE_SERVICE_KEY='{"clientid":"...","clientsecret":"...","url":"...","serviceurls":{"AI_API_URL":"..."}}'
AICORE_RESOURCE_GROUP=your-resource-group-idnpm run build# Simple chat
alexi chat -m "What is 2+2?"
# Auto-routing with cost optimization
alexi chat -m "Write a function to reverse a string" --auto-route --prefer-cheap
# Continue a conversation
alexi chat -m "Now make it recursive" --session <session-id> --auto-route
# Explain routing decision
alexi explain -m "Prove that sqrt(2) is irrational"The following models are available (configured in routing-config.json):
| Model ID | Type | Cost Tier | Reasoning | Max Tokens | Strengths |
|---|---|---|---|---|---|
gpt-4o-mini |
OpenAI | cheap | ❌ | 16,000 | simple-qa, classification, extraction, summarization |
gpt-4o |
OpenAI | medium | ❌ | 128,000 | coding, analysis, creative-writing, complex-qa, vision |
gpt-4.1 |
OpenAI | expensive | ✅ | 128,000 | deep-reasoning, complex-math, research, advanced-coding |
anthropic--claude-4.5-haiku |
Claude | cheap | ❌ | 200,000 | simple-qa, classification, extraction, summarization |
anthropic--claude-4.5-sonnet |
Claude | medium | ❌ | 200,000 | coding, analysis, long-context, technical-writing |
anthropic--claude-4.5-opus |
Claude | expensive | ✅ | 200,000 | deep-reasoning, complex-analysis, long-context, research |
alexi chat -m "your message" [options]
Options:
-m, --message <text> Message to send (required)
--model <id> Override model selection (e.g., gpt-4o, anthropic--claude-4.5-sonnet)
--auto-route Enable automatic model routing
--prefer-cheap Prefer cheaper models when auto-routing
--session <id> Continue existing session
--system <prompt> System prompt for conversationExamples:
# Use specific model
alexi chat -m "Hello" --model gpt-4o-mini
# Auto-route with cost optimization
alexi chat -m "What is AI?" --auto-route --prefer-cheap
# Continue conversation in session
alexi chat -m "Tell me more" --session abc-123 --auto-routealexi explain -m "your message"Shows:
- Prompt classification (type, complexity, reasoning requirements)
- Matched routing rules
- Model candidates with scores
- Selected model and confidence
Example output:
=== Prompt Analysis ===
Type: deep-reasoning
Complexity: complex
Requires Reasoning: true
Estimated Tokens: 19
=== Matched Rules ===
• reasoning-for-math (priority: 80): Use reasoning models for math problems
=== Model Candidates (by score) ===
✓ gpt-4.1 Score: 120 - expensive tier, strong at deep-reasoning, has reasoning
anthropic--claude-4.5-opus Score: 120 - expensive tier, strong at deep-reasoning, has reasoning
...
=== Selected Model ===
Model: gpt-4.1
Reason: Task type: deep-reasoning, Complexity: complex, requires reasoning
Confidence: 100%
Rule Applied: reasoning-for-math
alexi agent -m "your task" [options]
Options:
-m, --message <text> Task description for the agent (required)
--model <id> Model to use for the agent
--max-iterations <n> Maximum number of agent iterations (default: 10)The agent command runs an autonomous AI that can plan and execute multi-step tasks.
alexi stages [options]
Options:
--list List all available stages
--run <stage> Run a specific stage
--config <file> Path to stages configuration filealexi notes [options]
Options:
--add <note> Add a note to the current session
--list List all notes
--clear Clear all notes
--session <id> Specify session for notesalexi dod [options]
Options:
--check Check if current task meets definition of done
--set <criteria> Set definition of done criteria
--list List current DoD criteriaalexi context [options]
Options:
--add <file> Add file content to context
--clear Clear current context
--show Display current context
--limit <tokens> Set context token limitalexi sessionsalexi session-export -s <session-id> [-o output.md]alexi session-delete -s <session-id>alexi modelsStart interactive mode:
alexi interactive
# or
alexi -iOnce in interactive mode, the following commands are available:
| Command | Description |
|---|---|
/help |
Show available commands |
/model <id> |
Switch to a different model |
/models |
List available models |
/session |
Show current session info |
/sessions |
List all sessions |
/new |
Start a new session |
/load <id> |
Load an existing session |
/export [file] |
Export current session to markdown |
/clear |
Clear conversation history |
/system <prompt> |
Set system prompt |
/auto |
Toggle auto-routing |
/cheap |
Toggle prefer-cheap mode |
/context add <file> |
Add file to context |
/context clear |
Clear context |
/context show |
Show current context |
/notes add <note> |
Add a note |
/notes list |
List all notes |
/notes clear |
Clear all notes |
/quit or /exit |
Exit interactive mode |
Create a routing-config.json in the project root (see routing-config.example.json):
{
"models": [
{
"id": "gpt-4o-mini",
"type": "openai",
"costTier": "cheap",
"strengths": ["simple-qa", "classification", "extraction"],
"maxTokens": 16000,
"reasoning": false,
"enabled": true
}
],
"rules": [
{
"name": "force-claude-for-long-context",
"description": "Use Claude for prompts longer than 10000 characters",
"condition": {
"minLength": 10000
},
"modelId": "anthropic--claude-4.5-sonnet",
"priority": 100
},
{
"name": "reasoning-for-math",
"description": "Use reasoning models for math problems",
"condition": {
"keywords": ["prove", "derive", "equation", "theorem"]
},
"requiresReasoning": true,
"priority": 80
}
],
"preferences": {
"defaultCostTier": "medium",
"preferCheapWhenPossible": false,
"fallbackModel": "gpt-4o"
}
}minLength/maxLength: Character count constraintstaskTypes: Match specific task classifications (e.g.,["simple-qa", "coding"])maxComplexity: Maximum allowed complexity ("simple","medium","complex")keywords: List of keywords to match in prompt (case-insensitive)
The orchestrator automatically selects the appropriate provider based on model ID:
- GPT models → OpenAI-compatible proxy (
/v1/chat/completions) - Claude models → Native Bedrock Converse API (
/converse) - Anthropic models → Anthropic Messages API (
/v1/messages)
- Check for forced model via
--modelflag - If
--auto-routeenabled:- Classify prompt (task type, complexity, reasoning needs)
- Match against routing rules (highest priority wins)
- Score models based on capabilities and cost
- Select best model with confidence score
- Otherwise use default model from environment
- Sessions stored in
~/.alexi/sessions/ - Auto-generated session IDs (UUID)
- Conversation history preserved with token tracking
- Automatic title generation from first message
# Install dependencies
npm install
# Build TypeScript
npm run build
# Run in dev mode with tsx
npm run dev -- chat -m "test"
# Watch mode for development
npm run dev:watchThis bot automatically updates itself by syncing with three upstream AI coding assistant repositories:
| Repository | Description | Source |
|---|---|---|
| kilocode | Kilo AI coding assistant | Kilo-Org/kilocode |
| opencode | OpenCode AI terminal assistant | anomalyco/opencode |
| claude-code | Anthropic's Claude Code CLI | anthropics/claude-code |
The bot runs fully autonomously via GitHub Actions:
┌─────────────────────────────────────────────────────────────────┐
│ GitHub Actions (Daily 06:00 UTC) │
├─────────────────────────────────────────────────────────────────┤
│ 1. Fetch upstream repos (kilocode, opencode, claude-code) │
│ 2. Compare with last synced commits │
│ 3. Generate diff reports │
│ 4. Kilo AI analyzes changes & updates code │
│ 5. Create PR with changes │
│ 6. Auto-merge PR (squash) │
│ 7. Update sync state │
└─────────────────────────────────────────────────────────────────┘
- Automatic: Daily at 06:00 UTC
- Manual: Via GitHub Actions UI with options:
dry_run- Only analyze, don't apply changesforce_sync- Sync even if no changes detected
| Secret | Description |
|---|---|
AICORE_SERVICE_KEY |
Full SAP AI Core service key JSON |
AICORE_RESOURCE_GROUP |
SAP AI Core resource group ID |
GH_PAT |
Personal access token for PR creation and merge |
- Workflow file:
.github/workflows/sync-upstream.yml - Sync state:
.github/last-sync-commits.json - Secrets setup: See
SYNC_SECRETS_SETUP.md
# Dry run - analyze without applying
./scripts/sync-upstream.sh --dry-run --verbose
# Full sync with auto-apply
./scripts/sync-upstream.sh --yesKilo AI analyzes upstream changes and:
- Identifies relevant bug fixes and security updates
- Extracts useful new features
- Adapts code to maintain SAP AI Core compatibility
- Creates detailed change summaries
Test different scenarios:
# Simple query (should use gpt-4o-mini)
alexi explain -m "What is the capital of France?"
# Coding task (should use gpt-4o or anthropic--claude-4.5-sonnet)
alexi explain -m "Write a function to sort an array"
# Complex reasoning (should use gpt-4.1 or anthropic--claude-4.5-opus)
alexi explain -m "Prove the Pythagorean theorem step by step"
# Long context (should use Claude if rule enabled)
alexi explain -m "$(cat very_long_document.txt)"- Streaming support for long responses
- Interactive CLI mode (REPL)
- Function/tool calling support with streaming
- Content filtering (Azure, Llama Guard)
- Data masking (DPI)
- Document grounding
- Translation support
- Embeddings support
- Cost tracking and budget limits
- Token usage analytics
- Channel integrations (Telegram, Slack, WebChat)
- Caching layer for repeated queries
- A/B testing for routing strategies
- Performance metrics and logging
MIT