Skip to content

feat(sse): add adaptive volume/complexity detector for routing strategy override#794

Merged
diegosouzapw merged 1 commit intodiegosouzapw:mainfrom
igormorais123:feat/adaptive-volume-routing
Mar 30, 2026
Merged

feat(sse): add adaptive volume/complexity detector for routing strategy override#794
diegosouzapw merged 1 commit intodiegosouzapw:mainfrom
igormorais123:feat/adaptive-volume-routing

Conversation

@igormorais123
Copy link
Copy Markdown

Summary

  • Add volumeDetector.ts module to open-sse/services/
  • Detect request characteristics: batch size, token estimate, tool count, browser signals
  • Recommend routing strategy overrides based on detected signals
  • Include complexity classification: trivial → low → medium → high → critical

Motivation

Current combo strategies are static — a batch of 500 classification items and a single critical deployment both use the same routing strategy. This module analyzes the request and recommends strategy overrides:

Condition Override Reason
Batch >= 50 items round-robin + economy Distribute load, minimize cost
Medium batch + low risk cost-optimized Don't waste premium on bulk
Critical complexity priority + premium-first Needs the best model available
Browser/UI interaction priority + premium-first Computer use needs capability
Very short (<200 tokens) Keep strategy, flag economy Overkill to use premium

Integration

Call from combo.ts before selecting a model:

import { detectVolumeSignals, recommendStrategyOverride } from './volumeDetector';

// In handleComboChat:
const signals = detectVolumeSignals(body);
const override = recommendStrategyOverride(signals, combo.strategy);
const effectiveStrategy = override.shouldOverride ? override.strategy : combo.strategy;

Test plan

  • Detects simple single-message requests as trivial
  • Detects tool-heavy requests as critical complexity
  • Detects browser keywords in messages
  • Detects batch from multi-part content
  • Recommends round-robin for large batches
  • Recommends premium-first for browser tasks
  • Flags economy for tiny requests without changing strategy
  • No override for normal medium requests

Relates to #789

🤖 Generated with Claude Code

…gy override

Add volumeDetector module that analyzes request characteristics (batch
size, token count, tool usage, browser signals, complexity keywords)
and recommends routing strategy overrides.

Rules:
- Batch >= 50 items → round-robin with economy models
- Critical complexity (many tools, browser, deploy) → priority premium-first
- Browser/UI interaction → force premium priority
- Short requests (<200 tokens) → flag for economy tier

Closes diegosouzapw#789

// Estimate tokens from serialized message size
const serialized = JSON.stringify(messages);
const estimatedTokens = Math.ceil(serialized.length / 4); // rough: 4 chars ≈ 1 token
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Token estimation is overly simplistic. Using serialized.length / 4 is a rough approximation that can be significantly off for:

  • Code content (tokens are often larger per character)
  • Non-English text
  • Special characters and formatting

This could lead to incorrect routing decisions for requests that fall near complexity thresholds. Consider using a proper tokenizer library or at least documented worst-case assumptions.

const hasBrowser = BROWSER_KEYWORDS.some((kw) => lowerSerialized.includes(kw));

// Detect image content
const hasImages =
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Image detection via string matching can produce false positives:

  • "base64" will match any message containing the word "base64" (e.g., "The system uses base64 encoding")
  • "screenshot" will match text discussing screenshots without actual image data
  • "image_url" could match documentation or variable names

This may cause hasImages to be true when no actual image content is present, potentially routing to premium models unnecessarily.

* @returns Extracted signals
*/
export function detectVolumeSignals(body: Record<string, unknown>): VolumeSignals {
const messages = (body.messages || body.input || []) as unknown[];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: Type safety issue - body.messages is cast to unknown[] without validation. If the API passes a string (e.g., from some providers that use a string for single messages), the subsequent array operations on messages will fail. Consider adding runtime type validation.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a volume and complexity detector for adaptive routing, which analyzes request characteristics like batch size, token estimates, and tool usage to recommend strategy overrides. The feedback suggests tightening the test assertions to ensure specific complexity levels are correctly identified and improving the consistency of the override signaling logic to ensure that flags like economy or premium preferences are always applied correctly even when the strategy name doesn't change.

Comment on lines +63 to +66
assert.ok(
signals.complexity === "critical" || signals.complexity === "high",
`expected critical or high, got ${signals.complexity}`
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The assertion here is too lenient. Given the input Refactor the authentication module for production, the HIGH_COMPLEXITY_KEYWORDS like auth (from authentication) and production will be matched. According to the logic in detectVolumeSignals, this should always result in a critical complexity.

The current check for either critical or high could mask future regressions. The test should be more specific to enforce the expected behavior.

      assert.equal(signals.complexity, "critical", `expected critical, got ${signals.complexity}`);

// Rule 2: Medium batch with low complexity → cost-optimized
if (signals.batchSize >= 10 && signals.complexity === "low") {
return {
shouldOverride: currentStrategy !== "cost-optimized",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for shouldOverride here and in Rule 4 (line 192) is inconsistent with Rule 1 and Rule 3. Rules 1 and 3 always set shouldOverride: true, creating a clear signal to the caller to adopt the new configuration. This rule, however, sets shouldOverride to false if the strategy name already matches.

This can lead to bugs. For example, if the current strategy is cost-optimized but doesn't have preferEconomy: true, this rule will return shouldOverride: false, and the caller might miss applying the preferEconomy flag.

For a cleaner and more robust API contract, I recommend setting shouldOverride: true whenever a rule match recommends a specific configuration. This simplifies the consumer's logic: if shouldOverride is true, apply the new strategy and its associated flags.

Suggested change
shouldOverride: currentStrategy !== "cost-optimized",
shouldOverride: true,

// Rule 4: Browser/UI interaction → force priority with premium
if (signals.hasBrowser) {
return {
shouldOverride: currentStrategy !== "priority",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the comment on Rule 2 (line 170), setting shouldOverride: true here would make the behavior consistent with Rule 3 (critical complexity), which also forces a priority override. This ensures that important flags like forcePremium: true are always applied by the caller when this rule is triggered, creating a more robust and predictable API.

Suggested change
shouldOverride: currentStrategy !== "priority",
shouldOverride: true,

@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot bot commented Mar 30, 2026

Code Review Summary

Status: 3 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 3
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
open-sse/services/volumeDetector.ts 98 Token estimation is overly simplistic and could lead to incorrect routing decisions
open-sse/services/volumeDetector.ts 105 Image detection via string matching can produce false positives
open-sse/services/volumeDetector.ts 79 Type safety issue - body.messages cast without validation could fail at runtime
Files Reviewed (2 files)
  • open-sse/services/volumeDetector.ts - 3 issues
  • open-sse/services/__tests__/volumeDetector.test.ts - 0 issues

Positive observations:

  • Clear JSDoc documentation throughout
  • Well-structured interfaces with proper TypeScript types
  • Comprehensive test coverage with edge cases
  • Rules are well-ordered with early returns
  • Keyword lists are maintainable and documented

Note: The test file could benefit from edge case tests (e.g., empty arrays, null values in body) to ensure robustness, but this is not blocking.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a427d215e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +101 to +102
const lowerSerialized = serialized.toLowerCase();
const hasBrowser = BROWSER_KEYWORDS.some((kw) => lowerSerialized.includes(kw));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include tools when deriving browser interaction signals

hasBrowser is computed only from messages text, so requests that express browser work via structured tools (e.g. computer_use) but neutral prompt text will be misclassified as non-browser. In that case a single-tool request lands at medium complexity and recommendStrategyOverride will not force priority/forcePremium, which is a routing regression for browser-automation workloads this detector is meant to catch.

Useful? React with 👍 / 👎.

Comment on lines +112 to +116
const hasHighKeywords = HIGH_COMPLEXITY_KEYWORDS.some((kw) => lowerSerialized.includes(kw));
let complexity: VolumeSignals["complexity"];

if (toolCount > 3 || (hasBrowser && toolCount > 1) || hasHighKeywords) {
complexity = "critical";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match complexity keywords on word boundaries

High-complexity detection uses raw substring checks, and the keyword list contains "auth"; this makes benign prompts containing words like "author" evaluate as high-risk. That path sets complexity to critical, which then forces premium-first routing in Rule 3 and can unnecessarily raise cost/latency for ordinary requests. Use token/word-boundary matching (or less ambiguous keywords) to avoid these false positives.

Useful? React with 👍 / 👎.

@diegosouzapw
Copy link
Copy Markdown
Owner

Olá, Professor Igor! Fantástica contribuição introduzindo o volume rounter! 📈⚙️

O roteamento adaptativo e ciente do conteúdo das requisições é um dos grandes diferenciais do OmniRoute. A forma como seu código analisa o "batchSize", detecção de browser flags e volume de texto para tomar decisões como "pular ferramentas não essenciais" ou "usar um serviço com melhor custo-benefício" é extremamente elegante! E parabéns pelos testes minuciosos no arquivo de teste incluso.

Existem no entanto pontos obrigatórios para ajustes e integração antes de conseguirmos consolidá-la na branch main:

O Fio Desconectado (Falta a Integração Final)

Seu código construiu uma fechadura de ponta, porém ele esqueceu de ser colocado na porta! O módulo volumeDetector.ts existe, e até na descrição do PR você ilustrou um snippet com um // In handleComboChat... mas se você for ver a mudança enviada, os arquivos open-sse/executors/combo.ts (ou a cadeia correspondente que despacha os requests no Proxy) não foram modificados. A nossa aplicação ainda não sabe da existência deste belo módulo.

Sua missão para concluir este PR:

  1. Volte ao seu ambiente de trabalho na sua branch feat/adaptive-volume-routing.
  2. Vá até a lógica de manipulação principal (o local do executor onde nós escolhemos a Strategy final a ser seguida para a API no payload body).
  3. Importe e chame o seu detectVolumeSignals() ali dentro, avalie usando seu recommendStrategyOverride() e altere efetivamente o model_id ou a strategy que será usada.
  4. Adicione os seus arquivos, faça git commit e um novo git push. A nova atualização vai refletir aqui automaticamente.

Estamos super no aguardo dessa correção pois isso cortará drasticamente os custos da nuvem em requisições de batch e fará o roteamento parecer magica! Muito bom trabalho e foco nessa próxima etapa! 🚀

@diegosouzapw diegosouzapw merged commit 59e48ca into diegosouzapw:main Mar 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants