Skip to content

feat(sse): add provider diversity scoring via Shannon entropy#793

Merged
diegosouzapw merged 1 commit intodiegosouzapw:mainfrom
igormorais123:feat/provider-diversity-scoring
Mar 30, 2026
Merged

feat(sse): add provider diversity scoring via Shannon entropy#793
diegosouzapw merged 1 commit intodiegosouzapw:mainfrom
igormorais123:feat/provider-diversity-scoring

Conversation

@igormorais123
Copy link
Copy Markdown

Summary

  • Add providerDiversity.ts module to open-sse/services/autoCombo/
  • Track provider usage distribution using a rolling time window
  • Calculate Shannon entropy normalized to [0..1] for diversity measurement
  • Provide per-provider diversity boost for auto-combo integration
  • Include getDiversityReport() for dashboard display

Motivation

A system routing 90% of traffic to one provider has catastrophic single-point-of-failure risk. If that provider goes down, 90% of requests fail simultaneously. This module provides a measurable diversity score that can be integrated into the auto-combo scoring engine as an 8th factor.

How it works

Shannon Entropy measures how evenly distributed requests are across providers:

  • 0.0 = all requests go to one provider (maximum risk)
  • 1.0 = perfectly even distribution (minimum risk)

The module maintains a rolling window (configurable size + TTL) and provides:

  • recordProviderUsage(provider) — Call after each successful request
  • calculateDiversityScore() — Global entropy score
  • getProviderDiversityBoost(provider) — Per-candidate boost for underused providers
  • getDiversityReport() — Structured report for dashboard/debugging

Integration suggestion

Add as a scoring factor in scoring.ts:

// In ScoringWeights, add:
providerDiversity: 0.05  // Take 0.05 from existing factors

// In calculateFactors, add:
providerDiversity: getProviderDiversityBoost(candidate.provider)

Test plan

  • Unit tests for entropy calculation (0, 1, and intermediate values)
  • Unit tests for per-provider boost
  • Unit tests for window size limits
  • Unit tests for diversity report structure

Relates to #788

🤖 Generated with Claude Code

Add a providerDiversity module that tracks provider usage distribution
using a rolling time window and calculates Shannon entropy normalized
to [0..1]. This enables the auto-combo scoring engine to factor in
provider diversity — boosting underrepresented providers to reduce
single-point-of-failure risk.

Key features:
- Rolling window with configurable size and TTL
- Shannon entropy calculation normalized to [0..1]
- Per-provider diversity boost for auto-combo integration
- Diversity report for dashboard display
- Full test coverage

Closes diegosouzapw#788
Copy link
Copy Markdown

@kilo-code-bot kilo-code-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found in this test file.

Copy link
Copy Markdown

@kilo-code-bot kilo-code-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review comments

const providerCount = recent.filter((e) => e.provider === provider).length;

// Inverse usage share: providers used less get higher boost
const usageShare = providerCount / total;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING]: Non-identifier characters in provider names could cause issues

The code uses provider strings directly in Map lookups and object property access. If a provider name contains special characters like ., /, or other non-identifier characters (e.g., "vertex-ai/gemini-2.0"), this could cause unexpected behavior when used as property keys in providers: Record<string, ...> or in Map operations. Consider validating or normalizing provider names if they might contain such characters.

@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot bot commented Mar 30, 2026

Code Review Summary

Status: 1 Issue Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
open-sse/services/autoCombo/providerDiversity.ts 123 Non-identifier characters in provider names could cause issues
Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

File Line Issue
open-sse/services/autoCombo/providerDiversity.ts 36-37 Module-level mutable state (usageWindow, config) may cause issues in serverless/edge environments where modules can be hot-reloaded. Consider dependency injection or alternative patterns.
Files Reviewed (2 files)
  • open-sse/services/autoCombo/providerDiversity.ts - 1 warning
  • open-sse/services/autoCombo/__tests__/providerDiversity.test.ts - No issues

Positive Observations

  • Shannon entropy calculation is mathematically correct
  • Tests are comprehensive and cover edge cases
  • Good use of interface types for configuration
  • TTL and window size pruning logic is sound
  • Return value defaults (1.0 for no data) provide sensible fallbacks

Fix Link

Fix these issues in Kilo Cloud

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a provider diversity tracking module that uses Shannon entropy to measure request distribution across different providers, aiming to mitigate single-point-of-failure risks. The feedback identifies several performance optimizations, specifically regarding redundant TTL filtering and duplicate data processing within the scoring and reporting functions. Suggestions were also provided to improve the test suite by isolating test cases and adding coverage for time-based entry pruning.

Comment on lines +73 to +77
const now = Date.now();
const cutoff = now - config.ttlMs;
const recent = usageWindow.filter((e) => e.timestamp >= cutoff);

if (recent.length === 0) return 1.0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is redundant logic for filtering expired entries. The recordProviderUsage function is already responsible for pruning the usageWindow based on ttlMs. This function, and others, re-filter the usageWindow which is inefficient and can lead to inconsistent state if the pruning logic diverges.

To improve performance and maintainability, you should remove this filtering logic and operate directly on the usageWindow array, which should be trusted as the source of truth for recent usage. The check for an empty array at line 77 is also redundant with the check at line 71.

Comment on lines +113 to +117
const now = Date.now();
const cutoff = now - config.ttlMs;
const recent = usageWindow.filter((e) => e.timestamp >= cutoff);

if (recent.length === 0) return 0.5;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to calculateDiversityScore, this function redundantly filters the usageWindow by TTL. This work is already done in recordProviderUsage. You should remove this filtering logic and use usageWindow directly instead of recent to improve performance and avoid code duplication.

Comment on lines +138 to +140
const now = Date.now();
const cutoff = now - config.ttlMs;
const recent = usageWindow.filter((e) => e.timestamp >= cutoff);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This function also contains redundant TTL filtering. The usageWindow should be considered already pruned. Please remove this filtering and use usageWindow directly instead of recent throughout the function for better performance and code consistency.

Comment on lines +155 to +161
return {
score: calculateDiversityScore(),
totalRequests: recent.length,
providers,
windowSize: config.windowSize,
ttlMs: config.ttlMs,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function calculates provider counts and then calls calculateDiversityScore(), which recalculates the exact same counts internally. This is inefficient as it iterates over the usage window a second time.

You can calculate the diversity score directly within this function using the counts map that has already been computed.

  const totalRequests = recent.length;
  const score = (() => {
    if (totalRequests === 0) return 1.0;
    const nUnique = counts.size;
    if (nUnique <= 1) return 0.0;

    let entropy = 0;
    for (const count of counts.values()) {
      const p = count / totalRequests;
      entropy -= p * Math.log2(p);
    }

    const maxEntropy = Math.log2(nUnique);
    return maxEntropy > 0 ? entropy / maxEntropy : 0;
  })();

  return {
    score,
    totalRequests,
    providers,
    windowSize: config.windowSize,
    ttlMs: config.ttlMs,
  };

Comment on lines +46 to +68
it("higher entropy with more providers", () => {
// 2 providers
resetDiversity();
for (let i = 0; i < 10; i++) {
recordProviderUsage("claude");
recordProviderUsage("openai");
}
const score2 = calculateDiversityScore();

// 4 providers (same total requests)
resetDiversity();
for (let i = 0; i < 5; i++) {
recordProviderUsage("claude");
recordProviderUsage("openai");
recordProviderUsage("google");
recordProviderUsage("together");
}
const score4 = calculateDiversityScore();

// Both should be 1.0 (perfectly distributed within their pool)
assert.equal(score2, 1.0);
assert.equal(score4, 1.0);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test case combines two distinct scenarios (2 providers vs. 4 providers) and includes a manual resetDiversity() call. It's better practice for each test case (it block) to cover a single, isolated scenario. The beforeEach hook already handles resetting state between tests.

Consider splitting this into two separate tests. Note that the 2-provider scenario is already covered by another test, so you might only need to keep the 4-provider scenario in a new, dedicated test.

    it("returns 1.0 for perfectly even distribution across 4 providers", () => {
      for (let i = 0; i < 5; i++) {
        recordProviderUsage("claude");
        recordProviderUsage("openai");
        recordProviderUsage("google");
        recordProviderUsage("together");
      }
      const score4 = calculateDiversityScore();

      assert.equal(score4, 1.0);
    });

const report = getDiversityReport();
assert.ok(report.totalRequests <= 10, "should not exceed window size");
});
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test suite is missing a case to verify that entries are correctly pruned from the window when they expire based on ttlMs. This is a key part of the window management logic.

I recommend adding a test that uses mock timers to control the passage of time and assert that old entries are removed as expected.

    it("prunes entries older than ttlMs", (context) => {
      const { mock } = context;
      mock.timers.enable();

      configureDiversity({ windowSize: 100, ttlMs: 1000 });

      recordProviderUsage("claude"); // time = 0
      assert.equal(getDiversityReport().totalRequests, 1);

      // Advance time, but not enough to expire the first entry
      mock.timers.tick(500);
      recordProviderUsage("openai"); // time = 500
      assert.equal(getDiversityReport().totalRequests, 2);

      // Advance time enough to expire the first entry
      mock.timers.tick(600); // total elapsed = 1100ms

      // This call should trigger pruning of the 'claude' entry
      recordProviderUsage("google");

      const report = getDiversityReport();
      assert.equal(report.totalRequests, 2, "should have pruned the expired entry");
      assert.equal(report.providers["claude"], undefined, "'claude' entry should be gone");
      assert.ok(report.providers["openai"], "'openai' entry should remain");
    });

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 271cf37b8a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +42 to +43
export function configureDiversity(userConfig: Partial<DiversityConfig>): void {
config = { ...DEFAULT_CONFIG, ...userConfig };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prune existing usage window when reconfiguring limits

configureDiversity only updates the config object, so shrinking windowSize after data has already been recorded leaves usageWindow oversized until another call to recordProviderUsage. In that interval, calculateDiversityScore() and getDiversityReport() still read all TTL-valid entries, so the reported/requested rolling window semantics are violated and downstream scoring can be based on stale history. Apply window/TTL pruning immediately inside reconfiguration so new limits take effect at once.

Useful? React with 👍 / 👎.

@diegosouzapw
Copy link
Copy Markdown
Owner

Olá, Professor Igor! O que dizer dessa feature sensacional para mensurar Risco vs Dispersão via Entropia de Shannon? 📊🔢

Evitar cenários de Colapso Global do Sistema e balancear por distribuição natural com score de diversity (entropia) mitiga nossa chance de termos nosso Proxy OmniRoute fora do ar quando um provedor gigante sofrer instabilidades maiores do que já faz!

Contudo, para finalizar este trabalho excelente, nós precisamos conectar isso ao motor de Auto-Combs Scoring:

Como Finalizados a Integração

  1. Arquivo Modificado ausente (Onde está o Auto-Combo Scoring?)
    O módulo original trabalha por trás das pontuações nos arquivos como open-sse/services/autoCombo/scoring.ts. O PR incluiu as funções maravilhosas, porém nenhum arquivo consumiu. Se não o conectarmos, o score final não irá alterar.

Passo a passo da correção:

  • Edite o seu código para ir na malha principal do motor em open-sse/services/autoCombo/scoring.ts
  • Inclua o cálculo de Provider Diversity (O boost fornecido por seu getProviderDiversityBoost(candidate.provider)) como o oitavo fator de pontuação (que fará a calibragem do ranking durante o roteamento heurístico).
  1. Endpoints de Interface
    O método que você introduziu, getDiversityReport(), retornará dados fantásticos de painel Dashboard. Você não atrelou esse método ao export final para a Dashboard ou a uma rona do Next (src/app/api/...) no backend.

Estarei retendo este Pull Request apenas temporariamente. Uma vez que você corrigir e empurrar um novo push desses detalhes (adicionando a ponte no scoring do autoCombo scoring.ts), o PR passará nos checks de revisão arquitetural! Obrigado pela brilhante engenharia! 🔧🎉

@diegosouzapw diegosouzapw merged commit 04a0b07 into diegosouzapw:main Mar 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants