feat(sse): add provider diversity scoring via Shannon entropy#793
Conversation
Add a providerDiversity module that tracks provider usage distribution using a rolling time window and calculates Shannon entropy normalized to [0..1]. This enables the auto-combo scoring engine to factor in provider diversity — boosting underrepresented providers to reduce single-point-of-failure risk. Key features: - Rolling window with configurable size and TTL - Shannon entropy calculation normalized to [0..1] - Per-provider diversity boost for auto-combo integration - Diversity report for dashboard display - Full test coverage Closes diegosouzapw#788
| const providerCount = recent.filter((e) => e.provider === provider).length; | ||
|
|
||
| // Inverse usage share: providers used less get higher boost | ||
| const usageShare = providerCount / total; |
There was a problem hiding this comment.
[WARNING]: Non-identifier characters in provider names could cause issues
The code uses provider strings directly in Map lookups and object property access. If a provider name contains special characters like ., /, or other non-identifier characters (e.g., "vertex-ai/gemini-2.0"), this could cause unexpected behavior when used as property keys in providers: Record<string, ...> or in Map operations. Consider validating or normalizing provider names if they might contain such characters.
Code Review SummaryStatus: 1 Issue Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Other Observations (not in diff)Issues found in unchanged code that cannot receive inline comments:
Files Reviewed (2 files)
Positive Observations
Fix Link |
There was a problem hiding this comment.
Code Review
This pull request introduces a provider diversity tracking module that uses Shannon entropy to measure request distribution across different providers, aiming to mitigate single-point-of-failure risks. The feedback identifies several performance optimizations, specifically regarding redundant TTL filtering and duplicate data processing within the scoring and reporting functions. Suggestions were also provided to improve the test suite by isolating test cases and adding coverage for time-based entry pruning.
| const now = Date.now(); | ||
| const cutoff = now - config.ttlMs; | ||
| const recent = usageWindow.filter((e) => e.timestamp >= cutoff); | ||
|
|
||
| if (recent.length === 0) return 1.0; |
There was a problem hiding this comment.
There is redundant logic for filtering expired entries. The recordProviderUsage function is already responsible for pruning the usageWindow based on ttlMs. This function, and others, re-filter the usageWindow which is inefficient and can lead to inconsistent state if the pruning logic diverges.
To improve performance and maintainability, you should remove this filtering logic and operate directly on the usageWindow array, which should be trusted as the source of truth for recent usage. The check for an empty array at line 77 is also redundant with the check at line 71.
| const now = Date.now(); | ||
| const cutoff = now - config.ttlMs; | ||
| const recent = usageWindow.filter((e) => e.timestamp >= cutoff); | ||
|
|
||
| if (recent.length === 0) return 0.5; |
There was a problem hiding this comment.
| const now = Date.now(); | ||
| const cutoff = now - config.ttlMs; | ||
| const recent = usageWindow.filter((e) => e.timestamp >= cutoff); |
| return { | ||
| score: calculateDiversityScore(), | ||
| totalRequests: recent.length, | ||
| providers, | ||
| windowSize: config.windowSize, | ||
| ttlMs: config.ttlMs, | ||
| }; |
There was a problem hiding this comment.
This function calculates provider counts and then calls calculateDiversityScore(), which recalculates the exact same counts internally. This is inefficient as it iterates over the usage window a second time.
You can calculate the diversity score directly within this function using the counts map that has already been computed.
const totalRequests = recent.length;
const score = (() => {
if (totalRequests === 0) return 1.0;
const nUnique = counts.size;
if (nUnique <= 1) return 0.0;
let entropy = 0;
for (const count of counts.values()) {
const p = count / totalRequests;
entropy -= p * Math.log2(p);
}
const maxEntropy = Math.log2(nUnique);
return maxEntropy > 0 ? entropy / maxEntropy : 0;
})();
return {
score,
totalRequests,
providers,
windowSize: config.windowSize,
ttlMs: config.ttlMs,
};| it("higher entropy with more providers", () => { | ||
| // 2 providers | ||
| resetDiversity(); | ||
| for (let i = 0; i < 10; i++) { | ||
| recordProviderUsage("claude"); | ||
| recordProviderUsage("openai"); | ||
| } | ||
| const score2 = calculateDiversityScore(); | ||
|
|
||
| // 4 providers (same total requests) | ||
| resetDiversity(); | ||
| for (let i = 0; i < 5; i++) { | ||
| recordProviderUsage("claude"); | ||
| recordProviderUsage("openai"); | ||
| recordProviderUsage("google"); | ||
| recordProviderUsage("together"); | ||
| } | ||
| const score4 = calculateDiversityScore(); | ||
|
|
||
| // Both should be 1.0 (perfectly distributed within their pool) | ||
| assert.equal(score2, 1.0); | ||
| assert.equal(score4, 1.0); | ||
| }); |
There was a problem hiding this comment.
This test case combines two distinct scenarios (2 providers vs. 4 providers) and includes a manual resetDiversity() call. It's better practice for each test case (it block) to cover a single, isolated scenario. The beforeEach hook already handles resetting state between tests.
Consider splitting this into two separate tests. Note that the 2-provider scenario is already covered by another test, so you might only need to keep the 4-provider scenario in a new, dedicated test.
it("returns 1.0 for perfectly even distribution across 4 providers", () => {
for (let i = 0; i < 5; i++) {
recordProviderUsage("claude");
recordProviderUsage("openai");
recordProviderUsage("google");
recordProviderUsage("together");
}
const score4 = calculateDiversityScore();
assert.equal(score4, 1.0);
});| const report = getDiversityReport(); | ||
| assert.ok(report.totalRequests <= 10, "should not exceed window size"); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
The test suite is missing a case to verify that entries are correctly pruned from the window when they expire based on ttlMs. This is a key part of the window management logic.
I recommend adding a test that uses mock timers to control the passage of time and assert that old entries are removed as expected.
it("prunes entries older than ttlMs", (context) => {
const { mock } = context;
mock.timers.enable();
configureDiversity({ windowSize: 100, ttlMs: 1000 });
recordProviderUsage("claude"); // time = 0
assert.equal(getDiversityReport().totalRequests, 1);
// Advance time, but not enough to expire the first entry
mock.timers.tick(500);
recordProviderUsage("openai"); // time = 500
assert.equal(getDiversityReport().totalRequests, 2);
// Advance time enough to expire the first entry
mock.timers.tick(600); // total elapsed = 1100ms
// This call should trigger pruning of the 'claude' entry
recordProviderUsage("google");
const report = getDiversityReport();
assert.equal(report.totalRequests, 2, "should have pruned the expired entry");
assert.equal(report.providers["claude"], undefined, "'claude' entry should be gone");
assert.ok(report.providers["openai"], "'openai' entry should remain");
});There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 271cf37b8a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| export function configureDiversity(userConfig: Partial<DiversityConfig>): void { | ||
| config = { ...DEFAULT_CONFIG, ...userConfig }; |
There was a problem hiding this comment.
Prune existing usage window when reconfiguring limits
configureDiversity only updates the config object, so shrinking windowSize after data has already been recorded leaves usageWindow oversized until another call to recordProviderUsage. In that interval, calculateDiversityScore() and getDiversityReport() still read all TTL-valid entries, so the reported/requested rolling window semantics are violated and downstream scoring can be based on stale history. Apply window/TTL pruning immediately inside reconfiguration so new limits take effect at once.
Useful? React with 👍 / 👎.
|
Olá, Professor Igor! O que dizer dessa feature sensacional para mensurar Risco vs Dispersão via Entropia de Shannon? 📊🔢 Evitar cenários de Colapso Global do Sistema e balancear por distribuição natural com score de Contudo, para finalizar este trabalho excelente, nós precisamos conectar isso ao motor de Auto-Combs Scoring: Como Finalizados a Integração
Passo a passo da correção:
Estarei retendo este Pull Request apenas temporariamente. Uma vez que você corrigir e empurrar um novo push desses detalhes (adicionando a ponte no scoring do autoCombo |
Summary
providerDiversity.tsmodule toopen-sse/services/autoCombo/getDiversityReport()for dashboard displayMotivation
A system routing 90% of traffic to one provider has catastrophic single-point-of-failure risk. If that provider goes down, 90% of requests fail simultaneously. This module provides a measurable diversity score that can be integrated into the auto-combo scoring engine as an 8th factor.
How it works
Shannon Entropy measures how evenly distributed requests are across providers:
0.0= all requests go to one provider (maximum risk)1.0= perfectly even distribution (minimum risk)The module maintains a rolling window (configurable size + TTL) and provides:
recordProviderUsage(provider)— Call after each successful requestcalculateDiversityScore()— Global entropy scoregetProviderDiversityBoost(provider)— Per-candidate boost for underused providersgetDiversityReport()— Structured report for dashboard/debuggingIntegration suggestion
Add as a scoring factor in
scoring.ts:Test plan
Relates to #788
🤖 Generated with Claude Code