Skip to content

feat(domain): add graceful degradation framework with multi-layer fallback#803

Merged
diegosouzapw merged 1 commit intodiegosouzapw:mainfrom
igormorais123:feat/graceful-degradation-wrapper
Mar 30, 2026
Merged

feat(domain): add graceful degradation framework with multi-layer fallback#803
diegosouzapw merged 1 commit intodiegosouzapw:mainfrom
igormorais123:feat/graceful-degradation-wrapper

Conversation

@igormorais123
Copy link
Copy Markdown

Summary

  • Add degradation.ts to src/domain/
  • Standardized pattern: withDegradation(feature, primary, fallback, safeDefault)
  • Global registry tracks per-feature degradation status
  • Both async and sync variants
  • Dashboard-ready: summary, report, and per-feature status APIs

Motivation

Many OmniRoute features depend on external systems (Redis, databases, SSH, external APIs). When these fail, the feature shouldn't crash — it should degrade gracefully. This module provides a reusable pattern.

Usage

import { withDegradation } from '@/domain/degradation';

// Rate limiting: Redis → in-memory → permissive
const { result, status } = await withDegradation(
  'rate-limiting',
  () => redisRateLimit(key, limit),
  () => memoryRateLimit(key, limit),
  { allowed: true, remaining: Infinity },
  {
    fullCapability: 'Distributed rate limiting via Redis',
    reducedCapability: 'Single-instance in-memory rate limiting',
    defaultCapability: 'Permissive mode (all requests allowed)',
  }
);

// Dashboard: check all features
const report = getDegradationReport();
// → [{ feature: 'rate-limiting', level: 'reduced', reason: 'ECONNREFUSED' }]

Test plan

  • Primary succeeds → returns result with level "full"
  • Primary fails, fallback succeeds → returns with level "reduced"
  • Both fail → returns safe default with level "default"
  • Registry tracks status changes
  • Summary counts features per level

Closes #799

🤖 Generated with Claude Code

…lback

Add a standardized degradation pattern for services depending on external
systems. withDegradation() tries primary → fallback → safe default,
tracking status in a global registry for dashboard visibility.

Features:
- Async and sync variants
- Global registry with per-feature status tracking
- Degradation levels: full → reduced → minimal → default
- Summary and report APIs for dashboard integration
- Reason tracking for debugging

Example: Rate limiting degrades from Redis → in-memory → permissive
instead of crashing when Redis is unavailable.

Closes diegosouzapw#799
options?.onDegrade?.(status);
return { result: safeDefault, status };
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Error handling inconsistency between async and sync versions.

The async version (withDegradation) extracts error messages properly:

primaryError instanceof Error ? primaryError.message : String(primaryError)

The sync version (withDegradationSync) does not:

reason: `${primaryError}${fallbackError}`

If primaryError or fallbackError is a plain object without a toString() override, this will produce unhelpful output like "[object Object] → [object Object]". Apply the same instanceof Error check to extract meaningful error messages.

@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot bot commented Mar 30, 2026

Code Review Summary

Status: 1 Issue Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
src/domain/degradation.ts 130 Error handling inconsistency between async and sync versions
Files Reviewed (1 file)
  • src/domain/degradation.ts - 1 issue

Other Observations

  • The minimal level in DegradationLevel type and getDegradationSummary() is declared but never set — consider adding documentation or removing if unused.
  • The since timestamp is preserved when level doesn't change (line 201), which is good behavior.
  • Registry is module-level state with no cleanup mechanism other than resetDegradationRegistry() — ensure callers properly reset in testing or long-running processes.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Graceful Degradation Framework designed to manage external service dependencies by providing a structured fallback mechanism (Primary → Fallback → Safe Default). It includes a global registry for tracking feature health and reporting. Feedback focuses on resolving potential race conditions in the global state updates, removing the unused 'minimal' degradation level for consistency, standardizing error handling between the sync and async implementations, and optimizing the reporting logic by moving static configuration out of the sort callback.

Comment on lines +191 to +198
function updateRegistry(feature: string, status: DegradationStatus): void {
const existing = registry.get(feature);
// Only update 'since' if level actually changed
if (existing && existing.level === status.level) {
status.since = existing.since;
}
registry.set(feature, status);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The global registry is a mutable state shared across all requests. The updateRegistry function reads from and then writes to the registry (registry.get() then registry.set()), which is not an atomic operation. This creates a race condition in a concurrent environment where multiple requests could be processing for the same feature.

For example, a 'full' status update from one request could be immediately overwritten by a 'reduced' status from another request that began its read operation before the first request wrote its update. This can lead to an inconsistent and incorrect state in the degradation report.

To address this, you should ensure that updates to the registry for a specific feature are serialized, for instance by using a queue or a locking mechanism for each feature key.

*/

/** Degradation levels from best to worst */
export type DegradationLevel = "full" | "reduced" | "minimal" | "default";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The DegradationLevel type includes "minimal", and the documentation mentions a "Full → Reduced → Minimal → Safe Default" hierarchy. However, the withDegradation and withDegradationSync functions only implement a three-level fallback (primary, fallback, safe default), which correspond to "full", "reduced", and "default" levels. The "minimal" level is never used, which can be confusing for users of this new framework.

You should either update the framework to support a fourth "minimal" level, or remove "minimal" from the DegradationLevel type and update the documentation and related functions (getDegradationReport, getDegradationSummary) to reflect the three-level hierarchy. Removing it seems simpler if a fourth level is not immediately planned.

Suggested change
export type DegradationLevel = "full" | "reduced" | "minimal" | "default";
export type DegradationLevel = "full" | "reduced" | "default";

level: "default",
feature,
capability: options?.defaultCapability ?? "Safe default",
reason: `${primaryError} → ${fallbackError}`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The reason for the default case in withDegradationSync is constructed differently than in withDegradation. It uses string interpolation on error objects, which can lead to unhelpful messages like [object Error]. To ensure consistent and useful error logging, you should adopt the same pattern as the async version, which checks instanceof Error and uses the message property.

Here's a suggested implementation for the catch block:

    } catch (fallbackError) {
      const reason = [
        primaryError instanceof Error ? primaryError.message : String(primaryError),
        fallbackError instanceof Error ? fallbackError.message : String(fallbackError),
      ].join(" → ");

      const status: DegradationStatus = {
        level: "default",
        feature,
        capability: options?.defaultCapability ?? "Safe default",
        reason,
        since: now,
      };
      updateRegistry(feature, status);
      options?.onDegrade?.(status);
      return { result: safeDefault, status };
    }

Comment on lines +205 to +211
const order: Record<DegradationLevel, number> = {
default: 0,
minimal: 1,
reduced: 2,
full: 3,
};
return (order[a.level] ?? 4) - (order[b.level] ?? 4);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The order object is being created inside the sort callback function. This is inefficient as it will be re-created for every comparison made during the sort operation. For better performance, you should define the order constant outside of the sort callback.

Example:

export function getDegradationReport(): DegradationStatus[] {
  const order: Record<DegradationLevel, number> = {
    default: 0,
    minimal: 1,
    reduced: 2,
    full: 3,
  };
  return Array.from(registry.values()).sort(
    (a, b) => (order[a.level] ?? 4) - (order[b.level] ?? 4)
  );
}

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67592d80aa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

onDegrade?: (status: DegradationStatus) => void;
}
): Promise<DegradedResult<T>> {
const now = new Date().toISOString();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Record degradation timestamp at transition time

withDegradation (and its sync variant) captures now before running primary/fallback, then reuses that value for status.since in every outcome. When these calls block on retries/timeouts, the reported since can be significantly earlier than the actual level transition, which skews dashboard durations and any alerting keyed off this field. Capture the timestamp when each final status is built so it reflects the true transition moment.

Useful? React with 👍 / 👎.

Comment on lines +218 to +219
export function getFeatureStatus(feature: string): DegradationStatus | null {
return registry.get(feature) ?? null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return copies instead of mutable registry entries

getFeatureStatus returns the exact DegradationStatus object stored in the module-level registry. Any caller that mutates the returned object will mutate global degradation state in place, bypassing updateRegistry and potentially corrupting hasAnyDegradation()/summary results. Return a cloned snapshot (and similarly clone report entries) to keep registry state encapsulated.

Useful? React with 👍 / 👎.

@diegosouzapw
Copy link
Copy Markdown
Owner

Olá, Professor Igor! Obrigado pela contribuição fantástica! 🛡️

A ideia de um wrapper arquitetural withDegradation para garantirmos estabilidade (Graceful Degradation) nos serviços de terceiros (Redis, APIs, SSH) é uma adição de nível sênior ao ecossistema! Isso evitará falhas em cascata que podem derrubar a aplicação inteira.

Para que possamos avançar com o merge na main, há alguns pontos arquiteturais que necessitam da sua atenção. Aqui vai um breve tutorial prático:

O que precisa ser ajustado

  1. Estado em Memória (Map) vs SQLite (Persistência)
    Seu módulo utiliza um const registry = new Map<string, DegradationStatus>(); para guardar a situação das degradações. A arquitetura do `OmniRoute` usa reinicializações frequentes (Serverless / Clusters / PM2). O estado guardado em memória será perdido.
    Como ajustar: Todo estado que precisa gerar relatório no dashboard deve ser mantido utilizando SQLite (a camada em src/lib/db). Você pode criar uma tabela degradation_status e ler/escrever diretamente nela para que os relatórios sobrevivam a reboots.

  2. Ausência de Integração Real
    Foi criado o módulo base, mas a arquitetura ficou órfã. Onde ele será usado no projeto atual?
    Como ajustar: Por favor, escolha pelo menos um fluxo crítico atual na base do OmniRoute (por exemplo, dentro do executor de provedores ou limitador de taxa / health check) e "embrulhe" withDegradation em uma funcionalidade existente.

  3. Arquivos de Teste
    No "Test plan" consta como resolvido, mas o PR não contém alterações em tests/unit/ e nenhum .test.ts novo foi identificado.

Evolua o seu branch feat/graceful-degradation-wrapper com esses ajustes e nos dê um ping. Teremos um imenso prazer em revisar novamente e aceitar esta ótima estrutura! 🚀

@diegosouzapw diegosouzapw merged commit b6afa6c into diegosouzapw:main Mar 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Graceful Degradation Pattern: Multi-Layer Fallback for External Dependencies

2 participants