feat(domain): add graceful degradation framework with multi-layer fallback#803
Conversation
…lback Add a standardized degradation pattern for services depending on external systems. withDegradation() tries primary → fallback → safe default, tracking status in a global registry for dashboard visibility. Features: - Async and sync variants - Global registry with per-feature status tracking - Degradation levels: full → reduced → minimal → default - Summary and report APIs for dashboard integration - Reason tracking for debugging Example: Rate limiting degrades from Redis → in-memory → permissive instead of crashing when Redis is unavailable. Closes diegosouzapw#799
| options?.onDegrade?.(status); | ||
| return { result: safeDefault, status }; | ||
| } | ||
| } |
There was a problem hiding this comment.
WARNING: Error handling inconsistency between async and sync versions.
The async version (withDegradation) extracts error messages properly:
primaryError instanceof Error ? primaryError.message : String(primaryError)The sync version (withDegradationSync) does not:
reason: `${primaryError} → ${fallbackError}`If primaryError or fallbackError is a plain object without a toString() override, this will produce unhelpful output like "[object Object] → [object Object]". Apply the same instanceof Error check to extract meaningful error messages.
Code Review SummaryStatus: 1 Issue Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Files Reviewed (1 file)
Other Observations
|
There was a problem hiding this comment.
Code Review
This pull request introduces a Graceful Degradation Framework designed to manage external service dependencies by providing a structured fallback mechanism (Primary → Fallback → Safe Default). It includes a global registry for tracking feature health and reporting. Feedback focuses on resolving potential race conditions in the global state updates, removing the unused 'minimal' degradation level for consistency, standardizing error handling between the sync and async implementations, and optimizing the reporting logic by moving static configuration out of the sort callback.
| function updateRegistry(feature: string, status: DegradationStatus): void { | ||
| const existing = registry.get(feature); | ||
| // Only update 'since' if level actually changed | ||
| if (existing && existing.level === status.level) { | ||
| status.since = existing.since; | ||
| } | ||
| registry.set(feature, status); | ||
| } |
There was a problem hiding this comment.
The global registry is a mutable state shared across all requests. The updateRegistry function reads from and then writes to the registry (registry.get() then registry.set()), which is not an atomic operation. This creates a race condition in a concurrent environment where multiple requests could be processing for the same feature.
For example, a 'full' status update from one request could be immediately overwritten by a 'reduced' status from another request that began its read operation before the first request wrote its update. This can lead to an inconsistent and incorrect state in the degradation report.
To address this, you should ensure that updates to the registry for a specific feature are serialized, for instance by using a queue or a locking mechanism for each feature key.
| */ | ||
|
|
||
| /** Degradation levels from best to worst */ | ||
| export type DegradationLevel = "full" | "reduced" | "minimal" | "default"; |
There was a problem hiding this comment.
The DegradationLevel type includes "minimal", and the documentation mentions a "Full → Reduced → Minimal → Safe Default" hierarchy. However, the withDegradation and withDegradationSync functions only implement a three-level fallback (primary, fallback, safe default), which correspond to "full", "reduced", and "default" levels. The "minimal" level is never used, which can be confusing for users of this new framework.
You should either update the framework to support a fourth "minimal" level, or remove "minimal" from the DegradationLevel type and update the documentation and related functions (getDegradationReport, getDegradationSummary) to reflect the three-level hierarchy. Removing it seems simpler if a fourth level is not immediately planned.
| export type DegradationLevel = "full" | "reduced" | "minimal" | "default"; | |
| export type DegradationLevel = "full" | "reduced" | "default"; |
| level: "default", | ||
| feature, | ||
| capability: options?.defaultCapability ?? "Safe default", | ||
| reason: `${primaryError} → ${fallbackError}`, |
There was a problem hiding this comment.
The reason for the default case in withDegradationSync is constructed differently than in withDegradation. It uses string interpolation on error objects, which can lead to unhelpful messages like [object Error]. To ensure consistent and useful error logging, you should adopt the same pattern as the async version, which checks instanceof Error and uses the message property.
Here's a suggested implementation for the catch block:
} catch (fallbackError) {
const reason = [
primaryError instanceof Error ? primaryError.message : String(primaryError),
fallbackError instanceof Error ? fallbackError.message : String(fallbackError),
].join(" → ");
const status: DegradationStatus = {
level: "default",
feature,
capability: options?.defaultCapability ?? "Safe default",
reason,
since: now,
};
updateRegistry(feature, status);
options?.onDegrade?.(status);
return { result: safeDefault, status };
}| const order: Record<DegradationLevel, number> = { | ||
| default: 0, | ||
| minimal: 1, | ||
| reduced: 2, | ||
| full: 3, | ||
| }; | ||
| return (order[a.level] ?? 4) - (order[b.level] ?? 4); |
There was a problem hiding this comment.
The order object is being created inside the sort callback function. This is inefficient as it will be re-created for every comparison made during the sort operation. For better performance, you should define the order constant outside of the sort callback.
Example:
export function getDegradationReport(): DegradationStatus[] {
const order: Record<DegradationLevel, number> = {
default: 0,
minimal: 1,
reduced: 2,
full: 3,
};
return Array.from(registry.values()).sort(
(a, b) => (order[a.level] ?? 4) - (order[b.level] ?? 4)
);
}There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 67592d80aa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| onDegrade?: (status: DegradationStatus) => void; | ||
| } | ||
| ): Promise<DegradedResult<T>> { | ||
| const now = new Date().toISOString(); |
There was a problem hiding this comment.
Record degradation timestamp at transition time
withDegradation (and its sync variant) captures now before running primary/fallback, then reuses that value for status.since in every outcome. When these calls block on retries/timeouts, the reported since can be significantly earlier than the actual level transition, which skews dashboard durations and any alerting keyed off this field. Capture the timestamp when each final status is built so it reflects the true transition moment.
Useful? React with 👍 / 👎.
| export function getFeatureStatus(feature: string): DegradationStatus | null { | ||
| return registry.get(feature) ?? null; |
There was a problem hiding this comment.
Return copies instead of mutable registry entries
getFeatureStatus returns the exact DegradationStatus object stored in the module-level registry. Any caller that mutates the returned object will mutate global degradation state in place, bypassing updateRegistry and potentially corrupting hasAnyDegradation()/summary results. Return a cloned snapshot (and similarly clone report entries) to keep registry state encapsulated.
Useful? React with 👍 / 👎.
|
Olá, Professor Igor! Obrigado pela contribuição fantástica! 🛡️ A ideia de um wrapper arquitetural Para que possamos avançar com o merge na O que precisa ser ajustado
Evolua o seu branch |
Summary
degradation.tstosrc/domain/withDegradation(feature, primary, fallback, safeDefault)Motivation
Many OmniRoute features depend on external systems (Redis, databases, SSH, external APIs). When these fail, the feature shouldn't crash — it should degrade gracefully. This module provides a reusable pattern.
Usage
Test plan
Closes #799
🤖 Generated with Claude Code