fix: background polling for dashboard health checks (Windows + Linux) by Lightheartdevs · Pull Request #519 · Light-Heart-Labs/DreamServer

Lightheartdevs · 2026-03-21T03:05:27Z

Summary

Dashboard health checks ran on every API request. On Docker Desktop (Windows/WSL2), DNS takes ~4s per non-running service — with 8+ disabled services, each request took 8-16 seconds, making the dashboard unusable.

Fix: Move health checks to a background polling loop. API endpoints return cached results instantly.

BEFORE:  Browser → /api/status → 19 health checks → 8-16s response
AFTER:   Background poll (every 10s) → cache
         Browser → /api/status → read cache → <350ms response

Changes

helpers.py (reverted to near-original, minimal additions):

Restored original shared aiohttp session (removed all caching/semaphore/heuristic workarounds from first attempt)
Increased timeout from 5s → 30s (invisible — only runs in background)
Added asyncio.TimeoutError handling in _check_host_service_health (was raising unhandled exception)
Added get_cached_services() / set_services_cache() cache interface

main.py:

Added _poll_service_health() background task started on app startup
Added _get_services() async helper (cache-or-fallback)
Updated /services, /status, _build_api_status() to read from cache

routers/features.py:

Updated /api/features to read cached services

Test results

Platform	Healthy	Degraded	Response time
Windows Docker Desktop (RTX 5090)	11/11	0	328ms
Linux native Docker (Strix Halo)	18/18	0	~10ms

Behavior

Scenario	What happens
First 2 seconds after startup	No cache — falls back to live check
Normal operation	Background poll every 10s, API reads cache instantly
Service starts/stops	Detected within 10 seconds (next poll)
Poll fails	Logged, retried next cycle. Last good data retained
Multiple browser tabs	All read same cache — zero extra load

🤖 Generated with Claude Code

Root cause: Docker Desktop's embedded DNS takes ~4 seconds to return NXDOMAIN for non-running containers. With 19 services checked concurrently via asyncio.gather, the slow DNS lookups blocked running services from being checked in time, causing everything to show as "degraded" on the dashboard. Fix (three-part): 1. Fresh session per poll cycle — eliminates stale connection pool issues. The global aiohttp session accumulated dead connections from non-running services, poisoning subsequent polls. Now each cycle creates a fresh session with force_close=True and use_dns_cache=False, then closes it. 2. Not-deployed cache with TTL — services that fail DNS get cached for 15 seconds. Subsequent polls skip them entirely, so the slow 4-second DNS lookups only happen once per service. 3. Two-phase polling — Phase 1 returns cached not_deployed results instantly. Phase 2 checks remaining services with a semaphore (limit=4) to prevent DNS contention. Total timeout raised to 30s so the first poll (which has no cache) can complete even with slow DNS. Net effect: first poll takes ~4-5 seconds (DNS for non-deployed services), subsequent polls complete in <50ms. All running services show healthy with 1-5ms response times. No behavior change on native Linux Docker where DNS failures are instant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replaces request-triggered health checks with a background polling loop. API endpoints return cached results instantly (<1ms) instead of running live checks on every request (8-16s on Docker Desktop). Architecture: - Background task polls get_all_services() every 10 seconds - Results stored in module-level cache - All endpoints read from cache, falling back to live check only on first request before the poll completes helpers.py changes (reverted from previous PR, minimal diff): - Restored original shared aiohttp session pattern - Increased total timeout from 5s to 30s (no user impact since it only runs in the background poll) - Added asyncio.TimeoutError handling in _check_host_service_health (bug fix: was raising unhandled NameError) - Added get_cached_services() / set_services_cache() for the background poll to write and endpoints to read main.py changes: - Added _poll_service_health() background task (started on app startup) - Added _get_services() async helper for cache-or-live fallback - Updated /services, /status, _build_api_status() to read from cache routers/features.py: - Updated /api/features to read cached services instead of live check Tested on: - Windows Docker Desktop (RTX 5090): 11 healthy, 0 degraded, <350ms - Linux native Docker (Strix Halo): 18/18 healthy (no regression) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Lightheartdevs and others added 2 commits March 20, 2026 23:05

Lightheartdevs changed the title ~~fix: dashboard health checks on Docker Desktop (Windows/WSL2)~~ fix: background polling for dashboard health checks (Windows + Linux) Mar 21, 2026

Lightheartdevs merged commit 7478e2c into main Mar 21, 2026
16 of 21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: background polling for dashboard health checks (Windows + Linux)#519

fix: background polling for dashboard health checks (Windows + Linux)#519
Lightheartdevs merged 2 commits intomainfrom
fix/dashboard-health-check-windows

Lightheartdevs commented Mar 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Lightheartdevs commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test results

Behavior

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Lightheartdevs commented Mar 21, 2026 •

edited

Loading