Skip to content

fix: support macOS system health metrics#4616

Open
jkobject wants to merge 2 commits into
nesquena:masterfrom
jkobject:fix/macos-system-health-metrics
Open

fix: support macOS system health metrics#4616
jkobject wants to merge 2 commits into
nesquena:masterfrom
jkobject:fix/macos-system-health-metrics

Conversation

@jkobject

Copy link
Copy Markdown

Summary

  • add a psutil fallback for aggregate CPU/RAM metrics when Linux procfs is unavailable
  • keep the existing procfs path for Linux system health metrics
  • cover the macOS/no-/proc path with a regression test

Test plan

  • ./scripts/test.sh tests/test_issue693_system_health_panel.py ✅ (8 passed)
  • ruff check api/system_health.py tests/test_issue693_system_health_panel.py
  • git diff --check
  • ./scripts/test.sh ran locally: 9318 passed, 103 skipped, 1 xfailed, 2 xpassed; 4 failures observed from local environment/config unrelated to this patch:
    • local bot_name is Clawd instead of default Hermes
    • TLS warning assertion output was preceded/masked by local missing Hermes Agent deps (requests, etc.)
    • hermes_state import path picks up an incomplete local Hermes Agent checkout

Copilot AI review requested due to automatic review settings June 21, 2026 10:34

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds cross-platform support for the system health (CPU/RAM) metrics by keeping the existing Linux /proc collectors and introducing a psutil fallback path for platforms where procfs is unavailable (e.g., macOS). It also updates regression tests to cover the “no /proc” behavior.

Changes:

  • Add a procfs-first, psutil-fallback implementation for aggregate CPU and memory metrics in api/system_health.py.
  • Add a regression test that simulates missing procfs and verifies the psutil fallback behavior.
  • Add psutil to the minimal server dependencies in requirements.txt.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

File Description
api/system_health.py Adds psutil fallback for CPU/RAM metrics when procfs access fails/unavailable.
tests/test_issue693_system_health_panel.py Adds a regression test for the macOS/no-procfs fallback path; updates safety assertions accordingly.
requirements.txt Adds psutil to minimal dependencies with explanatory comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread api/system_health.py
Comment on lines 10 to 13
import shutil
import time
from importlib import import_module
from datetime import datetime, timezone

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8aa5ea60: the docstring now documents the procfs-first + psutil fallback behavior, while preserving the aggregate/no-private-process-data contract.

@greptile-apps

greptile-apps Bot commented Jun 21, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds a psutil fallback for aggregate CPU and RAM metrics when Linux procfs is unavailable (e.g. macOS), keeping the existing procfs path as the preferred route on Linux. Three new regression tests cover the macOS path, ensure procfs parse RuntimeErrors still propagate, and verify no double-sleep occurs on a mid-request procfs failure.

  • _cpu_percent and _memory_usage now catch OSError (not the broader Exception) to trigger psutil, so internal RuntimeError sentinels from malformed procfs output still surface as error entries.
  • The second-read fallback uses psutil.cpu_percent(interval=0.0) (no sleep) instead of re-sleeping, avoiding the double-_CPU_SAMPLE_SECONDS wall-clock delay previously noted in review.
  • psutil>=5.9 is added to requirements.txt as a hard runtime dependency; the lazy import_module("psutil") pattern keeps the procfs hot path free of psutil overhead.

Confidence Score: 5/5

Safe to merge; fallback paths are narrowly scoped to OSError, parse-level RuntimeErrors propagate correctly, and no double-sleep occurs on partial procfs failure.

All three concerns from prior review rounds are addressed: OSError-only catch keeps parse errors visible, second-read fallback uses interval=0.0, and the new test suite covers each branch explicitly.

No files require special attention.

Important Files Changed

Filename Overview
api/system_health.py Adds OSError-gated psutil fallback for _cpu_percent and _memory_usage; RuntimeError parse errors correctly propagate; double-sleep fixed by using interval=0.0 in second-read fallback
tests/test_issue693_system_health_panel.py Adds three new regression tests covering: psutil fallback on missing procfs, RuntimeError propagation from parse errors, and no-double-sleep in partial-failure path
requirements.txt Adds psutil>=5.9 as a hard runtime dependency for cross-platform CPU/RAM metrics

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_cpu_percent called] --> B[_read_proc_stat_cpu start]
    B -- OSError --> C[psutil.cpu_percent interval=_CPU_SAMPLE_SECONDS]
    B -- success --> D[time.sleep _CPU_SAMPLE_SECONDS]
    D --> E[_read_proc_stat_cpu end]
    E -- OSError --> F[psutil.cpu_percent interval=0.0]
    E -- RuntimeError --> G[propagates to caller]
    E -- success --> H[_cpu_delta_percent]
    C --> R[return clamped percent]
    F --> R
    H --> R
    A2[_memory_usage called] --> B2[_read_meminfo_kib]
    B2 -- OSError --> C2[psutil.virtual_memory]
    C2 --> D2[compute total and available]
    B2 -- success --> E2[parse MemTotal and MemAvailable]
    D2 --> F2[return used/total/percent dict]
    E2 --> F2
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[_cpu_percent called] --> B[_read_proc_stat_cpu start]
    B -- OSError --> C[psutil.cpu_percent interval=_CPU_SAMPLE_SECONDS]
    B -- success --> D[time.sleep _CPU_SAMPLE_SECONDS]
    D --> E[_read_proc_stat_cpu end]
    E -- OSError --> F[psutil.cpu_percent interval=0.0]
    E -- RuntimeError --> G[propagates to caller]
    E -- success --> H[_cpu_delta_percent]
    C --> R[return clamped percent]
    F --> R
    H --> R
    A2[_memory_usage called] --> B2[_read_meminfo_kib]
    B2 -- OSError --> C2[psutil.virtual_memory]
    C2 --> D2[compute total and available]
    B2 -- success --> E2[parse MemTotal and MemAvailable]
    D2 --> F2[return used/total/percent dict]
    E2 --> F2
Loading

Reviews (2): Last reviewed commit: "fix: narrow system health psutil fallbac..." | Re-trigger Greptile

Comment thread api/system_health.py Outdated
Comment thread api/system_health.py Outdated
Comment thread api/system_health.py
@nesquena-hermes nesquena-hermes added the size:M Medium PR (≤10 files, ≤250 LOC) label Jun 21, 2026
@jkobject

Copy link
Copy Markdown
Author

Updated in 8aa5ea60 for the new review feedback:

  • Updated the module docstring: procfs-first, psutil fallback, still aggregate/no private process data.
  • Narrowed CPU/RAM fallbacks from except Exception to except OSError, so procfs parse/invariant RuntimeErrors still surface as system-health errors instead of silently falling back.
  • Split the CPU sampling path so a second /proc/stat read failure after the sleep uses psutil.cpu_percent(interval=0.0) and does not sleep a second time.
  • Added regression tests for procfs parse errors remaining visible and for the no-double-sleep second-read fallback.

Validation:

  • ./scripts/test.sh tests/test_issue693_system_health_panel.py -q10 passed
  • ruff check api/system_health.py tests/test_issue693_system_health_panel.py → passed
  • git diff --check → passed

@nesquena-hermes

Copy link
Copy Markdown
Collaborator

Pulled the branch (8aa5ea60) and read api/system_health.py, requirements.txt, and the new tests against origin/master. The fallback logic is correct and the OSError-only narrowing is the right call. One design question on the dependency declaration that I think deserves a maintainer decision before merge.

The fallback logic is sound

The OSError gate cleanly separates "this platform has no procfs" from "procfs is present but malformed." _read_proc_stat_cpu / _read_meminfo_kib raise RuntimeError on parse/invariant failures (system_health.py:46,49,53,117) and let FileNotFoundError (an OSError) escape when the file is absent. So catching OSError only (:76,82,108) means a missing /proc triggers psutil while a corrupt /proc/stat still surfaces as a real error — exactly right. The second-read fallback using interval=0.0 (:82) avoids the double-sleep correctly since the first _CPU_SAMPLE_SECONDS sleep already elapsed.

The dependency declaration is the one thing to weigh

The PR adds psutil>=5.9 as a hard runtime dependency:

pyyaml>=6.0
cryptography>=42.0
psutil>=5.9

But psutil is only ever reached on the non-Linux fallback path, and the module already degrades gracefully when it's absent. import_module("psutil") raises ModuleNotFoundError (an ImportError, not an OSError), so it propagates past the fallback to build_system_health_payload's collector loop (:175):

except Exception as exc:
    errors.append(_safe_error(name, exc))

which emits a safe {"metric": "cpu", "code": "ModuleNotFoundError"} and the panel reports partial/unavailable. So a macOS user without psutil sees a clean degraded panel rather than a crash — psutil functionally behaves like an optional enhancement, not a hard requirement.

That matters because the repo has a documented stance against exactly this. The comment block right above your change in requirements.txt spells out the optional-dependency pattern for Edge TTS:

OPTIONAL: ... It is intentionally NOT a hard dependency — the /api/tts endpoint returns 503 with an install hint when it's absent.

and api/agent_health.py:7 states the heartbeat was written to work "without shelling out or adding psutil as a hard dependency." Adding psutil to the base requirements.txt makes every install — including the Linux/VPS majority that never touches the fallback — pull a compiled wheel they'll never exercise.

Recommendation

Two coherent options, maintainer's call:

  1. Keep it optional (consistent with the repo's pattern). Drop psutil from requirements.txt, keep the lazy import_module exactly as-is, and lean on the existing graceful degradation. Optionally enrich _safe_error so a ModuleNotFoundError on the fallback path maps to a friendlier "install psutil for macOS metrics" hint, mirroring the TTS 503-with-hint UX.

  2. Keep it hard but document the divergence. If the maintainer wants macOS metrics to work out-of-the-box, the hard dep is defensible — but then the agent_health.py:7 rationale and the "minimal Python dependencies" header are now stale and should be reconciled in the same PR so the next contributor doesn't read contradictory guidance.

I'd lean option 1 since the code already proves the optional path works, but it's a project-philosophy choice rather than a correctness one. Everything else (tests at test_issue693_system_health_panel.py covering the missing-procfs, RuntimeError-propagation, and no-double-sleep branches; 8 passed10 passed) looks merge-ready.

@nesquena-hermes nesquena-hermes left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @jkobject — the macOS procfs fallback is a real gap, and the implementation approach (wrapping the OSError in _cpu_percent / _memory_usage) is the right pattern. The code path is correct: on Linux procfs always works, so psutil is never reached.

One blocker before this can ship: psutil>=5.9 is currently in requirements.txt as a hard dependency, which means every WebUI install (including Linux servers where procfs is always available and psutil is never called) pulls in a binary wheel (~1 MB, C extensions) unconditionally.

The pattern in this repo for optional platform-specific deps is the edge-tts model: it's listed in requirements.txt as a comment (# pip install edge-tts if you want server-side TTS) but not added to the install list. The code then does a try/except ImportError in the actual call site.

Fix needed:

  1. Remove psutil>=5.9 from requirements.txt (or move it to the optional comment section matching the edge-tts pattern)
  2. In _cpu_percent and _memory_usage, change import_module("psutil") to a try/except ImportError that raises RuntimeError("psutil_unavailable") when psutil isn't installed — the panel already handles RuntimeError gracefully (shows "unavailable" for that metric)
  3. Update the test to cover the "psutil not installed" case (monkeypatch sys.modules to simulate missing psutil) alongside the existing fallback test

With those changes, on a Linux server where psutil isn't installed: no behavioural change (procfs is always hit first). On macOS with psutil installed: the fallback works as designed. On macOS without psutil: the metric shows "unavailable" — acceptable.

Happy to re-review once updated. The underlying approach is sound; just the dependency declaration needs to match the opt-in pattern.

@nesquena-hermes nesquena-hermes left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @jkobject — the macOS procfs fallback is a real gap, and the implementation approach (wrapping the OSError in _cpu_percent / _memory_usage) is the right pattern. The code path is correct: on Linux procfs always works, so psutil is never reached.

One blocker before this can ship: psutil>=5.9 is currently in requirements.txt as a hard dependency, which means every WebUI install (including Linux servers where procfs is always available and psutil is never called) pulls in a binary wheel (~1 MB, C extensions) unconditionally.

The pattern in this repo for optional platform-specific deps is the edge-tts model: it's listed in requirements.txt as a comment (# pip install edge-tts if you want server-side TTS) but not added to the install list. The code then does a try/except ImportError in the actual call site.

Fix needed:

  1. Remove psutil>=5.9 from requirements.txt (or move it to the optional comment section matching the edge-tts pattern)
  2. In _cpu_percent and _memory_usage, change import_module("psutil") to a try/except ImportError that raises RuntimeError("psutil_unavailable") when psutil isn't installed — the panel already handles RuntimeError gracefully (shows "unavailable" for that metric)
  3. Update the test to cover the "psutil not installed" case (monkeypatch sys.modules to simulate missing psutil) alongside the existing fallback test

With those changes, on a Linux server where psutil isn't installed: no behavioural change (procfs is always hit first). On macOS with psutil installed: the fallback works as designed. On macOS without psutil: the metric shows "unavailable" — acceptable.

Happy to re-review once updated. The underlying approach is sound; just the dependency declaration needs to match the opt-in pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M Medium PR (≤10 files, ≤250 LOC)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants