Skip to content

[Bug]: hermes status reports Gateway as 'stopped' when running as PID 1 in Docker/Kubernetes #4776

@denysgaievskyi

Description

@denysgaievskyi

Bug Description

When running hermes gateway as the container entrypoint (PID 1) — the standard deployment model for Docker and Kubernetes — hermes status reports the gateway as stopped, even though it is fully functional and handling requests.

This is because status.py checks systemctl --user is-active hermes-gateway, which returns "inactive" since the process was not started via systemd. The gateway IS running (confirmed via ps aux), but the status check doesn't detect it.

Related to #3724 (which covers the crash when systemctl is missing entirely). This issue is the complementary case: systemctl exists on the host but correctly reports "inactive" because the gateway bypasses systemd.

Steps to Reproduce

  1. Build a Docker image with Hermes installed
  2. Set CMD ["gateway"] or ENTRYPOINT ["hermes", "gateway"]
  3. Run the container (or deploy to Kubernetes)
  4. hermes status inside the container reports:
    ◆ Gateway Service
      Status:       ✗ stopped
      Manager:      systemd (user)
    
  5. But ps aux | grep hermes confirms:
    hermes  1  0.6  2.2 692240 180996 ?  Ssl  13:08  0:02 /usr/bin/python3 /usr/local/bin/hermes gateway
    

Expected Behavior

hermes status should detect that the gateway process is running, regardless of whether it was started via systemd. A fallback check (e.g., checking if PID 1 is hermes gateway, or checking if the gateway port is bound) would correctly report "running" in container environments.

Actual Behavior

Reports "stopped" because the systemd unit is not active, even though the process is alive as PID 1.

Affected Component

CLI (hermes_cli/status.py, lines ~282-299)

Messaging Platform (if gateway-related)

All (Slack, Telegram, etc.) — the gateway serves all platforms

Operating System

Linux (Docker / Kubernetes — tested on EKS with Debian 13.4 base image)

Python Version

3.13.5

Hermes Version

v0.6.0 (commit 49d7210)

Proposed Fix (optional)

Add a process-based fallback when systemd reports inactive:

if sys.platform.startswith('linux'):
    # ... existing systemctl check ...
    if not is_active:
        # Fallback: check if hermes gateway is running as a process
        try:
            result = subprocess.run(
                ["pgrep", "-f", "hermes gateway"],
                capture_output=True, text=True, timeout=5
            )
            is_active = result.returncode == 0
            if is_active:
                print(f"  Status:       ✓ running (direct process, not systemd)")
                print("  Manager:      none (PID 1 / container entrypoint)")
        except Exception:
            pass

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions