Skip to content

Hypervisor/SagaOrchestrator: async-without-await methods create a mixed sync/async surface (footgun) #3178

Description

@liamcrumm

Summary

Hypervisor (and SagaOrchestrator) expose a mixed sync/async surface. Several methods are declared async def but never await anything, while sibling methods that operate on the same objects are plain def. This is a footgun: an un-awaited coroutine is silently truthy and the failure surfaces later, far from the call site.

Important correction to the original framing

The original report described this as an "annotation bug" (e.g. async def create_session(...) -> ManagedSession claiming a non-coroutine return). That part is not a bug: annotating an async def with the awaited type is correct Python, and type checkers (mypy/pyright) already understand the call returns a coroutine and would flag .session_id on an un-awaited result. Annotating these as Coroutine[...] / Awaitable[...] would actively mislead type checkers and should not be done.

The genuine issues are:

  1. Async-without-await. In agent-governance-python/agent-hypervisor/src/hypervisor/core.py, create_session, join_session, activate_session, terminate_session, verify_behavior, and monitor_sessions are async def but contain no await (verified: zero internal await points). They could be synchronous.
  2. Mixed surface. get_session (and active_sessions, sessions, session_count, _get_session, _commit_audit, _cleanup_session) are sync, so the same object has some async and some sync methods with no signal of which is which.
  3. SagaOrchestrator. execute_step / compensate are async and their executor / compensator arguments MUST be coroutine functions; passing a plain sync function silently no-ops the awaited call (await sync_fn() raises, but a sync function that returns a non-awaitable is the trap when wrapped). This is the same class of "silently does nothing" failure.

Why a fix is non-trivial (needs a maintainer decision)

The codebase actually leans sync: get_session and all the helpers are already def, and they are called from sync contexts (CLI cmd_inspect / cmd_kill in cli/session_commands.py, and _get_managed in api/server.py). So the consistent surface is sync, not async.

But either direction is a broad, breaking change:

Direction Effect
Make the 6 methods sync (correct/uniform) Touches ~216 await call sites across 6 files (core.py, api/server.py, tests/unit/test_cli.py, tests/integration/test_scenarios.py, tests/integration/test_hypervisor_e2e.py, tests/test_agent_manager.py). Breaks the awaited FastAPI handlers in api/server.py.
Make get_session + helpers async (uniform async) Smaller src change but cascades async into currently-sync CLI/API helpers and keeps the no-await footgun.

This is a public, cross-cutting API-shape change, which per the repo's contribution rules warrants a maintainer decision before implementation.

Evidence

  • inspect.iscoroutinefunction(Hypervisor.create_session) is True; calling it without await returns a coroutine, and coro.session_id is None-ish / attribute access fails later, not at the call site.
  • grep confirms none of the 6 methods contain await.
  • ~216 awaited call sites depend on the current async shape.

Proposed next step

Maintainers to decide sync-vs-async for the Hypervisor session surface (recommendation: sync, since no method awaits and the helpers/callers are already sync). Once decided, implement in one focused PR that updates all call sites and tests, and add a SagaOrchestrator guard or docstring making the coroutine-executor contract explicit.

Filed from a code-review pass; deliberately not auto-fixed because it is a breaking public API-shape change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions