Where the fix lives: the actual line that needs changing is in the private Metta-AI/metta repo at app_backend/src/metta/app_backend/routes/coworld_routes.py:737 (and the symmetric replay-proxy hop at :826). Filing here because Metta-AI/metta has issues disabled, and this surfaces through the public coworld hosted-game CLI.
Summary
Every WebSocket connection to a coworld hosted-game play session is forcibly closed at ~40 seconds, regardless of game state, slot count, or client behavior. The cutoff matches the Python websockets library's default ping watchdog (ping_interval=20 + ping_timeout=20) on the proxy's upstream connection to the in-cluster game pod. This makes hosted-play unusable for any Among Them session, since lobby + RoleReveal alone is ~10 s and any meaningful play sits well above 40 s.
This is hosted-play-only. Tournament/league episodes use direct in-cluster service WS (packages/coworld/src/coworld/runner/kubernetes_runner.py:421) and don't traverse the FastAPI play-session proxy, so they are unaffected.
Reproduction
Fresh coworld hosted-game create cow_a7418f9b-…-91bb9655bc76 --variant default, three configurations, all close at the same wall-clock window:
| Setup |
Frames received |
Closed at |
Close code |
| 1 slot, raw client, no ping, no input echo |
650 |
+27.1 s |
1012 (Service Restart) |
| 1 slot, raw client, no ping, input echo every frame |
971 |
+40.3 s |
1006 (abnormal, no close frame) |
| 8 slots claimed (anonymous), 8 raw clients, all echoing input |
~974 each |
+40.3 s (within 0.1 s of each other) |
1006 |
974 frames at 24 Hz ≈ 40.6 s — matches ping_interval=20 + ping_timeout=20. Session status stays ready/running and frames flow continuously at 24 fps for the whole window — not an idle timeout, not a natural game-end.
Full reproducer (~30 LOC, stdlib + websockets):
import asyncio, json, subprocess, time, urllib.parse, urllib.request
import websockets
CLI = "/path/to/coworld"
SERVER = "https://api.observatory.softmax-research.net"
COWORLD_ID = "cow_a7418f9b-4f4e-4f93-bfa4-91bb9655bc76" # among_them
session = json.loads(subprocess.check_output([
CLI, "hosted-game", "create", COWORLD_ID, "--variant", "default", "--json",
]))
session_id = session["session_id"]
# Anonymous join (no auth) bypasses the same-user-returns-same-slot shortcut
req = urllib.request.Request(
f"{SERVER}/v2/coworlds/play/session/{session_id}/join", method="POST", data=b"")
req.add_header("content-type", "application/json")
with urllib.request.urlopen(req) as r:
join = json.load(r)
ws_url = urllib.parse.parse_qs(urllib.parse.urlsplit(join["player_url"]).query)["address"][0]
async def watch():
start = time.monotonic()
frames = 0
async with websockets.connect(ws_url, ping_interval=None, max_size=None) as ws:
try:
async for msg in ws:
frames += 1
if isinstance(msg, (bytes, bytearray)) and len(msg) == 8192:
await ws.send(bytes([0, 0])) # NOOP input
except websockets.exceptions.ConnectionClosed as exc:
print(f"closed at +{time.monotonic() - start:.1f}s frames={frames} code={exc.code}")
asyncio.run(watch())
Consistent output across runs:
closed at +40.3s frames=971 code=1006
Root cause
The connection chain has two stitched WS sessions:
[client] <— Session A (wss/443) —> [FastAPI proxy] <— Session B (TCP→WS) —> [Among Them (mummy)]
via coworld_play_proxy.py
(raw TCP pipe; not a WS proxy)
Metta-AI/metta:app_backend/src/metta/app_backend/routes/coworld_routes.py:737:
async with websockets.connect(target_url, additional_headers=headers, ssl=ssl_context) as upstream:
await websocket.accept()
upstream_task = asyncio.create_task(_upstream_to_websocket(websocket, upstream))
downstream_task = asyncio.create_task(_websocket_to_upstream(websocket, upstream))
done, pending = await asyncio.wait({upstream_task, downstream_task}, return_when=asyncio.FIRST_COMPLETED)
...
No ping_interval / ping_timeout are passed → websockets.connect uses the library defaults (20 + 20). Session B sends its first ping at t=20 s and gives up if no pong by t=40 s. When Session B closes, asyncio.wait(..., FIRST_COMPLETED) fires, cancels the downstream task, and Session A closes — which is what clients observe.
Either mummy isn't pong-ing for some reason, or pongs aren't surviving the in-cluster TCP pipe (less likely — coworld_play_proxy.py is byte-transparent). Either way the proxy's own watchdog is what ends the session at exactly its default window. 1006 is the code websockets produces on a ping-timeout-driven close (no close frame, just hangs up). The 27 s / 1012 outlier in the no-echo run is a separate, graceful close from upstream (probably a no-client-input watchdog inside the game container).
Suggested fix
One-line change at coworld_routes.py:737 (and the symmetric replay-proxy site at :826). Pick one:
# Option A — disable upstream-side pinging entirely.
# Simple. Relies on TCP/mummy to surface dropped peers.
async with websockets.connect(
target_url, additional_headers=headers, ssl=ssl_context,
ping_interval=None,
) as upstream:
# Option B — pragmatic longer window (recommended).
# Raises the watchdog to 180 s; long enough for a real Among Them game.
async with websockets.connect(
target_url, additional_headers=headers, ssl=ssl_context,
ping_interval=120, ping_timeout=60,
) as upstream:
Option B is the safer middle ground.
Independently, worth checking whether mummy is in fact auto-pong-ing in this deployment — if it isn't, that's a real bug in bitworld/among_them/server.nim and the metta-side change is defense-in-depth.
Impact
coworld hosted-game is currently effectively broken for Among Them — default variant has roleRevealTicks=120 (5 s) and startWaitTicks=120 (5 s) before Playing phase even begins, so by the time a game reaches its first interesting tick the 40 s proxy watchdog has already fired. Anyone using hosted-play for bot-vs-bot or human-vs-bot Among Them sees their connection silently drop ~40 s in.
Summary
Every WebSocket connection to a
coworld hosted-gameplay session is forcibly closed at ~40 seconds, regardless of game state, slot count, or client behavior. The cutoff matches the Pythonwebsocketslibrary's default ping watchdog (ping_interval=20 + ping_timeout=20) on the proxy's upstream connection to the in-cluster game pod. This makes hosted-play unusable for any Among Them session, since lobby + RoleReveal alone is ~10 s and any meaningful play sits well above 40 s.This is hosted-play-only. Tournament/league episodes use direct in-cluster service WS (
packages/coworld/src/coworld/runner/kubernetes_runner.py:421) and don't traverse the FastAPI play-session proxy, so they are unaffected.Reproduction
Fresh
coworld hosted-game create cow_a7418f9b-…-91bb9655bc76 --variant default, three configurations, all close at the same wall-clock window:974 frames at 24 Hz ≈ 40.6 s — matches
ping_interval=20 + ping_timeout=20. Session status staysready/runningand frames flow continuously at 24 fps for the whole window — not an idle timeout, not a natural game-end.Full reproducer (~30 LOC, stdlib +
websockets):Consistent output across runs:
Root cause
The connection chain has two stitched WS sessions:
Metta-AI/metta:app_backend/src/metta/app_backend/routes/coworld_routes.py:737:No
ping_interval/ping_timeoutare passed →websockets.connectuses the library defaults (20 + 20). Session B sends its first ping at t=20 s and gives up if no pong by t=40 s. When Session B closes,asyncio.wait(..., FIRST_COMPLETED)fires, cancels the downstream task, and Session A closes — which is what clients observe.Either mummy isn't pong-ing for some reason, or pongs aren't surviving the in-cluster TCP pipe (less likely —
coworld_play_proxy.pyis byte-transparent). Either way the proxy's own watchdog is what ends the session at exactly its default window. 1006 is the codewebsocketsproduces on a ping-timeout-driven close (no close frame, just hangs up). The 27 s / 1012 outlier in the no-echo run is a separate, graceful close from upstream (probably a no-client-input watchdog inside the game container).Suggested fix
One-line change at
coworld_routes.py:737(and the symmetric replay-proxy site at:826). Pick one:Option B is the safer middle ground.
Independently, worth checking whether
mummyis in fact auto-pong-ing in this deployment — if it isn't, that's a real bug inbitworld/among_them/server.nimand the metta-side change is defense-in-depth.Impact
coworld hosted-gameis currently effectively broken for Among Them —defaultvariant hasroleRevealTicks=120(5 s) andstartWaitTicks=120(5 s) before Playing phase even begins, so by the time a game reaches its first interesting tick the 40 s proxy watchdog has already fired. Anyone using hosted-play for bot-vs-bot or human-vs-bot Among Them sees their connection silently drop ~40 s in.