Skip to content

Commit 387b215

Browse files
ColonistOneclaude
andcommitted
feat: persist JWT to disk so process restarts skip /auth/token (v1.12.0)
Cross-process JWT cache at ~/.cache/colony-sdk/ (XDG-aware). The existing in-memory `_token` cache survives only the lifetime of a `ColonyClient` instance; every fresh process re-auths against /auth/token, which the server rate-limits to 100/hr/IP. A single host running ~10 short-lived SDK scripts plus four supervisor-rotated dogfood agents can exhaust that budget in an hour. This change persists the access_token + expiry to disk in ~/.cache/colony-sdk/<sha256(base_url|api_key)[:16]>.json (mode 0600, atomic write). New processes for the same (base_url, api_key) pair read the cached token before paying for /auth/token. A 60s safety margin avoids handing out a token that's about to expire. Cache invalidation: - refresh_token() clears both in-memory + on-disk - rotate_key() clears the OLD key's cache file BEFORE flipping api_key - 401 responses clear the disk cache so a stale token can't resurrect across processes Opt-out: - per-client: ColonyClient(..., cache_token=False) - global: COLONY_SDK_NO_TOKEN_CACHE=1 Test sandboxing: - COLONY_SDK_TOKEN_CACHE_DIR overrides cache dir (used by tests) - new tests/conftest.py autouse fixture routes all tests to tmp_path so token writes never leak into the real ~/.cache during dev Mirrored in AsyncColonyClient — sync + async share the same cache file for the same (base_url, api_key) pair. 11 new tests in TestTokenCachePersistence covering: first-write, load-from-disk, expired-token miss, corrupt-cache fallthrough, both opt-out paths, per-key and per-base-url isolation, refresh_token side effects, 401 invalidation, and safety-margin behaviour. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 4b33e53 commit 387b215

7 files changed

Lines changed: 654 additions & 4 deletions

File tree

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,17 @@
11
# Changelog
22

3+
## 1.12.0 — 2026-05-23
4+
5+
### New features
6+
7+
- **Cross-process JWT cache.** The SDK now persists the `/auth/token` response to disk in `~/.cache/colony-sdk/<sha256(base_url|api_key)[:16]>.json` (XDG-aware: honors `XDG_CACHE_HOME`, overridable via `COLONY_SDK_TOKEN_CACHE_DIR`). A new `ColonyClient(..., cache_token=True)` constructor arg (default-on) enables the disk cache; per-client opt-out is `cache_token=False` and global opt-out is `COLONY_SDK_NO_TOKEN_CACHE=1`. Cache writes are atomic (tmpfile + rename) and mode-0600 so a co-tenant on the same host cannot read another user's token. Reads and writes are best-effort — any IO error silently falls through to a fresh `/auth/token` call, so cache correctness is never load-bearing.
8+
9+
Closes the failure mode that surfaced on 2026-05-23 where a single host running ~10 short-lived SDK scripts plus four supervisor-rotated dogfood agents hit the server-side 100/hr/IP rate limit on `/auth/token`. Each fresh `ColonyClient` instance previously re-authed from zero; with this PR a new process for the same `(base_url, api_key)` pair reuses the on-disk token instead, as long as it has > 60s of life remaining (the safety margin guards against a token expiring mid-request).
10+
11+
The cache key includes both `base_url` and `api_key` so the same key used against prod vs staging gets independent cache files. `refresh_token()`, `rotate_key()`, and the auto-401-refresh path all invalidate the on-disk cache so a stale token cannot resurrect itself across processes. Mirrored in `AsyncColonyClient` (same cache file format and location — sync and async clients can share the cache for the same `(base_url, api_key)` pair).
12+
13+
11 new regression tests in `test_client.py::TestTokenCachePersistence`. A new `tests/conftest.py` autouse fixture routes the cache to a per-test `tmp_path` so existing tests don't leak token files into the developer's real cache dir.
14+
315
## 1.11.0 — 2026-05-18
416

517
### New methods

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "colony-sdk"
7-
version = "1.11.1"
7+
version = "1.12.0"
88
description = "Python SDK for The Colony (thecolony.cc) — the official Python client for the AI agent internet"
99
readme = "README.md"
1010
license = {text = "MIT"}

src/colony_sdk/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ async def main():
6161
from colony_sdk.async_client import AsyncColonyClient
6262
from colony_sdk.testing import MockColonyClient
6363

64-
__version__ = "1.11.1"
64+
__version__ = "1.12.0"
6565
__all__ = [
6666
"COLONIES",
6767
"AsyncColonyClient",

src/colony_sdk/async_client.py

Lines changed: 107 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ async def main():
3232
import asyncio
3333
import json
3434
from collections.abc import AsyncIterator
35+
from pathlib import Path
3536
from types import TracebackType
3637
from typing import Any
3738

@@ -87,12 +88,19 @@ def __init__(
8788
client: httpx.AsyncClient | None = None,
8889
retry: RetryConfig | None = None,
8990
typed: bool = False,
91+
cache_token: bool = True,
9092
):
9193
self.api_key = api_key
9294
self.base_url = base_url.rstrip("/")
9395
self.timeout = timeout
9496
self.retry = retry if retry is not None else RetryConfig()
9597
self.typed = typed
98+
# `cache_token=True` (default) persists the JWT to disk in
99+
# `~/.cache/colony-sdk/` (XDG-aware), shared with the sync
100+
# `ColonyClient` — same (base_url, api_key) pair, same file.
101+
# Disable per-client by passing False, or globally with
102+
# `COLONY_SDK_NO_TOKEN_CACHE=1`.
103+
self.cache_token = cache_token
96104
self._token: str | None = None
97105
self._token_expiry: float = 0
98106
self._client = client
@@ -191,11 +199,98 @@ def _get_client(self) -> httpx.AsyncClient:
191199

192200
# ── Auth ──────────────────────────────────────────────────────────
193201

202+
def _token_cache_enabled(self) -> bool:
203+
"""True if the on-disk JWT cache is active for this client. Mirrors sync."""
204+
from colony_sdk.client import _token_cache_disabled_via_env
205+
206+
if not self.cache_token:
207+
return False
208+
return not _token_cache_disabled_via_env()
209+
210+
def _cached_token_path(self) -> Path:
211+
from colony_sdk.client import _token_cache_path
212+
213+
return _token_cache_path(self.api_key, self.base_url)
214+
215+
def _load_cached_token(self) -> bool:
216+
"""Hydrate `self._token` from the on-disk cache if a valid one exists.
217+
218+
Identical contract to the sync version — see
219+
:meth:`ColonyClient._load_cached_token`. Shared cache file so a
220+
token written by the sync client is readable by the async client
221+
and vice versa.
222+
"""
223+
import time
224+
225+
from colony_sdk.client import _TOKEN_CACHE_SAFETY_MARGIN_SEC
226+
227+
if not self._token_cache_enabled():
228+
return False
229+
try:
230+
path = self._cached_token_path()
231+
if not path.exists():
232+
return False
233+
with path.open("r", encoding="utf-8") as f:
234+
data = json.load(f)
235+
token = data.get("token")
236+
expiry = float(data.get("expiry", 0))
237+
except (OSError, ValueError, TypeError, json.JSONDecodeError):
238+
return False
239+
if not token or expiry <= time.time() + _TOKEN_CACHE_SAFETY_MARGIN_SEC:
240+
return False
241+
self._token = token
242+
self._token_expiry = expiry
243+
return True
244+
245+
def _save_cached_token(self) -> None:
246+
"""Best-effort write of the current JWT + expiry to disk."""
247+
import contextlib
248+
import os
249+
250+
from colony_sdk.client import _TOKEN_CACHE_SCHEMA_VERSION
251+
252+
if not self._token_cache_enabled() or not self._token:
253+
return
254+
try:
255+
path = self._cached_token_path()
256+
path.parent.mkdir(parents=True, exist_ok=True)
257+
tmp = path.with_suffix(path.suffix + ".tmp")
258+
fd = os.open(str(tmp), os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
259+
try:
260+
with os.fdopen(fd, "w", encoding="utf-8") as f:
261+
json.dump(
262+
{
263+
"v": _TOKEN_CACHE_SCHEMA_VERSION,
264+
"token": self._token,
265+
"expiry": self._token_expiry,
266+
},
267+
f,
268+
)
269+
except Exception:
270+
with contextlib.suppress(OSError):
271+
os.unlink(str(tmp))
272+
raise
273+
os.replace(str(tmp), str(path))
274+
except OSError:
275+
pass
276+
277+
def _clear_cached_token(self) -> None:
278+
"""Remove the on-disk cache entry. Silent on failure."""
279+
import contextlib
280+
281+
if not self._token_cache_enabled():
282+
return
283+
with contextlib.suppress(OSError):
284+
self._cached_token_path().unlink(missing_ok=True)
285+
194286
async def _ensure_token(self) -> None:
195287
import time
196288

197289
if self._token and time.time() < self._token_expiry:
198290
return
291+
# See ColonyClient._ensure_token for the cache-first rationale.
292+
if self._load_cached_token():
293+
return
199294
data = await self._raw_request(
200295
"POST",
201296
"/auth/token",
@@ -205,11 +300,17 @@ async def _ensure_token(self) -> None:
205300
self._token = data["access_token"]
206301
# Refresh 1 hour before expiry (tokens last 24h)
207302
self._token_expiry = time.time() + 23 * 3600
303+
self._save_cached_token()
208304

209305
def refresh_token(self) -> None:
210-
"""Force a token refresh on the next request."""
306+
"""Force a token refresh on the next request.
307+
308+
Clears both the in-memory token and the on-disk cache entry
309+
(if enabled), matching :meth:`ColonyClient.refresh_token`.
310+
"""
211311
self._token = None
212312
self._token_expiry = 0
313+
self._clear_cached_token()
213314

214315
async def rotate_key(self) -> dict:
215316
"""Rotate your API key. Returns the new key and invalidates the old one.
@@ -219,6 +320,9 @@ async def rotate_key(self) -> dict:
219320
"""
220321
data = await self._raw_request("POST", "/auth/rotate-key")
221322
if "api_key" in data:
323+
# Clear the old key's on-disk cache entry BEFORE flipping
324+
# `self.api_key` — same ordering rule as ColonyClient.rotate_key.
325+
self._clear_cached_token()
222326
self.api_key = data["api_key"]
223327
self._token = None
224328
self._token_expiry = 0
@@ -300,6 +404,8 @@ async def _raw_request(
300404

301405
# Auto-refresh on 401 once (separate from the configurable retry loop).
302406
if resp.status_code == 401 and not _token_refreshed and auth:
407+
# Invalidate the disk cache too — the cached token is stale.
408+
self._clear_cached_token()
303409
self._token = None
304410
self._token_expiry = 0
305411
return await self._raw_request(method, path, body, auth, _retry=_retry, _token_refreshed=True)

0 commit comments

Comments
 (0)