fix: strengthen async test assertion and fix README example

amavashev · amavashev · commit 6729c9497af0 · 2026-04-08T07:42:53.000-04:00
- Fix weak async cost_fn test: use distinct value (1500) vs estimate
  (1000) so test fails if cost_fn is ignored
- Fix README streaming example: add missing variable definitions
  (max_tokens, openai_client, stream creation) so example is runnable
- Update AUDIT.md: correct test count (64), add validation and
  IDEMPOTENCY_MISMATCH details
- Update examples/README.md: streaming_usage.py description
diff --git a/AUDIT.md b/AUDIT.md
@@ -188,7 +188,8 @@ Automated contract tests validate sample request/response payloads against the O
 ## Streaming Convenience Module (added 2026-04-08)
 
 **Module:** `runcycles/streaming.py`
-**Test file:** `tests/test_streaming.py` (57 tests, all passing)
+**Test file:** `tests/test_streaming.py` (64 tests, all passing)
+**Version:** 0.3.0
 
 Added `StreamReservation` and `AsyncStreamReservation` context managers that automate the reserve → commit/release lifecycle for streaming use cases. This is a DX convenience layer — no protocol changes.
 
@@ -197,8 +198,10 @@ Added `StreamReservation` and `AsyncStreamReservation` context managers that aut
 - **`StreamUsage`** — mutable accumulator for token counts and cost during streaming
 - **Client convenience methods:** `CyclesClient.stream_reservation()` and `AsyncCyclesClient.stream_reservation()` — thin factories that build Subject from config defaults
 - **Cost resolution:** explicit `usage.actual_cost` > `cost_fn(usage)` > estimate fallback
-- **Heartbeat:** automatic TTL extension, same interval formula as decorator lifecycle
+- **Heartbeat:** automatic TTL extension, same interval formula as decorator lifecycle (`max(ttl_ms / 2, 1000)` ms)
 - **Commit retry:** uses existing `CommitRetryEngine`/`AsyncCommitRetryEngine`
-- **Context propagation:** sets/clears `CyclesContext` via `ContextVar`, accessible via `get_cycles_context()`
+- **Context propagation:** sets/clears `CyclesContext` via `ContextVar`, accessible via `get_cycles_context()`; respects user-set `ctx.metrics` during streaming
+- **Spec validation:** `validate_ttl_ms()` (1000–86400000), `validate_grace_period_ms()` (0–60000), `validate_subject()` (at least one standard field) — matches lifecycle.py
+- **Error handling:** `RESERVATION_FINALIZED`, `RESERVATION_EXPIRED`, and `IDEMPOTENCY_MISMATCH` do not trigger release; other 4xx client errors do trigger release — matches lifecycle.py behavior exactly
 
-Protocol conformance: No new endpoints or protocol changes. All reservation, commit, release, and extend calls use the same client methods and body formats as the decorator path. Verified by 57 unit tests covering success, deny, error, retry, heartbeat, cost resolution, and context propagation.
+Protocol conformance: No new endpoints or protocol changes. All reservation, commit, release, and extend calls use the same client methods and body formats as the decorator path. Verified by 64 unit tests covering success, deny, error, retry, heartbeat, cost resolution, context propagation, spec validation, and all commit error-code branches.
diff --git a/README.md b/README.md
@@ -133,25 +133,38 @@ result = await call_llm("Hello")
 For streaming LLM responses, use the `stream_reservation()` context manager. It reserves budget on enter, auto-commits on successful exit, and auto-releases on exception:
 
 ```python
+from openai import OpenAI
 from runcycles import CyclesClient, CyclesConfig, Action, Amount, Unit
 
 config = CyclesConfig(base_url="http://localhost:7878", api_key="your-api-key", tenant="acme")
-client = CyclesClient(config)
+cycles_client = CyclesClient(config)
+openai_client = OpenAI()
+max_tokens = 1024
 
-with client.stream_reservation(
+with cycles_client.stream_reservation(
     action=Action(kind="llm.completion", name="gpt-4o"),
     estimate=Amount(unit=Unit.USD_MICROCENTS, amount=max_tokens * 1000),
     cost_fn=lambda u: u.tokens_input * 250 + u.tokens_output * 1000,
 ) as reservation:
-    # Caps available immediately
+    # Caps available immediately after entering the context
     if reservation.caps and reservation.caps.max_tokens:
         max_tokens = min(max_tokens, reservation.caps.max_tokens)
 
-    for chunk in openai_stream:
+    stream = openai_client.chat.completions.create(
+        model="gpt-4o",
+        messages=[{"role": "user", "content": "Hello"}],
+        max_tokens=max_tokens,
+        stream=True,
+        stream_options={"include_usage": True},
+    )
+
+    for chunk in stream:
+        if chunk.choices and chunk.choices[0].delta.content:
+            print(chunk.choices[0].delta.content, end="", flush=True)
         if chunk.usage:
             reservation.usage.tokens_input = chunk.usage.prompt_tokens
             reservation.usage.tokens_output = chunk.usage.completion_tokens
-# Committed automatically with actual cost from cost_fn
+# Committed automatically with actual cost computed by cost_fn
 ```
 
 Also available as `async with client.stream_reservation(...)` for async clients. See [streaming_usage.py](examples/streaming_usage.py) for a complete example.
@@ -402,7 +415,7 @@ The [`examples/`](examples/) directory contains runnable integration examples:
 | [async_usage.py](examples/async_usage.py) | Async client and async decorator |
 | [openai_integration.py](examples/openai_integration.py) | Guard OpenAI chat completions with budget checks |
 | [anthropic_integration.py](examples/anthropic_integration.py) | Guard Anthropic messages with per-tool budget tracking |
-| [streaming_usage.py](examples/streaming_usage.py) | Budget-managed streaming with token accumulation |
+| [streaming_usage.py](examples/streaming_usage.py) | `stream_reservation()` context manager with auto-commit |
 | [fastapi_integration.py](examples/fastapi_integration.py) | FastAPI middleware, dependency injection, per-tenant budgets |
 | [langchain_integration.py](examples/langchain_integration.py) | LangChain callback handler for budget-aware agents |
 
diff --git a/examples/README.md b/examples/README.md
@@ -28,7 +28,7 @@ pip install runcycles
 | [async_usage.py](async_usage.py) | Async client and async decorator | — |
 | [openai_integration.py](openai_integration.py) | Guard OpenAI chat completions with budget checks | `openai` |
 | [anthropic_integration.py](anthropic_integration.py) | Guard Anthropic messages with per-tool budget tracking | `anthropic` |
-| [streaming_usage.py](streaming_usage.py) | Budget-managed streaming with token accumulation | `openai` |
+| [streaming_usage.py](streaming_usage.py) | `stream_reservation()` context manager with auto-commit | `openai` |
 | [fastapi_integration.py](fastapi_integration.py) | FastAPI middleware, dependency injection, per-tenant budgets | `fastapi`, `uvicorn` |
 | [langchain_integration.py](langchain_integration.py) | LangChain callback handler for budget-aware agents | `langchain`, `langchain-openai` |
 
diff --git a/tests/test_streaming.py b/tests/test_streaming.py
@@ -754,10 +754,11 @@ async def test_cost_fn_used(self) -> None:
         )
 
         async with asr as reservation:
-            reservation.usage.tokens_input = 200
+            reservation.usage.tokens_input = 300
 
         commit_body = mock.commit_reservation.call_args[0][1]
-        assert commit_body["actual"]["amount"] == 1000
+        # 300 * 5 = 1500, distinct from estimate (1000)
+        assert commit_body["actual"]["amount"] == 1500
 
     @pytest.mark.asyncio
     async def test_context_set_and_cleared(self) -> None: