nyo16 · nyo16 · May 1, 2026 · May 1, 2026 · May 1, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,79 @@
 
 All notable changes to this project will be documented in this file.
 
+## [0.15.4] - 2026-05-01
+
+Pluggable streaming HTTP backends + hackney 4 pull-mode bug fix.
+
+### Fixed
+
+- **Hackney 4 streaming was silently in push mode, not pull mode.**
+  `lib/nous/providers/http.ex:463-470` (in 0.15.0–0.15.3) passed
+  `[:async, :once, ...]` as separate atoms to `:hackney.request/5`.
+  Erlang's `proplists` resolves bare atom `:async` as `{:async, true}`,
+  which puts hackney into push mode; the bare `:once` atom is silently
+  ignored. The architectural intent of M-12 (strict pull-based
+  backpressure so a slow consumer cannot grow its mailbox) was
+  forfeited — `:hackney.stream_next/1` is a no-op in push mode, so the
+  receive loop appeared to work in many cases (chunks arrive in the
+  same shape) but the pacing came from the producer, not the consumer.
+  The fix is the tuple form `[{:async, :once}, ...]` per
+  `deps/hackney/NEWS.md:269-272`. Empirical confirmation: with the
+  broken form a benign Bypass server delivers 97 messages to the
+  caller's mailbox in 2 s without any `stream_next/1` call; with the
+  tuple form the mailbox holds only 2 messages (status + headers) and
+  body chunks gate on `stream_next/1`. Reported as part of the same
+  bug that caused observable timeouts against cold/slow SSE backends.
+
+### Added
+
+- **`Nous.HTTP.StreamBackend` behaviour** — pluggable streaming HTTP
+  layer mirroring the non-streaming `Nous.HTTP.Backend` introduced in
+  0.15.1. Two impls ship:
+  - `Nous.HTTP.StreamBackend.Req` — the new default. Drives
+    `Req.post/1` with the `:into` callback. Simpler stack
+    (Req/Finch/Mint), marginally faster TTFB than hackney in
+    benchmarks against LMStudio (~130 ms vs ~133 ms mean).
+  - `Nous.HTTP.StreamBackend.Hackney` — opt-in. Strict pull-based
+    backpressure via `:hackney`'s `[{:async, :once}]` mode (the bug
+    above is fixed here). Pick this when downstream consumers can
+    block per chunk (LiveView fan-out under load,
+    persistence-on-every-chunk, slow IO).
+- **`:stream_backend` per-call opt** on `Nous.Providers.HTTP.stream/4`.
+- **`NOUS_HTTP_STREAM_BACKEND` env var** (`req` | `hackney` |
+  `My.Custom.Backend`). Resolution mirrors `NOUS_HTTP_BACKEND`:
+  per-call → env → app config → default.
+- **`config :nous, :http_stream_backend, MyBackend`** application
+  config knob.
+
+### Changed
+
+- `Nous.Providers.HTTP.stream/4` now dispatches to the configured
+  `Nous.HTTP.StreamBackend` instead of inlining hackney plumbing. The
+  public API surface (return shape, event types, error tuples) is
+  unchanged. Provider stream normalizers (`Nous.StreamNormalizer.*`)
+  consume normalized events and need no changes.
+- The non-streaming pluggable `Nous.HTTP.Backend` resolver is
+  refactored to share its `String.to_existing_atom/1` safety logic with
+  the streaming resolver — same C-2 protection on both paths.
+
+### Documentation
+
+- `Nous.Providers.HTTP` moduledoc rewritten around the dual
+  pluggable-backend model and the streaming backpressure trade-off.
+- `Nous.HTTP.StreamBackend` and the two impl modules carry full
+  moduledocs explaining when to pick each.
+
+### Migration
+
+No code changes required for callers — the default behavior is
+restored to "streaming works against any healthy SSE backend." Apps
+that depend on strict pull-based backpressure should set:
+
+    config :nous, :http_stream_backend, Nous.HTTP.StreamBackend.Hackney
+
+or pass `stream_backend: Nous.HTTP.StreamBackend.Hackney` per call.
+
 ## [0.15.3] - 2026-05-01
 
 Streaming + tool execution. The `Nous.Agent.run/3` loop now has a

diff --git a/README.md b/README.md
@@ -110,11 +110,13 @@ IO.puts("Tokens: #{result.usage.total_tokens}")
 | LlamaCpp | `llamacpp:local` + `:llamacpp_model` | ✅ |
 | **Custom** | `custom:model` + `:base_url` | ✅ |
 
-HTTP providers use a pluggable backend — `Req` (default, on top of Finch) or
-`hackney 4` — selected per-call, via `NOUS_HTTP_BACKEND`, or via app config.
-Streaming always uses `hackney`'s `:async, :once` pull-based mode for
-backpressure (a slow consumer can't OOM under a fast LLM). LlamaCpp runs
-in-process via NIFs. See [HTTP Backend](#http-backend) below for details.
+HTTP providers use a pluggable backend on both the non-streaming and
+streaming paths — `Req` (default, on top of Finch) or `hackney 4` —
+selected per-call, via `NOUS_HTTP_BACKEND` / `NOUS_HTTP_STREAM_BACKEND`,
+or via app config. The Hackney streaming backend uses `[{:async, :once}]`
+pull-based mode for strict backpressure (a slow consumer can't grow its
+mailbox under a fast LLM). LlamaCpp runs in-process via NIFs.
+See [HTTP Backend](#http-backend) below for details.
 
 > **Tip**: The named local providers (`lmstudio:`, `vllm:`, `sglang:`,
 > `ollama:`) are the recommended way to talk to local OpenAI-compatible
@@ -266,11 +268,13 @@ agent = Nous.new("openai:gpt-4",
 
 ### HTTP Backend
 
-Non-streaming HTTP requests go through a pluggable backend. Default is
-`Nous.HTTP.Backend.Req` (Req on top of Finch); `Nous.HTTP.Backend.Hackney`
-is shipped as an alternative. Streaming always uses hackney's `:async, :once`
-pull-based mode for backpressure — that choice is structural, not
-configurable.
+Both the non-streaming and streaming HTTP paths go through pluggable
+backends. Defaults are `Nous.HTTP.Backend.Req` and
+`Nous.HTTP.StreamBackend.Req` (both on Req + Finch).
+`Nous.HTTP.Backend.Hackney` and `Nous.HTTP.StreamBackend.Hackney` are
+shipped as alternatives.
+
+#### Non-streaming (`Nous.HTTP.Backend`)
 
 Pick per-call, per-environment, or per-app:
 
@@ -286,8 +290,37 @@ HTTP.post(url, body, headers, backend: Nous.HTTP.Backend.Hackney)
 config :nous, :http_backend, Nous.HTTP.Backend.Hackney
 ```
 
-Tune the shared hackney `:default` pool from app config (used by both the
-Hackney backend and the streaming pipeline):
+#### Streaming (`Nous.HTTP.StreamBackend`)
+
+Same resolution chain, separate config knob:
+
+```elixir
+# Per-call
+HTTP.stream(url, body, headers,
+  stream_backend: Nous.HTTP.StreamBackend.Hackney)
+
+# Env
+# NOUS_HTTP_STREAM_BACKEND=hackney
+
+# App config
+config :nous, :http_stream_backend, Nous.HTTP.StreamBackend.Hackney
+```
+
+When to pick which streaming backend:
+
+| Backend | Pick it when |
+|---------|--------------|
+| `Nous.HTTP.StreamBackend.Req` *(default)* | One HTTP stack across streaming + non-streaming. Right default for almost every app. Backpressure is bounded by parsing speed, not strict pull pacing — fine for typical LLM workloads where token rate is the bottleneck. |
+| `Nous.HTTP.StreamBackend.Hackney` | Strict pull-based backpressure via `[{:async, :once}]`. Pick this when downstream consumers can block per chunk (LiveView fan-out under load, persistence-on-every-chunk, slow IO). |
+
+Both emit identical normalized event streams (parsed JSON maps,
+`{:stream_done, _}`, `{:stream_error, _}`); switching backends needs no
+other code changes.
+
+#### Hackney pool
+
+Tune the shared hackney `:default` pool from app config (used by both
+the Hackney non-streaming and Hackney streaming backends):
 
 ```elixir
 config :nous, :hackney_pool,
@@ -297,9 +330,8 @@ config :nous, :hackney_pool,
 
 See [the HTTP backend benchmark report](https://github.com/nyo16/nous/blob/master/docs/benchmarks/http_backend.md)
 for localhost + real-endpoint benchmark numbers and guidance on when
-to switch backends. Headline: stick with the Req default unless you
-specifically need HTTP/3 (Alt-Svc auto-upgrade) or want to consolidate
-on one HTTP family.
+to switch backends. Headline: stick with the Req defaults unless you
+have a specific reason (strict backpressure, HTTP/3 upgrade, single-HTTP-stack consolidation).
 
 ### Timeouts
 

diff --git a/lib/nous/http/stream_backend.ex b/lib/nous/http/stream_backend.ex
@@ -0,0 +1,68 @@
+defmodule Nous.HTTP.StreamBackend do
+  @moduledoc """
+  Behaviour for SSE / chunked streaming HTTP backends.
+
+  Implemented by `Nous.HTTP.StreamBackend.Req` (default) and
+  `Nous.HTTP.StreamBackend.Hackney`. Selection mirrors the non-streaming
+  `Nous.HTTP.Backend` resolution chain (per-call → env var → app config →
+  default). See `Nous.Providers.HTTP.stream/4` for the resolution order.
+
+  Pick a backend three ways, highest precedence first:
+
+      # 1. Per-call opt
+      Nous.Providers.HTTP.stream(url, body, headers,
+        stream_backend: Nous.HTTP.StreamBackend.Hackney)
+
+      # 2. Environment variable (req | hackney | "MyApp.MyStreamBackend")
+      export NOUS_HTTP_STREAM_BACKEND=hackney
+
+      # 3. Application config
+      config :nous, :http_stream_backend, Nous.HTTP.StreamBackend.Hackney
+
+  Default: `Nous.HTTP.StreamBackend.Req`.
+
+  ## When to pick which
+
+  - `Nous.HTTP.StreamBackend.Req` — one HTTP stack across streaming and
+    non-streaming, simpler dependency story. Right default for most apps.
+    Backpressure is bounded by parsing speed, not by `stream_next/1`
+    pacing — a fast LLM + slow consumer can grow the consumer's mailbox.
+    Acceptable for typical LLM workloads where token rate is the
+    bottleneck.
+  - `Nous.HTTP.StreamBackend.Hackney` — strict pull-based backpressure
+    via `:hackney`'s `{:async, :once}` mode. The consumer paces the
+    producer chunk-by-chunk. Pick this when downstream consumers can
+    block per chunk (LiveView assigns + diff + push under load,
+    persistence-on-every-chunk, slow IO).
+
+  Both backends emit the same normalized event stream (parsed JSON maps,
+  `{:stream_done, reason}`, `{:stream_error, reason}`). Switching between
+  them does not require changes elsewhere.
+
+  ## Custom backends
+
+  Implement `c:stream/4` and return `{:ok, Enumerable.t()}` where the
+  enumerable emits parsed JSON maps, `{:stream_done, reason}` tuples, or
+  `{:stream_error, reason}` tuples. The stream MUST halt after the first
+  `{:stream_error, _}` and after `{:stream_done, _}`.
+  """
+
+  @doc """
+  Issue a streaming POST request and return a lazy `Enumerable.t()` of
+  parsed events.
+
+  ## Options
+    * `:timeout` — receive timeout in milliseconds (default: `60_000`)
+    * `:connect_timeout` — TCP connect timeout in milliseconds (default: `30_000`)
+    * `:stream_parser` — module implementing `parse_buffer/1` for non-SSE
+      formats (e.g. JSON-array streams). Defaults to SSE.
+
+  Backends MAY accept additional options; unknown options should be ignored.
+  """
+  @callback stream(
+              url :: String.t(),
+              body :: map(),
+              headers :: [{String.t(), String.t()}],
+              opts :: keyword()
+            ) :: {:ok, Enumerable.t()} | {:error, term()}
+end