fix: pluggable streaming HTTP backend + hackney 4 pull-mode bug (0.15.4) by nyo16 · Pull Request #51 · nyo16/nous

nyo16 · 2026-05-01T22:17:41Z

Summary

Streaming on hackney 4 was silently in push mode, not pull mode — the M-12 backpressure architecture from 0.15.0 never materialized. Cold/slow SSE backends additionally surfaced as :timeout stream errors.
Adds Nous.HTTP.StreamBackend mirroring the non-streaming pluggable backend from 0.15.1, with Req as the default (symmetry across streaming + non-streaming) and Hackney as opt-in (with the bug fixed) for callers who need
strict pull-based pacing.

Root cause

lib/nous/providers/http.ex (0.15.0–0.15.3) passed [:async, :once, ...] as separate atoms to :hackney.request/5. Erlang's proplists resolves bare :async as {:async, true} (push mode); the bare :once is ignored.
:hackney.stream_next/1 is a no-op in push mode, so the receive loop appeared to work in many cases — chunks arrived in the same {:hackney_response, conn, _} shape — but the pacing came from the producer.

The hackney 4 docs (deps/hackney/NEWS.md:269-272) document the pull form as [{async, once}] — a tuple.

Changes

New Nous.HTTP.StreamBackend behaviour — same per-call → env → app config → default resolution chain as Nous.HTTP.Backend.
Nous.HTTP.StreamBackend.Req (default) — Req.post/1 with :into callback.
Nous.HTTP.StreamBackend.Hackney (opt-in) — strict pull-based via [{:async, :once}] tuple.
New env var NOUS_HTTP_STREAM_BACKEND (req | hackney | MyApp.MyBackend).
New app config config :nous, :http_stream_backend, ....
Provider stream normalizers (Nous.StreamNormalizer.*) untouched — they consume normalized events.
SSE parser helpers promoted to @doc false public for backend reuse.
README "HTTP backends" section split into non-streaming + streaming subsections with a "when to pick which" table.

Empirical evidence (localhost LMStudio, `qwen3.5-0.8b-mlx@4bit`)

Probe	Mailbox after 2s, no `stream_next/1`
`[:async, :once, ...]` (current/broken)	97 messages (push mode — no backpressure)
`[{:async, :once}, ...]` (fix)	2 messages (status + headers; body gates on `stream_next/1`)

End-to-end through Nous.AgentRunner.run(_, _, stream: true):

Default Req: 7 deltas, 244 ms, "Hello! How can I help?"
Hackney (env=hackney): 7 deltas, 293 ms, same text

TTFB benchmark (8 iters + warmup):

Req: 130.4 ms mean (124.4 ms min)
Hackney pull (fix): 133.4 ms mean
Hackney push (current/broken): 134.1 ms mean

All within ~5% — Req is marginally faster on TTFB, not slower.

Test plan

mix compile --warnings-as-errors — clean
mix test — 1640 tests, 0 failures
19 new tests under test/nous/http/stream_backend* (resolution chain + bypass-driven Req + bypass-driven Hackney)
LMStudio smoke: both backends end-to-end
Hackney mailbox stays bounded under load (≤ 2 msgs)
Stream.take/2 early-exit cleanup on both backends
NOUS_HTTP_STREAM_BACKEND env var dispatch verified for both req and hackney

Migration

No code changes required for callers — default behaviour is restored to "streaming works against any healthy SSE backend." Apps that depend on strict pull-based backpressure should set:

config :nous, :http_stream_backend, Nous.HTTP.StreamBackend.Hackney

or pass stream_backend: Nous.HTTP.StreamBackend.Hackney per call.

@doc

The 0.15.0 streaming pipeline passed [:async, :once, ...] as separate atoms to :hackney.request/5. Erlang's proplists resolves bare :async as {:async, true}, putting hackney into push mode; the bare :once is ignored. This silently forfeited the M-12 backpressure architecture — :hackney.stream_next/1 is a no-op in push mode, so the receive loop appeared to work in many cases but the pacing came from the producer. Cold/slow SSE backends additionally surfaced as :timeout stream errors. Mirroring the 0.15.1 non-streaming pluggable backend pattern: - New Nous.HTTP.StreamBackend behaviour with the same per-call → env → app config → default resolution chain (NOUS_HTTP_STREAM_BACKEND env, :http_stream_backend app config, :stream_backend per-call opt). - Nous.HTTP.StreamBackend.Req (default) — Req.post/1 with :into callback. - Nous.HTTP.StreamBackend.Hackney (opt-in) — strict pull-based mode via the [{:async, :once}] tuple form per deps/hackney/NEWS.md:269-272. Verified end-to-end against running LMStudio: both backends produce deltas through Nous.AgentRunner.run(_, _, stream: true). Hackney's mailbox stays bounded at 2 messages after 2s idle wait — pull mode is finally working as documented. Req TTFB is marginally faster (130ms vs 133ms mean) in localhost benchmarks; total stream time within 5%. Provider stream normalizers consume normalized events and need no changes. SSE parser helpers (parse_stream_buffer/2, flush_stream_buffer/2, max_buffer_size/0, ensure_streaming_headers/1) promoted to public @doc false for backend reuse.

Copilot

Pull request overview

Adds a pluggable streaming HTTP backend layer to Nous.Providers.HTTP.stream/4, restoring correct pull-based hackney streaming behavior and making streaming backend selection configurable (mirroring the existing non-streaming backend model).

Changes:

Introduces Nous.HTTP.StreamBackend behaviour with Req as default and Hackney as opt-in.
Refactors Nous.Providers.HTTP.stream/4 to dispatch to the configured stream backend and promotes shared SSE helper functions for backend reuse.
Adds/updates docs, changelog, and tests covering backend resolution and both streaming implementations.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
test/nous/http/stream_backend_resolution_test.exs	Tests stream-backend precedence (per-call/env/config/default) and env-var parsing/fallback.
test/nous/http/stream_backend/req_test.exs	Exercises Req streaming backend (SSE parsing, errors, early halt).
test/nous/http/stream_backend/hackney_test.exs	Exercises Hackney streaming backend and includes a regression test intended to guard pull-mode behavior.
mix.exs	Bumps version to 0.15.4.
lib/nous/providers/http.ex	Switches `stream/4` to stream backend dispatch; adds stream-backend resolver; exposes SSE helpers for reuse.
lib/nous/http/stream_backend/req.ex	Implements streaming via Req `:into` callback + Task forwarding to the consumer.
lib/nous/http/stream_backend/hackney.ex	Implements streaming via hackney pull mode using `[{:async, :once}]`.
lib/nous/http/stream_backend.ex	Defines the `Nous.HTTP.StreamBackend` behaviour and selection guidance.
README.md	Documents separate non-streaming vs streaming backend configuration and trade-offs.
CHANGELOG.md	Documents 0.15.4 additions/fixes and migration guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+  defp next_chunk(state) do
+    receive do
+      {ref, :done} when ref == state.ref ->
+        {events, _} = HTTP.flush_stream_buffer(state.buffer, state.stream_parser)
+


+    case parse_sse_buffer(buffer <> "\n\n") do
+      {:error, :buffer_overflow} -> {[{:stream_error, %{reason: :buffer_overflow}}], ""}
+      result -> result


+  # Regression net for the [{:async, :once}] tuple fix. If a future hackney
+  # bump silently changes the option shape, this test should fail loudly:
+  # in push mode the receive loop would still get messages but the
+  # backpressure property is lost. Here we verify the *messages* shape by
+  # asserting the stream completes against a Bypass server that delivers
+  # all data in one chunk — works in both push and pull.
+  test "request actually goes through hackney pull mode", %{bypass: bypass, url: url} do


Three Copilot review nits, all valid: - Req backend: when the spawned Task crashes (Req exception, callback raise) the parent never received a completion message and waited the full receive timeout before emitting a misleading :timeout error. Match `{:DOWN, task_ref, :process, _, reason}` and surface a `:task_died` stream_error immediately. Track `task.ref` (the monitor ref Task.async sets up) on state. - flush_stream_buffer/2: appending the synthetic "\n\n" delimiter for end-of-stream parsing could push a buffer at exactly @max_buffer_size two bytes over and trip a false-positive :buffer_overflow even though received data was within limits. Check input size first and bypass the public size-checked parser via the same-module private do_parse_sse_buffer/1. - Hackney pull-mode regression test was too weak — it asserted user-agent and event delivery, both of which pass in push mode too (push and pull deliver the same `{:hackney_response, conn, _}` shape; the difference is who drives chunk delivery). Replace with a unit test on the actual regression surface: extract `request_opts/3` and assert `{:async, :once}` is present as a tuple and bare `:async` / `:once` atoms are not. This is the only sound guardrail for the proplist-shape bug — a future hackney bump that changes the option form will fail this test loudly.

nyo16 requested a review from Copilot May 1, 2026 22:19

Copilot started reviewing on behalf of nyo16 May 1, 2026 22:19 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

nyo16 merged commit ead6b2e into master May 1, 2026
6 checks passed

nyo16 deleted the fix/hackney-streaming-pluggable-backend branch May 1, 2026 22:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pluggable streaming HTTP backend + hackney 4 pull-mode bug (0.15.4)#51

fix: pluggable streaming HTTP backend + hackney 4 pull-mode bug (0.15.4)#51
nyo16 merged 2 commits intomasterfrom
fix/hackney-streaming-pluggable-backend

nyo16 commented May 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nyo16 commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Changes

Empirical evidence (localhost LMStudio, qwen3.5-0.8b-mlx@4bit)

Test plan

Migration

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nyo16 commented May 1, 2026 •

edited

Loading

Empirical evidence (localhost LMStudio, `qwen3.5-0.8b-mlx@4bit`)