Skip to content

fix: pluggable streaming HTTP backend + hackney 4 pull-mode bug (0.15.4)#51

Merged
nyo16 merged 2 commits intomasterfrom
fix/hackney-streaming-pluggable-backend
May 1, 2026
Merged

fix: pluggable streaming HTTP backend + hackney 4 pull-mode bug (0.15.4)#51
nyo16 merged 2 commits intomasterfrom
fix/hackney-streaming-pluggable-backend

Conversation

@nyo16
Copy link
Copy Markdown
Owner

@nyo16 nyo16 commented May 1, 2026

Summary

  • Streaming on hackney 4 was silently in push mode, not pull mode — the M-12 backpressure architecture from 0.15.0 never materialized. Cold/slow SSE backends additionally surfaced as :timeout stream errors.
  • Adds Nous.HTTP.StreamBackend mirroring the non-streaming pluggable backend from 0.15.1, with Req as the default (symmetry across streaming + non-streaming) and Hackney as opt-in (with the bug fixed) for callers who need
    strict pull-based pacing.

Root cause

lib/nous/providers/http.ex (0.15.0–0.15.3) passed [:async, :once, ...] as separate atoms to :hackney.request/5. Erlang's proplists resolves bare :async as {:async, true} (push mode); the bare :once is ignored.
:hackney.stream_next/1 is a no-op in push mode, so the receive loop appeared to work in many cases — chunks arrived in the same {:hackney_response, conn, _} shape — but the pacing came from the producer.

The hackney 4 docs (deps/hackney/NEWS.md:269-272) document the pull form as [{async, once}] — a tuple.

Changes

  • New Nous.HTTP.StreamBackend behaviour — same per-call → env → app config → default resolution chain as Nous.HTTP.Backend.
  • Nous.HTTP.StreamBackend.Req (default)Req.post/1 with :into callback.
  • Nous.HTTP.StreamBackend.Hackney (opt-in) — strict pull-based via [{:async, :once}] tuple.
  • New env var NOUS_HTTP_STREAM_BACKEND (req | hackney | MyApp.MyBackend).
  • New app config config :nous, :http_stream_backend, ....
  • Provider stream normalizers (Nous.StreamNormalizer.*) untouched — they consume normalized events.
  • SSE parser helpers promoted to @doc false public for backend reuse.
  • README "HTTP backends" section split into non-streaming + streaming subsections with a "when to pick which" table.

Empirical evidence (localhost LMStudio, qwen3.5-0.8b-mlx@4bit)

Probe Mailbox after 2s, no stream_next/1
[:async, :once, ...] (current/broken) 97 messages (push mode — no backpressure)
[{:async, :once}, ...] (fix) 2 messages (status + headers; body gates on stream_next/1)

End-to-end through Nous.AgentRunner.run(_, _, stream: true):

  • Default Req: 7 deltas, 244 ms, "Hello! How can I help?"
  • Hackney (env=hackney): 7 deltas, 293 ms, same text

TTFB benchmark (8 iters + warmup):

  • Req: 130.4 ms mean (124.4 ms min)
  • Hackney pull (fix): 133.4 ms mean
  • Hackney push (current/broken): 134.1 ms mean

All within ~5% — Req is marginally faster on TTFB, not slower.

Test plan

  • mix compile --warnings-as-errors — clean
  • mix test — 1640 tests, 0 failures
  • 19 new tests under test/nous/http/stream_backend* (resolution chain + bypass-driven Req + bypass-driven Hackney)
  • LMStudio smoke: both backends end-to-end
  • Hackney mailbox stays bounded under load (≤ 2 msgs)
  • Stream.take/2 early-exit cleanup on both backends
  • NOUS_HTTP_STREAM_BACKEND env var dispatch verified for both req and hackney

Migration

No code changes required for callers — default behaviour is restored to "streaming works against any healthy SSE backend." Apps that depend on strict pull-based backpressure should set:

config :nous, :http_stream_backend, Nous.HTTP.StreamBackend.Hackney

or pass stream_backend: Nous.HTTP.StreamBackend.Hackney per call.

The 0.15.0 streaming pipeline passed [:async, :once, ...] as separate
atoms to :hackney.request/5. Erlang's proplists resolves bare :async as
{:async, true}, putting hackney into push mode; the bare :once is
ignored. This silently forfeited the M-12 backpressure architecture —
:hackney.stream_next/1 is a no-op in push mode, so the receive loop
appeared to work in many cases but the pacing came from the producer.
Cold/slow SSE backends additionally surfaced as :timeout stream errors.

Mirroring the 0.15.1 non-streaming pluggable backend pattern:

- New Nous.HTTP.StreamBackend behaviour with the same per-call → env →
  app config → default resolution chain (NOUS_HTTP_STREAM_BACKEND env,
  :http_stream_backend app config, :stream_backend per-call opt).
- Nous.HTTP.StreamBackend.Req (default) — Req.post/1 with :into callback.
- Nous.HTTP.StreamBackend.Hackney (opt-in) — strict pull-based mode via
  the [{:async, :once}] tuple form per deps/hackney/NEWS.md:269-272.

Verified end-to-end against running LMStudio: both backends produce
deltas through Nous.AgentRunner.run(_, _, stream: true). Hackney's
mailbox stays bounded at 2 messages after 2s idle wait — pull mode is
finally working as documented. Req TTFB is marginally faster (130ms vs
133ms mean) in localhost benchmarks; total stream time within 5%.

Provider stream normalizers consume normalized events and need no
changes. SSE parser helpers (parse_stream_buffer/2, flush_stream_buffer/2,
max_buffer_size/0, ensure_streaming_headers/1) promoted to public
@doc false for backend reuse.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a pluggable streaming HTTP backend layer to Nous.Providers.HTTP.stream/4, restoring correct pull-based hackney streaming behavior and making streaming backend selection configurable (mirroring the existing non-streaming backend model).

Changes:

  • Introduces Nous.HTTP.StreamBackend behaviour with Req as default and Hackney as opt-in.
  • Refactors Nous.Providers.HTTP.stream/4 to dispatch to the configured stream backend and promotes shared SSE helper functions for backend reuse.
  • Adds/updates docs, changelog, and tests covering backend resolution and both streaming implementations.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/nous/http/stream_backend_resolution_test.exs Tests stream-backend precedence (per-call/env/config/default) and env-var parsing/fallback.
test/nous/http/stream_backend/req_test.exs Exercises Req streaming backend (SSE parsing, errors, early halt).
test/nous/http/stream_backend/hackney_test.exs Exercises Hackney streaming backend and includes a regression test intended to guard pull-mode behavior.
mix.exs Bumps version to 0.15.4.
lib/nous/providers/http.ex Switches stream/4 to stream backend dispatch; adds stream-backend resolver; exposes SSE helpers for reuse.
lib/nous/http/stream_backend/req.ex Implements streaming via Req :into callback + Task forwarding to the consumer.
lib/nous/http/stream_backend/hackney.ex Implements streaming via hackney pull mode using [{:async, :once}].
lib/nous/http/stream_backend.ex Defines the Nous.HTTP.StreamBackend behaviour and selection guidance.
README.md Documents separate non-streaming vs streaming backend configuration and trade-offs.
CHANGELOG.md Documents 0.15.4 additions/fixes and migration guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +109 to +113
defp next_chunk(state) do
receive do
{ref, :done} when ref == state.ref ->
{events, _} = HTTP.flush_stream_buffer(state.buffer, state.stream_parser)

Comment thread lib/nous/providers/http.ex Outdated
Comment on lines +334 to +336
case parse_sse_buffer(buffer <> "\n\n") do
{:error, :buffer_overflow} -> {[{:stream_error, %{reason: :buffer_overflow}}], ""}
result -> result
Comment on lines +77 to +83
# Regression net for the [{:async, :once}] tuple fix. If a future hackney
# bump silently changes the option shape, this test should fail loudly:
# in push mode the receive loop would still get messages but the
# backpressure property is lost. Here we verify the *messages* shape by
# asserting the stream completes against a Bypass server that delivers
# all data in one chunk — works in both push and pull.
test "request actually goes through hackney pull mode", %{bypass: bypass, url: url} do
Three Copilot review nits, all valid:

- Req backend: when the spawned Task crashes (Req exception, callback
  raise) the parent never received a completion message and waited the
  full receive timeout before emitting a misleading :timeout error.
  Match `{:DOWN, task_ref, :process, _, reason}` and surface a
  `:task_died` stream_error immediately. Track `task.ref` (the monitor
  ref Task.async sets up) on state.

- flush_stream_buffer/2: appending the synthetic "\n\n" delimiter for
  end-of-stream parsing could push a buffer at exactly @max_buffer_size
  two bytes over and trip a false-positive :buffer_overflow even though
  received data was within limits. Check input size first and bypass
  the public size-checked parser via the same-module private
  do_parse_sse_buffer/1.

- Hackney pull-mode regression test was too weak — it asserted
  user-agent and event delivery, both of which pass in push mode too
  (push and pull deliver the same `{:hackney_response, conn, _}` shape;
  the difference is who drives chunk delivery). Replace with a unit
  test on the actual regression surface: extract `request_opts/3` and
  assert `{:async, :once}` is present as a tuple and bare `:async` /
  `:once` atoms are not. This is the only sound guardrail for the
  proplist-shape bug — a future hackney bump that changes the option
  form will fail this test loudly.
@nyo16 nyo16 merged commit ead6b2e into master May 1, 2026
6 checks passed
@nyo16 nyo16 deleted the fix/hackney-streaming-pluggable-backend branch May 1, 2026 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants