Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,79 @@

All notable changes to this project will be documented in this file.

## [0.15.4] - 2026-05-01

Pluggable streaming HTTP backends + hackney 4 pull-mode bug fix.

### Fixed

- **Hackney 4 streaming was silently in push mode, not pull mode.**
`lib/nous/providers/http.ex:463-470` (in 0.15.0–0.15.3) passed
`[:async, :once, ...]` as separate atoms to `:hackney.request/5`.
Erlang's `proplists` resolves bare atom `:async` as `{:async, true}`,
which puts hackney into push mode; the bare `:once` atom is silently
ignored. The architectural intent of M-12 (strict pull-based
backpressure so a slow consumer cannot grow its mailbox) was
forfeited — `:hackney.stream_next/1` is a no-op in push mode, so the
receive loop appeared to work in many cases (chunks arrive in the
same shape) but the pacing came from the producer, not the consumer.
The fix is the tuple form `[{:async, :once}, ...]` per
`deps/hackney/NEWS.md:269-272`. Empirical confirmation: with the
broken form a benign Bypass server delivers 97 messages to the
caller's mailbox in 2 s without any `stream_next/1` call; with the
tuple form the mailbox holds only 2 messages (status + headers) and
body chunks gate on `stream_next/1`. Reported as part of the same
bug that caused observable timeouts against cold/slow SSE backends.

### Added

- **`Nous.HTTP.StreamBackend` behaviour** — pluggable streaming HTTP
layer mirroring the non-streaming `Nous.HTTP.Backend` introduced in
0.15.1. Two impls ship:
- `Nous.HTTP.StreamBackend.Req` — the new default. Drives
`Req.post/1` with the `:into` callback. Simpler stack
(Req/Finch/Mint), marginally faster TTFB than hackney in
benchmarks against LMStudio (~130 ms vs ~133 ms mean).
- `Nous.HTTP.StreamBackend.Hackney` — opt-in. Strict pull-based
backpressure via `:hackney`'s `[{:async, :once}]` mode (the bug
above is fixed here). Pick this when downstream consumers can
block per chunk (LiveView fan-out under load,
persistence-on-every-chunk, slow IO).
- **`:stream_backend` per-call opt** on `Nous.Providers.HTTP.stream/4`.
- **`NOUS_HTTP_STREAM_BACKEND` env var** (`req` | `hackney` |
`My.Custom.Backend`). Resolution mirrors `NOUS_HTTP_BACKEND`:
per-call → env → app config → default.
- **`config :nous, :http_stream_backend, MyBackend`** application
config knob.

### Changed

- `Nous.Providers.HTTP.stream/4` now dispatches to the configured
`Nous.HTTP.StreamBackend` instead of inlining hackney plumbing. The
public API surface (return shape, event types, error tuples) is
unchanged. Provider stream normalizers (`Nous.StreamNormalizer.*`)
consume normalized events and need no changes.
- The non-streaming pluggable `Nous.HTTP.Backend` resolver is
refactored to share its `String.to_existing_atom/1` safety logic with
the streaming resolver — same C-2 protection on both paths.

### Documentation

- `Nous.Providers.HTTP` moduledoc rewritten around the dual
pluggable-backend model and the streaming backpressure trade-off.
- `Nous.HTTP.StreamBackend` and the two impl modules carry full
moduledocs explaining when to pick each.

### Migration

No code changes required for callers — the default behavior is
restored to "streaming works against any healthy SSE backend." Apps
that depend on strict pull-based backpressure should set:

config :nous, :http_stream_backend, Nous.HTTP.StreamBackend.Hackney

or pass `stream_backend: Nous.HTTP.StreamBackend.Hackney` per call.

## [0.15.3] - 2026-05-01

Streaming + tool execution. The `Nous.Agent.run/3` loop now has a
Expand Down
62 changes: 47 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,11 +110,13 @@ IO.puts("Tokens: #{result.usage.total_tokens}")
| LlamaCpp | `llamacpp:local` + `:llamacpp_model` | ✅ |
| **Custom** | `custom:model` + `:base_url` | ✅ |

HTTP providers use a pluggable backend — `Req` (default, on top of Finch) or
`hackney 4` — selected per-call, via `NOUS_HTTP_BACKEND`, or via app config.
Streaming always uses `hackney`'s `:async, :once` pull-based mode for
backpressure (a slow consumer can't OOM under a fast LLM). LlamaCpp runs
in-process via NIFs. See [HTTP Backend](#http-backend) below for details.
HTTP providers use a pluggable backend on both the non-streaming and
streaming paths — `Req` (default, on top of Finch) or `hackney 4` —
selected per-call, via `NOUS_HTTP_BACKEND` / `NOUS_HTTP_STREAM_BACKEND`,
or via app config. The Hackney streaming backend uses `[{:async, :once}]`
pull-based mode for strict backpressure (a slow consumer can't grow its
mailbox under a fast LLM). LlamaCpp runs in-process via NIFs.
See [HTTP Backend](#http-backend) below for details.

> **Tip**: The named local providers (`lmstudio:`, `vllm:`, `sglang:`,
> `ollama:`) are the recommended way to talk to local OpenAI-compatible
Expand Down Expand Up @@ -266,11 +268,13 @@ agent = Nous.new("openai:gpt-4",

### HTTP Backend

Non-streaming HTTP requests go through a pluggable backend. Default is
`Nous.HTTP.Backend.Req` (Req on top of Finch); `Nous.HTTP.Backend.Hackney`
is shipped as an alternative. Streaming always uses hackney's `:async, :once`
pull-based mode for backpressure — that choice is structural, not
configurable.
Both the non-streaming and streaming HTTP paths go through pluggable
backends. Defaults are `Nous.HTTP.Backend.Req` and
`Nous.HTTP.StreamBackend.Req` (both on Req + Finch).
`Nous.HTTP.Backend.Hackney` and `Nous.HTTP.StreamBackend.Hackney` are
shipped as alternatives.

#### Non-streaming (`Nous.HTTP.Backend`)

Pick per-call, per-environment, or per-app:

Expand All @@ -286,8 +290,37 @@ HTTP.post(url, body, headers, backend: Nous.HTTP.Backend.Hackney)
config :nous, :http_backend, Nous.HTTP.Backend.Hackney
```

Tune the shared hackney `:default` pool from app config (used by both the
Hackney backend and the streaming pipeline):
#### Streaming (`Nous.HTTP.StreamBackend`)

Same resolution chain, separate config knob:

```elixir
# Per-call
HTTP.stream(url, body, headers,
stream_backend: Nous.HTTP.StreamBackend.Hackney)

# Env
# NOUS_HTTP_STREAM_BACKEND=hackney

# App config
config :nous, :http_stream_backend, Nous.HTTP.StreamBackend.Hackney
```

When to pick which streaming backend:

| Backend | Pick it when |
|---------|--------------|
| `Nous.HTTP.StreamBackend.Req` *(default)* | One HTTP stack across streaming + non-streaming. Right default for almost every app. Backpressure is bounded by parsing speed, not strict pull pacing — fine for typical LLM workloads where token rate is the bottleneck. |
| `Nous.HTTP.StreamBackend.Hackney` | Strict pull-based backpressure via `[{:async, :once}]`. Pick this when downstream consumers can block per chunk (LiveView fan-out under load, persistence-on-every-chunk, slow IO). |

Both emit identical normalized event streams (parsed JSON maps,
`{:stream_done, _}`, `{:stream_error, _}`); switching backends needs no
other code changes.

#### Hackney pool

Tune the shared hackney `:default` pool from app config (used by both
the Hackney non-streaming and Hackney streaming backends):

```elixir
config :nous, :hackney_pool,
Expand All @@ -297,9 +330,8 @@ config :nous, :hackney_pool,

See [the HTTP backend benchmark report](https://github.com/nyo16/nous/blob/master/docs/benchmarks/http_backend.md)
for localhost + real-endpoint benchmark numbers and guidance on when
to switch backends. Headline: stick with the Req default unless you
specifically need HTTP/3 (Alt-Svc auto-upgrade) or want to consolidate
on one HTTP family.
to switch backends. Headline: stick with the Req defaults unless you
have a specific reason (strict backpressure, HTTP/3 upgrade, single-HTTP-stack consolidation).

### Timeouts

Expand Down
68 changes: 68 additions & 0 deletions lib/nous/http/stream_backend.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
defmodule Nous.HTTP.StreamBackend do
@moduledoc """
Behaviour for SSE / chunked streaming HTTP backends.

Implemented by `Nous.HTTP.StreamBackend.Req` (default) and
`Nous.HTTP.StreamBackend.Hackney`. Selection mirrors the non-streaming
`Nous.HTTP.Backend` resolution chain (per-call → env var → app config →
default). See `Nous.Providers.HTTP.stream/4` for the resolution order.

Pick a backend three ways, highest precedence first:

# 1. Per-call opt
Nous.Providers.HTTP.stream(url, body, headers,
stream_backend: Nous.HTTP.StreamBackend.Hackney)

# 2. Environment variable (req | hackney | "MyApp.MyStreamBackend")
export NOUS_HTTP_STREAM_BACKEND=hackney

# 3. Application config
config :nous, :http_stream_backend, Nous.HTTP.StreamBackend.Hackney

Default: `Nous.HTTP.StreamBackend.Req`.

## When to pick which

- `Nous.HTTP.StreamBackend.Req` — one HTTP stack across streaming and
non-streaming, simpler dependency story. Right default for most apps.
Backpressure is bounded by parsing speed, not by `stream_next/1`
pacing — a fast LLM + slow consumer can grow the consumer's mailbox.
Acceptable for typical LLM workloads where token rate is the
bottleneck.
- `Nous.HTTP.StreamBackend.Hackney` — strict pull-based backpressure
via `:hackney`'s `{:async, :once}` mode. The consumer paces the
producer chunk-by-chunk. Pick this when downstream consumers can
block per chunk (LiveView assigns + diff + push under load,
persistence-on-every-chunk, slow IO).

Both backends emit the same normalized event stream (parsed JSON maps,
`{:stream_done, reason}`, `{:stream_error, reason}`). Switching between
them does not require changes elsewhere.

## Custom backends

Implement `c:stream/4` and return `{:ok, Enumerable.t()}` where the
enumerable emits parsed JSON maps, `{:stream_done, reason}` tuples, or
`{:stream_error, reason}` tuples. The stream MUST halt after the first
`{:stream_error, _}` and after `{:stream_done, _}`.
"""

@doc """
Issue a streaming POST request and return a lazy `Enumerable.t()` of
parsed events.

## Options
* `:timeout` — receive timeout in milliseconds (default: `60_000`)
* `:connect_timeout` — TCP connect timeout in milliseconds (default: `30_000`)
* `:stream_parser` — module implementing `parse_buffer/1` for non-SSE
formats (e.g. JSON-array streams). Defaults to SSE.

Backends MAY accept additional options; unknown options should be ignored.
"""
@callback stream(
url :: String.t(),
body :: map(),
headers :: [{String.t(), String.t()}],
opts :: keyword()
) :: {:ok, Enumerable.t()} | {:error, term()}
end
Loading