diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 0000000..92adf92
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,301 @@
+# AGENTS.md
+
+Quick-reference for AI coding agents (Claude, Cursor, Copilot, Codex, etc.)
+working with the **Nous** Elixir AI agent framework. This file is for agents
+that want to *use* the library, not for agents maintaining the library
+itself (see `CONTRIBUTING.md` and `docs/` for that). Conforms to
+<https://agents.md>.
+
+## What Nous is
+
+Multi-provider LLM framework for Elixir/OTP. Provides:
+
+- **One-shot LLM calls** (`Nous.generate_text/2,3`, `Nous.stream_text/2,3`)
+- **Stateful agents** with tool-calling, memory, plugins (`Nous.new/2`, `Nous.run/2,3`)
+- **Pluggable providers** — OpenAI, Anthropic, Gemini, Vertex AI, Groq, Mistral,
+  OpenRouter, Together, Ollama, LM Studio, vLLM, SGLang, LlamaCpp, and a
+  generic `custom:` adapter for any OpenAI-compatible endpoint
+- **Tool system** — file ops, bash, web fetch + search, plus easy custom tools
+- **Pluggable HTTP backend** (Req default, hackney alternative)
+- **Streaming with backpressure** (hackney `:async, :once` pull mode)
+
+## Minimal API surface (start here)
+
+```elixir
+# Drop-in: one-shot text generation
+{:ok, text} = Nous.generate_text("openai:gpt-4o", "Explain GenServer in one sentence.")
+
+# Streaming
+{:ok, stream} = Nous.stream_text("anthropic:claude-sonnet-4-5", "Write a haiku")
+Enum.each(stream, &IO.write/1)
+
+# Stateful agent with tools
+agent =
+  Nous.new("openai:gpt-4o",
+    tools: [Nous.Tools.FileRead, Nous.Tools.FileGrep],
+    system_prompt: "You are a code reviewer."
+  )
+
+{:ok, result} = Nous.run(agent, "Find all TODOs in lib/")
+# result.text, result.messages, result.usage
+
+# Streaming agent run
+{:ok, stream} = Nous.run_stream(agent, "Summarize this repo")
+```
+
+That's 90% of what most apps need. Everything else is configuration.
+
+## Provider quick-pick (model strings)
+
+Format is `"<provider>:<model_id>"`. Pick one:
+
+| If you want… | Use |
+|---|---|
+| Best general-purpose, high quality | `openai:gpt-4o` or `anthropic:claude-sonnet-4-5-20250929` |
+| Cheap and fast | `groq:llama-3.1-70b-versatile` or `gemini:gemini-2.0-flash` |
+| Local / no API key | `lmstudio:<your-loaded-model>` (default port 1234) |
+| Local high-throughput inference | `vllm:<huggingface-id>` (default port 8000) |
+| Local with structured generation | `sglang:<model>` (default port 30000) |
+| Anything else with an OpenAI-compatible API | `custom:<model>` + `:base_url` opt |
+
+Auth picks up the env var by convention: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`,
+`GROQ_API_KEY`, `GEMINI_API_KEY`, `OPENROUTER_API_KEY`, etc. Local providers
+don't need a key. Override per-call with `api_key:` opt.
+
+## Key opts you'll actually use
+
+```elixir
+Nous.new("openai:gpt-4o",
+  # LLM behavior
+  system_prompt: "...",
+  temperature: 0.7,
+  max_tokens: 2_000,
+  receive_timeout: 60_000,        # ms; 120_000 for local models
+
+  # Tools (modules implementing Nous.Tool.Behaviour)
+  tools: [Nous.Tools.Bash, MyApp.MyTool],
+
+  # Memory backend (optional)
+  memory: %{store: Nous.Memory.Store.ETS, opts: []},
+
+  # Plugins (optional, composable)
+  plugins: [Nous.Plugins.SubAgent, Nous.Plugins.HumanInTheLoop],
+
+  # Resilience
+  fallback: ["anthropic:claude-sonnet-4-5", "groq:llama-3.1-70b-versatile"],
+
+  # Vendor-specific body params (vLLM/SGLang/LM Studio/llama.cpp)
+  extra_body: %{top_k: 50, repetition_penalty: 1.1}
+)
+```
+
+## Built-in tools
+
+In `Nous.Tools.*`. The five most useful:
+
+- **`Nous.Tools.Bash`** — execute shell commands (requires approval handler in production)
+- **`Nous.Tools.FileRead`** / **`FileWrite`** / **`FileEdit`** — workspace-sandboxed file ops
+- **`Nous.Tools.FileGlob`** / **`FileGrep`** — find files / search content
+- **`Nous.Tools.WebFetch`** — fetch + extract text from a URL (SSRF-protected)
+- **`Nous.Tools.TavilySearch`** / **`BraveSearch`** — web search
+
+File tools enforce a workspace root. Default is `cwd`. Override per-agent:
+
+```elixir
+Nous.new("openai:gpt-4o",
+  tools: [Nous.Tools.FileRead],
+  deps: %{workspace_root: "/path/to/project"}
+)
+```
+
+## Building a custom tool
+
+```elixir
+defmodule MyApp.WeatherTool do
+  use Nous.Tool
+
+  @impl Nous.Tool.Behaviour
+  def name, do: "get_weather"
+
+  @impl Nous.Tool.Behaviour
+  def description, do: "Get current weather for a city"
+
+  @impl Nous.Tool.Behaviour
+  def parameters do
+    %{
+      "type" => "object",
+      "properties" => %{
+        "city" => %{"type" => "string", "description" => "City name"}
+      },
+      "required" => ["city"]
+    }
+  end
+
+  @impl Nous.Tool.Behaviour
+  def execute(%{"city" => city}, _ctx) do
+    {:ok, "Weather in #{city}: 72°F, sunny"}
+  end
+end
+```
+
+Pass it in the `tools:` list. The `_ctx` arg gives access to `deps`,
+the workspace root, and the approval handler. Use `Nous.Tool.Validator`
+for input validation — it runs automatically when `validate_args: true`
+(the default).
+
+## HTTP backend (don't change unless you need to)
+
+Default backend is `Nous.HTTP.Backend.Req` — Req on top of Finch. It's
+faster under parallel batching than the alternative. Override only if:
+
+- You need HTTP/3 → `NOUS_HTTP_BACKEND=hackney`
+- You want one HTTP family across streaming + non-streaming → same
+
+Pool config (hackney pool, used by streaming + Hackney backend):
+
+```elixir
+config :nous, :hackney_pool,
+  max_connections: 200,
+  timeout: 1_500   # idle keepalive ms (hackney 4 caps at 2_000)
+```
+
+Streaming **always** uses hackney's pull-based `:async, :once` mode for
+backpressure (slow consumer can't OOM under fast LLM). This is structural,
+not configurable. See `docs/benchmarks/http_backend.md`.
+
+## Critical rules (security & correctness)
+
+These are project-wide and non-negotiable. If you write code that breaks
+these, it will be rejected.
+
+1. **Never `String.to_atom/1` on untrusted input.** Use
+   `String.to_existing_atom/1` with rescue, or pattern-match on a
+   whitelist of literal strings. The atom table is finite and a
+   prompt-injection input can OOM the BEAM.
+2. **Tools requiring approval are rejected without an `:approval_handler`.**
+   `Bash`, `FileWrite`, `FileEdit` need one wired in `RunContext` or they
+   refuse to run. Don't disable this.
+3. **File tools enforce a workspace root.** Don't bypass `PathGuard`. Pass
+   paths within the workspace; the guard rejects `..` traversal, absolute
+   paths outside, and symlink escapes.
+4. **HTTP from agents goes through `UrlGuard`.** Don't make raw `Req.get/1`
+   calls from a tool to a user-controlled URL — use `Nous.Tools.WebFetch` or
+   call `UrlGuard.validate/2` first. Blocks RFC1918, loopback, link-local,
+   cloud-metadata IPs.
+5. **`PromptTemplate` rejects `<% ... %>` blocks** — only `<%= @var %>`
+   substitution is allowed. Don't try to enable EEx evaluation on
+   LLM-touched templates; it's an RCE vector.
+6. **Sub-agent deps don't auto-forward.** If you spawn a sub-agent via
+   `Nous.Plugins.SubAgent`, declare which deps it sees with
+   `:sub_agent_shared_deps, [:key1, :key2]`. The default `[]` is correct
+   for security.
+
+## Common workflows
+
+### Streaming to LiveView
+
+```elixir
+# In your LiveView mount or handle_event:
+{:ok, stream} = Nous.stream_text("anthropic:claude-sonnet-4-5", prompt)
+
+stream
+|> Stream.each(fn chunk ->
+  send(self(), {:llm_chunk, chunk})
+end)
+|> Stream.run()
+```
+
+The hackney backpressure means the stream paces itself to match LiveView's
+diff/push throughput — no mailbox accumulation.
+
+### Tool-using agent loop
+
+```elixir
+agent =
+  Nous.new("openai:gpt-4o",
+    tools: [Nous.Tools.FileGrep, Nous.Tools.FileRead, Nous.Tools.Bash],
+    max_iterations: 10
+  )
+
+{:ok, result} = Nous.run(agent, "Find the bug in lib/foo.ex and explain it")
+
+# result.messages contains the full transcript including tool calls
+# result.usage gives token counts per provider
+```
+
+### Provider failover
+
+```elixir
+agent =
+  Nous.new("openai:gpt-4o",
+    fallback: [
+      "anthropic:claude-sonnet-4-5-20250929",
+      "groq:llama-3.1-70b-versatile"
+    ]
+  )
+```
+
+Falls through on transport errors, 5xx, and rate-limit (429) responses.
+
+### Local dev with LM Studio
+
+```elixir
+# 1. Start LM Studio, load a model, start the server (default port 1234).
+# 2. In Elixir:
+{:ok, text} = Nous.generate_text("lmstudio:<exact-model-name-shown-in-lmstudio>",
+                                  "Hello!")
+
+# Or override the URL:
+agent = Nous.new("lmstudio:my-model", base_url: "http://gpu-host:1234/v1")
+```
+
+## Testing your code that uses Nous
+
+```elixir
+# Use the test helpers in Nous.Tool.Testing for tool unit tests.
+# For end-to-end agent tests, the recommended pattern is to use Bypass to
+# stub the LLM HTTP endpoint:
+
+setup do
+  bypass = Bypass.open()
+  base = "http://localhost:#{bypass.port}/v1"
+  {:ok, bypass: bypass, base: base}
+end
+
+test "agent calls the model", %{bypass: bypass, base: base} do
+  Bypass.expect_once(bypass, "POST", "/v1/chat/completions", fn conn ->
+    conn
+    |> Plug.Conn.put_resp_header("content-type", "application/json")
+    |> Plug.Conn.resp(200, ~s({"choices":[{"message":{"content":"hi!"}}]}))
+  end)
+
+  agent = Nous.new("custom:test-model", base_url: base, api_key: "test")
+  assert {:ok, %{text: "hi!"}} = Nous.run(agent, "hello")
+end
+```
+
+Don't mock `Req`/`hackney` directly — Bypass is the supported test seam.
+
+## What NOT to use
+
+The public API is `Nous.*` and `Nous.Tools.*`. These are NOT public:
+
+- `Nous.HTTP.Backend.*` — internal; use `HTTP.post/4`'s `:backend` opt instead
+- `Nous.Providers.HTTP` — internal helper for provider authors
+- `Nous.AgentRunner`, `Nous.AgentServer` — internal supervision; use `Nous.run/3`
+- `Nous.Application`, `Nous.Persistence.ETS.TableOwner` — internal supervision tree
+- Anything under `Nous.Workflow.Engine.*` — internal; the public API is `Nous.Workflow`
+- Anything marked `@moduledoc false` — hidden on purpose; will change without notice
+
+Stick to the documented modules and your code will survive minor version bumps.
+
+## Where to look for more
+
+- **Hex docs:** <https://hexdocs.pm/nous>
+- **Getting started:** `docs/getting-started.md`
+- **Production guides:** `docs/guides/` (skills, hooks, LiveView integration,
+  best practices, tool development, troubleshooting, evaluation, structured
+  output, workflows, memory, context, knowledge base)
+- **Examples:** `examples/`
+- **CHANGELOG:** behavioral changes per release; **read the "Behavioral /
+  breaking changes" sections before upgrading**.
diff --git a/CHANGELOG.md b/CHANGELOG.md
index fbd6b38..961e430 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,34 @@
 
 All notable changes to this project will be documented in this file.
 
+## [0.15.2] - 2026-04-27
+
+Documentation-only release. No code changes.
+
+### Added
+
+- **`AGENTS.md`** — quick-reference for AI coding agents (Claude, Cursor,
+  Copilot, Codex, etc.) consuming the library. Covers the minimal API,
+  provider quick-pick, key opts, custom tools, HTTP backend, security
+  rules, common workflows, and what's public vs internal. Conforms to
+  <https://agents.md>.
+
+### Changed
+
+- README "Supported Providers" table now lists `vllm:` and `sglang:`
+  as first-class named providers (previously only `lmstudio:` was
+  mentioned; vLLM and SGLang were buried in the `custom:` section).
+- README "Local Servers" section now recommends the dedicated
+  `lmstudio:` / `vllm:` / `sglang:` / `ollama:` prefixes over `custom:`
+  — they default to the right port, validate `*_BASE_URL` env vars
+  through `UrlGuard`, and pick up the OpenAI stream normalizer for free.
+- New "HTTP Backend" section in README covering the pluggable
+  `Nous.HTTP.Backend` behaviour, env-var selection, and shared hackney
+  pool config.
+- Cleaned up `mix docs` warnings — replaced backticks around hidden
+  module references in CHANGELOG so ExDoc no longer tries to auto-link
+  them.
+
 ## [0.15.1] - 2026-04-26
 
 Follow-up to 0.15.0. No behavioral changes for existing users — the
@@ -26,9 +54,9 @@ SGLang) up to date with the post-0.15.0 hackney streaming rewrite.
   in `docs/benchmarks/http_backend.md`.
 - **Hackney `:default` pool is now configurable from app config:**
   `config :nous, :hackney_pool, max_connections: 200, timeout: 1_500`.
-  Applied at `Nous.Application` boot. Used by both the Hackney HTTP
-  backend and the streaming pipeline. (Hackney 4 caps the idle
-  keepalive timeout at 2_000 ms — values above that silently cap.)
+  Applied at app boot. Used by both the Hackney HTTP backend and the
+  streaming pipeline. (Hackney 4 caps the idle keepalive timeout at
+  2_000 ms — values above that silently cap.)
 - **Per-call `:connect_timeout` and `:pool` opts** added to both HTTP
   backends and `Nous.Providers.HTTP.stream/4`. Default 30_000ms /
   `:default` pool. Lets a single app run different timeouts per
@@ -57,7 +85,7 @@ Minor version bump (not patch) because of the 9 behavioral changes called out be
 
 Read these before upgrading.
 
-- **Sub-agent deps no longer auto-forward to children.** `Nous.Plugins.SubAgent.compute_sub_deps/1` now defaults to `[]`. The previous default forwarded every parent dep (minus a 6-key denylist) — secrets, repo handles, signed URLs all leaked into LLM-controlled sub-agent contexts. To restore the old behaviour, set `:sub_agent_shared_deps, :all` explicitly. Recommended: list specific keys with `:sub_agent_shared_deps, [:key1, :key2]`.
+- **Sub-agent deps no longer auto-forward to children.** The `compute_sub_deps/1` helper in `Nous.Plugins.SubAgent` now defaults to `[]`. The previous default forwarded every parent dep (minus a 6-key denylist) — secrets, repo handles, signed URLs all leaked into LLM-controlled sub-agent contexts. To restore the old behaviour, set `:sub_agent_shared_deps, :all` explicitly. Recommended: list specific keys with `:sub_agent_shared_deps, [:key1, :key2]`.
 - **Tools with `requires_approval: true` are now rejected when no `:approval_handler` is wired** (was silently approved). If you use `Nous.Tools.Bash`, `FileWrite`, or `FileEdit`, configure an `approval_handler` on `RunContext` or those tools will refuse to run.
 - **File tools (`FileRead/Write/Edit/Glob/Grep`) now enforce a workspace root.** Defaults to `cwd`; override per-agent via `deps: %{workspace_root: "/path"}`. Paths that escape the root (absolute paths outside, `..` traversal, symlink-escape) are rejected with a clear error to the LLM.
 - **`PromptTemplate.from_template/2` rejects template bodies containing `<% ... %>` blocks** other than the simple `<%= @ident %>` substitution form. Previously bodies were passed through `EEx.eval_string/2`, which executes arbitrary Elixir — an RCE vector for any caller piping LLM output into a template. Conditionals must now be expressed by composing multiple smaller templates.
@@ -112,7 +140,7 @@ Read these before upgrading.
 - **AgentServer `load_context` runs in a `Task.Supervisor.start_child` task** with `GenServer.reply/2` — slow persistence backends no longer block concurrent `get_context` / `cancel_execution` calls.
 - **AgentDynamicSupervisor + Application supervisor restart limits** tuned to `max_restarts: 100, max_seconds: 10` (was the default 3-in-5) so one bad user's crash loop doesn't take down every other tenant.
 - **`Nous.Teams.RateLimiter` is now race-safe under concurrent acquires (M-9 final).** `acquire/3` now returns `{:ok, reservation_ref} | {:error, _}` and atomically reserves the estimated tokens + 1 request slot. `record_usage/3` accepts `:reservation` to reconcile actual vs estimated; missing reconciliations are auto-refunded after `:reservation_ttl_ms` (default 5 min) with a `Logger.warning/1`. `release/2` cancels a reservation when the call errored before completing. Legacy `record_usage/3` without `:reservation` still works for callers that don't go through `acquire`. Added `:open_reservations` to `get_status/1`.
-- **`Nous.Memory.Embedding.Bumblebee` uses a Registry + DynamicSupervisor (M-7 final).** Each model_name is owned by exactly one `ServingHolder` GenServer registered by name. Replaces the `:persistent_term` cache (which forced a node-wide GC pause per new model). `Nous.Application` conditionally adds the Registry + ServingSupervisor children when Bumblebee is loaded.
+- **`Nous.Memory.Embedding.Bumblebee` uses a Registry + DynamicSupervisor (M-7 final).** Each model_name is owned by exactly one `ServingHolder` GenServer registered by name. Replaces the `:persistent_term` cache (which forced a node-wide GC pause per new model). The application supervisor conditionally adds the Registry + ServingSupervisor children when Bumblebee is loaded.
 
 ### Fixed (UX / minor)
 
@@ -138,7 +166,7 @@ Read these before upgrading.
 
 ### Added
 
-- **`:extra_body` setting for arbitrary request body params** — pass vendor-specific top-level JSON keys (e.g. `top_k`, `chat_template_kwargs`, `repetition_penalty`, `min_p`, `best_of`, `ignore_eos`) to OpenAI-compatible providers (`vllm:`, `sglang:`, `custom:`, `lmstudio:`, `ollama:`). Mirrors the OpenAI Python SDK's `extra_body=` argument. Works in `default_settings`, `Nous.LLM` calls, and `Agent.run/3` `model_settings`. Atom keys are stringified at request build time; nested values pass through verbatim. `extra_body` wins on collision with whitelisted keys (escape-hatch semantics). Also forwarded by Gemini and Vertex AI overrides.
+- **`:extra_body` setting for arbitrary request body params** — pass vendor-specific top-level JSON keys (e.g. `top_k`, `chat_template_kwargs`, `repetition_penalty`, `min_p`, `best_of`, `ignore_eos`) to OpenAI-compatible providers (`vllm:`, `sglang:`, `custom:`, `lmstudio:`, `ollama:`). Mirrors the OpenAI Python SDK's `extra_body=` argument. Works in `default_settings`, `Nous.LLM` calls, and agent `model_settings`. Atom keys are stringified at request build time; nested values pass through verbatim. `extra_body` wins on collision with whitelisted keys (escape-hatch semantics). Also forwarded by Gemini and Vertex AI overrides.
 
   Example — disable Qwen3 thinking and tune sampling on a vLLM endpoint:
 
diff --git a/README.md b/README.md
index f344ee5..f35b44e 100644
--- a/README.md
+++ b/README.md
@@ -23,7 +23,7 @@ Add to your `mix.exs`:
 ```elixir
 def deps do
   [
-    {:nous, "~> 0.15.1"}
+    {:nous, "~> 0.15.2"}
   ]
 end
 ```
@@ -95,23 +95,32 @@ IO.puts("Tokens: #{result.usage.total_tokens}")
 
 | Provider | Model String | Streaming |
 |----------|-------------|-----------|
-| LM Studio | `lmstudio:qwen3` | ✅ |
 | OpenAI | `openai:gpt-4` | ✅ |
 | Anthropic | `anthropic:claude-sonnet-4-5-20250929` | ✅ |
 | Google Gemini | `gemini:gemini-2.0-flash` | ✅ |
 | Google Vertex AI | `vertex_ai:gemini-3.1-pro-preview` | ✅ |
 | Groq | `groq:llama-3.1-70b-versatile` | ✅ |
-| Ollama | `ollama:llama2` | ✅ |
+| Mistral | `mistral:mistral-large-latest` | ✅ |
 | OpenRouter | `openrouter:anthropic/claude-3.5-sonnet` | ✅ |
 | Together AI | `together:meta-llama/Llama-3-70b-chat-hf` | ✅ |
+| Ollama | `ollama:llama2` | ✅ |
+| LM Studio | `lmstudio:qwen3` | ✅ |
+| vLLM | `vllm:meta-llama/Llama-3-8B-Instruct` | ✅ |
+| SGLang | `sglang:meta-llama/Llama-3-8B-Instruct` | ✅ |
 | LlamaCpp | `llamacpp:local` + `:llamacpp_model` | ✅ |
 | **Custom** | `custom:model` + `:base_url` | ✅ |
 
-All HTTP providers use pure Elixir HTTP clients (Req + Finch). LlamaCpp runs in-process via NIFs.
+HTTP providers use a pluggable backend — `Req` (default, on top of Finch) or
+`hackney 4` — selected per-call, via `NOUS_HTTP_BACKEND`, or via app config.
+Streaming always uses `hackney`'s `:async, :once` pull-based mode for
+backpressure (a slow consumer can't OOM under a fast LLM). LlamaCpp runs
+in-process via NIFs. See [HTTP Backend](#http-backend) below for details.
 
-> **Tip**: The `custom:` prefix works with **any** OpenAI-compatible endpoint—Groq, Together,
-> OpenRouter, local servers (vLLM, SGLang, LM Studio), or self-hosted endpoints. See
-> [Custom Providers](#custom-providers) for details.
+> **Tip**: The named local providers (`lmstudio:`, `vllm:`, `sglang:`,
+> `ollama:`) are the recommended way to talk to local OpenAI-compatible
+> servers — they default to the right port, validate `*_BASE_URL` env vars
+> through `UrlGuard`, and pick up the OpenAI stream normalizer for free.
+> Use `custom:` only when no named provider fits.
 
 ### Custom Providers
 
@@ -177,23 +186,25 @@ agent = Nous.new("custom:anthropic/claude-3.5-sonnet",
 )
 ```
 
-**Local Servers** (LM Studio, Ollama, vLLM, SGLang):
-```elixir
-# LM Studio (default: localhost:1234)
-agent = Nous.new("custom:qwen3", base_url: "http://localhost:1234/v1")
-
-# Ollama (default: localhost:11434)
-agent = Nous.new("custom:llama2", base_url: "http://localhost:11434/v1")
+**Local Servers** — prefer the named providers below; use `custom:` only when
+your local server isn't one of them.
 
-# vLLM (default: localhost:8000)
-agent = Nous.new("custom:my-model", base_url: "http://localhost:8000/v1")
+```elixir
+# Named providers — recommended. Each defaults to the standard port for
+# its server, and the *_BASE_URL env var is validated for SSRF safety.
+agent = Nous.new("lmstudio:qwen3")                          # localhost:1234
+agent = Nous.new("ollama:llama2")                           # localhost:11434
+agent = Nous.new("vllm:meta-llama/Llama-3-8B-Instruct")     # localhost:8000
+agent = Nous.new("sglang:meta-llama/Llama-3-8B-Instruct")   # localhost:30000
 
-# SGLang (default: localhost:30000)
-agent = Nous.new("custom:my-model", base_url: "http://localhost:30000/v1")
+# Per-provider overrides via env (or :base_url opt):
+# export LMSTUDIO_BASE_URL="http://10.0.0.5:1234/v1"
+# export VLLM_BASE_URL="http://gpu-host:8000/v1"
+# export SGLANG_BASE_URL="http://gpu-host:30000/v1"
 
-# Or use environment variables
-# export CUSTOM_BASE_URL="http://localhost:1234/v1"
-agent = Nous.new("custom:qwen3")  # base_url read from env
+# Fall back to custom: only for non-OpenAI-compatible local servers,
+# or servers without a named provider.
+agent = Nous.new("custom:my-model", base_url: "http://localhost:9999/v1")
 ```
 
 > **Note**: The legacy `openai_compatible:` prefix still works for backward compatibility
@@ -253,6 +264,43 @@ agent = Nous.new("openai:gpt-4",
 )
 ```
 
+### HTTP Backend
+
+Non-streaming HTTP requests go through a pluggable backend. Default is
+`Nous.HTTP.Backend.Req` (Req on top of Finch); `Nous.HTTP.Backend.Hackney`
+is shipped as an alternative. Streaming always uses hackney's `:async, :once`
+pull-based mode for backpressure — that choice is structural, not
+configurable.
+
+Pick per-call, per-environment, or per-app:
+
+```elixir
+# Per-call
+HTTP.post(url, body, headers, backend: Nous.HTTP.Backend.Hackney)
+
+# Env (highest precedence after per-call):
+# NOUS_HTTP_BACKEND=hackney   # also accepts "req" or a fully-qualified
+#                             # custom module name like "MyApp.MyBackend"
+
+# App config
+config :nous, :http_backend, Nous.HTTP.Backend.Hackney
+```
+
+Tune the shared hackney `:default` pool from app config (used by both the
+Hackney backend and the streaming pipeline):
+
+```elixir
+config :nous, :hackney_pool,
+  max_connections: 200,
+  timeout: 1_500   # idle keepalive ms (hackney 4 caps at 2_000)
+```
+
+See [the HTTP backend benchmark report](https://github.com/nyo16/nous/blob/master/docs/benchmarks/http_backend.md)
+for localhost + real-endpoint benchmark numbers and guidance on when
+to switch backends. Headline: stick with the Req default unless you
+specifically need HTTP/3 (Alt-Svc auto-upgrade) or want to consolidate
+on one HTTP family.
+
 ### Timeouts
 
 Each provider has sensible default timeouts (60s for cloud APIs, 120s for local models). Override per-model with `receive_timeout`:
diff --git a/lib/nous/http/backend/hackney.ex b/lib/nous/http/backend/hackney.ex
index 3e5372d..18fb211 100644
--- a/lib/nous/http/backend/hackney.ex
+++ b/lib/nous/http/backend/hackney.ex
@@ -4,7 +4,7 @@ defmodule Nous.HTTP.Backend.Hackney do
 
   Uses `:hackney.request/5` synchronously — hackney 4 returns the full
   response body inline as `{:ok, status, headers, body}` (the legacy
-  `:hackney.body/1` follow-up call from hackney 1.x was removed).
+  `hackney.body/1` follow-up call from hackney 1.x was removed in v4).
   Hackney 4 is already in the dependency tree from 0.15.0 (used for
   streaming) — this backend lets users consolidate non-streaming HTTP
   onto the same library without keeping Finch/Mint in the hot path.
@@ -47,7 +47,7 @@ defmodule Nous.HTTP.Backend.Hackney do
   end
 
   # Hackney 4 returns the body inline: `{:ok, status, headers, body}`. The
-  # legacy `:hackney.body/1` follow-up call from hackney 1.x is gone — the
+  # legacy hackney.body/1 follow-up call from hackney 1.x is gone — the
   # `with_body` option is now the default and ignored.
   defp do_request(url, headers, body, timeout, connect_timeout, pool) do
     hackney_opts = [
diff --git a/lib/nous/persistence/ets.ex b/lib/nous/persistence/ets.ex
index 193689c..8b9c4d7 100644
--- a/lib/nous/persistence/ets.ex
+++ b/lib/nous/persistence/ets.ex
@@ -3,10 +3,9 @@ defmodule Nous.Persistence.ETS do
   ETS-based persistence backend.
 
   Stores serialized context data in a named ETS table. The table is owned
-  by a dedicated GenServer (`Nous.Persistence.ETS.TableOwner`) started
-  under the Nous application supervisor, so the table outlives transient
-  callers - previously the table died with whichever process happened to
-  call save/load first.
+  by a dedicated GenServer started under the Nous application supervisor,
+  so the table outlives transient callers - previously the table died
+  with whichever process happened to call save/load first.
 
   Data does not survive node restarts. Useful for development, testing,
   and short-lived sessions.
diff --git a/mix.exs b/mix.exs
index d2ddd82..b7853f3 100644
--- a/mix.exs
+++ b/mix.exs
@@ -1,7 +1,7 @@
 defmodule Nous.MixProject do
   use Mix.Project
 
-  @version "0.15.1"
+  @version "0.15.2"
   @source_url "https://github.com/nyo16/nous"
 
   def project do