diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..92adf92 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,301 @@ +# AGENTS.md + +Quick-reference for AI coding agents (Claude, Cursor, Copilot, Codex, etc.) +working with the **Nous** Elixir AI agent framework. This file is for agents +that want to *use* the library, not for agents maintaining the library +itself (see `CONTRIBUTING.md` and `docs/` for that). Conforms to +. + +## What Nous is + +Multi-provider LLM framework for Elixir/OTP. Provides: + +- **One-shot LLM calls** (`Nous.generate_text/2,3`, `Nous.stream_text/2,3`) +- **Stateful agents** with tool-calling, memory, plugins (`Nous.new/2`, `Nous.run/2,3`) +- **Pluggable providers** — OpenAI, Anthropic, Gemini, Vertex AI, Groq, Mistral, + OpenRouter, Together, Ollama, LM Studio, vLLM, SGLang, LlamaCpp, and a + generic `custom:` adapter for any OpenAI-compatible endpoint +- **Tool system** — file ops, bash, web fetch + search, plus easy custom tools +- **Pluggable HTTP backend** (Req default, hackney alternative) +- **Streaming with backpressure** (hackney `:async, :once` pull mode) + +## Minimal API surface (start here) + +```elixir +# Drop-in: one-shot text generation +{:ok, text} = Nous.generate_text("openai:gpt-4o", "Explain GenServer in one sentence.") + +# Streaming +{:ok, stream} = Nous.stream_text("anthropic:claude-sonnet-4-5", "Write a haiku") +Enum.each(stream, &IO.write/1) + +# Stateful agent with tools +agent = + Nous.new("openai:gpt-4o", + tools: [Nous.Tools.FileRead, Nous.Tools.FileGrep], + system_prompt: "You are a code reviewer." + ) + +{:ok, result} = Nous.run(agent, "Find all TODOs in lib/") +# result.text, result.messages, result.usage + +# Streaming agent run +{:ok, stream} = Nous.run_stream(agent, "Summarize this repo") +``` + +That's 90% of what most apps need. Everything else is configuration. + +## Provider quick-pick (model strings) + +Format is `":"`. Pick one: + +| If you want… | Use | +|---|---| +| Best general-purpose, high quality | `openai:gpt-4o` or `anthropic:claude-sonnet-4-5-20250929` | +| Cheap and fast | `groq:llama-3.1-70b-versatile` or `gemini:gemini-2.0-flash` | +| Local / no API key | `lmstudio:` (default port 1234) | +| Local high-throughput inference | `vllm:` (default port 8000) | +| Local with structured generation | `sglang:` (default port 30000) | +| Anything else with an OpenAI-compatible API | `custom:` + `:base_url` opt | + +Auth picks up the env var by convention: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, +`GROQ_API_KEY`, `GEMINI_API_KEY`, `OPENROUTER_API_KEY`, etc. Local providers +don't need a key. Override per-call with `api_key:` opt. + +## Key opts you'll actually use + +```elixir +Nous.new("openai:gpt-4o", + # LLM behavior + system_prompt: "...", + temperature: 0.7, + max_tokens: 2_000, + receive_timeout: 60_000, # ms; 120_000 for local models + + # Tools (modules implementing Nous.Tool.Behaviour) + tools: [Nous.Tools.Bash, MyApp.MyTool], + + # Memory backend (optional) + memory: %{store: Nous.Memory.Store.ETS, opts: []}, + + # Plugins (optional, composable) + plugins: [Nous.Plugins.SubAgent, Nous.Plugins.HumanInTheLoop], + + # Resilience + fallback: ["anthropic:claude-sonnet-4-5", "groq:llama-3.1-70b-versatile"], + + # Vendor-specific body params (vLLM/SGLang/LM Studio/llama.cpp) + extra_body: %{top_k: 50, repetition_penalty: 1.1} +) +``` + +## Built-in tools + +In `Nous.Tools.*`. The five most useful: + +- **`Nous.Tools.Bash`** — execute shell commands (requires approval handler in production) +- **`Nous.Tools.FileRead`** / **`FileWrite`** / **`FileEdit`** — workspace-sandboxed file ops +- **`Nous.Tools.FileGlob`** / **`FileGrep`** — find files / search content +- **`Nous.Tools.WebFetch`** — fetch + extract text from a URL (SSRF-protected) +- **`Nous.Tools.TavilySearch`** / **`BraveSearch`** — web search + +File tools enforce a workspace root. Default is `cwd`. Override per-agent: + +```elixir +Nous.new("openai:gpt-4o", + tools: [Nous.Tools.FileRead], + deps: %{workspace_root: "/path/to/project"} +) +``` + +## Building a custom tool + +```elixir +defmodule MyApp.WeatherTool do + use Nous.Tool + + @impl Nous.Tool.Behaviour + def name, do: "get_weather" + + @impl Nous.Tool.Behaviour + def description, do: "Get current weather for a city" + + @impl Nous.Tool.Behaviour + def parameters do + %{ + "type" => "object", + "properties" => %{ + "city" => %{"type" => "string", "description" => "City name"} + }, + "required" => ["city"] + } + end + + @impl Nous.Tool.Behaviour + def execute(%{"city" => city}, _ctx) do + {:ok, "Weather in #{city}: 72°F, sunny"} + end +end +``` + +Pass it in the `tools:` list. The `_ctx` arg gives access to `deps`, +the workspace root, and the approval handler. Use `Nous.Tool.Validator` +for input validation — it runs automatically when `validate_args: true` +(the default). + +## HTTP backend (don't change unless you need to) + +Default backend is `Nous.HTTP.Backend.Req` — Req on top of Finch. It's +faster under parallel batching than the alternative. Override only if: + +- You need HTTP/3 → `NOUS_HTTP_BACKEND=hackney` +- You want one HTTP family across streaming + non-streaming → same + +Pool config (hackney pool, used by streaming + Hackney backend): + +```elixir +config :nous, :hackney_pool, + max_connections: 200, + timeout: 1_500 # idle keepalive ms (hackney 4 caps at 2_000) +``` + +Streaming **always** uses hackney's pull-based `:async, :once` mode for +backpressure (slow consumer can't OOM under fast LLM). This is structural, +not configurable. See `docs/benchmarks/http_backend.md`. + +## Critical rules (security & correctness) + +These are project-wide and non-negotiable. If you write code that breaks +these, it will be rejected. + +1. **Never `String.to_atom/1` on untrusted input.** Use + `String.to_existing_atom/1` with rescue, or pattern-match on a + whitelist of literal strings. The atom table is finite and a + prompt-injection input can OOM the BEAM. +2. **Tools requiring approval are rejected without an `:approval_handler`.** + `Bash`, `FileWrite`, `FileEdit` need one wired in `RunContext` or they + refuse to run. Don't disable this. +3. **File tools enforce a workspace root.** Don't bypass `PathGuard`. Pass + paths within the workspace; the guard rejects `..` traversal, absolute + paths outside, and symlink escapes. +4. **HTTP from agents goes through `UrlGuard`.** Don't make raw `Req.get/1` + calls from a tool to a user-controlled URL — use `Nous.Tools.WebFetch` or + call `UrlGuard.validate/2` first. Blocks RFC1918, loopback, link-local, + cloud-metadata IPs. +5. **`PromptTemplate` rejects `<% ... %>` blocks** — only `<%= @var %>` + substitution is allowed. Don't try to enable EEx evaluation on + LLM-touched templates; it's an RCE vector. +6. **Sub-agent deps don't auto-forward.** If you spawn a sub-agent via + `Nous.Plugins.SubAgent`, declare which deps it sees with + `:sub_agent_shared_deps, [:key1, :key2]`. The default `[]` is correct + for security. + +## Common workflows + +### Streaming to LiveView + +```elixir +# In your LiveView mount or handle_event: +{:ok, stream} = Nous.stream_text("anthropic:claude-sonnet-4-5", prompt) + +stream +|> Stream.each(fn chunk -> + send(self(), {:llm_chunk, chunk}) +end) +|> Stream.run() +``` + +The hackney backpressure means the stream paces itself to match LiveView's +diff/push throughput — no mailbox accumulation. + +### Tool-using agent loop + +```elixir +agent = + Nous.new("openai:gpt-4o", + tools: [Nous.Tools.FileGrep, Nous.Tools.FileRead, Nous.Tools.Bash], + max_iterations: 10 + ) + +{:ok, result} = Nous.run(agent, "Find the bug in lib/foo.ex and explain it") + +# result.messages contains the full transcript including tool calls +# result.usage gives token counts per provider +``` + +### Provider failover + +```elixir +agent = + Nous.new("openai:gpt-4o", + fallback: [ + "anthropic:claude-sonnet-4-5-20250929", + "groq:llama-3.1-70b-versatile" + ] + ) +``` + +Falls through on transport errors, 5xx, and rate-limit (429) responses. + +### Local dev with LM Studio + +```elixir +# 1. Start LM Studio, load a model, start the server (default port 1234). +# 2. In Elixir: +{:ok, text} = Nous.generate_text("lmstudio:", + "Hello!") + +# Or override the URL: +agent = Nous.new("lmstudio:my-model", base_url: "http://gpu-host:1234/v1") +``` + +## Testing your code that uses Nous + +```elixir +# Use the test helpers in Nous.Tool.Testing for tool unit tests. +# For end-to-end agent tests, the recommended pattern is to use Bypass to +# stub the LLM HTTP endpoint: + +setup do + bypass = Bypass.open() + base = "http://localhost:#{bypass.port}/v1" + {:ok, bypass: bypass, base: base} +end + +test "agent calls the model", %{bypass: bypass, base: base} do + Bypass.expect_once(bypass, "POST", "/v1/chat/completions", fn conn -> + conn + |> Plug.Conn.put_resp_header("content-type", "application/json") + |> Plug.Conn.resp(200, ~s({"choices":[{"message":{"content":"hi!"}}]})) + end) + + agent = Nous.new("custom:test-model", base_url: base, api_key: "test") + assert {:ok, %{text: "hi!"}} = Nous.run(agent, "hello") +end +``` + +Don't mock `Req`/`hackney` directly — Bypass is the supported test seam. + +## What NOT to use + +The public API is `Nous.*` and `Nous.Tools.*`. These are NOT public: + +- `Nous.HTTP.Backend.*` — internal; use `HTTP.post/4`'s `:backend` opt instead +- `Nous.Providers.HTTP` — internal helper for provider authors +- `Nous.AgentRunner`, `Nous.AgentServer` — internal supervision; use `Nous.run/3` +- `Nous.Application`, `Nous.Persistence.ETS.TableOwner` — internal supervision tree +- Anything under `Nous.Workflow.Engine.*` — internal; the public API is `Nous.Workflow` +- Anything marked `@moduledoc false` — hidden on purpose; will change without notice + +Stick to the documented modules and your code will survive minor version bumps. + +## Where to look for more + +- **Hex docs:** +- **Getting started:** `docs/getting-started.md` +- **Production guides:** `docs/guides/` (skills, hooks, LiveView integration, + best practices, tool development, troubleshooting, evaluation, structured + output, workflows, memory, context, knowledge base) +- **Examples:** `examples/` +- **CHANGELOG:** behavioral changes per release; **read the "Behavioral / + breaking changes" sections before upgrading**. diff --git a/CHANGELOG.md b/CHANGELOG.md index fbd6b38..961e430 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,34 @@ All notable changes to this project will be documented in this file. +## [0.15.2] - 2026-04-27 + +Documentation-only release. No code changes. + +### Added + +- **`AGENTS.md`** — quick-reference for AI coding agents (Claude, Cursor, + Copilot, Codex, etc.) consuming the library. Covers the minimal API, + provider quick-pick, key opts, custom tools, HTTP backend, security + rules, common workflows, and what's public vs internal. Conforms to + . + +### Changed + +- README "Supported Providers" table now lists `vllm:` and `sglang:` + as first-class named providers (previously only `lmstudio:` was + mentioned; vLLM and SGLang were buried in the `custom:` section). +- README "Local Servers" section now recommends the dedicated + `lmstudio:` / `vllm:` / `sglang:` / `ollama:` prefixes over `custom:` + — they default to the right port, validate `*_BASE_URL` env vars + through `UrlGuard`, and pick up the OpenAI stream normalizer for free. +- New "HTTP Backend" section in README covering the pluggable + `Nous.HTTP.Backend` behaviour, env-var selection, and shared hackney + pool config. +- Cleaned up `mix docs` warnings — replaced backticks around hidden + module references in CHANGELOG so ExDoc no longer tries to auto-link + them. + ## [0.15.1] - 2026-04-26 Follow-up to 0.15.0. No behavioral changes for existing users — the @@ -26,9 +54,9 @@ SGLang) up to date with the post-0.15.0 hackney streaming rewrite. in `docs/benchmarks/http_backend.md`. - **Hackney `:default` pool is now configurable from app config:** `config :nous, :hackney_pool, max_connections: 200, timeout: 1_500`. - Applied at `Nous.Application` boot. Used by both the Hackney HTTP - backend and the streaming pipeline. (Hackney 4 caps the idle - keepalive timeout at 2_000 ms — values above that silently cap.) + Applied at app boot. Used by both the Hackney HTTP backend and the + streaming pipeline. (Hackney 4 caps the idle keepalive timeout at + 2_000 ms — values above that silently cap.) - **Per-call `:connect_timeout` and `:pool` opts** added to both HTTP backends and `Nous.Providers.HTTP.stream/4`. Default 30_000ms / `:default` pool. Lets a single app run different timeouts per @@ -57,7 +85,7 @@ Minor version bump (not patch) because of the 9 behavioral changes called out be Read these before upgrading. -- **Sub-agent deps no longer auto-forward to children.** `Nous.Plugins.SubAgent.compute_sub_deps/1` now defaults to `[]`. The previous default forwarded every parent dep (minus a 6-key denylist) — secrets, repo handles, signed URLs all leaked into LLM-controlled sub-agent contexts. To restore the old behaviour, set `:sub_agent_shared_deps, :all` explicitly. Recommended: list specific keys with `:sub_agent_shared_deps, [:key1, :key2]`. +- **Sub-agent deps no longer auto-forward to children.** The `compute_sub_deps/1` helper in `Nous.Plugins.SubAgent` now defaults to `[]`. The previous default forwarded every parent dep (minus a 6-key denylist) — secrets, repo handles, signed URLs all leaked into LLM-controlled sub-agent contexts. To restore the old behaviour, set `:sub_agent_shared_deps, :all` explicitly. Recommended: list specific keys with `:sub_agent_shared_deps, [:key1, :key2]`. - **Tools with `requires_approval: true` are now rejected when no `:approval_handler` is wired** (was silently approved). If you use `Nous.Tools.Bash`, `FileWrite`, or `FileEdit`, configure an `approval_handler` on `RunContext` or those tools will refuse to run. - **File tools (`FileRead/Write/Edit/Glob/Grep`) now enforce a workspace root.** Defaults to `cwd`; override per-agent via `deps: %{workspace_root: "/path"}`. Paths that escape the root (absolute paths outside, `..` traversal, symlink-escape) are rejected with a clear error to the LLM. - **`PromptTemplate.from_template/2` rejects template bodies containing `<% ... %>` blocks** other than the simple `<%= @ident %>` substitution form. Previously bodies were passed through `EEx.eval_string/2`, which executes arbitrary Elixir — an RCE vector for any caller piping LLM output into a template. Conditionals must now be expressed by composing multiple smaller templates. @@ -112,7 +140,7 @@ Read these before upgrading. - **AgentServer `load_context` runs in a `Task.Supervisor.start_child` task** with `GenServer.reply/2` — slow persistence backends no longer block concurrent `get_context` / `cancel_execution` calls. - **AgentDynamicSupervisor + Application supervisor restart limits** tuned to `max_restarts: 100, max_seconds: 10` (was the default 3-in-5) so one bad user's crash loop doesn't take down every other tenant. - **`Nous.Teams.RateLimiter` is now race-safe under concurrent acquires (M-9 final).** `acquire/3` now returns `{:ok, reservation_ref} | {:error, _}` and atomically reserves the estimated tokens + 1 request slot. `record_usage/3` accepts `:reservation` to reconcile actual vs estimated; missing reconciliations are auto-refunded after `:reservation_ttl_ms` (default 5 min) with a `Logger.warning/1`. `release/2` cancels a reservation when the call errored before completing. Legacy `record_usage/3` without `:reservation` still works for callers that don't go through `acquire`. Added `:open_reservations` to `get_status/1`. -- **`Nous.Memory.Embedding.Bumblebee` uses a Registry + DynamicSupervisor (M-7 final).** Each model_name is owned by exactly one `ServingHolder` GenServer registered by name. Replaces the `:persistent_term` cache (which forced a node-wide GC pause per new model). `Nous.Application` conditionally adds the Registry + ServingSupervisor children when Bumblebee is loaded. +- **`Nous.Memory.Embedding.Bumblebee` uses a Registry + DynamicSupervisor (M-7 final).** Each model_name is owned by exactly one `ServingHolder` GenServer registered by name. Replaces the `:persistent_term` cache (which forced a node-wide GC pause per new model). The application supervisor conditionally adds the Registry + ServingSupervisor children when Bumblebee is loaded. ### Fixed (UX / minor) @@ -138,7 +166,7 @@ Read these before upgrading. ### Added -- **`:extra_body` setting for arbitrary request body params** — pass vendor-specific top-level JSON keys (e.g. `top_k`, `chat_template_kwargs`, `repetition_penalty`, `min_p`, `best_of`, `ignore_eos`) to OpenAI-compatible providers (`vllm:`, `sglang:`, `custom:`, `lmstudio:`, `ollama:`). Mirrors the OpenAI Python SDK's `extra_body=` argument. Works in `default_settings`, `Nous.LLM` calls, and `Agent.run/3` `model_settings`. Atom keys are stringified at request build time; nested values pass through verbatim. `extra_body` wins on collision with whitelisted keys (escape-hatch semantics). Also forwarded by Gemini and Vertex AI overrides. +- **`:extra_body` setting for arbitrary request body params** — pass vendor-specific top-level JSON keys (e.g. `top_k`, `chat_template_kwargs`, `repetition_penalty`, `min_p`, `best_of`, `ignore_eos`) to OpenAI-compatible providers (`vllm:`, `sglang:`, `custom:`, `lmstudio:`, `ollama:`). Mirrors the OpenAI Python SDK's `extra_body=` argument. Works in `default_settings`, `Nous.LLM` calls, and agent `model_settings`. Atom keys are stringified at request build time; nested values pass through verbatim. `extra_body` wins on collision with whitelisted keys (escape-hatch semantics). Also forwarded by Gemini and Vertex AI overrides. Example — disable Qwen3 thinking and tune sampling on a vLLM endpoint: diff --git a/README.md b/README.md index f344ee5..f35b44e 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ Add to your `mix.exs`: ```elixir def deps do [ - {:nous, "~> 0.15.1"} + {:nous, "~> 0.15.2"} ] end ``` @@ -95,23 +95,32 @@ IO.puts("Tokens: #{result.usage.total_tokens}") | Provider | Model String | Streaming | |----------|-------------|-----------| -| LM Studio | `lmstudio:qwen3` | ✅ | | OpenAI | `openai:gpt-4` | ✅ | | Anthropic | `anthropic:claude-sonnet-4-5-20250929` | ✅ | | Google Gemini | `gemini:gemini-2.0-flash` | ✅ | | Google Vertex AI | `vertex_ai:gemini-3.1-pro-preview` | ✅ | | Groq | `groq:llama-3.1-70b-versatile` | ✅ | -| Ollama | `ollama:llama2` | ✅ | +| Mistral | `mistral:mistral-large-latest` | ✅ | | OpenRouter | `openrouter:anthropic/claude-3.5-sonnet` | ✅ | | Together AI | `together:meta-llama/Llama-3-70b-chat-hf` | ✅ | +| Ollama | `ollama:llama2` | ✅ | +| LM Studio | `lmstudio:qwen3` | ✅ | +| vLLM | `vllm:meta-llama/Llama-3-8B-Instruct` | ✅ | +| SGLang | `sglang:meta-llama/Llama-3-8B-Instruct` | ✅ | | LlamaCpp | `llamacpp:local` + `:llamacpp_model` | ✅ | | **Custom** | `custom:model` + `:base_url` | ✅ | -All HTTP providers use pure Elixir HTTP clients (Req + Finch). LlamaCpp runs in-process via NIFs. +HTTP providers use a pluggable backend — `Req` (default, on top of Finch) or +`hackney 4` — selected per-call, via `NOUS_HTTP_BACKEND`, or via app config. +Streaming always uses `hackney`'s `:async, :once` pull-based mode for +backpressure (a slow consumer can't OOM under a fast LLM). LlamaCpp runs +in-process via NIFs. See [HTTP Backend](#http-backend) below for details. -> **Tip**: The `custom:` prefix works with **any** OpenAI-compatible endpoint—Groq, Together, -> OpenRouter, local servers (vLLM, SGLang, LM Studio), or self-hosted endpoints. See -> [Custom Providers](#custom-providers) for details. +> **Tip**: The named local providers (`lmstudio:`, `vllm:`, `sglang:`, +> `ollama:`) are the recommended way to talk to local OpenAI-compatible +> servers — they default to the right port, validate `*_BASE_URL` env vars +> through `UrlGuard`, and pick up the OpenAI stream normalizer for free. +> Use `custom:` only when no named provider fits. ### Custom Providers @@ -177,23 +186,25 @@ agent = Nous.new("custom:anthropic/claude-3.5-sonnet", ) ``` -**Local Servers** (LM Studio, Ollama, vLLM, SGLang): -```elixir -# LM Studio (default: localhost:1234) -agent = Nous.new("custom:qwen3", base_url: "http://localhost:1234/v1") - -# Ollama (default: localhost:11434) -agent = Nous.new("custom:llama2", base_url: "http://localhost:11434/v1") +**Local Servers** — prefer the named providers below; use `custom:` only when +your local server isn't one of them. -# vLLM (default: localhost:8000) -agent = Nous.new("custom:my-model", base_url: "http://localhost:8000/v1") +```elixir +# Named providers — recommended. Each defaults to the standard port for +# its server, and the *_BASE_URL env var is validated for SSRF safety. +agent = Nous.new("lmstudio:qwen3") # localhost:1234 +agent = Nous.new("ollama:llama2") # localhost:11434 +agent = Nous.new("vllm:meta-llama/Llama-3-8B-Instruct") # localhost:8000 +agent = Nous.new("sglang:meta-llama/Llama-3-8B-Instruct") # localhost:30000 -# SGLang (default: localhost:30000) -agent = Nous.new("custom:my-model", base_url: "http://localhost:30000/v1") +# Per-provider overrides via env (or :base_url opt): +# export LMSTUDIO_BASE_URL="http://10.0.0.5:1234/v1" +# export VLLM_BASE_URL="http://gpu-host:8000/v1" +# export SGLANG_BASE_URL="http://gpu-host:30000/v1" -# Or use environment variables -# export CUSTOM_BASE_URL="http://localhost:1234/v1" -agent = Nous.new("custom:qwen3") # base_url read from env +# Fall back to custom: only for non-OpenAI-compatible local servers, +# or servers without a named provider. +agent = Nous.new("custom:my-model", base_url: "http://localhost:9999/v1") ``` > **Note**: The legacy `openai_compatible:` prefix still works for backward compatibility @@ -253,6 +264,43 @@ agent = Nous.new("openai:gpt-4", ) ``` +### HTTP Backend + +Non-streaming HTTP requests go through a pluggable backend. Default is +`Nous.HTTP.Backend.Req` (Req on top of Finch); `Nous.HTTP.Backend.Hackney` +is shipped as an alternative. Streaming always uses hackney's `:async, :once` +pull-based mode for backpressure — that choice is structural, not +configurable. + +Pick per-call, per-environment, or per-app: + +```elixir +# Per-call +HTTP.post(url, body, headers, backend: Nous.HTTP.Backend.Hackney) + +# Env (highest precedence after per-call): +# NOUS_HTTP_BACKEND=hackney # also accepts "req" or a fully-qualified +# # custom module name like "MyApp.MyBackend" + +# App config +config :nous, :http_backend, Nous.HTTP.Backend.Hackney +``` + +Tune the shared hackney `:default` pool from app config (used by both the +Hackney backend and the streaming pipeline): + +```elixir +config :nous, :hackney_pool, + max_connections: 200, + timeout: 1_500 # idle keepalive ms (hackney 4 caps at 2_000) +``` + +See [the HTTP backend benchmark report](https://github.com/nyo16/nous/blob/master/docs/benchmarks/http_backend.md) +for localhost + real-endpoint benchmark numbers and guidance on when +to switch backends. Headline: stick with the Req default unless you +specifically need HTTP/3 (Alt-Svc auto-upgrade) or want to consolidate +on one HTTP family. + ### Timeouts Each provider has sensible default timeouts (60s for cloud APIs, 120s for local models). Override per-model with `receive_timeout`: diff --git a/lib/nous/http/backend/hackney.ex b/lib/nous/http/backend/hackney.ex index 3e5372d..18fb211 100644 --- a/lib/nous/http/backend/hackney.ex +++ b/lib/nous/http/backend/hackney.ex @@ -4,7 +4,7 @@ defmodule Nous.HTTP.Backend.Hackney do Uses `:hackney.request/5` synchronously — hackney 4 returns the full response body inline as `{:ok, status, headers, body}` (the legacy - `:hackney.body/1` follow-up call from hackney 1.x was removed). + `hackney.body/1` follow-up call from hackney 1.x was removed in v4). Hackney 4 is already in the dependency tree from 0.15.0 (used for streaming) — this backend lets users consolidate non-streaming HTTP onto the same library without keeping Finch/Mint in the hot path. @@ -47,7 +47,7 @@ defmodule Nous.HTTP.Backend.Hackney do end # Hackney 4 returns the body inline: `{:ok, status, headers, body}`. The - # legacy `:hackney.body/1` follow-up call from hackney 1.x is gone — the + # legacy hackney.body/1 follow-up call from hackney 1.x is gone — the # `with_body` option is now the default and ignored. defp do_request(url, headers, body, timeout, connect_timeout, pool) do hackney_opts = [ diff --git a/lib/nous/persistence/ets.ex b/lib/nous/persistence/ets.ex index 193689c..8b9c4d7 100644 --- a/lib/nous/persistence/ets.ex +++ b/lib/nous/persistence/ets.ex @@ -3,10 +3,9 @@ defmodule Nous.Persistence.ETS do ETS-based persistence backend. Stores serialized context data in a named ETS table. The table is owned - by a dedicated GenServer (`Nous.Persistence.ETS.TableOwner`) started - under the Nous application supervisor, so the table outlives transient - callers - previously the table died with whichever process happened to - call save/load first. + by a dedicated GenServer started under the Nous application supervisor, + so the table outlives transient callers - previously the table died + with whichever process happened to call save/load first. Data does not survive node restarts. Useful for development, testing, and short-lived sessions. diff --git a/mix.exs b/mix.exs index d2ddd82..b7853f3 100644 --- a/mix.exs +++ b/mix.exs @@ -1,7 +1,7 @@ defmodule Nous.MixProject do use Mix.Project - @version "0.15.1" + @version "0.15.2" @source_url "https://github.com/nyo16/nous" def project do