Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
301 changes: 301 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,301 @@
# AGENTS.md

Quick-reference for AI coding agents (Claude, Cursor, Copilot, Codex, etc.)
working with the **Nous** Elixir AI agent framework. This file is for agents
that want to *use* the library, not for agents maintaining the library
itself (see `CONTRIBUTING.md` and `docs/` for that). Conforms to
<https://agents.md>.

## What Nous is

Multi-provider LLM framework for Elixir/OTP. Provides:

- **One-shot LLM calls** (`Nous.generate_text/2,3`, `Nous.stream_text/2,3`)
- **Stateful agents** with tool-calling, memory, plugins (`Nous.new/2`, `Nous.run/2,3`)
- **Pluggable providers** — OpenAI, Anthropic, Gemini, Vertex AI, Groq, Mistral,
OpenRouter, Together, Ollama, LM Studio, vLLM, SGLang, LlamaCpp, and a
generic `custom:` adapter for any OpenAI-compatible endpoint
- **Tool system** — file ops, bash, web fetch + search, plus easy custom tools
- **Pluggable HTTP backend** (Req default, hackney alternative)
- **Streaming with backpressure** (hackney `:async, :once` pull mode)

## Minimal API surface (start here)

```elixir
# Drop-in: one-shot text generation
{:ok, text} = Nous.generate_text("openai:gpt-4o", "Explain GenServer in one sentence.")

# Streaming
{:ok, stream} = Nous.stream_text("anthropic:claude-sonnet-4-5", "Write a haiku")
Enum.each(stream, &IO.write/1)

# Stateful agent with tools
agent =
Nous.new("openai:gpt-4o",
tools: [Nous.Tools.FileRead, Nous.Tools.FileGrep],
system_prompt: "You are a code reviewer."
)

{:ok, result} = Nous.run(agent, "Find all TODOs in lib/")
# result.text, result.messages, result.usage

# Streaming agent run
{:ok, stream} = Nous.run_stream(agent, "Summarize this repo")
```

That's 90% of what most apps need. Everything else is configuration.

## Provider quick-pick (model strings)

Format is `"<provider>:<model_id>"`. Pick one:

| If you want… | Use |
|---|---|
| Best general-purpose, high quality | `openai:gpt-4o` or `anthropic:claude-sonnet-4-5-20250929` |
| Cheap and fast | `groq:llama-3.1-70b-versatile` or `gemini:gemini-2.0-flash` |
| Local / no API key | `lmstudio:<your-loaded-model>` (default port 1234) |
| Local high-throughput inference | `vllm:<huggingface-id>` (default port 8000) |
| Local with structured generation | `sglang:<model>` (default port 30000) |
| Anything else with an OpenAI-compatible API | `custom:<model>` + `:base_url` opt |

Auth picks up the env var by convention: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`,
`GROQ_API_KEY`, `GEMINI_API_KEY`, `OPENROUTER_API_KEY`, etc. Local providers
don't need a key. Override per-call with `api_key:` opt.

## Key opts you'll actually use

```elixir
Nous.new("openai:gpt-4o",
# LLM behavior
system_prompt: "...",
temperature: 0.7,
max_tokens: 2_000,
receive_timeout: 60_000, # ms; 120_000 for local models

# Tools (modules implementing Nous.Tool.Behaviour)
tools: [Nous.Tools.Bash, MyApp.MyTool],

# Memory backend (optional)
memory: %{store: Nous.Memory.Store.ETS, opts: []},

# Plugins (optional, composable)
plugins: [Nous.Plugins.SubAgent, Nous.Plugins.HumanInTheLoop],

# Resilience
fallback: ["anthropic:claude-sonnet-4-5", "groq:llama-3.1-70b-versatile"],

# Vendor-specific body params (vLLM/SGLang/LM Studio/llama.cpp)
extra_body: %{top_k: 50, repetition_penalty: 1.1}
)
```

## Built-in tools

In `Nous.Tools.*`. The five most useful:

- **`Nous.Tools.Bash`** — execute shell commands (requires approval handler in production)
- **`Nous.Tools.FileRead`** / **`FileWrite`** / **`FileEdit`** — workspace-sandboxed file ops
- **`Nous.Tools.FileGlob`** / **`FileGrep`** — find files / search content
- **`Nous.Tools.WebFetch`** — fetch + extract text from a URL (SSRF-protected)
- **`Nous.Tools.TavilySearch`** / **`BraveSearch`** — web search

File tools enforce a workspace root. Default is `cwd`. Override per-agent:

```elixir
Nous.new("openai:gpt-4o",
tools: [Nous.Tools.FileRead],
deps: %{workspace_root: "/path/to/project"}
)
```

## Building a custom tool

```elixir
defmodule MyApp.WeatherTool do
use Nous.Tool

@impl Nous.Tool.Behaviour
def name, do: "get_weather"

@impl Nous.Tool.Behaviour
def description, do: "Get current weather for a city"

@impl Nous.Tool.Behaviour
def parameters do
%{
"type" => "object",
"properties" => %{
"city" => %{"type" => "string", "description" => "City name"}
},
"required" => ["city"]
}
end

@impl Nous.Tool.Behaviour
def execute(%{"city" => city}, _ctx) do
{:ok, "Weather in #{city}: 72°F, sunny"}
end
end
```

Pass it in the `tools:` list. The `_ctx` arg gives access to `deps`,
the workspace root, and the approval handler. Use `Nous.Tool.Validator`
for input validation — it runs automatically when `validate_args: true`
(the default).

## HTTP backend (don't change unless you need to)

Default backend is `Nous.HTTP.Backend.Req` — Req on top of Finch. It's
faster under parallel batching than the alternative. Override only if:

- You need HTTP/3 → `NOUS_HTTP_BACKEND=hackney`
- You want one HTTP family across streaming + non-streaming → same

Pool config (hackney pool, used by streaming + Hackney backend):

```elixir
config :nous, :hackney_pool,
max_connections: 200,
timeout: 1_500 # idle keepalive ms (hackney 4 caps at 2_000)
```

Streaming **always** uses hackney's pull-based `:async, :once` mode for
backpressure (slow consumer can't OOM under fast LLM). This is structural,
not configurable. See `docs/benchmarks/http_backend.md`.

## Critical rules (security & correctness)

These are project-wide and non-negotiable. If you write code that breaks
these, it will be rejected.

1. **Never `String.to_atom/1` on untrusted input.** Use
`String.to_existing_atom/1` with rescue, or pattern-match on a
whitelist of literal strings. The atom table is finite and a
prompt-injection input can OOM the BEAM.
2. **Tools requiring approval are rejected without an `:approval_handler`.**
`Bash`, `FileWrite`, `FileEdit` need one wired in `RunContext` or they
refuse to run. Don't disable this.
3. **File tools enforce a workspace root.** Don't bypass `PathGuard`. Pass
paths within the workspace; the guard rejects `..` traversal, absolute
paths outside, and symlink escapes.
4. **HTTP from agents goes through `UrlGuard`.** Don't make raw `Req.get/1`
calls from a tool to a user-controlled URL — use `Nous.Tools.WebFetch` or
call `UrlGuard.validate/2` first. Blocks RFC1918, loopback, link-local,
cloud-metadata IPs.
5. **`PromptTemplate` rejects `<% ... %>` blocks** — only `<%= @var %>`
substitution is allowed. Don't try to enable EEx evaluation on
LLM-touched templates; it's an RCE vector.
6. **Sub-agent deps don't auto-forward.** If you spawn a sub-agent via
`Nous.Plugins.SubAgent`, declare which deps it sees with
`:sub_agent_shared_deps, [:key1, :key2]`. The default `[]` is correct
for security.

## Common workflows

### Streaming to LiveView

```elixir
# In your LiveView mount or handle_event:
{:ok, stream} = Nous.stream_text("anthropic:claude-sonnet-4-5", prompt)

stream
|> Stream.each(fn chunk ->
send(self(), {:llm_chunk, chunk})
end)
|> Stream.run()
```

The hackney backpressure means the stream paces itself to match LiveView's
diff/push throughput — no mailbox accumulation.

### Tool-using agent loop

```elixir
agent =
Nous.new("openai:gpt-4o",
tools: [Nous.Tools.FileGrep, Nous.Tools.FileRead, Nous.Tools.Bash],
max_iterations: 10
)

{:ok, result} = Nous.run(agent, "Find the bug in lib/foo.ex and explain it")

# result.messages contains the full transcript including tool calls
# result.usage gives token counts per provider
```

### Provider failover

```elixir
agent =
Nous.new("openai:gpt-4o",
fallback: [
"anthropic:claude-sonnet-4-5-20250929",
"groq:llama-3.1-70b-versatile"
]
)
```

Falls through on transport errors, 5xx, and rate-limit (429) responses.

### Local dev with LM Studio

```elixir
# 1. Start LM Studio, load a model, start the server (default port 1234).
# 2. In Elixir:
{:ok, text} = Nous.generate_text("lmstudio:<exact-model-name-shown-in-lmstudio>",
"Hello!")

# Or override the URL:
agent = Nous.new("lmstudio:my-model", base_url: "http://gpu-host:1234/v1")
```

## Testing your code that uses Nous

```elixir
# Use the test helpers in Nous.Tool.Testing for tool unit tests.
# For end-to-end agent tests, the recommended pattern is to use Bypass to
# stub the LLM HTTP endpoint:

setup do
bypass = Bypass.open()
base = "http://localhost:#{bypass.port}/v1"
{:ok, bypass: bypass, base: base}
end

test "agent calls the model", %{bypass: bypass, base: base} do
Bypass.expect_once(bypass, "POST", "/v1/chat/completions", fn conn ->
conn
|> Plug.Conn.put_resp_header("content-type", "application/json")
|> Plug.Conn.resp(200, ~s({"choices":[{"message":{"content":"hi!"}}]}))
end)

agent = Nous.new("custom:test-model", base_url: base, api_key: "test")
assert {:ok, %{text: "hi!"}} = Nous.run(agent, "hello")
end
```

Don't mock `Req`/`hackney` directly — Bypass is the supported test seam.

## What NOT to use

The public API is `Nous.*` and `Nous.Tools.*`. These are NOT public:

- `Nous.HTTP.Backend.*` — internal; use `HTTP.post/4`'s `:backend` opt instead
- `Nous.Providers.HTTP` — internal helper for provider authors
- `Nous.AgentRunner`, `Nous.AgentServer` — internal supervision; use `Nous.run/3`
- `Nous.Application`, `Nous.Persistence.ETS.TableOwner` — internal supervision tree
- Anything under `Nous.Workflow.Engine.*` — internal; the public API is `Nous.Workflow`
- Anything marked `@moduledoc false` — hidden on purpose; will change without notice

Stick to the documented modules and your code will survive minor version bumps.

## Where to look for more

- **Hex docs:** <https://hexdocs.pm/nous>
- **Getting started:** `docs/getting-started.md`
- **Production guides:** `docs/guides/` (skills, hooks, LiveView integration,
best practices, tool development, troubleshooting, evaluation, structured
output, workflows, memory, context, knowledge base)
- **Examples:** `examples/`
- **CHANGELOG:** behavioral changes per release; **read the "Behavioral /
breaking changes" sections before upgrading**.
40 changes: 34 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,34 @@

All notable changes to this project will be documented in this file.

## [0.15.2] - 2026-04-27

Documentation-only release. No code changes.

### Added

- **`AGENTS.md`** — quick-reference for AI coding agents (Claude, Cursor,
Copilot, Codex, etc.) consuming the library. Covers the minimal API,
provider quick-pick, key opts, custom tools, HTTP backend, security
rules, common workflows, and what's public vs internal. Conforms to
<https://agents.md>.

### Changed

- README "Supported Providers" table now lists `vllm:` and `sglang:`
as first-class named providers (previously only `lmstudio:` was
mentioned; vLLM and SGLang were buried in the `custom:` section).
- README "Local Servers" section now recommends the dedicated
`lmstudio:` / `vllm:` / `sglang:` / `ollama:` prefixes over `custom:`
— they default to the right port, validate `*_BASE_URL` env vars
through `UrlGuard`, and pick up the OpenAI stream normalizer for free.
- New "HTTP Backend" section in README covering the pluggable
`Nous.HTTP.Backend` behaviour, env-var selection, and shared hackney
pool config.
- Cleaned up `mix docs` warnings — replaced backticks around hidden
module references in CHANGELOG so ExDoc no longer tries to auto-link
them.

## [0.15.1] - 2026-04-26

Follow-up to 0.15.0. No behavioral changes for existing users — the
Expand All @@ -26,9 +54,9 @@ SGLang) up to date with the post-0.15.0 hackney streaming rewrite.
in `docs/benchmarks/http_backend.md`.
- **Hackney `:default` pool is now configurable from app config:**
`config :nous, :hackney_pool, max_connections: 200, timeout: 1_500`.
Applied at `Nous.Application` boot. Used by both the Hackney HTTP
backend and the streaming pipeline. (Hackney 4 caps the idle
keepalive timeout at 2_000 ms — values above that silently cap.)
Applied at app boot. Used by both the Hackney HTTP backend and the
streaming pipeline. (Hackney 4 caps the idle keepalive timeout at
2_000 ms — values above that silently cap.)
- **Per-call `:connect_timeout` and `:pool` opts** added to both HTTP
backends and `Nous.Providers.HTTP.stream/4`. Default 30_000ms /
`:default` pool. Lets a single app run different timeouts per
Expand Down Expand Up @@ -57,7 +85,7 @@ Minor version bump (not patch) because of the 9 behavioral changes called out be

Read these before upgrading.

- **Sub-agent deps no longer auto-forward to children.** `Nous.Plugins.SubAgent.compute_sub_deps/1` now defaults to `[]`. The previous default forwarded every parent dep (minus a 6-key denylist) — secrets, repo handles, signed URLs all leaked into LLM-controlled sub-agent contexts. To restore the old behaviour, set `:sub_agent_shared_deps, :all` explicitly. Recommended: list specific keys with `:sub_agent_shared_deps, [:key1, :key2]`.
- **Sub-agent deps no longer auto-forward to children.** The `compute_sub_deps/1` helper in `Nous.Plugins.SubAgent` now defaults to `[]`. The previous default forwarded every parent dep (minus a 6-key denylist) — secrets, repo handles, signed URLs all leaked into LLM-controlled sub-agent contexts. To restore the old behaviour, set `:sub_agent_shared_deps, :all` explicitly. Recommended: list specific keys with `:sub_agent_shared_deps, [:key1, :key2]`.
- **Tools with `requires_approval: true` are now rejected when no `:approval_handler` is wired** (was silently approved). If you use `Nous.Tools.Bash`, `FileWrite`, or `FileEdit`, configure an `approval_handler` on `RunContext` or those tools will refuse to run.
- **File tools (`FileRead/Write/Edit/Glob/Grep`) now enforce a workspace root.** Defaults to `cwd`; override per-agent via `deps: %{workspace_root: "/path"}`. Paths that escape the root (absolute paths outside, `..` traversal, symlink-escape) are rejected with a clear error to the LLM.
- **`PromptTemplate.from_template/2` rejects template bodies containing `<% ... %>` blocks** other than the simple `<%= @ident %>` substitution form. Previously bodies were passed through `EEx.eval_string/2`, which executes arbitrary Elixir — an RCE vector for any caller piping LLM output into a template. Conditionals must now be expressed by composing multiple smaller templates.
Expand Down Expand Up @@ -112,7 +140,7 @@ Read these before upgrading.
- **AgentServer `load_context` runs in a `Task.Supervisor.start_child` task** with `GenServer.reply/2` — slow persistence backends no longer block concurrent `get_context` / `cancel_execution` calls.
- **AgentDynamicSupervisor + Application supervisor restart limits** tuned to `max_restarts: 100, max_seconds: 10` (was the default 3-in-5) so one bad user's crash loop doesn't take down every other tenant.
- **`Nous.Teams.RateLimiter` is now race-safe under concurrent acquires (M-9 final).** `acquire/3` now returns `{:ok, reservation_ref} | {:error, _}` and atomically reserves the estimated tokens + 1 request slot. `record_usage/3` accepts `:reservation` to reconcile actual vs estimated; missing reconciliations are auto-refunded after `:reservation_ttl_ms` (default 5 min) with a `Logger.warning/1`. `release/2` cancels a reservation when the call errored before completing. Legacy `record_usage/3` without `:reservation` still works for callers that don't go through `acquire`. Added `:open_reservations` to `get_status/1`.
- **`Nous.Memory.Embedding.Bumblebee` uses a Registry + DynamicSupervisor (M-7 final).** Each model_name is owned by exactly one `ServingHolder` GenServer registered by name. Replaces the `:persistent_term` cache (which forced a node-wide GC pause per new model). `Nous.Application` conditionally adds the Registry + ServingSupervisor children when Bumblebee is loaded.
- **`Nous.Memory.Embedding.Bumblebee` uses a Registry + DynamicSupervisor (M-7 final).** Each model_name is owned by exactly one `ServingHolder` GenServer registered by name. Replaces the `:persistent_term` cache (which forced a node-wide GC pause per new model). The application supervisor conditionally adds the Registry + ServingSupervisor children when Bumblebee is loaded.

### Fixed (UX / minor)

Expand All @@ -138,7 +166,7 @@ Read these before upgrading.

### Added

- **`:extra_body` setting for arbitrary request body params** — pass vendor-specific top-level JSON keys (e.g. `top_k`, `chat_template_kwargs`, `repetition_penalty`, `min_p`, `best_of`, `ignore_eos`) to OpenAI-compatible providers (`vllm:`, `sglang:`, `custom:`, `lmstudio:`, `ollama:`). Mirrors the OpenAI Python SDK's `extra_body=` argument. Works in `default_settings`, `Nous.LLM` calls, and `Agent.run/3` `model_settings`. Atom keys are stringified at request build time; nested values pass through verbatim. `extra_body` wins on collision with whitelisted keys (escape-hatch semantics). Also forwarded by Gemini and Vertex AI overrides.
- **`:extra_body` setting for arbitrary request body params** — pass vendor-specific top-level JSON keys (e.g. `top_k`, `chat_template_kwargs`, `repetition_penalty`, `min_p`, `best_of`, `ignore_eos`) to OpenAI-compatible providers (`vllm:`, `sglang:`, `custom:`, `lmstudio:`, `ollama:`). Mirrors the OpenAI Python SDK's `extra_body=` argument. Works in `default_settings`, `Nous.LLM` calls, and agent `model_settings`. Atom keys are stringified at request build time; nested values pass through verbatim. `extra_body` wins on collision with whitelisted keys (escape-hatch semantics). Also forwarded by Gemini and Vertex AI overrides.

Example — disable Qwen3 thinking and tune sampling on a vLLM endpoint:

Expand Down
Loading