Skip to content

feat(agent): add native Ollama backend with tool-use loop#1095

Open
sanjay3290 wants to merge 1 commit intomultica-ai:mainfrom
sanjay3290:feat/ollama-agent-backend
Open

feat(agent): add native Ollama backend with tool-use loop#1095
sanjay3290 wants to merge 1 commit intomultica-ai:mainfrom
sanjay3290:feat/ollama-agent-backend

Conversation

@sanjay3290
Copy link
Copy Markdown
Contributor

What

Adds a first-class ollama agent provider alongside claude/codex/opencode/openclaw/hermes/gemini. Models with tools capability (gemma3+, qwen2.5+, llama3.1+) drive the Multica CLI via a single shell tool; chat-only models still work for conversational tasks.

Closes #769.

Why

Issue #769 closed with the env-var workaround (#846) — pointing ANTHROPIC_BASE_URL at Ollama. That works for users who've installed Claude Code, but Ollama-only users still need a supported CLI on their machine before anything works. This ships the real native path.

End-to-end verified

  • LOC-3 (in-workdir): gemma4:latest ran a 6-tool-call sequence on a Multica issue — read context, wrote a file, updated status to in_review, posted a summary comment. 29s.
  • LOC-5 (out-of-sandbox): same model wrote $HOME/Desktop/multica-ollama-test.txt via absolute-path printf, verified with ls -l, posted output as comment. 4 calls / 20s.

Configuration

MULTICA_OLLAMA_HOST              default http://localhost:11434
MULTICA_OLLAMA_MODEL             required; empty = provider skipped
MULTICA_OLLAMA_SHELL_TIMEOUT     default 5m; per-command cap
MULTICA_OLLAMA_MAX_OUTPUT_BYTES  default 16384; tool-result cap

Daemon probes /api/tags for reachability, /api/version for version, /api/show for tool capability (prints stderr WARNING if the configured model doesn't declare "tools"). All probes are best-effort and do not block daemon startup.

Architecture

  • stream:false when tools active — tool_calls arrive on the final chunk anyway; single-shot is simpler and equally responsive at task granularity.
  • Single shell tool, executed via bash -c (NOT -lc; login shells re-source /etc/profile and wipe the PATH injection the daemon makes for the multica CLI). Matches Claude's bypassPermissions and Gemini's --yolo.
  • tool_call_id threaded on tool-role messages so strict models (qwen2.5, llama3.1) can correlate results to calls.
  • MaxTurns cap (default 25) prevents runaway tool loops.
  • Context status (DeadlineExceeded/Canceled) takes precedence over side-effect errors when determining Result.Status — matches the pattern claude.go and gemini.go use after PR fix(daemon): correct Gemini backend status on timeout and cancellation #920.

v1 non-goals (deliberate, documented)

  • Streaming during tool turns (chat-only non-tool turns still stream the final text as one event)
  • Per-task model selection (single configured model wins)
  • Session resume (Ollama is stateless; ResumeSessionID ignored with debug log)
  • Dynamic /api/tags model discovery
  • Granular tools (file_read/file_write/etc. — single shell covers the same ground via standard Unix utilities)

These match the limitations Gemini #755 shipped with.

Test plan

  • go test ./pkg/agent/... ./internal/daemon/... — 22 new tests, all pass in 10s
  • go vet ./... clean
  • gofmt -w applied
  • pnpm --filter @multica/views --filter @multica/core typecheck clean
  • End-to-end smoke test: gemma4:latest on local Docker stack, 2 live tasks completed (in-workdir file write + out-of-sandbox Desktop write)
  • Daemon registers ollama runtime, reports online, appears in /api/runtimes
  • Regression tests for: context cancel mid-loop, timeout mid-loop, malformed response (both empty-everywhere and OOM-case {role:assistant, content:"", tool_calls:[]}), MaxTurns cap, shell exit code forwarding, unknown tool, tool_call_id wire propagation, env-var fallback on malformed values, /api/show capability probe (true/false/404/unreachable)

Files changed

  • server/pkg/agent/ollama.go — new (609 LOC)
  • server/pkg/agent/ollama_test.go — new (712 LOC)
  • server/pkg/agent/agent.go — register case "ollama"; DetectVersion now takes provider name to dispatch HTTP probe for HTTP-based backends
  • server/internal/daemon/config.go — HTTP probe of MULTICA_OLLAMA_HOST; capability check with WARNING; references agent.DefaultOllamaHost
  • server/internal/daemon/execenv/runtime_config.goollama added to the AGENTS.md case
  • server/internal/daemon/daemon.go — one-line change to pass provider name into DetectVersion
  • packages/views/runtimes/components/provider-logo.tsxOllamaLogo SVG + switch case

Closes multica-ai#769. Ships a first-class `ollama` agent provider alongside
claude/codex/opencode/openclaw/hermes/gemini. Models with `tools`
capability (gemma3+, qwen2.5+, llama3.1+) drive the Multica CLI via
a single shell tool; chat-only models still work for conversational
tasks.

End-to-end verified: gemma4:latest ran a 6-tool-call sequence on a
Multica issue (read context, wrote a file, updated status, posted a
comment, finished in 29s), and separately wrote a file to
$HOME/Desktop via absolute-path shell command in 4 calls / 20s.

Configuration:

  MULTICA_OLLAMA_HOST            (default http://localhost:11434)
  MULTICA_OLLAMA_MODEL           (required; empty = provider skipped)
  MULTICA_OLLAMA_SHELL_TIMEOUT   (default 5m; per-command cap)
  MULTICA_OLLAMA_MAX_OUTPUT_BYTES (default 16384; tool-result cap)

Daemon probes /api/tags for reachability, /api/version for version,
/api/show for tool-capability (prints stderr WARNING if missing).
All three calls are best-effort and do not block daemon startup.

Architecture:

- stream:false when tools are active (tool_calls arrive on final chunk
  anyway; single-shot is simpler and equally responsive at task level)
- Single `shell` tool, executed via `bash -c` (NOT `-lc` — login shell
  would re-source /etc/profile and wipe the PATH injection the daemon
  makes for the multica CLI)
- tool_call_id threaded on tool-role messages so strict models
  (qwen2.5, llama3.1) can correlate results to calls
- MaxTurns cap (default 25) prevents runaway tool loops
- Context status (DeadlineExceeded/Canceled) takes precedence over
  side-effect errors when determining final Result.Status — same
  pattern claude.go and gemini.go use post-PR multica-ai#920

v1 non-goals (documented):

- Streaming during tool turns (loses token-by-token UX for tool-using
  tasks; chat-only non-tool tasks still get the final text as one
  MessageText event)
- Per-task model selection (single configured model wins)
- Session resume (Ollama is stateless; ResumeSessionID ignored)
- Dynamic /api/tags discovery (operator picks model via env)
- Granular tools (file_read, file_write, etc. — single shell covers
  the same ground via standard Unix utilities; matches Claude's
  bypassPermissions and Gemini's --yolo)

Tests (22 total, all pass in 10s):

- Single-turn chat, tool-loop 2-turn, tool_call_id wire propagation
- Malformed response guards (both {done:true} empty, and OOM-case
  {role:assistant, content:"", tool_calls:[]})
- Shell exit codes forwarded to model; unknown tool returns error
  string without crashing; MaxTurns cap; timeout; cancel
- Env helpers honor valid values and fall back on malformed/negative
- /api/show capability probe (true, false, unreachable, 404)
- /api/version probe; system-prompt forwarding; missing-model guard
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
multica-web Ignored Ignored Apr 15, 2026 1:50pm
multica-web-production Ignored Ignored Apr 15, 2026 1:50pm

Request Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

支持本地模型,ollama,LM studio等跑的模型

1 participant