fix: fallback to Ollama native API for thinking-mode models by wbryanta · Pull Request #27 · nikmcfly/MiroFish-Offline

wbryanta · 2026-04-03T15:31:13Z

Fixes #26

Problem

Models with thinking mode (e.g., Gemma 4) return empty content via Ollama's /v1/chat/completions endpoint. Thinking tokens consume the max_tokens budget but visible content is stripped, causing 500 errors on simulation start.

Fix

When LLMClient.chat() gets empty content from the OpenAI-compatible endpoint and the backend is Ollama, it retries via the native /api/chat endpoint, which handles thinking tokens correctly.

Only triggers on empty responses — no change in behavior for working models
No new dependencies (requests already in requirements)
Tested with gemma4:26b on Ollama 0.20.0

Models with thinking mode (e.g., Gemma 4) return empty content via Ollama's OpenAI-compatible /v1/chat/completions endpoint. The thinking tokens consume the max_tokens budget but visible content is stripped. This adds a fallback in LLMClient.chat(): when the OpenAI endpoint returns empty content and we're talking to Ollama, retry via the native /api/chat endpoint which handles thinking tokens correctly. Backwards-compatible — only triggers on empty responses. Fixes nikmcfly#26 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fallback to Ollama native API for thinking-mode models#27

fix: fallback to Ollama native API for thinking-mode models#27
wbryanta wants to merge 1 commit intonikmcfly:mainfrom
wbryanta:fix/ollama-thinking-mode-fallback

wbryanta commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wbryanta commented Apr 3, 2026

Problem

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant