Add `reasoning_style` override for inline-tag thinking blocks on OpenAI chat-completions (MiniMax M3, Qwen, GLM)

## Executive Summary

* The parses of reasoning content from Minimax M3 is broken in CodeWhale when the Minimax Token Plan is used as an 'OpenAI-comaptible' Provider.
* There is an existing issue [v0.8.60: Add first-party MiniMax provider route](https://github.com/Hmbown/CodeWhale/issues/1310) that adds Minimax Token Plan as a supported provider. This is a step in the right direction but this issue does not address the more fundamental problem: *developers using CodeWhale with OpenAI-compatible providers need to be able to override how the TUI parses reasoning content*. This is especially true for developers using CodeWhale with private models that would never be added to the well-known Providers enum.
* This patch introduces a "OpenAI Reasoning Style Override". It is a configuration option that developers can use to change how Codewhale parses reasoning content with any OpenAI-compatible provider.

## Problem

When MiniMax M3 is reached through CodeWhale's generic OpenAI-compatible provider, reasoning is streamed as inline `<think>…</think>` blocks inside `delta.content` instead of a separate `delta.reasoning_content` field. CodeWhale's parser has no inline-tag handling, so the literal tags appear in the visible chat. The same problem affects Qwen thinking models on raw vLLM/Ollama and GLM models on aggregators that don't have a GLM-specific parser. The format depends on the **serving gateway**, not the model identity.

## Proposed solution

A new `[overrides]` config bucket lets the user declare which reasoning format a given protocol emits. The change to `~/.codewhale/config.toml` looks like this:

```toml
[overrides.openai.protocol]
reasoning_style = "https://codewhale.dev/configuration/reasoning_style#inline_tags"
```

**Built-in fragments** (all under `https://codewhale.dev/configuration/reasoning_style`):

- `#separate_field` (default) — reasoning arrives in `delta.reasoning_content` or `delta.reasoning`. Current behavior.
- `#inline_tags` — reasoning arrives as `<think>…</think>` blocks inside `delta.content`. Used by M3, Qwen thinking family, GLM, and similar.
- `#none` — no reasoning support. Treats the content stream as plain text.

The fragment is the strategy selector; the base URL is the namespace. New strategies ship as new fragments, not new keys or new code paths. Plugins can register custom strategies by pointing the URL elsewhere.

**Resolution order (highest priority wins):** CLI flag → environment variable → `[overrides]` block → per-provider config → per-model config → hardcoded heuristic.

A typo in the URI or an unknown key is a startup error with a clear message. No silent fallbacks.

## Use case

The format depends on the gateway, not the model. A user running Qwen via raw Ollama needs `#inline_tags`; the same model via Alibaba Cloud DashScope needs `#separate_field`. With this change, the user picks the right strategy per gateway with a single config line — no code change, no model-name matching, no per-model registry update.

## Alternatives considered

- **Per-model hardcoded matching (current approach).** Doesn't scale to gateway-dependent formats; requires a code change per new model.
- **Strip-only post-processing.** An existing `strip_thinking_tags` already strips the tags, but only for the saved-session history list, not the live stream — and it silently eats the reasoning rather than routing it to a thinking cell.
- **New `ApiProvider` variant per model family.** #1310 (closed) added the first-party MiniMax route for v0.8.60, but that's orthogonal — it covers the provider route, not the inline-tag parser gap.

## Impact

- Removes the blocker preventing M3 (and Qwen/GLM) from being usable through CodeWhale.
- Single config edit covers the whole class of inline-tag models. Future models that adopt inline tags need a config line, not a PR.
- Convergent with #3016, #3024, #3023, and #1978 — all of which touch the same gap of telling CodeWhale the reasoning format when a model is served through a non-native provider.

## Additional context

### Reproduction

```bash
export OPENAI_BASE_URL="https://api.minimax.io/v1"
export OPENAI_API_KEY="<token-plan-key>"
export DEEPSEEK_PROVIDER=openai
codewhale --provider openai --model MiniMax-M3
# send: "hello"
```

The visible response begins with a literal `<think>…</think>` block. Expected: a reasoning cell plus the actual reply.

### Related CodeWhale issues

- [#1310](https://github.com/Hmbown/CodeWhale/issues/1310) — v0.8.60: Add first-party MiniMax provider route (closed)
- [#3016](https://github.com/Hmbown/CodeWhale/issues/3016) — Reasoning-content integrity audit (open)
- [#3023](https://github.com/Hmbown/CodeWhale/issues/3023) — Provider capability matrix for thinking flags (closed)
- [#3024](https://github.com/Hmbown/CodeWhale/issues/3024) — Wire reasoning-effort params for OpenAI-compatible providers (open)
- [#1978](https://github.com/Hmbown/CodeWhale/issues/1978) — Custom base_url reasoning/cache support (open)
- [#861](https://github.com/Hmbown/CodeWhale/issues/861) — Thinking-block collapse root causes (referenced by #3016)

### External references

- [QwenLM/Qwen3 discussion #1657](https://github.com/QwenLM/Qwen3/discussions/1657) — Qwen team's thread on inconsistent `<think>` tag behavior across model sizes
- [Alibaba Cloud Model Studio — deep thinking](https://www.alibabacloud.com/help/en/model-studio/deep-thinking) — Qwen DashScope uses `delta.reasoning_content` as a separate field
- [Z.AI API docs](https://docs.z.ai/api-reference/llm/chat-completion) — GLM emits both inline tags and `reasoning_content` depending on request config
- [anomalyco/opencode#31999](https://github.com/anomalyco/opencode/issues/31999) — Upstream opencode's parallel M3 bug (PR #32152 in flight)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `reasoning_style` override for inline-tag thinking blocks on OpenAI chat-completions (MiniMax M3, Qwen, GLM) #3222

Executive Summary

Problem

Proposed solution

Use case

Alternatives considered

Impact

Additional context

Reproduction

Related CodeWhale issues

External references

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add reasoning_style override for inline-tag thinking blocks on OpenAI chat-completions (MiniMax M3, Qwen, GLM) #3222

Description

Executive Summary

Problem

Proposed solution

Use case

Alternatives considered

Impact

Additional context

Reproduction

Related CodeWhale issues

External references

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Add `reasoning_style` override for inline-tag thinking blocks on OpenAI chat-completions (MiniMax M3, Qwen, GLM) #3222