Skip to content

perf(reasoning): dedupe to_lowercase + marker early-outs in Qwen apply_reasoning_mode (#402) #549

Description

@kiannidev

Problem

Every Qwen chat/voice utterance runs apply_reasoning_mode before the LLM sees messages. On main, explicit_reasoning_mode, is_simple_request, and looks_like_deep_reasoning_request each call to_lowercase() independently — up to three full-string allocations on the same user text.

Proposed fix

Behavior-preserving optimization in crates/genie-core/src/reasoning.rs:

  • Single shared to_lowercase() in the Qwen path.
  • Conservative needs_simple_request_scan / needs_deep_reasoning_scan early-outs before expensive marker loops.
  • Corpus regression tests + optional release bench.

Scope

Pre-LLM performance bucket under #402. Does not touch tools/quick.rs (avoids collision with open quick-router PRs). Complements #501 (broader pre-LLM bundle) and merged voice-intent work.

Acceptance

  • ReasoningDecision + adjusted message content byte-identical to main for a regression corpus.
  • cargo test -p genie-core + cargo clippy + cargo fmt clean.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceperformance improvement, but not verify on jetson

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions