feat: out-of-box free model providers + dynamic model discovery by KylinMountain · Pull Request #2 · KylinMountain/openmux

KylinMountain · 2026-03-26T03:22:54Z

Summary

预配置 8 个 LLM 平台（智谱、硅基流动、OpenRouter、魔搭、美团LongCat、火山引擎、阿里云、DeepSeek），用户只需 cp .env.example .env 填入 API Key 即可开箱即用
新增模型自动发现服务：定时调用各平台 /v1/models 接口，自动获取最新免费模型列表并注册路由
新增 5 条预配置模型路由，支持多平台自动 fallback

支持平台

平台	免费额度	环境变量
智谱 GLM	6 款免费模型 (glm-4-flash 等)	`ZHIPU_API_KEY`
硅基流动	DeepSeek-V3/R1, Qwen2.5 等免费	`SILICONFLOW_API_KEY`
魔搭 ModelScope	Qwen3-235B, DeepSeek-R1 等，2000 次/天	`MODELSCOPE_API_KEY`
OpenRouter	~30 款 `:free` 模型	`OPENROUTER_API_KEY`
美团 LongCat	公测全免费，Lite 5000 万 tokens/天	`LONGCAT_API_KEY`
火山引擎/豆包	新用户每模型 50 万 tokens	`VOLCENGINE_API_KEY`
阿里云百炼	qwen-turbo 有免费额度	`ALIYUN_API_KEY`
DeepSeek	deepseek-chat	`DEEPSEEK_API_KEY`

模型自动发现 (Model Discovery)

新增 discovery 包，支持定时从各平台 API 自动获取免费模型列表：

discovery:
  enabled: true
  interval: 24h
  providers: [openrouter, siliconflow, modelscope]

OpenRouter: 通过 pricing.prompt == "0" 筛选免费模型
SiliconFlow: 排除 Pro/ 前缀（付费加速版）
ModelScope: 全部模型免费（2000 次/天）
Generic: 通用 fetcher，支持任意 OpenAI 兼容平台

发现的模型自动注册为 provider/model 格式路由，出现在 /v1/models 列表中。静态配置的路由永远不会被覆盖。Router 使用 RWMutex 保证并发安全。

预配置路由

路由	说明	Fallback 平台
`chat-free`	免费聊天	智谱→硅基流动→LongCat→魔搭→OpenRouter
`reasoning-free`	免费推理/思考链	硅基流动→魔搭→LongCat→智谱
`chat-free-large`	大参数免费模型	魔搭→硅基流动→OpenRouter
`chat-lite`	轻量高速	LongCat→智谱→硅基流动
`chat`	付费高质量	DeepSeek→阿里云→智谱

Test plan

验证各平台 base_url 可达
用至少 2 个平台的 Key 测试 chat-free 路由的 fallback
验证 docker-compose up 能正确传递环境变量
测试直通模式 provider/model 格式访问
验证 discovery 启动后能成功获取 OpenRouter 免费模型列表
验证 discovery 不会覆盖静态配置的路由
验证 /v1/models 接口返回发现的模型

https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

Pre-configure popular LLM platforms (Zhipu GLM, SiliconFlow, OpenRouter, Aliyun DashScope, DeepSeek) with free model routes so users can start by just setting API keys in .env. Includes multi-platform fallback routes like chat-free and reasoning-free. https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

- Add ModelScope (魔搭) with free Qwen3-235B, DeepSeek-R1 etc (2000 calls/day) - Add Meituan LongCat with free Flash-Chat/Thinking/Lite models - Update Zhipu with full free model list (glm-4.7-flash, glm-z1-flash etc) - Update OpenRouter with latest free models (gemini-2.5-flash, llama-3.3-70b) - Add new routes: chat-free-large (large free models), chat-lite (fast/light) - Fix OpenRouter RPM to match actual free tier limit (20 RPM) - Fix LongCat base_url to correct path (api.longcat.chat/openai/v1) https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

Add 火山引擎/豆包 (ark.cn-beijing.volces.com) as the 8th provider. New users get 500K tokens per model for free. API key format: ark-xxxxx. https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

Add a discovery package that periodically fetches free models from provider /v1/models APIs and dynamically updates the router: - OpenRouter: filters by pricing.prompt == "0" - SiliconFlow: filters out Pro/ prefix (paid accelerated) - ModelScope: all models are free (2000 calls/day) - Generic: fallback fetcher for any OpenAI-compatible provider Router now uses RWMutex for thread-safe dynamic updates. Static routes from config are never overwritten by discovered models. Configurable via: discovery: enabled: true interval: 24h providers: [openrouter, siliconflow, modelscope] https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

OpenRouter pricing fields can be "0", "free", or empty string for free models. Previously only checked for "0". https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

Add daily request limit support to MultiLimiter via a new RPD field. Uses a token bucket with 86400s refill window. This enables accurate rate limiting for providers like ModelScope (2000 calls/day). Config example: rate_limit: rpm: 60 rpd: 2000 tpm: 100000 https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

Discovery now builds an aggregated 'free' route from all discovered free models. Each provider contributes its best model (heuristic: prefer large/popular models like Qwen3, DeepSeek-V3, Llama-3.3-70b). Users can now simply call model:"free" to get automatic cross-platform fallback across all configured free providers. The alias name is configurable via discovery.free_alias (default "free"). https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

New 'auto' mode routes requests to different model tiers based on complexity analysis: - lite: simple greetings, short Q&A → fast small models - standard: normal conversation, summaries → standard models - large: code generation, long writing, function calling → large models - reasoning: math, logic, debugging, step-by-step → reasoning models Classification heuristics: - Tool/function calls → large - Reasoning keywords (prove, debug, step by step) → reasoning - Code/complex keywords (implement, algorithm, analyze) → large - Content length > 4000 chars or > 10 turns → large - Content length > 500 chars or > 4 turns → standard - Otherwise → lite Usage: model:"auto" in the request, maps to configured route names. Each tier falls back to adjacent tiers if not configured. https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

Fix bug where tryTarget error was shadowed by outer 'err' variable, causing 'failed: <nil>' in logs and losing the actual error info. Affected all three handlers (chat, embedding, rerank) and stream mode. Also fix stale model ID: qwen3-coder-480b → qwen3-coder (verified via OpenRouter API discovery). https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

Auto route now uses a two-stage classification: 1. Rules (zero latency) - handles obvious cases: tools→large, keywords→reasoning/large, short messages→lite, long content→large 2. LLM classifier (optional) - for ambiguous cases, calls a small model (e.g. glm-4-flash) with a classification prompt, 5s timeout, max_tokens=10, temperature=0 Config: auto_route: classifier: "zhipu/glm-4-flash" # provider/model format If classifier is not configured, falls back to rule-based standard tier for ambiguous cases. The LLM prompt asks the model to reply with just one word: lite/standard/large/reasoning. https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

Discovery now parses model size and capability from model IDs: - Size: regex extracts "70b", "405b", MoE active params "120b-a12b"→12B - Capability: keywords like "thinking", "r1", "qwq" → reasoning Auto-generates tier routes (auto:lite, auto:standard, auto:large, auto:reasoning) with parameter-count-based weights. AutoRouter reads these directly - no manual tier→route mapping needed. Classification: <10B→lite, 10-72B→standard, >72B→large, reasoning keywords→reasoning, unknown size→standard. The "free" alias now picks the best model from large/standard/lite tiers. Manual tier overrides still supported as fallback. https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

- Require 3 consecutive rate-limit failures before marking backend unhealthy (was: 1 failure = instant unhealthy) - Skip already-tried target in fallback retry loop to avoid double-counting failures and wasting retry attempts - Reset fail count on successful request This fixes the issue where one model's 429 would take down all models sharing the same provider API key. https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

Major changes to rate limit handling: - 429 now marks the specific model as hot (30s cooldown), NOT the entire backend/API key. Other models on the same provider work fine. - handleWithRetry skips hot models when selecting targets - Auto route degrades from large→standard→lite when all models in a tier are hot, ensuring requests always find an available model New cooldown package tracks per-model (provider+model) overload state. This properly handles providers like OpenRouter where different models have independent rate limits sharing one API key. Flow: model:auto → classify(large) → auto:large all hot → degrade → auto:standard has available models → route there https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

claude added 4 commits March 26, 2026 00:18

feat: add Volcengine/Doubao provider with free tier

0753089

Add 火山引擎/豆包 (ark.cn-beijing.volces.com) as the 8th provider. New users get 500K tokens per model for free. API key format: ark-xxxxx. https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

KylinMountain changed the title ~~feat: add out-of-box config with 8 free model providers~~ feat: out-of-box free model providers + dynamic model discovery Mar 28, 2026

claude added 9 commits March 28, 2026 23:05

fix: handle 'free' and empty pricing in OpenRouter model discovery

98892e2

OpenRouter pricing fields can be "0", "free", or empty string for free models. Previously only checked for "0". https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: out-of-box free model providers + dynamic model discovery#2

feat: out-of-box free model providers + dynamic model discovery#2
KylinMountain wants to merge 13 commits intomainfrom
claude/review-fallback-logic-iRPJj

KylinMountain commented Mar 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KylinMountain commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

支持平台

模型自动发现 (Model Discovery)

预配置路由

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KylinMountain commented Mar 26, 2026 •

edited

Loading