feat: out-of-box free model providers + dynamic model discovery#2
Open
KylinMountain wants to merge 13 commits intomainfrom
Open
feat: out-of-box free model providers + dynamic model discovery#2KylinMountain wants to merge 13 commits intomainfrom
KylinMountain wants to merge 13 commits intomainfrom
Conversation
Pre-configure popular LLM platforms (Zhipu GLM, SiliconFlow, OpenRouter, Aliyun DashScope, DeepSeek) with free model routes so users can start by just setting API keys in .env. Includes multi-platform fallback routes like chat-free and reasoning-free. https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6
- Add ModelScope (魔搭) with free Qwen3-235B, DeepSeek-R1 etc (2000 calls/day) - Add Meituan LongCat with free Flash-Chat/Thinking/Lite models - Update Zhipu with full free model list (glm-4.7-flash, glm-z1-flash etc) - Update OpenRouter with latest free models (gemini-2.5-flash, llama-3.3-70b) - Add new routes: chat-free-large (large free models), chat-lite (fast/light) - Fix OpenRouter RPM to match actual free tier limit (20 RPM) - Fix LongCat base_url to correct path (api.longcat.chat/openai/v1) https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6
Add 火山引擎/豆包 (ark.cn-beijing.volces.com) as the 8th provider. New users get 500K tokens per model for free. API key format: ark-xxxxx. https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6
Add a discovery package that periodically fetches free models from
provider /v1/models APIs and dynamically updates the router:
- OpenRouter: filters by pricing.prompt == "0"
- SiliconFlow: filters out Pro/ prefix (paid accelerated)
- ModelScope: all models are free (2000 calls/day)
- Generic: fallback fetcher for any OpenAI-compatible provider
Router now uses RWMutex for thread-safe dynamic updates. Static routes
from config are never overwritten by discovered models.
Configurable via:
discovery:
enabled: true
interval: 24h
providers: [openrouter, siliconflow, modelscope]
https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6
OpenRouter pricing fields can be "0", "free", or empty string for free models. Previously only checked for "0". https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6
Add daily request limit support to MultiLimiter via a new RPD field.
Uses a token bucket with 86400s refill window. This enables accurate
rate limiting for providers like ModelScope (2000 calls/day).
Config example:
rate_limit:
rpm: 60
rpd: 2000
tpm: 100000
https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6
Discovery now builds an aggregated 'free' route from all discovered free models. Each provider contributes its best model (heuristic: prefer large/popular models like Qwen3, DeepSeek-V3, Llama-3.3-70b). Users can now simply call model:"free" to get automatic cross-platform fallback across all configured free providers. The alias name is configurable via discovery.free_alias (default "free"). https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6
New 'auto' mode routes requests to different model tiers based on complexity analysis: - lite: simple greetings, short Q&A → fast small models - standard: normal conversation, summaries → standard models - large: code generation, long writing, function calling → large models - reasoning: math, logic, debugging, step-by-step → reasoning models Classification heuristics: - Tool/function calls → large - Reasoning keywords (prove, debug, step by step) → reasoning - Code/complex keywords (implement, algorithm, analyze) → large - Content length > 4000 chars or > 10 turns → large - Content length > 500 chars or > 4 turns → standard - Otherwise → lite Usage: model:"auto" in the request, maps to configured route names. Each tier falls back to adjacent tiers if not configured. https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6
Fix bug where tryTarget error was shadowed by outer 'err' variable, causing 'failed: <nil>' in logs and losing the actual error info. Affected all three handlers (chat, embedding, rerank) and stream mode. Also fix stale model ID: qwen3-coder-480b → qwen3-coder (verified via OpenRouter API discovery). https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6
Auto route now uses a two-stage classification:
1. Rules (zero latency) - handles obvious cases: tools→large,
keywords→reasoning/large, short messages→lite, long content→large
2. LLM classifier (optional) - for ambiguous cases, calls a small
model (e.g. glm-4-flash) with a classification prompt, 5s timeout,
max_tokens=10, temperature=0
Config:
auto_route:
classifier: "zhipu/glm-4-flash" # provider/model format
If classifier is not configured, falls back to rule-based standard
tier for ambiguous cases. The LLM prompt asks the model to reply
with just one word: lite/standard/large/reasoning.
https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6
Discovery now parses model size and capability from model IDs: - Size: regex extracts "70b", "405b", MoE active params "120b-a12b"→12B - Capability: keywords like "thinking", "r1", "qwq" → reasoning Auto-generates tier routes (auto:lite, auto:standard, auto:large, auto:reasoning) with parameter-count-based weights. AutoRouter reads these directly - no manual tier→route mapping needed. Classification: <10B→lite, 10-72B→standard, >72B→large, reasoning keywords→reasoning, unknown size→standard. The "free" alias now picks the best model from large/standard/lite tiers. Manual tier overrides still supported as fallback. https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6
- Require 3 consecutive rate-limit failures before marking backend unhealthy (was: 1 failure = instant unhealthy) - Skip already-tried target in fallback retry loop to avoid double-counting failures and wasting retry attempts - Reset fail count on successful request This fixes the issue where one model's 429 would take down all models sharing the same provider API key. https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6
Major changes to rate limit handling:
- 429 now marks the specific model as hot (30s cooldown), NOT the
entire backend/API key. Other models on the same provider work fine.
- handleWithRetry skips hot models when selecting targets
- Auto route degrades from large→standard→lite when all models in
a tier are hot, ensuring requests always find an available model
New cooldown package tracks per-model (provider+model) overload state.
This properly handles providers like OpenRouter where different models
have independent rate limits sharing one API key.
Flow: model:auto → classify(large) → auto:large all hot →
degrade → auto:standard has available models → route there
https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
cp .env.example .env填入 API Key 即可开箱即用/v1/models接口,自动获取最新免费模型列表并注册路由支持平台
ZHIPU_API_KEYSILICONFLOW_API_KEYMODELSCOPE_API_KEY:free模型OPENROUTER_API_KEYLONGCAT_API_KEYVOLCENGINE_API_KEYALIYUN_API_KEYDEEPSEEK_API_KEY模型自动发现 (Model Discovery)
新增
discovery包,支持定时从各平台 API 自动获取免费模型列表:pricing.prompt == "0"筛选免费模型Pro/前缀(付费加速版)发现的模型自动注册为
provider/model格式路由,出现在/v1/models列表中。静态配置的路由永远不会被覆盖。Router 使用 RWMutex 保证并发安全。预配置路由
chat-freereasoning-freechat-free-largechat-litechatTest plan
chat-free路由的 fallbackdocker-compose up能正确传递环境变量provider/model格式访问/v1/models接口返回发现的模型https://claude.ai/code/session_01Hjjpqc8fw2Rxz7mGN4WdU6