Model Context Protocol (MCP) server for Inference Labs. Adds vendor-neutral LLM routing, model comparison, and live LLM pricing to Claude Desktop, Cursor, Windsurf, Continue, and any other MCP-compatible client.
After installing, Claude (or whatever MCP client you use) gets four new tools:
| Tool | Auth required | What it does |
|---|---|---|
get_pricing |
none | Returns current per-token pricing for every major 2026 LLM (GPT-5, Claude, Gemini, Bedrock, Llama). Data source: /api/prices.json, CC BY 4.0. |
recommend_model |
none | Given a task and a priority (cost / quality / balanced / long-context), returns the top three models with monthly cost estimates. |
route_request |
INFERENCE_LABS_API_KEY |
Routes one prompt through Inference Labs and returns the response + which model was chosen + cost. |
compare_models |
INFERENCE_LABS_API_KEY |
Runs the same prompt across N models and returns every response side-by-side. |
The first two work without an account — try the server, get value, then sign up at inference-labs.com for the routing tools.
# Recommended: uv (no install, runs on demand)
uvx inference-labs-mcp
# Or pip
pip install inference-labs-mcp
inference-labs-mcpAdd this to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"inference-labs": {
"command": "uvx",
"args": ["inference-labs-mcp"],
"env": {
"INFERENCE_LABS_API_KEY": "il_live_..."
}
}
}
}Restart Claude Desktop. Type "ask Inference Labs for current LLM pricing" and Claude will call the MCP server.
The INFERENCE_LABS_API_KEY line is optional — without it, get_pricing and recommend_model still work; route_request and compare_models will return a friendly "set INFERENCE_LABS_API_KEY" message.
Same config shape — these editors all read mcpServers from their respective settings files. See the MCP docs for client-specific paths.
Try these in any MCP-enabled client after installing:
- "Use inference-labs to show me the cheapest 1M-context LLM."
- "Recommend the best model for summarizing 10M support tickets a month, optimizing for cost."
- "Use the inference-labs router with cost-first strategy to classify this email: ..."
- "Run this prompt through GPT-5 and Claude Sonnet 4.5 and tell me which response is better."
git clone https://github.com/bosslesss/inference-labs-mcp
cd inference-labs-mcp
pip install -e .
INFERENCE_LABS_API_KEY=il_live_... python -m inference_labs_mcp
# Speaks JSON-RPC over stdio. Use mcp-cli or your MCP client to interact.Apache-2.0. See LICENSE.
- Platform: https://inference-labs.com
- Dashboard: https://app.inference-labs.com
- Python SDK (HTTP): https://github.com/bosslesss/inference-labs-python
- MCP spec: https://modelcontextprotocol.io