"Allow any MCP-capable LLM agent to communicate with or delegate tasks to any other LLM available through the OpenRouter.ai API."
A Model Context Protocol (MCP) server wrapper designed to facilitate seamless interaction with various Large Language Models (LLMs) through a standardized interface. This project enables developers to integrate LLM capabilities into their applications by providing a robust and flexible STDIO-based server that handles LLM calls, tool execution, and result processing.
- Implements the Model Context Protocol (MCP) specification for standardized LLM interactions.
- Provides an STDIO-based server for handling LLM requests and responses via standard input/output.
- Supports advanced features like tool calls and results through the MCP protocol.
- Configurable to use various LLM providers (e.g., OpenRouter, local models) via API base URL and model parameters.
- Designed for extensibility, allowing easy integration of new LLM backends.
- Integrates with
llm-accounting
for robust logging, rate limiting, and audit functionalities, enabling monitoring of remote LLM usage, inference costs, and inspection of queries/responses for debugging or legal purposes.
This project relies on the following key dependencies:
pydantic
: Data validation and settings management using Python type hints.pydantic-settings
: Pydantic's settings management for environment variables and configuration.python-dotenv
: Reads key-value pairs from a.env
file and sets them as environment variables.requests
: An elegant and simple HTTP library for Python.tiktoken
: A fast BPE tokeniser for use with OpenAI's models.llm-accounting
: For robust logging, rate limiting, and audit functionalities.
(Note: fastapi
and uvicorn
have been removed as the primary server is STDIO-based. If these are used for other utilities within the project, they should be re-added with clarification.)
pytest
: A mature full-featured Python testing framework.black
: An uncompromising Python code formatter.isort
: A Python utility / library to sort imports alphabetically, and automatically separate into sections and by type.mypy
: An optional static type checker for Python.pytest-mock
: A pytest plugin that provides amocker
fixture for easier mocking.
The llm-wrapper-mcp-server
package is available on PyPI and can be installed via pip:
pip install llm-wrapper-mcp-server
Alternatively, for local development or to install from source:
- Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Install the package:
pip install -e .
Create a .env
file in the project root with the following variable:
OPENROUTER_API_KEY=your_openrouter_api_key_here
The server is configured to use OpenRouter by default. The API key is loaded from the OPENROUTER_API_KEY
environment variable. The specific LLM model and API base URL are primarily configured via command-line arguments when running the server (see below).
Default settings if not overridden by CLI arguments:
- API Base URL for LLMClient: https://openrouter.ai/api/v1 (can be overridden by
LLM_API_BASE_URL
env var or--llm-api-base-url
CLI arg) - Default Model for LLMClient: perplexity/llama-3.1-sonar-small-128k-online (can be overridden by
--model
CLI arg)
Textual Overview:
- Agent Software communicates with the LLM Wrapper MCP Server via the MCP Protocol (stdin/stdout).
- The LLM Wrapper MCP Server interacts with LLM providers (e.g., OpenRouter.ai) for LLM API calls and responses.
- The server also integrates with an LLM Accounting System for logging and auditing.
- Main components:
- MCP Communication Handler
- LLM Client
- Tool Executor
- LLM Accounting Integration
This project includes a reference implementation of a fully functional MCP server named "Ask Online Question".
It can be directly integrated into your agentic workflows, providing cloud-based, LLM-powered online search capabilities via the MCP protocol.
This server demonstrates how to build a specialized MCP server on top of the llm-wrapper-mcp-server
foundation. For detailed information on its features, usage, and how to integrate it with your agent, please refer to its dedicated README: src/ask_online_question_mcp_server/README.md.
To run the server, execute the following command:
python -m llm_wrapper_mcp_server [OPTIONS]
For example:
python -m llm_wrapper_mcp_server --model your-org/your-model-name --log-level DEBUG
Run python -m llm_wrapper_mcp_server --help
to see all available command-line options for configuring the server.
This server operates as a Model Context Protocol (MCP) STDIO server, communicating via standard input and output. It does not open a network port for MCP communication.
The server communicates using JSON-RPC messages over stdin
and stdout
. It supports the following MCP methods:
initialize
: Handshake to establish protocol version and server capabilities.tools/list
: Lists available tools. The main server provides anllm_call
tool.tools/call
: Executes a specified tool.resources/list
: Lists available resources (currently none).resources/templates/list
: Lists available resource templates (currently none).
The llm_call
tool takes prompt
(string, required) and optionally model
(string) as arguments to allow per-call model overrides if the specified model is permitted.
You can interact with the STDIO MCP server using any language that supports standard input/output communication. Here's a Python example using the subprocess
module:
import subprocess
import json
import time
def send_request(process, request):
"""Sends a JSON-RPC request to the server's stdin."""
request_str = json.dumps(request) + "\\n"
process.stdin.write(request_str.encode('utf-8'))
process.stdin.flush()
def read_response(process):
"""Reads a JSON-RPC response from the server's stdout."""
line = process.stdout.readline().decode('utf-8').strip()
if line:
return json.loads(line)
return None
if __name__ == "__main__":
# Start the MCP server as a subprocess
# Ensure you have the virtual environment activated or the package installed globally
server_process = subprocess.Popen(
["python", "-m", "llm_wrapper_mcp_server"], # Add any CLI args here if needed
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE, # Capture stderr for debugging
text=False # Use bytes for stdin/stdout
)
print("Waiting for server to initialize...")
# The server sends an initial capabilities message on startup (id: None)
initial_response = read_response(server_process)
print(f"Server Initial Response: {json.dumps(initial_response, indent=2)}")
# 1. Send an 'initialize' request
initialize_request = {
"jsonrpc": "2.0",
"id": "1",
"method": "initialize",
"params": {}
}
print("\\nSending initialize request...")
send_request(server_process, initialize_request)
initialize_response = read_response(server_process)
print(f"Initialize Response: {json.dumps(initialize_response, indent=2)}")
# 2. Send a 'tools/call' request to use the 'llm_call' tool
llm_call_request = {
"jsonrpc": "2.0",
"id": "2",
"method": "tools/call",
"params": {
"name": "llm_call",
"arguments": {
"prompt": "What is the capital of France?"
# Optionally add: "model": "another-model/if-allowed"
}
}
}
print("\\nSending llm_call request...")
send_request(server_process, llm_call_request)
llm_call_response = read_response(server_process)
print(f"LLM Call Response: {json.dumps(llm_call_response, indent=2)}")
# You can also read stderr for any server logs/errors
# Note: stderr might block if there's no output, consider using non-blocking reads or threads for real apps
# stderr_output = server_process.stderr.read().decode('utf-8')
# if stderr_output:
# print("\\nServer Stderr Output:\\n", stderr_output)
# Terminate the server process
server_process.terminate()
server_process.wait(timeout=5) # Wait for process to terminate
print("\\nServer process terminated.")
For a detailed overview of the project's directory and file structure, refer to docs/STRUCTURE.md. This document is useful for understanding the codebase during development.
This project uses pytest
for testing.
To run all unit tests:
pytest
Integration tests are disabled by default to avoid making external API calls during normal test runs. To include and run integration tests, use the integration
marker:
pytest -m integration
Install development dependencies:
pip install -e ".[dev]"
MIT License