Every method, every parameter, every possibility documented with quantum precision
- Core API
- Model Management
- Routing System
- Engine Control
- Streaming API
- Configuration API
- Advanced Features
- Events & Hooks
The main orchestrator of the neural symphony.
import LLMRouter from 'llm-runner-router';const router = new LLMRouter(options);Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
options |
Object |
{} |
Configuration object |
options.strategy |
String |
'balanced' |
Routing strategy |
options.maxModels |
Number |
100 |
Maximum models in registry |
options.cacheTTL |
Number |
3600000 |
Cache time-to-live (ms) |
options.autoInit |
Boolean |
true |
Auto-initialize on creation |
options.logLevel |
String |
'info' |
Logging verbosity |
Example:
const router = new LLMRouter({
strategy: 'quality-first',
maxModels: 50,
cacheTTL: 7200000,
logLevel: 'debug'
});Initialize the router system.
await router.initialize();Returns: Promise<void>
Throws: Error if initialization fails
Example:
try {
await router.initialize();
console.log('Router ready!');
} catch (error) {
console.error('Initialization failed:', error);
}Load a model into the system.
const model = await router.load(spec);Parameters:
| Parameter | Type | Description |
|---|---|---|
spec |
String|Object |
Model specification |
spec.source |
String |
Model source (path/URL) |
spec.format |
String |
Optional format override |
spec.immediate |
Boolean |
Load immediately |
spec.config |
Object |
Model-specific config |
Returns: Promise<Model> - Loaded model instance
Examples:
// Simple string load
const model1 = await router.load('models/llama-7b.gguf');
// Object specification
const model2 = await router.load({
source: 'huggingface:meta-llama/Llama-2-7b',
format: 'auto-detect',
immediate: false,
config: {
quantization: 'q4_k_m',
context: 4096
}
});Quick inference with automatic model selection.
const response = await router.quick(prompt, options);Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt |
String |
Required | Input text |
options |
Object |
{} |
Generation options |
options.maxTokens |
Number |
500 |
Maximum tokens |
options.temperature |
Number |
0.7 |
Sampling temperature |
options.topP |
Number |
0.9 |
Nucleus sampling |
options.topK |
Number |
40 |
Top-K sampling |
options.cache |
Boolean |
true |
Use cache |
Returns: Promise<Response>
Response Structure:
{
text: String, // Generated text
tokens: Number, // Token count
model: String, // Model used
latency: Number, // Generation time (ms)
cached: Boolean // From cache?
}Advanced inference with full control.
const response = await router.advanced(config);Parameters:
| Parameter | Type | Description |
|---|---|---|
config |
Object |
Complete configuration |
config.prompt |
String |
Input prompt |
config.model |
String |
Specific model ID |
config.temperature |
Number |
Temperature (0-2) |
config.maxTokens |
Number |
Max generation length |
config.stream |
Boolean |
Enable streaming |
config.fallbacks |
Array<String> |
Fallback models |
config.timeout |
Number |
Request timeout (ms) |
config.retries |
Number |
Retry attempts |
Example:
const response = await router.advanced({
prompt: "Explain recursion",
model: "gpt-4",
temperature: 0.5,
maxTokens: 1000,
fallbacks: ['claude', 'llama-70b'],
timeout: 30000,
retries: 3
});Register a model in the system.
await router.registry.register(model);Get a model by ID.
const model = await router.registry.get(modelId);Search for models.
const models = router.registry.search({
name: 'llama',
format: 'gguf',
capabilities: ['streaming'],
maxSize: 10000000000
});Remove a model from registry.
await router.registry.unregister(modelId);Change routing strategy at runtime.
router.setStrategy('cost-optimized');Available Strategies:
'quality-first'- Prioritize output quality'cost-optimized'- Minimize inference costs'speed-priority'- Fastest response time'balanced'- Balance all factors'round-robin'- Equal distribution'least-loaded'- Load balancing'random'- Chaos mode
Add custom routing logic.
router.addCustomStrategy('my-strategy', (models, context) => {
// Custom selection logic
return models.find(m => m.name === 'preferred-model');
});Stream tokens in real-time.
const stream = router.stream(prompt, options);Returns: AsyncGenerator<String>
Example:
const stream = router.stream("Write a story");
for await (const token of stream) {
process.stdout.write(token);
}Stream with progress events.
const stream = router.streamWithEvents(prompt, options);
stream.on('token', (token) => console.log(token));
stream.on('progress', (info) => console.log(info));
stream.on('complete', (result) => console.log('Done!'));
stream.on('error', (error) => console.error(error));
await stream.start();Get current engine.
const engine = router.getEngine();
console.log(engine.name); // 'webgpu'Manually set engine.
await router.setEngine('wasm');List available engines.
const engines = router.getAvailableEngines();
// ['webgpu', 'wasm', 'node']The router emits various events for monitoring and debugging.
router.on('model-loaded', (model) => {
console.log(`Model loaded: ${model.name}`);
});
router.on('inference-start', (context) => {
console.log(`Starting inference: ${context.model}`);
});
router.on('inference-complete', (result) => {
console.log(`Completed in ${result.latency}ms`);
});
router.on('error', (error) => {
console.error('Router error:', error);
});Available Events:
'initialized'- Router ready'model-loaded'- Model loaded'model-unloaded'- Model removed'inference-start'- Generation beginning'inference-complete'- Generation done'token'- Stream token emitted'cache-hit'- Cache used'strategy-changed'- Routing changed'error'- Error occurred
Add middleware to processing pipeline.
router.use(async (context, next) => {
console.log('Before inference:', context.prompt);
const result = await next();
console.log('After inference:', result.text);
return result;
});Get current configuration.
const config = router.getConfig();Update configuration at runtime.
router.updateConfig({
cacheTTL: 7200000,
maxTokens: 2048
});Export configuration for persistence.
const config = router.exportConfig();
await fs.writeFile('config.json', JSON.stringify(config));Combine multiple models.
const result = await router.ensemble([
{ model: 'gpt-4', weight: 0.5 },
{ model: 'claude', weight: 0.3 },
{ model: 'llama', weight: 0.2 }
], prompt, options);Process multiple prompts efficiently.
const results = await router.batch([
"Question 1",
"Question 2",
"Question 3"
], options);Compare models side-by-side.
const comparison = await router.compare(
['model1', 'model2', 'model3'],
prompt,
options
);
console.log(comparison.rankings);
console.log(comparison.scores);import {
ModelNotFoundError,
EngineError,
TimeoutError,
ValidationError
} from 'llm-runner-router/errors';
try {
await router.load('invalid-model');
} catch (error) {
if (error instanceof ModelNotFoundError) {
console.error('Model not found:', error.modelId);
}
}const router = new LLMRouter({
retryConfig: {
maxAttempts: 3,
backoff: 'exponential',
initialDelay: 1000,
maxDelay: 10000
}
});Get system metrics.
const metrics = router.getMetrics();
console.log(metrics);
// {
// totalInferences: 1234,
// averageLatency: 245,
// cacheHitRate: 0.67,
// modelsLoaded: 5,
// memoryUsage: 2147483648
// }Reset performance counters.
router.resetMetrics();Properly cleanup resources.
await router.cleanup();Always call cleanup when shutting down to:
- Unload models from memory
- Close engine connections
- Persist cache if configured
- Release resources
API designed for humans, built for scale, ready for the future 🚀
Built with 💙 by Echo AI Systems