Skip to content

feat: add --model CLI parameter for model selection#19

Closed
chindris-mihai-alexandru wants to merge 2 commits intoCerebras:mainfrom
chindris-mihai-alexandru:feature/model-selection-cli
Closed

feat: add --model CLI parameter for model selection#19
chindris-mihai-alexandru wants to merge 2 commits intoCerebras:mainfrom
chindris-mihai-alexandru:feature/model-selection-cli

Conversation

@chindris-mihai-alexandru
Copy link
Copy Markdown

@chindris-mihai-alexandru chindris-mihai-alexandru commented Jan 3, 2026

Summary

Adds the ability to select which Cerebras model to use via a --model CLI parameter, addressing the feature request in #6.

Changes

New Features

  • --model <model-name> / -m <model-name> - Select which Cerebras model to use
  • --help / -h - Display usage information and list of available models
  • Model validation with fallback to default if invalid model specified
  • Clear indication of which model is being used on server startup (regardless of selection method)

Implementation Details

  • Added AVAILABLE_MODELS constant in src/config/constants.js with all 6 supported Cerebras models
  • Added setModel() function to enable runtime model configuration
  • Used Node.js built-in parseArgs (node:util) for CLI argument parsing
  • Maintained backward compatibility - existing behavior unchanged if no --model flag provided
  • Updated Node.js requirement to >=18.3.0 for parseArgs support

Documentation

  • Updated README.md with:
    • Model selection section
    • List of all available models with parameters and speed
    • Usage examples
    • Reference to --help flag

Available Models

Model ID Name Parameters Speed
llama3.1-8b Llama 3.1 8B 8 billion ~2200 tokens/s
llama-3.3-70b Llama 3.3 70B 70 billion ~2100 tokens/s
gpt-oss-120b OpenAI GPT OSS 120 billion ~3000 tokens/s
qwen-3-32b Qwen 3 32B 32 billion ~2600 tokens/s
qwen-3-235b-a22b-instruct-2507 Qwen 3 235B Instruct 235 billion ~1400 tokens/s (Preview)
zai-glm-4.6 Z.ai GLM 4.6 357 billion ~1000 tokens/s (Preview, Default)

Usage Examples

# Use Llama 3.3 70B
cerebras-mcp --model llama-3.3-70b

# Use Qwen 3 32B (shorthand)
cerebras-mcp -m qwen-3-32b

# Display help
cerebras-mcp --help

Testing

  • Verified --help displays correct information
  • Tested model selection with valid model names
  • Confirmed invalid model names fall back to default with warning
  • Verified backward compatibility (no flags = default behavior)
  • Server starts successfully with selected model
  • Model display shows on startup regardless of selection method

Copilot AI Review

All 5 automated review comments have been addressed in commit b6a21fc:

  • Added startup log showing active model
  • Hardcoded default model in help text for clarity
  • Fixed parameter formatting consistency in README
  • Updated Node.js requirement to >=18.3.0
  • Corrected GitHub URL to match package.json

Resolves

Closes #6

Notes

  • The CEREBRAS_MODEL environment variable still works and takes precedence if set
  • Default model remains zai-glm-4.6 to maintain existing behavior
  • No breaking changes - fully backward compatible

Implements feature requested in issue Cerebras#6

Changes:
- Add --model/-m CLI parameter to specify Cerebras model at runtime
- Add --help/-h flag to display usage information and available models
- Add AVAILABLE_MODELS constant with all supported Cerebras models
- Add setModel() function to enable runtime model configuration
- Update README with model selection documentation

Available models:
- llama3.1-8b (8B, ~2200 tokens/s)
- llama-3.3-70b (70B, ~2100 tokens/s)
- gpt-oss-120b (120B, ~3000 tokens/s)
- qwen-3-32b (32B, ~2600 tokens/s)
- qwen-3-235b-a22b-instruct-2507 (235B, ~1400 tokens/s, Preview)
- zai-glm-4.6 (357B, ~1000 tokens/s, Preview, Default)

Usage:
  cerebras-mcp --model llama-3.3-70b
  cerebras-mcp -m qwen-3-32b
  cerebras-mcp --help

Resolves Cerebras#6
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds CLI model selection functionality to the Cerebras Code MCP Server, allowing users to specify which Cerebras model to use via a --model flag. The implementation uses Node.js's built-in parseArgs for argument handling and includes comprehensive help documentation.

Key Changes:

  • Added --model/-m CLI parameter for runtime model selection with validation and fallback to default
  • Implemented --help/-h flag displaying usage information and available models
  • Created AVAILABLE_MODELS constant with metadata for all 6 supported Cerebras models

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/index.js Implements CLI argument parsing with parseArgs, model validation logic, and comprehensive help text
src/config/constants.js Adds AVAILABLE_MODELS constant with model metadata and setModel() function for runtime configuration
README.md Documents model selection feature, lists available models with specifications, and provides usage examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/index.js
Comment on lines 95 to 96
console.error('Cerebras Code MCP Server starting...');
console.error(`📝 Debug logs will be written to: ${LOG_FILE}`);
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server startup should display which model is currently being used, regardless of whether it was set via CLI flag or environment variable. Currently, the model is only shown when explicitly set via the --model flag. Add a log statement after line 95 to always show the active model, similar to how API key availability is logged.

Copilot uses AI. Check for mistakes.
Comment thread src/index.js Outdated
Options:
-c, --config Run interactive configuration wizard
-r, --remove Remove MCP server configuration
-m, --model <model-name> Specify which Cerebras model to use (default: ${config.cerebrasModel})
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help message displays the default model by accessing config.cerebrasModel at template string evaluation time (line 54). However, if the CEREBRAS_MODEL environment variable is set, this will show the environment variable's value as the default, which could be confusing. Consider hardcoding the default as "zai-glm-4.6" in the help text, and separately mention that it can be overridden by the CEREBRAS_MODEL environment variable.

Suggested change
-m, --model <model-name> Specify which Cerebras model to use (default: ${config.cerebrasModel})
-m, --model <model-name> Specify which Cerebras model to use (default: zai-glm-4.6)
This default can be overridden via the CEREBRAS_MODEL environment variable.

Copilot uses AI. Check for mistakes.
Comment thread README.md Outdated
- `llama-3.3-70b` - Llama 3.3 70B (70 billion parameters, ~2100 tokens/s)
- `gpt-oss-120b` - OpenAI GPT OSS (120 billion parameters, ~3000 tokens/s)
- `qwen-3-32b` - Qwen 3 32B (32 billion parameters, ~2600 tokens/s)
- `qwen-3-235b-a22b-instruct-2507` - Qwen 3 235B Instruct (235 billion, ~1400 tokens/s, Preview)
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent formatting in the model descriptions. Line 51 says "235 billion" while other entries say "235 billion parameters". Line 52 also says "357 billion parameters" while line 51 drops the "parameters" word. For consistency, either all entries should include "parameters" or all should omit it.

Suggested change
- `qwen-3-235b-a22b-instruct-2507` - Qwen 3 235B Instruct (235 billion, ~1400 tokens/s, Preview)
- `qwen-3-235b-a22b-instruct-2507` - Qwen 3 235B Instruct (235 billion parameters, ~1400 tokens/s, Preview)

Copilot uses AI. Check for mistakes.
Comment thread src/index.js
*/

import { config, debugLog, LOG_FILE } from './config/constants.js';
import { parseArgs } from 'node:util';
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parseArgs function from node:util was stabilized in Node.js 18.3.0, but package.json specifies "node": ">=18.0.0". Users running Node.js 18.0.0-18.2.x will encounter errors because parseArgs is experimental in those versions and requires a flag to use. Consider either updating the minimum Node.js version to ">=18.3.0" in package.json, or using the util.parseArgs with the experimental flag check for older versions.

Copilot uses AI. Check for mistakes.
Comment thread src/index.js Outdated
cerebras-mcp -m qwen-3-32b # Use Qwen 3 32B (shorthand)
cerebras-mcp --config # Configure API keys

For more information, visit: https://github.com/Cerebras/cerebras-code-mcp
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GitHub repository URL in the help text does not match the actual repository URL specified in package.json. The help text references "https://github.com/Cerebras/cerebras-code-mcp" but package.json indicates the repository is at "https://github.com/kevint-cerebras/cerebras-mcp". Update the URL to match the correct repository location.

Suggested change
For more information, visit: https://github.com/Cerebras/cerebras-code-mcp
For more information, visit: https://github.com/kevint-cerebras/cerebras-mcp

Copilot uses AI. Check for mistakes.
- Add startup log showing active model regardless of selection method
- Hardcode default model (zai-glm-4.6) in help text for clarity
- Fix parameter formatting: '235 billion' → '235 billion parameters'
- Update Node.js requirement to >=18.3.0 for parseArgs support
- Fix GitHub URL to match package.json repository
@chindris-mihai-alexandru
Copy link
Copy Markdown
Author

chindris-mihai-alexandru commented Jan 3, 2026

Copilot AI Review Feedback Addressed

I've addressed all 5 comments in commit b6a21fc:

Changes Made:

  1. Model Display Enhancement - Added startup log showing the active model regardless of how it was set (CLI flag, env var, or default)

    • Now displays: Using model: Z.ai GLM 4.6 (zai-glm-4.6)
  2. Help Text Clarity - Hardcoded zai-glm-4.6 as the default model in help text instead of using the dynamic variable

    • Makes the help output clearer and doesn't depend on runtime state
  3. Parameter Formatting Consistency - Fixed 235 billion235 billion parameters in README.md

    • Now consistent with other model descriptions
  4. Node.js Version Requirement - Updated package.json engines to require Node.js >=18.3.0

    • This is when parseArgs was stabilized (no longer experimental)
  5. GitHub URL Correction - Fixed help text URL to match package.json repository

    • Changed from github.com/Cerebras/cerebras-code-mcp to github.com/kevint-cerebras/cerebras-mcp

All changes maintain backward compatibility and improve code quality.

@chindris-mihai-alexandru
Copy link
Copy Markdown
Author

chindris-mihai-alexandru commented Jan 3, 2026

Model List Verification

I've verified that the model list in this PR matches the current official Cerebras Inference API documentation (January 2026).

Models Included (6 total)

Production Models:

  • llama3.1-8b - Llama 3.1 8B (8 billion parameters, ~2200 tokens/s)
  • llama-3.3-70b - Llama 3.3 70B (70 billion parameters, ~2100 tokens/s)
  • gpt-oss-120b - OpenAI GPT OSS (120 billion parameters, ~3000 tokens/s)
  • qwen-3-32b - Qwen 3 32B (32 billion parameters, ~2600 tokens/s)

Preview Models:

  • qwen-3-235b-a22b-instruct-2507 - Qwen 3 235B Instruct (235 billion parameters, ~1400 tokens/s)
  • zai-glm-4.6 - Z.ai GLM 4.6 (357 billion parameters, ~1000 tokens/s, Default)

Source

Official documentation: https://inference-docs.cerebras.ai/models

Note on qwen-3-coder-480b

The qwen-3-coder-480b model mentioned in issue #6 was officially deprecated by Cerebras on November 5, 2025 (source: Deprecations). Cerebras recommends migrating to zai-glm-4.6 as a replacement. This PR uses only currently supported models to ensure long-term reliability.

@chindris-mihai-alexandru
Copy link
Copy Markdown
Author

Closing this stale PR to keep my open queue focused. If maintainers still want this change, I can reopen and refresh it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Model selection

2 participants