A terminal-based LLM chat app that runs locally and interacts with your vLLM server
TermiLLM relies on vLLM, a high-throughput and memory-efficient inference engine for LLMs. Before using TermiLLM:
- Install vLLM:
pip install vllm - Start a vLLM server with your preferred model:
python -m vllm.entrypoints.api_server --model meta-llama/Llama-3.2-3B-Instruct --port 8000
Future versions of TermiLLM will include integrated vLLM support, eliminating the need for a separate server.
- Interactive Chat Interface: Connect to your local vLLM backend with streaming responses
- User Experience:
- Colorful output using Rich for a pleasant terminal experience
- Keyboard navigation to review chat history
- Stream responses from your local LLM in real-time
- Command System:
/help- Display available commands/clear- Clear the current conversation/exit- Exit the application/model- Change the model on the fly/temp- Adjust temperature setting/max_tokens- Change maximum token output
- Configuration Management:
- Persistent settings via JSON configuration file
- Dynamic model switching without restarting
source ./venv.sh
python3 TermiLLM.pyYou can also specify a different model or server:
python3 TermiLLM.py --model meta-llama/Llama-3.2-3B-Instruct --base-url http://localhost:8000TermiLLM creates a configuration file named termillm_config.json in the application directory that stores your settings. You can edit this file directly to customize your preferences:
{
"model": "meta-llama/Llama-3.2-3B-Instruct",
"base_url": "http://localhost:8000",
"temperature": 0.7,
"max_tokens": 2048
}Settings can also be changed from within the application using commands like /model, /temp, and /max_tokens.
Interesting? Feel free to contribute or create a PR for features you want or bugs you found! The following is the plan:
- Basic Chat Feature
- Connect to vLLM backend
- Send/receive message to/from backend
- Support Streamed Output
- Support keyboard move
- Slash Commands
- /help
- /clear
- /exit
- Configurable model
- Support diff model through vllm
- Change model use '/model'
- Save previous model selection
- Check model (backend connection) before start
- Move setting to JSON
- Colorful Output: Use rich to make UX more pleasant
-
Provided more message during generating - Documentation
- C++ version
- Convert current TermiLLM.py as an engine
- Prefer to build a python interface for user and connect to the engine
- Add pytest
- CI/CD
- Highlight the code in output(may use a buffer)
- Local file support
- READ file, such as cpp, py, txt, md
- Write to file
- Generate file
- Linux command
- API Config
- Integrated vLLM as part of the project
- Docker
- A LangChain Mode
- Moving to bubbletea style
- Integrated local inference into it
- Integrated model into it