-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Description
Name and Version
$./llama-server --version
version: 6561 (8ba548d)
built with cc (Debian 14.2.0-19) 14.2.0 for x86_64-linux-gnu
Five days ago, I posted this issue in the discussion section (#16103). After further reflection, I believe it's more accurately categorized as a bug rather than an enhancement request.
I've identified two misleading behaviors in Llama.cpp:
- When loading a model with parameters specified via the command line (using
llama-server
), these parameters are not reflected in the user interface (UI). - When switching to a different model, the UI continues to apply the previous model’s parameters instead of utilizing the command-line settings for the new model.
This behavior leads to a poor user experience, as incorrect parameters are applied, resulting in suboptimal model performance.
Proposed Solution
To address these issues, we could fetch data from the /props
endpoint and apply it as the default configuration in the UI.
Below is a basic example demonstrating the intended functionality:
const props = fetch('/props')
.then((response) => {
if (!response.ok) {
throw new Error(`HTTP error! Status: ${response.status}`);
}
return response.json();
})
.then((props) => {
console.log(props);
localStorage.setItem('config', JSON.stringify({
"apiKey": "",
"systemMessage": "You are a helpful assistant.",
"showTokensPerSecond": true,
"showThoughtInProgress": true,
"excludeThoughtOnReq": true,
"pasteLongTextToFileLen": 15500,
"samplers": "edkypmxt",
"temperature": props['default_generation_settings']['params']['temperature'],
"dynatemp_range": 0,
"dynatemp_exponent": 1,
"top_k": props['default_generation_settings']['params']['top_k'],
"top_p": props['default_generation_settings']['params']['top_p'],
"min_p": props['default_generation_settings']['params']['min_p'],
"xtc_probability": 0,
"xtc_threshold": 0.1,
"typical_p": 1,
"repeat_last_n": 64,
"repeat_penalty": 1,
"presence_penalty": 1.5,
"frequency_penalty": 0,
"dry_multiplier": 0,
"dry_base": 1.75,
"dry_allowed_length": 2,
"dry_penalty_last_n": -1,
"max_tokens": -1,
"custom": "",
"pyIntepreterEnabled": false
}));
})
.catch((error) => {
console.error('Failed to fetch /props:', error);
});
### Operating systems
Linux
### Which llama.cpp modules do you know to be affected?
llama-server
### Command line
```shell
/llama-server --port 8090 --model ~/Ai/Models/google/unsloth/gemma-3-27b-it-qat-UD-Q5_K_XL.gguf --ctx-size 32768 --n-gpu-layers 64 --prio 3 --temp 1.0 --min-p 0.01 --top-p 0.95 --top-k 64 --presence-penalty 1.0 --jinja --reasoning-format none --flash-attn on --threads 6
Problem description & steps to reproduce
Use these parameters and compare with the UI settings.
--temp 1.0 --min-p 0.01 --top-p 0.95 --top-k 64
First Bad Commit
Since the beginning?