You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.
The problem happened below. Turns out it didn't include the "general.quantization_version" metadata. In the case that llama.cpp reads a file without a version, it assumes 2 (grep for the line gguf_set_val_u32(ctx_out, "general.quantization_version", GGML_QNT_VERSION);), so this model works with llama.cpp but fails with rusformers/llm.
let model = llm::load(
path,
llm::TokenizerSource::Embedded,
parameters,
llm::load_progress_callback_stdout,
)
.unwrap_or_else(|err| panic!("Failed to load model: {err}"));
thread '<unnamed>' panicked at llm/inference/src/llms/local/llama2.rs:45:35:
Failed to load model: quantization version was missing, despite model containing quantized tensors
My solution was to just get rid of this whole block
The problem happened below. Turns out it didn't include the "general.quantization_version" metadata. In the case that llama.cpp reads a file without a version, it assumes 2 (grep for the line
gguf_set_val_u32(ctx_out, "general.quantization_version", GGML_QNT_VERSION);
), so this model works with llama.cpp but fails with rusformers/llm.My solution was to just get rid of this whole block
Unsure how you want to handle this since it does remove a check.
The text was updated successfully, but these errors were encountered: