Base line model testing #14
Replies: 1 comment 1 reply
-
GGML format https://huggingface.co/aisuko/gpt-2-1.5B-ggml/tree/main. This is a demo model aims to show the GPT2-xl CPU accelerate with ggml. Update:It loads model to memory with fp32, it isn't what we want. So, we give up use Huggingface Transformers and their ecosystems on inference. LLamacppFor the original of llama.cpp(gguf), it already support GPT and Phi series.
Here are GPT2 117M, 1.5B Furthermore, for the quantization method in llama.cpp, see the The model reposThe reason why we use Instruct model see Model Training Techniques Hugging Face transformers with ggufRelated issues
RLHF notebooks
|
Beta Was this translation helpful? Give feedback.
-
We want to run the model on consumer-grade hardware. It means we will support different accelerate methods on CPU and GPU.
For CPU, ggml it the best choice.
For GPU, we support distribution by using Hugging Face libs
And, we want the model as small as it can be. So, we should choose a base line model. And as HuggingFace transformers supports gguf(format of ggml). So, we have many of choices below:
The reason we choice the base line model size larger than 1B is that we should make sure the results of model are reasonable and useful. I will upload some notebooks or model test result to support this later.
@Micost
Beta Was this translation helpful? Give feedback.
All reactions