llama.cpp on AWS EC2 under $2 #296

forgeda · 2023-03-19T15:08:44Z

forgeda
Mar 19, 2023

Prerequisites
Start a instance with t2.xlarge type hardware (4 core , 16GiB Memory), Ubuntu 22.04 AMI
SSH with PuTTY 0.76 above
Microsoft Remote Desktop Connection

1. Python Configuration

python 3.10.6 has been preinstalled on Ubuntu 22.04
sudo apt install python-pip
python3 -m pip install torch numpy sentencepiece

2. CMake Configuration

GCC, G++ 11.3.0 preinstalled on Ubuntu 22.04
sudo apt install build-essential
sudo apt-get install libcurl4-openssl-dev libssl-dev uuid-dev zlib1g-dev libpulse-dev
sudo apt install cmake

3. 7B Model Quantification and Inference with llama.cpp

4. Performance

Speed: 310 ms per token
Time for Model Deployment, from starting instance to getting the first inference result: < 4 Hours

5. Highlight

Make the model easily scalable
The process of retraining and inference can be more efficient in a consistent flow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama.cpp on AWS EC2 under $2 #296

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

llama.cpp on AWS EC2 under $2 #296

Uh oh!

forgeda Mar 19, 2023

Replies: 0 comments

forgeda
Mar 19, 2023