This repository contains an AI model benchmarking on EtGlowExecutionProvider provider hardware.
Executions on CPU EP are also provided for comparison.
Model (results for MMLUDataset) | Size | Batch | Sequence Len | Window | Implementation | ET Time To First Token (TTFT) | ET Prompt parsing TPS (tok/s) | ET Prompt parsing TPS/W (tok/s/W) |
---|---|---|---|---|---|---|---|---|
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 2048 | 1 | Default | 20.806 | 3.36 | 0.15 |
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 2048 | 1 | IOBindings | 14.282 | 4.9 | 0.2 |
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 4096 | 1 | Default | 40.342 | 1.74 | 0.08 |
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 4096 | 1 | IOBindings | 25.332 | 2.76 | 0.12 |
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 2048 | 32 | Default | 0.901 | 77.7 | 3.19 |
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 2048 | 32 | IOBindings | 0.683 | 102.46 | 4.17 |
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 4096 | 32 | Default | 1.671 | 41.9 | 1.98 |
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 4096 | 32 | IOBindings | 1.171 | 59.77 | 2.59 |
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 2048 | 96 | Default | 0.352 | 199.04 | 7.13 |
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 2048 | 96 | IOBindings | 0.287 | 334.07 | 12.88 |
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 4096 | 96 | Default | 0.622 | 112.46 | 5.0 |
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 4096 | 96 | IOBindings | 0.526 | 182.39 | 8.36 |
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 2048 | 128 | Default | 0.352 | 199.01 | 6.27 |
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 2048 | 128 | IOBindings | 0.318 | 403.04 | 15.57 |
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 4096 | 128 | Default | 0.666 | 105.13 | 4.38 |
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx | fp16 | 1 | 4096 | 128 | IOBindings | 0.578 | 221.59 | 9.55 |
The onnxruntime-benchmarking repository requires a system that has the Esperanto SDK pre-installed. The SDK is a set of utilities and tools that allow the user to transparently use the Esperanto SW stack, including the onnxruntime esperanto fork.
First step is to get the onnxruntime-benchmarking sources.
git clone [email protected]:esperantotech/software/onnxruntime-benchmarking.git
cd onnxruntime-benchmarking/
We have to load a docker from somewhere!!!
Assuming you have a sw-platform
installation in your $HOME
:
./sw-platform/dock.py --image=convoke/ubuntu-22.04-et-sw-develop-stack prompt
In order to successfully execute this benchmark we first need to install all its dependencies. This can be performed with these two commands:
python3 -m pip install -r requirements.txt --extra-index-url https://sc-artifactory1.esperanto.ai/artifactory/api/pypi/pypi-virtual/simple
python3 -m pip install opencv-python-headless~=4.10.0.84
This benchmark utilizes HuggingFace to download models and datasets automatically.
Make sure to go to your HuggingFace account settings -> Access Tokens and create a token with the following permisisons:
- (Personal permissions) Read access to contents of all repos under your personal namespace
- (Personal permissions) Read access to contents of all public gated repos you can access
- (Org permissions) Read access to contents of all repos in selected organizations
Then set HF_TOKEN
with the newly created token
export HF_TOKEN=<your_hugging_face_token>
OBS: you will need to have access to Esperanto EtSoC-1 accelerators to be able to successfully execute
- Visit HuggingFace Datasets - ImageNet-1k and follow the instructions to grant access to the ImageNet-1k dataset.
- Make sure you can view and access the dataset details without any permission errors.
To run this benchmark is as easy as running python3 bench.py <params>
.
Several examples:
- Listing models and the configurations available:
$ python3 bench.py -lm
2025-01-30 09:41:58.541933 | INFO | Available models
Models configs
----------- -----------------------------------------
mobilenetv2 ['small', 'large']
resnet ['xx', '50']
bert ['base', 'large', 'albert', 'distilbert']
llama3 ['', 'kvc']
```
2. Running Resnet50
$ python3 bench.py -m resnet -c 50
3. Running Resnet50 limiting number of launches to 10
$ python3 bench.py -m resnet -c 50 -l 10
4. Running Resnet50 limiting the benchmark execution time to 7 seconds
$ python3 bench.py -m resnet -c 50 -tc 7