Skip to content

esperantotech/onnxruntime-benchmarking

Repository files navigation

ONNXRuntime Benchmarking

linting: pylint This repository contains an AI model benchmarking on EtGlowExecutionProvider provider hardware. Executions on CPU EP are also provided for comparison.

Performance Results

Model Size Batch ET Provider (Inf/s) ET Provider (Inf/s/W)
Esperanto/resnet50-onnx fp32 1 269.806 9.958698
Esperanto/resnet50-onnx fp32 2 345.299 12.543673
Esperanto/resnet50-onnx fp32 4 466.124 16.233861
Esperanto/resnet50-onnx fp32 8 427.453 15.111834
Esperanto/vgg-16-bn-fp32-onnx fp32 1 106.370 3.929013
Esperanto/vgg-16-bn-fp32-onnx fp32 2 148.595 5.259819
Esperanto/vgg-16-bn-fp32-onnx fp32 4 168.185 5.978496
Esperanto/vgg-16-bn-fp32-onnx fp32 8 167.038 6.096589
Esperanto/vgg-16-bn-fp32-onnx fp32 16 142.018 5.347647
Esperanto/vgg19-7-onnx fp32 1 93.033 3.379863
Esperanto/vgg19-7-onnx fp32 2 125.453 4.437935
Esperanto/vgg19-7-onnx fp32 4 137.954 4.873554
Esperanto/vgg19-7-onnx fp32 8 113.090 4.374226
Esperanto/mobilenet-v2-fp32-onnx fp32 1 722.274 33.004601
Esperanto/mobilenet-v2-fp32-onnx fp32 2 1155.423 49.460166
Esperanto/mobilenet-v2-fp32-onnx fp32 4 1628.399 64.735260
Esperanto/mobilenet-v2-fp32-onnx fp32 8 1478.598 56.441255
Esperanto/mobilenet-v3-small-onnx fp32 1 319.873 17.816432
Esperanto/mobilenet-v3-small-onnx fp32 2 598.362 32.487561
Esperanto/mobilenet-v3-small-onnx fp32 4 1040.481 53.614453
Esperanto/mobilenet-v3-small-onnx fp32 8 1748.935 82.216533
Esperanto/mobilenet-v3-small-onnx fp32 16 2405.113 109.051403
Esperanto/mobilenet-v3-small-onnx fp32 32 2120.889 89.259395
onnx-community/mobilenet_v2_1.4_224 fp32 1 139.007 6.622033
onnx-community/mobilenet_v2_1.4_224 fp32 2 190.644 8.958979
onnx-community/mobilenet_v2_1.4_224 fp32 4 218.698 9.774437
onnx-community/mobilenet_v2_1.4_224 fp32 8 232.392 10.256368
onnx-community/mobilenet_v2_1.4_224 fp32 16 210.324 9.370816
Model (results for MMLUDataset) Size Batch Sequence Len Window Implementation ET Time To First Token (TTFT) ET Prompt parsing TPS (tok/s) ET Prompt parsing TPS/W (tok/s/W)
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 2048 1 Default 20.806 3.36 0.15
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 2048 1 IOBindings 14.282 4.9 0.2
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 4096 1 Default 40.342 1.74 0.08
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 4096 1 IOBindings 25.332 2.76 0.12
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 2048 32 Default 0.901 77.7 3.19
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 2048 32 IOBindings 0.683 102.46 4.17
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 4096 32 Default 1.671 41.9 1.98
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 4096 32 IOBindings 1.171 59.77 2.59
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 2048 96 Default 0.352 199.04 7.13
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 2048 96 IOBindings 0.287 334.07 12.88
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 4096 96 Default 0.622 112.46 5.0
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 4096 96 IOBindings 0.526 182.39 8.36
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 2048 128 Default 0.352 199.01 6.27
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 2048 128 IOBindings 0.318 403.04 15.57
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 4096 128 Default 0.666 105.13 4.38
Esperanto/llama3.1-8b-Instruct-kvc-fp16-onnx fp16 1 4096 128 IOBindings 0.578 221.59 9.55

Prepare development environment

The onnxruntime-benchmarking repository requires a system that has the Esperanto SDK pre-installed. The SDK is a set of utilities and tools that allow the user to transparently use the Esperanto SW stack, including the onnxruntime esperanto fork.

Getting sources

First step is to get the onnxruntime-benchmarking sources.

git clone [email protected]:esperantotech/software/onnxruntime-benchmarking.git
cd onnxruntime-benchmarking/

Start dockerized environment

We have to load a docker from somewhere!!! Assuming you have a sw-platform installation in your $HOME:

./sw-platform/dock.py --image=convoke/ubuntu-22.04-et-sw-develop-stack prompt

Installing dependencies

In order to successfully execute this benchmark we first need to install all its dependencies. This can be performed with these two commands:

python3 -m pip install -r requirements.txt --extra-index-url https://sc-artifactory1.esperanto.ai/artifactory/api/pypi/pypi-virtual/simple
python3 -m pip install opencv-python-headless~=4.10.0.84

Setting environment

This benchmark utilizes HuggingFace to download models and datasets automatically.

Make sure to go to your HuggingFace account settings -> Access Tokens and create a token with the following permisisons:

  • (Personal permissions) Read access to contents of all repos under your personal namespace
  • (Personal permissions) Read access to contents of all public gated repos you can access
  • (Org permissions) Read access to contents of all repos in selected organizations

Then set HF_TOKEN with the newly created token

export HF_TOKEN=<your_hugging_face_token>

OBS: you will need to have access to Esperanto EtSoC-1 accelerators to be able to successfully execute

Dataset Access:

  • Visit HuggingFace Datasets - ImageNet-1k and follow the instructions to grant access to the ImageNet-1k dataset.
  • Make sure you can view and access the dataset details without any permission errors.

How to run

To run this benchmark is as easy as running python3 bench.py <params>.

Several examples:

  1. Listing models and the configurations available:
$ python3 bench.py  -lm
2025-01-30 09:41:58.541933 | INFO     | Available models
Models       configs
-----------  -----------------------------------------
mobilenetv2  ['small', 'large']
resnet       ['xx', '50']
bert         ['base', 'large', 'albert', 'distilbert']
llama3       ['', 'kvc']
```

2. Running Resnet50

$ python3 bench.py -m resnet -c 50


3. Running Resnet50 limiting number of launches to 10

$ python3 bench.py -m resnet -c 50 -l 10



4. Running Resnet50 limiting the benchmark execution time to 7 seconds

$ python3 bench.py -m resnet -c 50 -tc 7


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages