NVIDIA Jetson Orin Nano #660

BradHutchings · 2024-12-20T07:44:02Z

BradHutchings
Dec 20, 2024

Should a llamafile binary work on this new toy?

https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/nano-super-developer-kit/

6 ARM cores, Ubuntu Linux, 8 GB RAM, a ton of CUDA cores.

They show benchmarks for a few 8B and 9B models running faster than whatever their baseline is.

MysterHawk · 2024-12-20T14:31:32Z

MysterHawk
Dec 20, 2024

The bandwidth of 100GB/s is nice, probably there is also cuda acceleration, I don't know if llama.cpp supports it yet with the gpu though

1 reply

BradHutchings Dec 23, 2024
Author

There is "jetson" in the code base of llamafile:

https://github.com/Mozilla-Ocho/llamafile/blob/a8fd4d28c3d2259c98af7035bcdda1a68af6f62c/llamafile/cuda.c#L473

It's unclear to me if that 8GB RAM is shared with the CUDA cores. If so, this device will have a tough time with 8B-ish models, as it will be paging from disk.

cjpais · 2024-12-24T18:05:52Z

cjpais
Dec 24, 2024
Collaborator

I have a jetson nano and have ran the latest builds of llamafile with it

For llama 1B you get approx 50 tok/s for generation and 1500 for prompt processing

4 replies

BradHutchings Dec 25, 2024
Author

Any chance I could get you to try my llamafile build?

https://huggingface.co/bradhutchings/DemoMachine-LLMs

Download the DemoMachine LLMs (macOS-Linux).zip archive, unzip on the device, run one of the Start scripts. It will download the gguf for the mode. I'm most interested in how a 4B-ish model like Microsoft Phi runs, and an 8B-ish like IBM Granite. Do they feel fast or is there too much disk paging going on with only 8GB RAM?

cjpais Dec 30, 2024
Collaborator

Here are results for Phi3-mini Q5_K_M

|          test | run number | avg time   | power      | tokens processed | pp t/s     | tg t/s     | pp t/s/watt | tg t/s/watt | ttft       |
| ------------: | ---------- | ---------- | ---------- | ---------------- | ---------- | ---------- | ----------- | ----------- | ---------- |
|   pp1024+tg16 | 4/4        | 4.06 s     | 0.00 W     | 4160 / 4160      | 352.23     | 17.05      | 0.0000      | 0.0000      | 3201.83 ms |
|  pp4096+tg256 | 4/4        | 31.62 s    | 0.00 W     | 17408 / 17408    | 316.57     | 13.70      | 0.0000      | 0.0000      | 13011.54 ms |
|  pp2048+tg256 | 4/4        | 21.91 s    | 0.00 W     | 9216 / 9216      | 364.96     | 15.71      | 0.0000      | 0.0000      | 5674.40 ms |
|  pp2048+tg768 | 4/4        | 55.51 s    | 0.00 W     | 11264 / 11264    | 364.68     | 15.39      | 0.0000      | 0.0000      | 5679.52 ms |
| pp1024+tg1024 | 4/4        | 64.97 s    | 0.00 W     | 8192 / 8192      | 398.62     | 16.41      | 0.0000      | 0.0000      | 2627.40 ms |
|  pp384+tg1152 | 4/4        | 67.53 s    | 0.00 W     | 6144 / 6144      | 432.32     | 17.29      | 0.0000      | 0.0000      | 943.19 ms  |
|   pp64+tg1024 | 4/4        | 57.17 s    | 0.00 W     | 4352 / 4352      | 389.54     | 17.96      | 0.0000      | 0.0000      | 218.25 ms  |
|   pp16+tg1536 | 4/4        | 87.89 s    | 0.00 W     | 6208 / 6208      | 162.95     | 17.50      | 0.0000      | 0.0000      | 180.44 ms  |

cjpais Dec 30, 2024
Collaborator

And Qwen 2.5 7B, I didn't use your build and granite does not run by default right now, but should get close

|          test | run number | avg time   | power      | tokens processed | pp t/s     | tg t/s     | pp t/s/watt | tg t/s/watt | ttft       |
| ------------: | ---------- | ---------- | ---------- | ---------------- | ---------- | ---------- | ----------- | ----------- | ---------- |
|   pp1024+tg16 | 4/4        | 6.84 s     | 0.00 W     | 4160 / 4160      | 258.20     | 10.24      | 0.0000      | 0.0000      | 5385.65 ms |
|  pp4096+tg256 | 4/4        | 40.93 s    | 0.00 W     | 17408 / 17408    | 298.13     | 9.42       | 0.0000      | 0.0000      | 13849.26 ms |
|  pp2048+tg256 | 4/4        | 32.16 s    | 0.00 W     | 9216 / 9216      | 334.37     | 9.84       | 0.0000      | 0.0000      | 6226.34 ms |
|  pp2048+tg768 | 4/4        | 85.01 s    | 0.00 W     | 11264 / 11264    | 336.57     | 9.73       | 0.0000      | 0.0000      | 6186.96 ms |
| pp1024+tg1024 | 4/4        | 105.83 s   | 0.00 W     | 8192 / 8192      | 357.82     | 9.94       | 0.0000      | 0.0000      | 2961.01 ms |
|  pp384+tg1152 | 4/4        | 115.24 s   | 0.00 W     | 6144 / 6144      | 373.38     | 10.09      | 0.0000      | 0.0000      | 1125.75 ms |
|   pp64+tg1024 | 4/4        | 100.76 s   | 0.00 W     | 4352 / 4352      | 280.95     | 10.19      | 0.0000      | 0.0000      | 328.45 ms  |
|   pp16+tg1536 | 4/4        | 152.34 s   | 0.00 W     | 6208 / 6208      | 111.30     | 10.09      | 0.0000      | 0.0000      | 274.66 ms  |

BradHutchings Dec 31, 2024
Author

My build supports Granite. Was hoping to see how that larger model ran.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NVIDIA Jetson Orin Nano #660

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

NVIDIA Jetson Orin Nano #660

Uh oh!

BradHutchings Dec 20, 2024

Replies: 2 comments · 5 replies

Uh oh!

MysterHawk Dec 20, 2024

Uh oh!

BradHutchings Dec 23, 2024 Author

Uh oh!

cjpais Dec 24, 2024 Collaborator

Uh oh!

BradHutchings Dec 25, 2024 Author

Uh oh!

cjpais Dec 30, 2024 Collaborator

Uh oh!

cjpais Dec 30, 2024 Collaborator

Uh oh!

BradHutchings Dec 31, 2024 Author

BradHutchings
Dec 20, 2024

Replies: 2 comments 5 replies

MysterHawk
Dec 20, 2024

BradHutchings Dec 23, 2024
Author

cjpais
Dec 24, 2024
Collaborator

BradHutchings Dec 25, 2024
Author

cjpais Dec 30, 2024
Collaborator

cjpais Dec 30, 2024
Collaborator

BradHutchings Dec 31, 2024
Author