Feature matrix

	CPU (AVX2)	CPU (ARM NEON)	Metal	cuBLAS	rocBLAS	SYCL	CLBlast	Vulkan	Kompute
K-quants	✅	✅	✅	✅	✅	✅	✅ 🐢⁵	✅ 🐢⁵	🚫
I-quants	✅ 🐢⁴	✅ 🐢⁴	✅ 🐢⁴	✅	✅	Partial¹	🚫	🚫	🚫
Multi-GPU	N/A	N/A	N/A	✅	❓	🚫	❓	✅	❓
K cache quants	✅	❓	❓	✅	Partial³ 🐢	❓	✅	🚫	🚫
MoE architecture	✅	❓	✅	✅	✅	❓	Partial²	🚫	🚫

✅: feature works
🚫: feature does not work
❓: unknown, please contribute if you can test it youself
🐢: feature is slow
¹: IQ3_S and IQ1_S, see #5886
²: Only with -ngl 0
³: Only -ctk q8_0, inference is 50% slower
⁴: Slower than K-quants of comparable size
⁵: Slower than hipBLAS/cuBLAS on similar cards

Users Guide

Useful information for users that doesn't fit into Readme.

Technical Details

These are information useful for Maintainers and Developers which does not fit into code comments

Github Actions Main Branch Status

Click on a badge to jump to workflow. This is here as a useful general view of all the actions so that we may notice quicker if main branch automation is broken and where.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature matrix

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Users Guide

Technical Details

Github Actions Main Branch Status

Clone this wiki locally