- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.4k
Feature matrix
        Romain D edited this page Mar 5, 2024 
        ·
        12 revisions
      
    | CPU (AVX2) | CPU (ARM NEON) | Metal | cuBLAS | rocBLAS | SYCL | CLBlast | Vulkan | Kompute | |
|---|---|---|---|---|---|---|---|---|---|
| K-quants | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ 🐢⁵ | ✅ 🐢⁵ | 🚫 | 
| I-quants | ✅ 🐢⁴ | ✅ 🐢⁴ | ✅ 🐢⁴ | ✅ | ✅ | Partial¹ | 🚫 | 🚫 | 🚫 | 
| Multi-GPU | N/A | N/A | N/A | ✅ | ❓ | 🚫 | ❓ | ✅ | ❓ | 
| K cache quants | ✅ | ❓ | ❓ | ✅ | Partial³ 🐢 | ❓ | ✅ | 🚫 | 🚫 | 
| MoE architecture | ✅ | ❓ | ✅ | ✅ | ✅ | ❓ | Partial² | 🚫 | 🚫 | 
- ✅: feature works
- 🚫: feature does not work
- ❓: unknown, please contribute if you can test it youself
- 🐢: feature is slow
- ¹: IQ3_S and IQ1_S, see #5886
- ²: Only with -ngl 0
- ³: Only -ctk q8_0, inference is 50% slower
- ⁴: Slower than K-quants of comparable size
- ⁵: Slower than hipBLAS/cuBLAS on similar cards
Useful information for users that doesn't fit into Readme.
- Home
- Feature Matrix
- GGML Tips & Tricks
- Chat Templating
- Metadata Override
- HuggingFace Model Card Metadata Interoperability Consideration
These are information useful for Maintainers and Developers which does not fit into code comments
Click on a badge to jump to workflow. This is here as a useful general view of all the actions so that we may notice quicker if main branch automation is broken and where.