Skip to content

Commit 8811f7a

Browse files
authored
Added blog post "INT4 Decoding GQA CUDA Optimizations for LLM Inference" (#1648)
Signed-off-by: Chris Abraham <[email protected]>
1 parent c2cd932 commit 8811f7a

23 files changed

+3485
-0
lines changed

_posts/2024-06-06-int4-decoding.md

Lines changed: 3485 additions & 0 deletions
Large diffs are not rendered by default.

assets/images/int4-decoding/eq.jpg

53.4 KB
Loading

assets/images/int4-decoding/fg1.png

98.9 KB
Loading

assets/images/int4-decoding/fg10.jpg

18.2 KB
Loading

assets/images/int4-decoding/fg11.jpg

54.6 KB
Loading

assets/images/int4-decoding/fg12.png

411 KB
Loading

assets/images/int4-decoding/fg13.jpg

296 KB
Loading

assets/images/int4-decoding/fg14.jpg

207 KB
Loading

assets/images/int4-decoding/fg15.jpg

347 KB
Loading

assets/images/int4-decoding/fg16.jpg

460 KB
Loading

0 commit comments

Comments
 (0)