Skip to content

feat: support turboquant_plus#512

Open
heroims wants to merge 1 commit intojundot:mainfrom
heroims:main
Open

feat: support turboquant_plus#512
heroims wants to merge 1 commit intojundot:mainfrom
heroims:main

Conversation

@heroims
Copy link
Copy Markdown

@heroims heroims commented Apr 1, 2026

No description provided.

@SirDominik
Copy link
Copy Markdown

I gave this a try. Unfortunately, just like the previously disabled KV cache quantization, I'm not seeing any reduction in peak memory usage. On top of that, token generation speed dropped by about 5–8% in my testing — though I should note that my tests were fairly short.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants