[Feature]: Inference and embedding supports CPU/GPU off load #41

Aisuko · 2024-08-08T23:56:47Z

Contact Details(optional)

No response

What feature are you requesting?

We already support GPU inference and embedding at Kirin project. So, we should also support GPU in this project. Furthermore, please keep in mind what I mentioned in last meeting. We want CPU/GPU offload not the CPU or GPU separately mode.

https://medium.com/@aisuko/quantization-tech-of-llms-gguf-0342a08f082c

Aisuko assigned Aisuko, Micost and cbh778899 Aug 8, 2024

Aisuko added the enhancement New feature or request label Aug 8, 2024

Aisuko added this to the v0.1.2 milestone Aug 8, 2024

Aisuko removed this from the v0.1.2 milestone Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Inference and embedding supports CPU/GPU off load #41

[Feature]: Inference and embedding supports CPU/GPU off load #41

Aisuko commented Aug 8, 2024 •

edited

Loading

[Feature]: Inference and embedding supports CPU/GPU off load #41

[Feature]: Inference and embedding supports CPU/GPU off load #41

Comments

Aisuko commented Aug 8, 2024 • edited Loading

Contact Details(optional)

What feature are you requesting?

Aisuko commented Aug 8, 2024 •

edited

Loading