model : do not repack if a GPU device is present #12498

ggerganov · 2025-03-21T13:39:48Z

fix #12481
fix #12490
ref #12481 (comment)

Disable repacking (i.e. extra buffer types) if a GPU device is going to be available.

ggml-ci

Djip007 · 2025-03-24T21:16:09Z

why disable and not simply change order:
// CPU: ACCEL -> CPU extra -> GPU host -> CPU
to
// CPU: ACCEL -> GPU host -> CPU extra -> CPU
???

ggerganov · 2025-03-25T07:53:16Z

I think keeping the non-offloaded layers without repack would allow to dynamically move the ops to the GPU which would end up generally more efficient than using the optimized repacked CPU implementations.

Djip007 · 2025-03-26T19:14:55Z

Yes, I understood that you prefer to have dynamically move the ops to the GPU.
But wouldn't it have had the same effect in that case by changing the order of addition (priority) of the buffers type, without completely disabling them?

ggerganov · 2025-03-27T08:04:50Z

Hm, I guess you might be right. Do you want to give this a try, or should I go ahead and create a PR.

Djip007 · 2025-03-28T19:58:15Z

OK. I'll have a try!

=> #12632 🤞

this allow to use GPU host when possible over CPU repack. this have the same effect to resolve this issues (ggml-org#12498) without completely disable CPU extra buffer.

… CPU (#12632) this allow to use GPU host when possible over CPU repack. this have the same effect to resolve this issues (#12498) without completely disable CPU extra buffer. Co-authored-by: philou <philou@framework>

model : do not repack if a GPU device is present

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

Loading
Loading status checks…

b285344

ggml-ci

ggerganov requested a review from slaren March 21, 2025 13:39

slaren approved these changes Mar 21, 2025

View reviewed changes

ggerganov merged commit af04481 into master Mar 21, 2025
56 of 58 checks passed

ggerganov deleted the gg/repack-skip-if-gpu branch March 21, 2025 14:14

Ivy233 pushed a commit to Ivy233/llama.cpp that referenced this pull request Mar 23, 2025

model : do not repack if a GPU device is present (ggml-org#12498)

c451d2f

ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model : do not repack if a GPU device is present #12498

model : do not repack if a GPU device is present #12498

ggerganov commented Mar 21, 2025

Djip007 commented Mar 24, 2025

ggerganov commented Mar 25, 2025

Djip007 commented Mar 26, 2025

ggerganov commented Mar 27, 2025

Djip007 commented Mar 28, 2025 •

edited

Loading

model : do not repack if a GPU device is present #12498

model : do not repack if a GPU device is present #12498

Conversation

ggerganov commented Mar 21, 2025

Djip007 commented Mar 24, 2025

ggerganov commented Mar 25, 2025

Djip007 commented Mar 26, 2025

ggerganov commented Mar 27, 2025

Djip007 commented Mar 28, 2025 • edited Loading

Djip007 commented Mar 28, 2025 •

edited

Loading