-
Notifications
You must be signed in to change notification settings - Fork 14.4k
ggml : add ggml_build_forward_select #18550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gg/llama-reserve
Are you sure you want to change the base?
Conversation
|
Just want to make sure I understand how this is used - it would still be two separate graphs, they'd just be able to reuse allocations (i.e. ggml-alloc would decide they match)? I think ggml_can_fuse and ggml_can_fuse_subgroup would need to be updated to make sure all nodes are computed. And any backend-specific fusion logic. |
Yes, for example the graph when the input is token ids (
Not yet sure that it's really necessary to do so - at least I can't think of a fail case so far. Note that the |
max-krasnyansky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
|
We would need to check how this behaves with CUDA graphs, since inherently the computation is changing |
|
cc: @AlekseiNikiforovIBM @Andreas-Krebbel Give us a week or so to check on this :) |
e7b6c35 to
da5d289
Compare
89d19e0 to
c92df39
Compare
da5d289 to
9922d3a
Compare
9922d3a to
9f8a79c
Compare
|
For CUDA graphs I think adding a check for flags in |
LGTM |
taronaeo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack for IBM zDNN backend :)
c92df39 to
cf2b3ca
Compare
| } | ||
|
|
||
| if ((cgraph->nodes[i]->flags & GGML_TENSOR_FLAG_COMPUTE) == 0) { | ||
| continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the last node or nodes are not flagged, the loop would end without the final command submission. This would need some way to ensure a final submit if submitted_nodes > 0.
reeselevine
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WebGPU update looks good to me, we always do a final submission if commands > 0 so there shouldn't be a problem like the comment about the Vulkan backend above.
target #18547
alt #18549
GGML_TENSOR_FLAG_COMPUTEflag indicating that a tensor in the graph must be computedggml_build_forward_select()call:All provided tensors are built forward into the graph. Only
tensors[idx]and it's ancestry are marked for computing via the new flag value.This new logic allows us to construct graphs that compute different things, but at the same time have the same topology. This is needed to avoid unwanted graph reallocations (#17617).
TODOs:
-DGGML_SCHED_NO_REALLOC=ONfor server CI