-
Notifications
You must be signed in to change notification settings - Fork 13.1k
ggml : extend ggml_can_fuse to work with non-sequential nodes #16123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, with the sorting I have in the vulkan backend, I try to separate empty nodes from ninety nodes, which makes it less important to be able to skip.
ggml/src/ggml-impl.h
Outdated
for (int i = 0; i < num_ops; ++i) { | ||
struct ggml_tensor * node = cgraph->nodes[node_idx + i]; | ||
if (node_idxs[i] + num_ops > cgraph->n_nodes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add num_ops here? Also, should be >=, i think.
ggml/src/ggml-impl.h
Outdated
@@ -615,6 +627,11 @@ inline bool ggml_can_fuse(const struct ggml_cgraph * cgraph, int node_idx, std:: | |||
return ggml_can_fuse(cgraph, node_idx, ops.begin(), (int)ops.size()); | |||
} | |||
|
|||
inline bool ggml_can_fuse(const struct ggml_cgraph * cgraph, std::initializer_list<int> node_idx, std::initializer_list<enum ggml_op> ops) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally clear on how you foresee this function being used. I assume the caller needs to look through nodes and find the ops its interested in (skipping empty nodes), so I dont think it'll be able to use an initializer list for the node idxs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to create a list of indices that excludes the empty nodes in idxs
. And then use this like so:
// non-empty node indices
std::vector<int> idxs;
bool can_fuse(int i0, int i1, std::initializer_list<enum ggml_op> ops) const {
assert(use_fusion);
assert(i0 >= 0 && i0 < n_nodes());
assert(i1 >= 0 && i1 < n_nodes());
assert(ops.size() == 2);
return ggml_can_fuse(gf, { idxs[i0], idxs[i1] }, ops);
}
Btw, I really need an API to tell if 2 ops are fusable - don't think there is any benefit to support more than a pair of ops. The assumption is that if we can fuse N
ops, then for sure we can fuse N-1
ops. So at step N
we only need to check if ops N
and N+1
are fusable - no need to iterate again over all previous ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have to rebuild the initializer_list from the vector, why not just pass an int*
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I can do that. Removed the extra overload.
Yes, I noticed that works, but I'm not really confident if this won't break any assumption in the future. Specifically, this case:
If I just move all the adds together and put the views after them, everything seems to work. But I think I also saw that depending on the order of the views, the allocr can do different things. For example, at some point I was thinking, let's make the |
In some cases, we might want to fuse ops that are not sequential in the graph. For example, when there are intermediate view ops in-between the fusable ops.
Sample usage: #16102