ggml : add ggml_build_forward_select #18550

ggerganov · 2026-01-02T17:35:47Z

Add GGML_TENSOR_FLAG_COMPUTE flag indicating that a tensor in the graph must be computed
Add new ggml_build_forward_select() call:

    GGML_API struct ggml_tensor * ggml_build_forward_select(
            struct ggml_cgraph  * cgraph,
            struct ggml_tensor ** tensors,
            int                   n_tensors,
            int                   idx);

All provided tensors are built forward into the graph. Only tensors[idx] and it's ancestry are marked for computing via the new flag value.

This new logic allows us to construct graphs that compute different things, but at the same time have the same topology. This is needed to avoid unwanted graph reallocations (#17617).

TODOs:

Check flag for CUDA graph updates (ggml : add ggml_build_forward_select #18550 (comment))
Update fusion logic? (ggml : add ggml_build_forward_select #18550 (comment))
Enable -DGGML_SCHED_NO_REALLOC=ON for server CI
Fix vulkan command submission logic ggml : add ggml_build_forward_select #18550 (comment)

jeffbolznv · 2026-01-02T20:16:08Z

Just want to make sure I understand how this is used - it would still be two separate graphs, they'd just be able to reuse allocations (i.e. ggml-alloc would decide they match)?

I think ggml_can_fuse and ggml_can_fuse_subgroup would need to be updated to make sure all nodes are computed. And any backend-specific fusion logic.

ggerganov · 2026-01-02T20:58:50Z

Just want to make sure I understand how this is used - it would still be two separate graphs, they'd just be able to reuse allocations (i.e. ggml-alloc would decide they match)?

Yes, for example the graph when the input is token ids (batch.token != null) and the graph when the input is directly embedding vectors (batch.embd != null) are still different, but with this extra logic the scheduler will not need to reallocate them because all nodes remain the same. It's just a different subset of the nodes being marked for computing.

I think ggml_can_fuse and ggml_can_fuse_subgroup would need to be updated to make sure all nodes are computed. And any backend-specific fusion logic.

Not yet sure that it's really necessary to do so - at least I can't think of a fail case so far. Note that the GGML_TENSOR_FLAG_COMPUTE flag is controlled only through ggml_build_forward_select().

max-krasnyansky

Looks good to me.

am17an · 2026-01-03T09:26:05Z

We would need to check how this behaves with CUDA graphs, since inherently the computation is changing

taronaeo · 2026-01-03T13:55:09Z

cc: @AlekseiNikiforovIBM @Andreas-Krebbel

Give us a week or so to check on this :)

am17an · 2026-01-05T03:40:59Z

For CUDA graphs I think adding a check for flags in ggml_graph_node_has_matching_properties should be enough. This would trigger an update to the graph

AlekseiNikiforovIBM · 2026-01-05T09:47:10Z

cc: @AlekseiNikiforovIBM @Andreas-Krebbel

Give us a week or so to check on this :)

LGTM

taronaeo

Ack for IBM zDNN backend :)

0cc4m · 2026-01-05T13:26:43Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

        }

+        if ((cgraph->nodes[i]->flags & GGML_TENSOR_FLAG_COMPUTE) == 0) {
+            continue;


If the last node or nodes are not flagged, the loop would end without the final command submission. This would need some way to ensure a final submit if submitted_nodes > 0.

reeselevine

WebGPU update looks good to me, we always do a final submission if commands > 0 so there shouldn't be a problem like the comment about the Vulkan backend above.

ggerganov requested review from 0cc4m, CISC, lhez, max-krasnyansky, reeselevine and taronaeo as code owners January 2, 2026 17:35

max-krasnyansky approved these changes Jan 3, 2026

View reviewed changes

ggerganov force-pushed the gg/graph-avoid-branches-3 branch from e7b6c35 to da5d289 Compare January 3, 2026 17:49

ggerganov added 4 commits January 4, 2026 11:11

context : reserve new scheduler when graph topology changes

69a30df

cont : fix

b8a223d

cont : fix reserve

16ba37d

cont : reserve only when changes occur + timing

c92df39

ggerganov force-pushed the gg/llama-reserve branch from 89d19e0 to c92df39 Compare January 4, 2026 09:12

ggml : add ggml_build_forward_select

3ff3489

ggerganov force-pushed the gg/graph-avoid-branches-3 branch from da5d289 to 9922d3a Compare January 4, 2026 14:46

models : make deepstack graphs (e.g. Qwen3 VL) have constant topology

9f8a79c

ggerganov force-pushed the gg/graph-avoid-branches-3 branch from 9922d3a to 9f8a79c Compare January 4, 2026 14:56

ggerganov mentioned this pull request Jan 4, 2026

sampling : add support for backend sampling #17004

Merged

31 tasks

taronaeo approved these changes Jan 5, 2026

View reviewed changes

ggerganov force-pushed the gg/llama-reserve branch from c92df39 to cf2b3ca Compare January 5, 2026 12:18

0cc4m reviewed Jan 5, 2026

View reviewed changes

reeselevine approved these changes Jan 5, 2026

View reviewed changes

ggerganov mentioned this pull request Jan 8, 2026

model: try to improve Qwen3 Next #18683

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : add ggml_build_forward_select #18550

ggml : add ggml_build_forward_select #18550

ggerganov commented Jan 2, 2026 •

edited

Loading

Uh oh!

jeffbolznv commented Jan 2, 2026

Uh oh!

ggerganov commented Jan 2, 2026

Uh oh!

max-krasnyansky left a comment

Uh oh!

am17an commented Jan 3, 2026

Uh oh!

taronaeo commented Jan 3, 2026

Uh oh!

am17an commented Jan 5, 2026

Uh oh!

AlekseiNikiforovIBM commented Jan 5, 2026

Uh oh!

taronaeo left a comment

Uh oh!

0cc4m Jan 5, 2026

Uh oh!

reeselevine left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

ggml : add ggml_build_forward_select #18550

Are you sure you want to change the base?

ggml : add ggml_build_forward_select #18550

Conversation

ggerganov commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffbolznv commented Jan 2, 2026

Uh oh!

ggerganov commented Jan 2, 2026

Uh oh!

max-krasnyansky left a comment

Choose a reason for hiding this comment

Uh oh!

am17an commented Jan 3, 2026

Uh oh!

taronaeo commented Jan 3, 2026

Uh oh!

am17an commented Jan 5, 2026

Uh oh!

AlekseiNikiforovIBM commented Jan 5, 2026

Uh oh!

taronaeo left a comment

Choose a reason for hiding this comment

Uh oh!

0cc4m Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

reeselevine left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

ggerganov commented Jan 2, 2026 •

edited

Loading