sync : ggml #3526

ggerganov · 2025-11-17T14:32:02Z

No description provided.

* vulkan : implement upscale with bicubic interpolation * cuda : implement upscale with bicubic interpolation * tests : add ggml_interpolate with GGML_SCALE_MODE_BICUBIC to backend tests * adapt OpenCL backend to not support the OP in that case so tests don't fail * print scale mode & flags in test-backend-ops

…t_q6_K_… (#15277) * add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_q8_K * Surround SVE function with compiler directive * fix compile switch * fix coding style * ggml : fix indent --------- Co-authored-by: Georgi Gerganov <[email protected]>

Signed-off-by: Adrien Gallouët <[email protected]>

* cpu: skip NOPs to avoid barriers * cpu: use ggml_op_is_empty

…7090) * opencl: add fastdiv for mm q8_0 * opencl: use uint4 for fastdiv vals * opencl: use fastdiv for set_rows * opencl: do not use fastdiv for q8_0 mm

When compiling llama.cpp in Yocto, it fails QA checks because the generated so files aren't versioned. This applies a version to all generated so files, allowing the package to build without errors.

…6805) * extract rotate_pairs logic from ggml_compute_forward_rope_f32 * templateify ggml_compute_forward_rope_f32 and _f16 * abort when rope type not supported, remove GLM from test-rope * add imrope branch to switch * add rope tests for perf * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>

…sion (llama/17161) Signed-off-by: Wang Yang <[email protected]>

* hexagon: explicitly check for ops with zero nrows llm_graph_context::build_inp_out_ids() can generate tensors with zero nrows. Somehow other backends seems to handle this without obvious explicit checks. In the hexagon case we need to check explicitly and skip them. * hexagon: introduce fastdiv, fix test-backend-ops for ADD/SUB/MUL Co-authored-by: chraac <[email protected]> * hexagon: use fastdiv in ADD_ID * hexagon: use ggml_op_is_empty and ggml_is_empty to check for NOPs --------- Co-authored-by: chraac <[email protected]>

* fix ci crash * Update ggml-sycl.cpp * Update ggml/src/ggml-sycl/ggml-sycl.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Co-authored-by: Zhang Jianyu <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>

* update L2_NORM op support * update L2_NORM op support * remove extra whitespace

* ggml-cpu: handle 3d tensors in repack mul_mat * Removed unnecessary branch, removed need for <algorithm> * Fixed dst_ptr pointer in chunk + clang_format * GGML_ASSERT to check wdata within bounds * Accidental ggml.h inclusion * Improved GGML_ASSERT on wdata boundaries

* ggml : use std::sort in ggml_argsort CPU implementation * cont : add missing header

* CUDA: add fused rope * move k forward_expand up * create helper function instead of re-using params * make assert statement more in line with comment * rope_norm: coalesced writes to global mem

* update L2_NORM op support * update L2_NORM op support * remove extra whitespace * cann: update cross_entropy_loss op support * remove trailing whitespaces * rebase the latest code in the main repository and remove the l2_norm operator that already exists in another pull request. * undo the l2_norm operator deletion

…(llama/17233) This reverts commit 1c398dc9eca9c366ce98deb0e6f3538e444ebc8a.

* metal: accelerated conv2d * cont : cleanup --------- Co-authored-by: bghira <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

…ations (llama/17227) Signed-off-by: Wang Yang <[email protected]>

…heck (llama/17219) * vulkan: remove shell call from vulkan-shaders-gen tool * use string vector for command execution * Fix condition * use string, remove const_cast * Fix dependency file quotation on Windows --------- Co-authored-by: Jeff Bolz <[email protected]>

* Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <[email protected]> * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Code review * Whitespace * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <[email protected]> * This is actually sigmoid, duh. * Add CONST, remove TRI_KEEP, other changes from review * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <[email protected]> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <[email protected]> * Update ggml/src/ggml-cuda/unary.cu Co-authored-by: Aman Gupta <[email protected]> * Remove extra script * Update ggml/src/ggml.c Co-authored-by: Diego Devesa <[email protected]> * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <[email protected]> * moving changes from laptop [no ci] * pre-rebase * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Refactor tests * ggml : cleanup * cont : fix ggml_fill srcs * tests : add note * ggml : add ggml_fill_inplace * ggml : add asserts * ggml : fix ggml_fill constant cast * cont : ggml_tri minor * Use TENSOR_LOCALS * Fix regression from #14596, regenerate * Don't make commits at night... --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Diego Devesa <[email protected]> Co-authored-by: Aman Gupta <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>

* ggml-cpu: handle 3d tensors in repack mul_mat * Removed unnecessary branch, removed need for <algorithm> * Fixed dst_ptr pointer in chunk + clang_format * GGML_ASSERT to check wdata within bounds * Accidental ggml.h inclusion * Improved GGML_ASSERT on wdata boundaries * Address performance regression in Qwen and llama.cpp due to chunking

* metal : refactor argsort * cont : sort chunks * cont : merge sorted buckets * cont : cleanup

…(llama/17158) * vulkan: change graph_compute to be async and enable get_tensor_async This allows some additional CPU/GPU overlap for large pp workloads. Also seems to help a bit for token gen, maybe getting rid of a small bubble between graph_compute and get_tensor. Async set and copy functions seem to be very rarely used, so I didn't enable them because I didn't have a good way to test them. The async commands need to be ordered against each other, so put them all on the compute queue. The non-async commands still use the transfer queue. The fence for graph_compute/get_tensor_async is submitted and waited on in ggml_vk_synchronize. * fix thread safety errors * teardown context cleanly * Handle async read to non-pinned dst

…17244) * vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths * set allow_misalign

* docs: update Vulkan ops * vulkan: add NEG op * vulkan: add ABS op --------- Signed-off-by: Giuseppe Scrivano <[email protected]>

…D driver bug (llama/17285)

These both show up in gpt-oss. Also, cleanup the mul_mat_vec fusion code a bit.

…ide operator support (llama/17213) * SYCL: add generic unary op implementation for multiple ops (ABS/SGN/…); unify non-contiguous access * SYCL: update documentation and sycl.csv to reflect new unary op support * update ops.md after syncing SYCL.csv changes * Fix SYCL.csv merge conflict * Update ops.md after fixing SYCL.csv conflicts * Fix SYCL.csv tail after merge conflict and regenerate ops.md * Fix line endings and final newline in SYCL.csv * Remove TOPK_MOE entries from SYCL.csv as requested * Update ops.md after removing TOPK_MOE from SYCL.csv * Regenerated SYCL.csv and synced ops.md with upstream * Update ops.md using create_ops_docs.py

… speed (llama/17181) * Add mul_mm_f16_f32_kq_kqv kernel * Add ggml_cl_mul_mat_kq_kqv_adreno func * fix whitespace * remove unused variable * remove redundant * refactor and clean up * remove trailing whitespace

* opencl: use subgrroup reduce for reduction in rms_norm_mul * opencl: add comment about workgroup size

* vulkan: add LOG operation support for F32 and F16 Part of #14909. * vulkan: Fix LOG operation types * docs: Update operation support documentation for Vulkan LOG operation * vulkan: fix log_f16 shader * docs: restore missing LOG test cases and regenerate ops.md

* CANN: Use smart pointers to manage ACL objects Previously, ACL objects were managed via manual destruction, which led to multiple memory-leak issues during runtime. This patch replaces manual memory management with smart pointers so that ACL objects are properly released and ownership is clearly defined. Note that the ownership of an ACL object belongs to the function that creates it. Other internal functions should operate on these ACL objects using raw pointers to avoid unintended ownership transfers. Additionally, since aclTensorList automatically frees its contained aclTensor objects, any aclTensor added to a tensor list must release ownership to avoid double free operations. This PR also removes the asynchronous task submission mechanism. Due to changes in recent CANN versions, tiling time has significantly decreased. Even with a dual-thread submission model, the dispatch overhead still falls on the critical path, making async submission less beneficial. Moreover, aclGraph support provides a much better path to reducing operator dispatch latency. * CANN: resolve review comments

* metal : faster argsort * cont : keep data in registers

0cc4m and others added 30 commits November 17, 2025 16:26

vulkan: fix memory allocations (llama/17122)

6e2d45a

metal : enable tensor API for A19 (llama/17087)

4cd5695

vulkan: fix validation issue introduced by #16868 (llama/17145)

a64712e

vulkan: check glslc executable string (llama/17144)

d1a83fb

ggml-cpu : inspect -march and -mcpu to found the CPU (llama/16333)

e4c1e3c

Signed-off-by: Adrien Gallouët <[email protected]>

metal : cap threadgroups size of set_rows (llama/17146)

4413a56

cpu: skip NOPs to avoid barriers (llama/17133)

becc46e

* cpu: skip NOPs to avoid barriers * cpu: use ggml_op_is_empty

opencl: add fastdiv and use it in set_rows, ported from cuda (llama/1…

485e423

…7090) * opencl: add fastdiv for mm q8_0 * opencl: use uint4 for fastdiv vals * opencl: use fastdiv for set_rows * opencl: do not use fastdiv for q8_0 mm

cmake : add version to all shared object files (llama/17091)

bee7518

When compiling llama.cpp in Yocto, it fails QA checks because the generated so files aren't versioned. This applies a version to all generated so files, allowing the package to build without errors.

kleidiai: add optimized per-channel kernels for Q8_0 (llama/16993)

2fe28b6

ggml-cpu : add RISC-V RVV (Zvfh) optimization for FP16 to FP32 conver…

f52e7c7

…sion (llama/17161) Signed-off-by: Wang Yang <[email protected]>

disable rms norm mul rope for chips with no fp16 rte (llama/17134)

c3a1298

CANN: Add L2_NORM op support (llama/16856)

2f2c6c3

* update L2_NORM op support * update L2_NORM op support * remove extra whitespace

ggml : use std::sort in ggml_argsort CPU implementation (llama/17211)

a541b0e

* ggml : use std::sort in ggml_argsort CPU implementation * cont : add missing header

CUDA: static assert to prevent misuse of memcpy_1 (llama/17198)

214d1af

CUDA: fuse rope + set_rows (llama/16884)

be4d130

* CUDA: add fused rope * move k forward_expand up * create helper function instead of re-using params * make assert statement more in line with comment * rope_norm: coalesced writes to global mem

ggml-cpu : use template for argsort (llama/17222)

9808706

Revert "ggml-cpu: handle 3d tensors in repack mat_mul (llama/17030)" …

b6d0ebe

…(llama/17233) This reverts commit 1c398dc9eca9c366ce98deb0e6f3538e444ebc8a.

metal: accelerated conv2d (llama/17175)

5150c23

* metal: accelerated conv2d * cont : cleanup --------- Co-authored-by: bghira <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

ggml-cpu : add RISC-V vector intrinsic support for silu and cvar oper…

273dd3f

…ations (llama/17227) Signed-off-by: Wang Yang <[email protected]>

sched : fix reserve ignoring user tensor assignments (llama/17232)

312480c

Alcpz and others added 21 commits November 17, 2025 16:26

metal : make the FA extra sizes consistent (llama/17143)

ae08083

metal : support argsort for ne00 > 1024 (llama/17247)

a6f1d80

* metal : refactor argsort * cont : sort chunks * cont : merge sorted buckets * cont : cleanup

vulkan: skip all-negative-inf blocks in FA (llama/17186)

9d3fa94

vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths (llama/…

a175f85

…17244) * vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths * set allow_misalign

vulkan: implement ABS and NEG (llama/17245)

89f82bf

* docs: update Vulkan ops * vulkan: add NEG op * vulkan: add ABS op --------- Signed-off-by: Giuseppe Scrivano <[email protected]>

vulkan: Replace 16-bit unpack8 calls to work around legacy Windows AM…

5ae4173

…D driver bug (llama/17285)

vulkan: Fuse mul_mat_id+add_id+mul and mul_mat+add+add. (llama/17287)

5d9fba0

These both show up in gpt-oss. Also, cleanup the mul_mat_vec fusion code a bit.

opencl: add kernel to handle mat mul in attention to improve encoding…

14dac59

… speed (llama/17181) * Add mul_mm_f16_f32_kq_kqv kernel * Add ggml_cl_mul_mat_kq_kqv_adreno func * fix whitespace * remove unused variable * remove redundant * refactor and clean up * remove trailing whitespace

opencl: fix rms_norm_mul (llama/17250)

9c2bde0

* opencl: use subgrroup reduce for reduction in rms_norm_mul * opencl: add comment about workgroup size

metal : remove obosolete asserts (llama/17295)

844275a

vulkan: fix MMQ quantize_y condition (llama/17301)

75cfe4a

metal : add cumsum (llama/17305)

25182a7

metal : faster argsort (llama/17315)

8208359

* metal : faster argsort * cont : keep data in registers

metal : support I32 -> I32 copy (llama/17317)

714c1ba

sync : ggml

36b80f6

sync : llama.cpp

3e980fd

danbev approved these changes Nov 17, 2025

View reviewed changes

ggerganov merged commit b12abef into master Nov 17, 2025
64 of 66 checks passed

ggerganov deleted the sync-ggml-25-11-17 branch November 17, 2025 19:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sync : ggml #3526

sync : ggml #3526

ggerganov commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

28 participants

sync : ggml #3526

sync : ggml #3526

Conversation

ggerganov commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

28 participants