ggml : fix quantized cpy op #12310

ggerganov · 2025-03-10T13:48:43Z

This should fix CPY(Q8_0, Q8_0)

aviallon · 2025-03-10T16:56:23Z

I no longer have garbled output with quantized cache. Only repetitions when reaching context-size, depending on the batch-size and the number of slots.
Tested-by: Antoine Viallon <[email protected]>

tests/test-backend-ops.cpp

jukofyork · 2025-03-10T18:25:55Z

Is there any chance we could add the copy operations for BF16? Even just BF16 <--> F32 would be enough to test it for the KV-cache types.

ggerganov · 2025-03-11T08:40:01Z

@jukofyork bc25236 should cover BF16 <-> F32 copies.

tests/test-backend-ops.cpp

slaren · 2025-03-11T13:58:56Z

This change does not look right to me. If i00 and i10 represent blocks now, then the logic for determining when to move to the next row in if (++i10 == ne0) { i10 = 0; .. does not seem correct, since i10 is a block index, and ne0 is the number of elements. Renaming the variables so that it is clear if they are element of block indices should make the code easier to understand.

llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c

Lines 4166 to 4187 in 938c779

    
           for (int64_t i01 = ir0; i01 < ir1; i01++) { 
        
               for (int64_t i00 = 0; i00 < nb; i00++) { 
        
                   const char * src0_ptr = ((char *) src0->data + i00*nb00 + i01*nb01 + i02*nb02 + i03*nb03); 
        
                         char * dst_ptr  = ((char *)  dst->data + i10*nb0  + i11*nb1  + i12*nb2  + i13*nb3); 
        
                   memcpy(dst_ptr, src0_ptr, type_size); 
        
                   if (++i10 == ne0) { 
        
                       i10 = 0; 
        
                       if (++i11 == ne1) { 
        
                           i11 = 0; 
        
                           if (++i12 == ne2) { 
        
                               i12 = 0; 
        
                               if (++i13 == ne3) { 
        
                                   i13 = 0; 
        
                               } 
        
                           } 
        
                       } 
        
                   } 
        
               } 
        
           } 
        
           i10 += nb * (ne01 - ir1);

ggerganov · 2025-03-11T15:24:42Z

Good catch. This code wasn't exercised by the tests it is used when dst is non-contiguous. I added an option to permute the dst tensor for the test_cpy.

I used the nk00 to indicate number of blocks of src0 along dim 0 (i.e. along the row). Respectively, the counter is k00.

ggml-ci

* ggml : fix quantized cpy op ggml-ci * tests : add cpy tests for all types ggml-ci * tests : add BF16 copy tests ggml-ci * tests : fix loop for same-type copy ggml-ci * tests : add option to permute the dst tensor ggml-ci

aviallon · 2025-03-24T10:44:42Z

Fixed #12253

ggerganov mentioned this pull request Mar 10, 2025

Eval bug: garbage output right after kv-cache defragmentation for CPU backend #12253

Closed

github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning labels Mar 10, 2025

slaren reviewed Mar 10, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

slaren reviewed Mar 11, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

ggerganov commented Mar 11, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

ggerganov commented Mar 11, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

ggerganov force-pushed the gg/cpu-fix-cpy-q branch from 5da8ae3 to 3384f36 Compare March 11, 2025 15:22

ggerganov mentioned this pull request Mar 13, 2025

Eval bug: Segfault in ggml_compute_forward_dup_bytes #12354

Closed

cpg314 mentioned this pull request Mar 13, 2025

Eval bug: Segfault at the end of the cache (cache defragmentation?) #12259

Closed

ggerganov added 5 commits March 14, 2025 11:00

ggml : fix quantized cpy op

fb73be1

ggml-ci

tests : add cpy tests for all types

b5f5b07

ggml-ci

tests : add BF16 copy tests

ad62ba1

ggml-ci

tests : fix loop for same-type copy

b3be71a

ggml-ci

tests : add option to permute the dst tensor

d266584

ggml-ci

ggerganov force-pushed the gg/cpu-fix-cpy-q branch from 3384f36 to d266584 Compare March 14, 2025 09:00

slaren approved these changes Mar 21, 2025

View reviewed changes

ggerganov merged commit ba932df into master Mar 22, 2025
56 checks passed

ggerganov deleted the gg/cpu-fix-cpy-q branch March 22, 2025 14:23

masamaru-san mentioned this pull request Mar 22, 2025

Misc. bug: test-backend-ops grad crash by GGML_ASSERT error #12520

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : fix quantized cpy op #12310

ggml : fix quantized cpy op #12310

Uh oh!

ggerganov commented Mar 10, 2025

Uh oh!

aviallon commented Mar 10, 2025

Uh oh!

Uh oh!

jukofyork commented Mar 10, 2025

Uh oh!

ggerganov commented Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slaren commented Mar 11, 2025

Uh oh!

ggerganov commented Mar 11, 2025

Uh oh!

Uh oh!

aviallon commented Mar 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ggml : fix quantized cpy op #12310

ggml : fix quantized cpy op #12310

Uh oh!

Conversation

ggerganov commented Mar 10, 2025

Uh oh!

aviallon commented Mar 10, 2025

Uh oh!

Uh oh!

jukofyork commented Mar 10, 2025

Uh oh!

ggerganov commented Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slaren commented Mar 11, 2025

Uh oh!

ggerganov commented Mar 11, 2025

Uh oh!

Uh oh!

aviallon commented Mar 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants