Skip to content

ggml : fix quantized cpy op #12310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 22, 2025
Merged

ggml : fix quantized cpy op #12310

merged 5 commits into from
Mar 22, 2025

Conversation

ggerganov
Copy link
Member

ref #12253

This should fix CPY(Q8_0, Q8_0)

@github-actions github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning labels Mar 10, 2025
@aviallon
Copy link
Contributor

I no longer have garbled output with quantized cache. Only repetitions when reaching context-size, depending on the batch-size and the number of slots.
Tested-by: Antoine Viallon <[email protected]>

@jukofyork
Copy link
Collaborator

Is there any chance we could add the copy operations for BF16? Even just BF16 <--> F32 would be enough to test it for the KV-cache types.

@ggerganov
Copy link
Member Author

@jukofyork bc25236 should cover BF16 <-> F32 copies.

@slaren
Copy link
Member

slaren commented Mar 11, 2025

This change does not look right to me. If i00 and i10 represent blocks now, then the logic for determining when to move to the next row in if (++i10 == ne0) { i10 = 0; .. does not seem correct, since i10 is a block index, and ne0 is the number of elements. Renaming the variables so that it is clear if they are element of block indices should make the code easier to understand.

for (int64_t i01 = ir0; i01 < ir1; i01++) {
for (int64_t i00 = 0; i00 < nb; i00++) {
const char * src0_ptr = ((char *) src0->data + i00*nb00 + i01*nb01 + i02*nb02 + i03*nb03);
char * dst_ptr = ((char *) dst->data + i10*nb0 + i11*nb1 + i12*nb2 + i13*nb3);
memcpy(dst_ptr, src0_ptr, type_size);
if (++i10 == ne0) {
i10 = 0;
if (++i11 == ne1) {
i11 = 0;
if (++i12 == ne2) {
i12 = 0;
if (++i13 == ne3) {
i13 = 0;
}
}
}
}
}
}
i10 += nb * (ne01 - ir1);

@ggerganov
Copy link
Member Author

Good catch. This code wasn't exercised by the tests it is used when dst is non-contiguous. I added an option to permute the dst tensor for the test_cpy.

I used the nk00 to indicate number of blocks of src0 along dim 0 (i.e. along the row). Respectively, the counter is k00.

@ggerganov ggerganov merged commit ba932df into master Mar 22, 2025
56 checks passed
@ggerganov ggerganov deleted the gg/cpu-fix-cpy-q branch March 22, 2025 14:23
Ivy233 pushed a commit to Ivy233/llama.cpp that referenced this pull request Mar 23, 2025
* ggml : fix quantized cpy op

ggml-ci

* tests : add cpy tests for all types

ggml-ci

* tests : add BF16 copy tests

ggml-ci

* tests : fix loop for same-type copy

ggml-ci

* tests : add option to permute the dst tensor

ggml-ci
@aviallon
Copy link
Contributor

Fixed #12253

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants