UPSTREAM PR #18538: CUDA: cache intermediate tensors #784

loci-dev · 2026-01-02T07:37:35Z

Cache quantized activations and mul_mat_id_helper output for the gate tensor. Unfortunately I was expecting more PP gains but it falls between 1-3% for most models. It may not be worth merging just for performance reasons but could be useful for as a general idea of a cache

loci-agentic-ai · 2026-01-02T08:24:07Z

Explore the complete analysis inside the Version Insights

Perfect! I was able to retrieve the summary report for your project. Here's what the analysis shows:

Summary Report for llama.cpp PR #784

Project Information:

Repository: llama.cpp (owner: auroralabs-loci)
Pull Request: UPSTREAM PR #18538: CUDA: cache intermediate tensors #784
Comparing versions between base and target

Key Findings:

✅ No significant performance changes detected - The analysis found that no modified functions showed performance changes greater than 2% in either:

Response Time
Throughput Time

What this means:

✅ No performance regressions introduced
✅ Changes are performance-neutral
✅ Safe to proceed from a performance perspective

The modifications in this PR appear to focus on functionality, bug fixes, or refactoring without negatively impacting the performance characteristics of the codebase.

loci-agentic-ai · 2026-01-02T09:29:07Z

Explore the complete analysis inside the Version Insights

I've generated a summary report for your project. The analysis shows that Pull Request #784 in the llama.cpp repository (owned by auroralabs-loci) has minimal to no performance impact.

Key highlights:

✅ No significant response time changes (all within 2% threshold)
✅ No significant throughput changes (all within 2% threshold)
✅ Performance-neutral changes - safe to proceed

The comparison between the base version (cfc7f9d0-e773-11f0-81f2-dbb430499cb5) and target version (7d80cd71-e7b7-11f0-81f2-dbb430499cb5) indicates that the code changes maintain performance stability without introducing any regressions.

loci-dev temporarily deployed to PROD__AL_DEMO January 2, 2026 07:37 — with GitHub Actions Inactive

CUDA: cache intermediate tensors

23d04b3

loci-dev force-pushed the upstream-PR18538-branch_am17an-cuda-cache branch from c15e1c9 to 23d04b3 Compare January 2, 2026 08:42

loci-dev temporarily deployed to PROD__AL_DEMO January 2, 2026 08:42 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 24 times, most recently from 87eb8b6 to 118039a Compare January 6, 2026 04:23

loci-dev force-pushed the main branch 30 times, most recently from 8e2d6b7 to 6e24171 Compare January 10, 2026 11:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #18538: CUDA: cache intermediate tensors #784

UPSTREAM PR #18538: CUDA: cache intermediate tensors #784

loci-dev commented Jan 2, 2026

Uh oh!

loci-agentic-ai bot commented Jan 2, 2026

Uh oh!

loci-agentic-ai bot commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #18538: CUDA: cache intermediate tensors #784

Are you sure you want to change the base?

UPSTREAM PR #18538: CUDA: cache intermediate tensors #784

Conversation

loci-dev commented Jan 2, 2026

Uh oh!

loci-agentic-ai bot commented Jan 2, 2026

Summary Report for llama.cpp PR #784

Uh oh!

loci-agentic-ai bot commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants