Skip to content

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Jan 2, 2026

Mirrored from ggml-org/llama.cpp#18538

Cache quantized activations and mul_mat_id_helper output for the gate tensor. Unfortunately I was expecting more PP gains but it falls between 1-3% for most models. It may not be worth merging just for performance reasons but could be useful for as a general idea of a cache

@loci-agentic-ai
Copy link

Explore the complete analysis inside the Version Insights

Perfect! I was able to retrieve the summary report for your project. Here's what the analysis shows:

Summary Report for llama.cpp PR #784

Project Information:

Key Findings:

No significant performance changes detected - The analysis found that no modified functions showed performance changes greater than 2% in either:

  • Response Time
  • Throughput Time

What this means:

  • ✅ No performance regressions introduced
  • ✅ Changes are performance-neutral
  • ✅ Safe to proceed from a performance perspective

The modifications in this PR appear to focus on functionality, bug fixes, or refactoring without negatively impacting the performance characteristics of the codebase.

@loci-dev loci-dev force-pushed the upstream-PR18538-branch_am17an-cuda-cache branch from c15e1c9 to 23d04b3 Compare January 2, 2026 08:42
@loci-agentic-ai
Copy link

Explore the complete analysis inside the Version Insights

I've generated a summary report for your project. The analysis shows that Pull Request #784 in the llama.cpp repository (owned by auroralabs-loci) has minimal to no performance impact.

Key highlights:

  • ✅ No significant response time changes (all within 2% threshold)
  • ✅ No significant throughput changes (all within 2% threshold)
  • ✅ Performance-neutral changes - safe to proceed

The comparison between the base version (cfc7f9d0-e773-11f0-81f2-dbb430499cb5) and target version (7d80cd71-e7b7-11f0-81f2-dbb430499cb5) indicates that the code changes maintain performance stability without introducing any regressions.

@loci-dev loci-dev force-pushed the main branch 24 times, most recently from 87eb8b6 to 118039a Compare January 6, 2026 04:23
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 8e2d6b7 to 6e24171 Compare January 10, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants