Skip to content

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Jan 2, 2026

Mirrored from ggml-org/llama.cpp#18535

ggml-org/llama.cpp#18530 / ggml-org/llama.cpp#17026

Makes the min_batch_size for triggering op offload configurable via env var, defaulting to the prior hardcoded value of 32 if not specified to keep current behavior intact.

This is helpful when running large MoEs with a significant size of weights stored in host buffers on CPU, causing a bottleneck when op offloading with small batches that are still larger than the default 32. The optimal value, or "break even point" here depends on characteristics of the hardware + model, and is best determined empirically (ref: ggml-org/llama.cpp#17026 (comment)).

Make sure to read the contributing guidelines before submitting a PR

* makes the min_batch_size for triggering op offload configurable via env var, defaulting to the prior hardcoded value of 32
@loci-agentic-ai
Copy link

Explore the complete analysis inside the Version Insights

Perfect! I've generated the summary report for your project. Here are the key findings:

Summary Report for llama.cpp PR #781

Key Highlights:

  • No significant performance changes detected - All modified functions show performance changes within ±2% threshold
  • Both response time and throughput remain stable
  • Low risk for performance regressions

The analysis compared:

  • Base version: cfc7f9d0-e773-11f0-81f2-dbb430499cb5
  • Target version: 23f02221-e7a7-11f0-81f2-dbb430499cb5

This is a positive result indicating that the changes in Pull Request #781 maintain performance stability and can proceed without performance concerns.

Would you like more detailed analysis on any specific aspects of the report?

@loci-dev loci-dev force-pushed the main branch 27 times, most recently from 0fbfb33 to c5b5703 Compare January 6, 2026 16:11
@loci-dev loci-dev force-pushed the main branch 29 times, most recently from 8e2d6b7 to 6e24171 Compare January 10, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants