UPSTREAM PR #18535: ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH #781

loci-dev · 2026-01-02T06:45:12Z

ggml-org/llama.cpp#18530 / ggml-org/llama.cpp#17026

Makes the min_batch_size for triggering op offload configurable via env var, defaulting to the prior hardcoded value of 32 if not specified to keep current behavior intact.

This is helpful when running large MoEs with a significant size of weights stored in host buffers on CPU, causing a bottleneck when op offloading with small batches that are still larger than the default 32. The optimal value, or "break even point" here depends on characteristics of the hardware + model, and is best determined empirically (ref: ggml-org/llama.cpp#17026 (comment)).

Make sure to read the contributing guidelines before submitting a PR

* makes the min_batch_size for triggering op offload configurable via env var, defaulting to the prior hardcoded value of 32

loci-agentic-ai · 2026-01-02T07:30:59Z

Explore the complete analysis inside the Version Insights

Perfect! I've generated the summary report for your project. Here are the key findings:

Summary Report for llama.cpp PR #781

Key Highlights:

✅ No significant performance changes detected - All modified functions show performance changes within ±2% threshold
✅ Both response time and throughput remain stable
✅ Low risk for performance regressions

The analysis compared:

Base version: cfc7f9d0-e773-11f0-81f2-dbb430499cb5
Target version: 23f02221-e7a7-11f0-81f2-dbb430499cb5

This is a positive result indicating that the changes in Pull Request #781 maintain performance stability and can proceed without performance concerns.

Would you like more detailed analysis on any specific aspects of the report?

ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH

3c1bcf2

* makes the min_batch_size for triggering op offload configurable via env var, defaulting to the prior hardcoded value of 32

loci-dev temporarily deployed to PROD__AL_DEMO January 2, 2026 06:45 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 27 times, most recently from 0fbfb33 to c5b5703 Compare January 6, 2026 16:11

loci-dev force-pushed the main branch 29 times, most recently from 8e2d6b7 to 6e24171 Compare January 10, 2026 11:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #18535: ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH #781

UPSTREAM PR #18535: ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH #781

Uh oh!

loci-dev commented Jan 2, 2026

Uh oh!

loci-agentic-ai bot commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #18535: ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH #781

Are you sure you want to change the base?

UPSTREAM PR #18535: ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH #781

Uh oh!

Conversation

loci-dev commented Jan 2, 2026

Uh oh!

loci-agentic-ai bot commented Jan 2, 2026

Summary Report for llama.cpp PR #781

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants