Skip to content

Add challenge 104: Min-P Sampling (Medium)#277

Open
claude[bot] wants to merge 1 commit into
mainfrom
add-challenge-104-min-p-sampling
Open

Add challenge 104: Min-P Sampling (Medium)#277
claude[bot] wants to merge 1 commit into
mainfrom
add-challenge-104-min-p-sampling

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude Bot commented Jun 5, 2026

Summary

  • Adds challenge 104: Min-P Sampling, a real-world LLM logit-filtering primitive used in vLLM, Hugging Face TGI, and llama.cpp.
  • For a batch of logits [B, V], the kernel computes softmax probabilities, masks tokens whose probability is below min_p * max_prob in that row, and renormalizes the survivors.
  • Unlike top-k or top-p, no sort is required — the threshold adapts to how peaked each row's distribution is, so the challenge becomes a clean two-pass per-row reduction (max → masked sum → renormalize).

What's included

  • challenge.py with reference_impl, get_solve_signature, and example/functional/performance test generators (10 functional tests covering edge cases, min_p=0/0.99/in-between, tied maxima, peaked vs uniform distributions, power-of-2 and non-power-of-2 vocabs, realistic 16×32000).
  • challenge.html with description, requirements, worked 2×4 example matching generate_example_test(), and constraints including the performance test sizing bullet.
  • Six starter files (CUDA, PyTorch, Triton, JAX, CuTe, Mojo) following the medium-difficulty parameter-comment convention.

Performance test

B = 64, V = 128,000, min_p = 0.05 (~62.5 MB per buffer × 2 buffers; well under the 5× / 16 GB T4 budget).

Test plan

  • pre-commit run --all-files passes
  • Reference implementation verified locally against the example and all 10 functional tests (every row sums to 1)
  • CUDA reference solution validated via scripts/run_challenge.py ... --action submit on T4 — all tests passed

🤖 Generated with Claude Code

Batched min-p sampling, a logit-filtering primitive used in modern LLM
serving stacks (vLLM, TGI, llama.cpp). Per row, the kernel computes
softmax probabilities, masks tokens below min_p * max_prob, and
renormalizes. Unlike top-k or top-p, no sort is required — the cutoff
adapts to how peaked each row is, which makes it a clean exercise in
batched per-row reductions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants