Skip to content

Pull requests: vllm-project/tpu-inference

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Fix kv cache shape for head_dim=64
#976 opened Oct 30, 2025 by yaochengji Loading…
[Spec Decoding] Reduce TPU <-> CPU data transfer
#961 opened Oct 28, 2025 by Lumosis Loading…
Update README.md
#956 opened Oct 27, 2025 by bvrockwell Loading…
Install runai-model-streamer module in Dockerfile
#955 opened Oct 27, 2025 by amacaskill Loading…
[GPT-OSS] enable attention-sink
#943 opened Oct 27, 2025 by bzgoogle Draft
[multi-host] add quick start guide
#928 opened Oct 23, 2025 by Lumosis Loading…
[Requirements] Bump JAX/JAXLib to 0.8.0
#927 opened Oct 23, 2025 by jrplatin Loading…
[Requirements] Bump TPU Info to 0.6.0
#917 opened Oct 22, 2025 by jrplatin Loading…
PP for single host
#914 opened Oct 21, 2025 by Chenyaaang Draft
[WIP] Add Qwen3-Omni model
#896 opened Oct 19, 2025 by eitanporat Loading…
add jax support for Qwen2VL
#893 opened Oct 18, 2025 by shungcp Loading…
[Doc] Docker guide extended
#890 opened Oct 17, 2025 by hosseinsarshar Loading…
Data Parallelism support
#865 opened Oct 14, 2025 by wenxindongwork Loading…
lora spmd
#802 opened Oct 8, 2025 by vanbasten23 Draft
ProTip! no:milestone will show everything without a milestone.