-
Notifications
You must be signed in to change notification settings - Fork 20
Pull requests: vllm-project/tpu-inference
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Model] Add vision encoder and input embeddings merger warmup for Qwen2.5 VL model
#972
opened Oct 29, 2025 by
KWang1998
Loading…
[FIX] Add dummy get_input_embeddings to fix vLLM model type check
#971
opened Oct 29, 2025 by
kuafou
Loading…
[TPU host offload][FIX] Keep KV cache using NamedSharding on CPU
#970
opened Oct 29, 2025 by
juncgu-google
Loading…
[Llama4/JAX] Llama4 FP8 Quantized Weight Loading and Sharding
#962
opened Oct 28, 2025 by
sierraisland
Loading…
dtype in ModelConfig will be implicitly casted to torch.dtype so in tpu_jax, we need to check for torch dtype as well
#945
opened Oct 27, 2025 by
lc5211
Loading…
[Kernel] Refactor ragged_paged_attention to proxy for default and hd64
#918
opened Oct 22, 2025 by
yaochengji
•
Draft
[CI] remove lora_bias_stacked as it is deprecated in vllm
#835
opened Oct 11, 2025 by
bzgoogle
Loading…
feat(ci): Record vLLM and tpu-inference commit hashes and refactor commit logic
#795
opened Oct 7, 2025 by
dennisYehCienet
Loading…
Previous Next
ProTip!
no:milestone will show everything without a milestone.