vllm-project / tpu-inference Public

Notifications You must be signed in to change notification settings
Fork 20
Star 137

Code
Issues 8
Pull requests 46
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: vllm-project/tpu-inference

Labels 10 Milestones 0

New pull request New

46 Open 911 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[GPT-OSS] add unit test for GPT-OSS

#978 opened Oct 30, 2025 by bzgoogle • Draft

[DeepSeek][Do-Not-Merge] experimental PR for DeepSeek optimization

#977 opened Oct 30, 2025 by bzgoogle • Draft

Fix kv cache shape for head_dim=64

#976 opened Oct 30, 2025 by yaochengji

Loading…

[Model] Add vision encoder and input embeddings merger warmup for Qwen2.5 VL model

#972 opened Oct 29, 2025 by KWang1998

Loading…

[FIX] Add dummy get_input_embeddings to fix vLLM model type check

#971 opened Oct 29, 2025 by kuafou

Loading…

[TPU host offload][FIX] Keep KV cache using NamedSharding on CPU

#970 opened Oct 29, 2025 by juncgu-google

Loading…

[Llama4/JAX] Llama4 FP8 Quantized Weight Loading and Sharding

#962 opened Oct 28, 2025 by sierraisland

Loading…

[Spec Decoding] Reduce TPU <-> CPU data transfer

#961 opened Oct 28, 2025 by Lumosis

Loading…

Update README.md

#956 opened Oct 27, 2025 by bvrockwell

Loading…

Install runai-model-streamer module in Dockerfile

#955 opened Oct 27, 2025 by amacaskill

Loading…

dtype in ModelConfig will be implicitly casted to torch.dtype so in tpu_jax, we need to check for torch dtype as well

#945 opened Oct 27, 2025 by lc5211

Loading…

[GPT-OSS] enable attention-sink

#943 opened Oct 27, 2025 by bzgoogle • Draft

[do not review yet] Get rid of a2a

#939 opened Oct 24, 2025 by vanbasten23 • Draft

[multi-host] add quick start guide

#928 opened Oct 23, 2025 by Lumosis

Loading…

[Requirements] Bump JAX/JAXLib to 0.8.0

#927 opened Oct 23, 2025 by jrplatin

Loading…

[Kernel] Refactor ragged_paged_attention to proxy for default and hd64

#918 opened Oct 22, 2025 by yaochengji • Draft

[Requirements] Bump TPU Info to 0.6.0

#917 opened Oct 22, 2025 by jrplatin

Loading…

PP for single host

#914 opened Oct 21, 2025 by Chenyaaang • Draft

[WIP] Add Qwen3-Omni model

#896 opened Oct 19, 2025 by eitanporat

Loading…

add jax support for Qwen2VL

#893 opened Oct 18, 2025 by shungcp

Loading…

[Doc] Docker guide extended

#890 opened Oct 17, 2025 by hosseinsarshar

Loading…

Data Parallelism support

#865 opened Oct 14, 2025 by wenxindongwork

Loading…

[CI] remove lora_bias_stacked as it is deprecated in vllm

#835 opened Oct 11, 2025 by bzgoogle

Loading…

lora spmd

#802 opened Oct 8, 2025 by vanbasten23 • Draft

feat(ci): Record vLLM and tpu-inference commit hashes and refactor commit logic

#795 opened Oct 7, 2025 by dennisYehCienet

Loading…

Previous 1 2 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!