vllm-project · mgoin · Oct 15, 2025 · Oct 15, 2025 · Oct 15, 2025 · Oct 15, 2025
diff --git a/_posts/2025-01-14-struct-decode-intro.md b/_posts/2025-01-14-struct-decode-intro.md
@@ -2,6 +2,7 @@
 layout: post
 title: "Structured Decoding in vLLM: a gentle introduction"
 author: "Guest Post by BentoML and Red Hat"
+image: /assets/figures/struct-decode-intro/vllm-xgrammar-decode-time-per-output-token.png
 ---
 
 **TL/DR**:

diff --git a/_posts/2025-01-21-stack-release.md b/_posts/2025-01-21-stack-release.md
@@ -1,8 +1,6 @@
 ---
 layout: post
-title: "High Performance and Easy Deployment of vLLM in K8S with “vLLM production-stack”"
-thumbnail-img: /assets/figures/stack/stack-thumbnail.png
-share-img: /assets/figures/stack/stack-thumbnail.png
+title: "High Performance and Easy Deployment of vLLM in K8S with vLLM production-stack"
 author: LMCache Team
 image: /assets/figures/stack/stack-thumbnail.png
 ---

diff --git a/_posts/2025-02-24-ptpc-fp8-rocm.md b/_posts/2025-02-24-ptpc-fp8-rocm.md
@@ -3,8 +3,6 @@ layout: post
 title: "PTPC-FP8: Boosting vLLM Performance on AMD ROCm"
 author: "AMD and Embedded LLM"
 image: /assets/figures/ptpc/PTPC-tumbnail.png
-thumbnail-img: /assets/figures/ptpc/PTPC-tumbnail.png
-share-img: /assets/figures/ptpc/PTPC-tumbnail.png
 math: true
 ---
 

diff --git a/_posts/2025-04-05-llama4.md b/_posts/2025-04-05-llama4.md
@@ -3,8 +3,6 @@ layout: post
 title: "Llama 4 in vLLM"
 author: "The vLLM Team"
 image: /assets/figures/llama4/perf.png
-thumbnail-img: /assets/figures/llama4/perf.png
-share-img: /assets/figures/llama4/perf.png
 ---
 
 We're excited to announce that vLLM now supports the [Llama 4 herd of models](https://ai.meta.com/blog/llama-4-multimodal-intelligence/): **Scout** (17B-16E) and **Maverick** (17B-128E). You can run these powerful long-context, natively multi-modal (up to 8-10 images with good results), mixture-of-experts models in vLLM today by updating to version v0.8.3 or later:

diff --git a/_posts/2025-04-11-transformers-backend.md b/_posts/2025-04-11-transformers-backend.md
@@ -3,8 +3,6 @@ layout: post
 title: "Transformers backend integration in vLLM"
 author: "The Hugging Face Team"
 image: /assets/figures/transformers-backend/transformers-backend.png
-thumbnail-img: /assets/figures/transformers-backend/transformers-backend.png
-share-img: /assets/figures/transformers-backend/transformers-backend.png
 ---
 
 The [Hugging Face Transformers library](https://huggingface.co/docs/transformers/main/en/index)

diff --git a/_posts/2025-04-23-openrlhf-vllm.md b/_posts/2025-04-23-openrlhf-vllm.md
@@ -1,10 +1,8 @@
 ---
 layout: post
 title: "Accelerating RLHF with vLLM, Best Practice from OpenRLHF"
-author: "The OpenRLHF Team"  
-image: /assets/figures/openrlhf-vllm/ray.png  
-thumbnail-img: /assets/figures/openrlhf-vllm/ray.png  
-share-img: /assets/figures/openrlhf-vllm/ray.png  
+author: "The OpenRLHF Team"
+image: /assets/figures/openrlhf-vllm/ray.png
 ---
 
 As demand grows for training reasoning-capable large language models (LLMs), Reinforcement Learning from Human Feedback (RLHF) has emerged as a cornerstone technique. However, conventional RLHF pipelines—especially those using Proximal Policy Optimization (PPO)—are often hindered by substantial computational overhead. This challenge is particularly pronounced with models that excel at complex reasoning tasks (such as OpenAI-o1 and DeepSeek-R1), where generating long chain-of-thought (CoT) outputs can account for up to 90% of total training time. These models must produce detailed, step-by-step reasoning that can span thousands of tokens, making inference significantly more time-consuming than the training phase itself. As a pioneering inference framework, vLLM provides a user-friendly interface for generating RLHF samples and updating model weights.

diff --git a/_posts/2025-06-30-minimax-m1.md b/_posts/2025-06-30-minimax-m1.md
@@ -2,8 +2,9 @@
 layout: post
 title: "MiniMax-M1 Hybrid Architecture Meets vLLM: Long Context, Fast Inference"
 author: "MiniMax"
-benchmark-img: /assets/figures/minimax-m1/benchmark.png 
-moe-img: /assets/figures/minimax-m1/moe.png 
+image: /assets/figures/minimax-m1/benchmark.png
+benchmark-img: /assets/figures/minimax-m1/benchmark.png
+moe-img: /assets/figures/minimax-m1/moe.png
 lightning_attention-img: /assets/figures/minimax-m1/lightning_attention.png
 ---
 

diff --git a/_posts/2025-09-11-qwen3-next.md b/_posts/2025-09-11-qwen3-next.md
@@ -3,8 +3,6 @@ layout: post
 title: "vLLM Now Supports Qwen3-Next: Hybrid Architecture with Extreme Efficiency"
 author: "The vLLM Team"
 image: /assets/figures/qwen3-next/qwen.png
-thumbnail-img: /assets/figures/qwen3-next/qwen.png
-share-img: /assets/figures/qwen3-next/qwen.png
 ---
 
 We’re excited to announce that **vLLM now supports Qwen3-Next**, the latest generation of foundation models from the Qwen team. Qwen3-Next introduces a **hybrid architecture with extreme efficiency for long context support**, and vLLM offers full support of its functionalities.

diff --git a/_posts/2025-09-16-vllm-meetup.md b/_posts/2025-09-16-vllm-meetup.md
@@ -1,7 +1,8 @@
 ---
 layout: post
 title: "The First vLLM Meetup in Korea"
-author: "vLLM Team" 
+author: "vLLM Team"
+image: /assets/figures/vllm-meetup/image-3.png
 ---
 
 <p align="center">

diff --git a/_posts/2025-09-29-deepseek-v3-2.md b/_posts/2025-09-29-deepseek-v3-2.md
@@ -1,10 +1,8 @@
 ---
 layout: post
 title: "DeepSeek-V3.2-Exp in vLLM: Fine-Grained Sparse Attention in Action"
-author: "vLLM Team" 
+author: "vLLM Team"
 image: /assets/figures/deepseek-v3-2/dsa-explained.png
-thumbnail-img: /assets/figures/deepseek-v3-2/dsa-explained.png
-share-img: /assets/figures/deepseek-v3-2/dsa-explained.png
 ---
 
 ### Introduction

diff --git a/_posts/2025-10-09-blackwell-inferencemax.md b/_posts/2025-10-09-blackwell-inferencemax.md
@@ -1,7 +1,8 @@
----  
-layout: post  
-title: "SemiAnalysis InferenceMAX: vLLM and NVIDIA Accelerate Blackwell Inference"  
-author: "vLLM Team"   
+---
+layout: post
+title: "SemiAnalysis InferenceMAX: vLLM and NVIDIA Accelerate Blackwell Inference"
+author: "vLLM Team"
+image: /assets/figures/blackwell-inferencemax/gpt-oss-120b-1k-1k.png
 ---
 
 ### Introduction