diff --git a/QEfficient/transformers/models/modeling_auto.py b/QEfficient/transformers/models/modeling_auto.py
index 236f6c9f5..8b2f3edd6 100644
--- a/QEfficient/transformers/models/modeling_auto.py
+++ b/QEfficient/transformers/models/modeling_auto.py
@@ -3550,10 +3550,10 @@ class QEFFAutoModelForCTC(QEFFTransformersBase):
     including Wav2Vec2 and other encoder-only speech models optimized for alignment-free transcription.
     Although it is possible to initialize the class directly, we highly recommend using the ``from_pretrained`` method for initialization.
 
-    ``Mandatory`` Args:
-        :model (nn.Module): PyTorch model
-
+    Example
+    -------
     .. code-block:: python
+
         import torchaudio
         from QEfficient import QEFFAutoModelForCTC
         from transformers import AutoProcessor
diff --git a/README.md b/README.md
index cb6f32382..257fd6344 100644
--- a/README.md
+++ b/README.md
@@ -6,18 +6,26 @@
 ---
 
 *Latest news* :fire: <br>
-
+- [12/2025] Enabled [disaggregated serving](examples/disagg_serving) for GPT-OSS model
+- [12/2025] Added support for wav2vec2 Audio Model [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h)
+- [12/2025] Added support for diffuser video generation model [WAN 2.2 Model Card](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers)
+- [12/2025] Added support for diffuser image generation model [FLUX.1 Model Card](https://huggingface.co/black-forest-labs/FLUX.1-schnell)
+- [12/2025] Added support for [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b)
+- [12/2025] Added support for [OpenGVLab/InternVL3_5-1B](https://huggingface.co/OpenGVLab/InternVL3_5-1B)
+- [12/2025] Added support for Olmo Model [allenai/OLMo-2-0425-1B](https://huggingface.co/allenai/OLMo-2-0425-1B)
+- [10/2025] Added support for Qwen3 MOE Model [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507)
 - [10/2025] Added support for Qwen2.5VL Multi-Model [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct)
 - [10/2025] Added support for Mistral3 Multi-Model [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503)
 - [10/2025] Added support for Molmo Multi-Model [allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924)
-- [06/2025] Added support for Llama4 Multi-Model [meta-llama/Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct)
-- [06/2025] Added support for Gemma3 Multi-Modal-Model [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)
-- [06/2025] Added support of model `hpcai-tech/grok-1` [hpcai-tech/grok-1](https://huggingface.co/hpcai-tech/grok-1)
-- [06/2025] Added support for sentence embedding which improves efficiency, Flexible/Custom Pooling configuration and compilation with multiple sequence lengths, [Embedding model](https://github.com/quic/efficient-transformers/pull/424).
+
 
 <details>
 <summary>More</summary>
 
+- [06/2025] Added support for Llama4 Multi-Model [meta-llama/Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct)
+- [06/2025] Added support for Gemma3 Multi-Modal-Model [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)
+- [06/2025] Added support of model `hpcai-tech/grok-1` [hpcai-tech/grok-1](https://huggingface.co/hpcai-tech/grok-1)
+- [06/2025] Added support for sentence embedding which improves efficiency, Flexible/Custom Pooling configuration and compilation with multiple sequence lengths, [Embedding model](https://github.com/quic/efficient-transformers/pull/424)
 - [04/2025] Support for [SpD, multiprojection heads](https://quic.github.io/efficient-transformers/source/quick_start.html#draft-based-speculative-decoding). Implemented post-attention hidden size projections to speculate tokens ahead of the base model
 - [04/2025] [QNN Compilation support](https://github.com/quic/efficient-transformers/pull/374) for AutoModel classes. QNN compilation capabilities for multi-models, embedding models and causal models.
 - [04/2025] Added support for separate prefill and decode compilation for encoder (vision) and language models. This feature will be utilized for [disaggregated serving](https://github.com/quic/efficient-transformers/pull/365).
diff --git a/docs/index.rst b/docs/index.rst
index e83337db2..5e0c8f634 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -38,6 +38,7 @@ Welcome to Efficient-Transformers Documentation!
    :maxdepth: 4
 
    source/qeff_autoclasses
+   source/diffuser_classes
    source/cli_api
 
 .. toctree::
diff --git a/docs/source/diffuser_classes.md b/docs/source/diffuser_classes.md
new file mode 100644
index 000000000..7154f8c0d
--- /dev/null
+++ b/docs/source/diffuser_classes.md
@@ -0,0 +1,84 @@
+# Diffuser Classes
+
+
+## Pipeline API
+
+(QEffTextEncoder)=
+### `QEffTextEncoder`
+
+```{eval-rst}
+.. autoclass:: QEfficient.diffusers.pipelines.pipeline_module.QEffTextEncoder
+   :members:
+   :no-show-inheritance:
+```
+
+---
+
+(QEffUNet)=
+### `QEffUNet`
+
+```{eval-rst}
+.. autoclass:: QEfficient.diffusers.pipelines.pipeline_module.QEffUNet
+   :members:
+   :no-show-inheritance:
+```
+
+---
+
+(QEffVAE)=
+### `QEffVAE`
+
+```{eval-rst}
+.. autoclass:: QEfficient.diffusers.pipelines.pipeline_module.QEffVAE
+   :members:
+   :no-show-inheritance:
+```
+
+---
+
+(QEffFluxTransformerModel)=
+### `QEffFluxTransformerModel`
+
+```{eval-rst}
+.. autoclass:: QEfficient.diffusers.pipelines.pipeline_module.QEffFluxTransformerModel
+   :members:
+   :no-show-inheritance:
+```
+
+----
+
+(QEffWanUnifiedTransformer)=
+### `QEffWanUnifiedTransformer`
+
+```{eval-rst}
+.. autoclass:: QEfficient.diffusers.pipelines.pipeline_module.QEffWanUnifiedTransformer
+   :members:
+   :no-show-inheritance:
+```
+
+----
+
+
+## Model Classes
+
+(QEffWanPipeline)=
+### `QEffWanPipeline`
+
+```{eval-rst}
+.. autoclass:: QEfficient.diffusers.pipelines.wan.pipeline_wan.QEffWanPipeline
+   :members:
+   :no-show-inheritance:
+```
+
+----
+
+(QEffFluxPipeline)=
+### `QEffFluxPipeline`
+
+```{eval-rst}
+.. autoclass:: QEfficient.diffusers.pipelines.flux.pipeline_flux.QEffFluxPipeline
+   :members:
+   :no-show-inheritance:
+```
+
+----
diff --git a/docs/source/introduction.md b/docs/source/introduction.md
index 9fdc814d8..3fbbb1813 100644
--- a/docs/source/introduction.md
+++ b/docs/source/introduction.md
@@ -23,14 +23,26 @@ For other models, there is comprehensive documentation to inspire upon the chang
 ***Latest news*** : <br>
 
 - [coming soon] Support for more popular [models](models_coming_soon)<br>
-- [06/2025] Added support for Llama4 Multi-Model [meta-llama/Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct)
-- [06/2025] Added support for Gemma3 Multi-Modal-Model [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)
-- [06/2025] Added support of model `hpcai-tech/grok-1` [hpcai-tech/grok-1](https://huggingface.co/hpcai-tech/grok-1)
-- [06/2025] Added support for sentence embedding which improves efficiency, Flexible/Custom Pooling configuration and compilation with multiple sequence lengths, [Embedding model](https://github.com/quic/efficient-transformers/pull/424).
+- [12/2025] Enabled [disaggregated serving](https://github.com/quic/efficient-transformers/tree/main/examples/disagg_serving) for GPT-OSS model
+- [12/2025] Added support for wav2vec2 Audio Model [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h)
+- [12/2025] Added support for diffuser video generation model [WAN 2.2 Model Card](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers)
+- [12/2025] Added support for diffuser image generation model [FLUX.1 Model Card](https://huggingface.co/black-forest-labs/FLUX.1-schnell)
+- [12/2025] Added support for [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b)
+- [12/2025] Added support for [OpenGVLab/InternVL3_5-1B](https://huggingface.co/OpenGVLab/InternVL3_5-1B)
+- [12/2025] Added support for Olmo Model [allenai/OLMo-2-0425-1B](https://huggingface.co/allenai/OLMo-2-0425-1B)
+- [10/2025] Added support for Qwen3 MOE Model [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507)
+- [10/2025] Added support for Qwen2.5VL Multi-Model [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct)
+- [10/2025] Added support for Mistral3 Multi-Model [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503)
+- [10/2025] Added support for Molmo Multi-Model [allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924)
+
 
 <details>
 <summary>More</summary>
 
+- [06/2025] Added support for Llama4 Multi-Model [meta-llama/Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct)
+- [06/2025] Added support for Gemma3 Multi-Modal-Model [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)
+- [06/2025] Added support of model `hpcai-tech/grok-1` [hpcai-tech/grok-1](https://huggingface.co/hpcai-tech/grok-1)
+- [06/2025] Added support for sentence embedding which improves efficiency, Flexible/Custom Pooling configuration and compilation with multiple sequence lengths, [Embedding model](https://github.com/quic/efficient-transformers/pull/424)
 - [04/2025] Support for [SpD, multiprojection heads](https://quic.github.io/efficient-transformers/source/quick_start.html#draft-based-speculative-decoding). Implemented post-attention hidden size projections to speculate tokens ahead of the base model
 - [04/2025] [QNN Compilation support](https://github.com/quic/efficient-transformers/pull/374) for AutoModel classes. QNN compilation capabilities for multi-models, embedding models and causal models.
 - [04/2025] Added support for separate prefill and decode compilation for encoder (vision) and language models. This feature will be utilized for [disaggregated serving](https://github.com/quic/efficient-transformers/pull/365).
diff --git a/docs/source/qeff_autoclasses.md b/docs/source/qeff_autoclasses.md
index 1b1d8657d..7ec21b97b 100644
--- a/docs/source/qeff_autoclasses.md
+++ b/docs/source/qeff_autoclasses.md
@@ -115,3 +115,23 @@
 .. automethod:: QEfficient.transformers.models.modeling_auto.QEFFAutoModelForSpeechSeq2Seq.compile
 .. automethod:: QEfficient.transformers.models.modeling_auto.QEFFAutoModelForSpeechSeq2Seq.generate
 ```
+
+(QEFFAutoModelForCTC)=
+## `QEFFAutoModelForCTC`
+
+
+```{eval-rst}
+.. autoclass:: QEfficient.transformers.models.modeling_auto.QEFFAutoModelForCTC
+   :noindex:
+   :no-members:
+   :no-show-inheritance:
+```
+
+### High-Level API
+
+```{eval-rst}
+.. automethod:: QEfficient.transformers.models.modeling_auto.QEFFAutoModelForCTC.from_pretrained
+.. automethod:: QEfficient.transformers.models.modeling_auto.QEFFAutoModelForCTC.export
+.. automethod:: QEfficient.transformers.models.modeling_auto.QEFFAutoModelForCTC.compile
+.. automethod:: QEfficient.transformers.models.modeling_auto.QEFFAutoModelForCTC.generate
+```
\ No newline at end of file
diff --git a/docs/source/release_docs.md b/docs/source/release_docs.md
index 97389e571..c71d13d30 100644
--- a/docs/source/release_docs.md
+++ b/docs/source/release_docs.md
@@ -1,11 +1,120 @@
+# Efficient Transformer Library - 1.21.0 Release Notes
+
+Welcome to the official release of **Efficient Transformer Library v1.21.0**! This release introduces advanced attention mechanisms, expanded model support, optimized serving capabilities, and significant improvements to fine-tuning and deployment workflows.
+
+> ✅ All features and models listed below are available on the [`release/v1.21.0`](https://github.com/quic/efficient-transformers/tree/release/v1.21.0) branch and [`mainline`](https://github.com/quic/efficient-transformers/tree/main).
+
+---
+
+## Newly Supported Models
+
+- **Flux (Diffusers - Image Generation)**
+  - Diffusion-based image generation model
+  - [Flux.1 Schnell Example Script](https://github.com/quic/efficient-transformers/blob/main/examples/diffusers/flux/flux_1_schnell.py)
+
+- **WAN (Diffusers - Video Generation)**
+  - Wide-Area Network Lightning support for distributed inference
+  - [Wan_lightning Example Script](https://github.com/quic/efficient-transformers/blob/main/examples/diffusers/wan/wan_lightning.py)
+
+- **Qwen2.5-VL (Vision Language)**
+  - Executable via [`QEFFAutoModelForImageTextToText`](#QEFFAutoModelForImageTextToText)
+  - Multi-image prompt support
+  - Continuous batching enabled
+  - [Qwen2.5-VL Usage Guide](https://github.com/quic/efficient-transformers/tree/main/examples/image_text_to_text/models/qwen_vl)
+
+- **Mistral 3.1 (24B)**
+  - Executable via [`QEFFAutoModelForImageTextToText`](#QEFFAutoModelForImageTextToText)
+  - [Mistral-3.1 Example Script](https://github.com/quic/efficient-transformers/blob/main/examples/image_text_to_text/models/mistral_vision/mistral3_example.py)
+
+
+- **GPT-OSS (Decode-Only)**
+  - Executable via [`QEffAutoModelForCausalLM`](#QEffAutoModelForCausalLM)
+  - Separate prefill and decode compilation supported
+  - Disaggregated serving ready
+  - [GPT-OSS Example Scripts](https://github.com/quic/efficient-transformers/blob/main/examples/disagg_serving/gpt_oss_disagg_mode.py)
+
+- **Olmo2**
+  - Executable via [`QEffAutoModelForCausalLM`](#QEffAutoModelForCausalLM)
+  - Full CausalLM support with optimizations
+  - Refer to [Text generation Example Scripts](https://github.com/quic/efficient-transformers/tree/main/examples/text_generation) for usage details.
+
+- **Molmo**
+  - Executable via [`QEffAutoModelForCausalLM`](#QEffAutoModelForCausalLM)
+  - Multi-modal capabilities
+  - [Molmo Example Script](https://github.com/quic/efficient-transformers/blob/main/examples/image_text_to_text/models/molmo/molmo_example.py)
+
+- **InternVL 3.5 Series**
+  - Executable via [`QEffAutoModelForCausalLM`](#QEffAutoModelForCausalLM)
+  - Full Vision-Language support
+  - Multi-image handling with continuous batching
+  - Refer to [InternVL 3.5 Example Scripts](https://github.com/quic/efficient-transformers/tree/main/examples/image_text_to_text/models/internvl) for usage details.
+
+- **Qwen3-MOE (Mixture of Experts)**
+  - Executable via [`QEffAutoModelForCausalLM`](#QEffAutoModelForCausalLM)
+  - Efficient expert routing
+  - [Qwen3-MOE Example Scripts](https://github.com/quic/efficient-transformers/blob/main/examples/text_generation/moe_inference.py)
+
+- **Wav2Vec2 (Audio)**
+  - Executable via [`QEFFAutoModelForCTC`](#QEFFAutoModelForCTC)
+  - Speech recognition and audio feature extraction
+  - [Wav2Vec2 Example Scripts](https://github.com/quic/efficient-transformers/blob/main/examples/audio/wav2vec2_inference.py)
+
+- **Multilingual-e5-Large (Embedding Model)**
+  - Executable via [`QEffAutoModel`](#QEffAutoModel)
+  - Multilingual text embedding capabilities
+  - Refer [usage details](https://github.com/quic/efficient-transformers/tree/main/examples/embeddings) here.
+
+---
+
+## Key Features & Enhancements
+
+- **Framework Upgrades**: Transformers `4.55`, PyTorch `2.7.0+cpu`, Torchvision `0.22.0+cpu`
+- **Python Support**:  Requires Python `3.10`
+- **ONNX Opset**: Updated to version `17` for broader operator support
+- **Advanced Attention**: Flux blocking support, BlockedKV attention for CausalLM models
+- **Diffusers Integration**: Full support for diffuser-based image generation and video generation models
+- **Compute-Context-Length (CCL) support**: To optimize the throughput when handling very large context lengths
+- **Prefill/Decode Separation**: Support for GPT OSS using disaggregate serving models
+- **Continuous Batching (VLMs)**: Extended to Vision Language Models with multi-image handling
+- **ONNX Sub-Functions**: Feature enabling more efficient model compilation and execution on hardware
+- **Memory Profiling**: Built-in utilities for optimization analysis
+- **Extend on-device Sampling**: Extend on-device sampling to dual QPC VLMs and Guided decoding for on-device sampling
+- **ONNX transform, memory & time optimizations**: Optimizations for faster ONNX Transform and reduced memory footprint
+- **Removed platform SDK dependency**: Support QPC generation on systems without the Platform SDK
+- **Example Scripts Revamp**: New example scripts for audio, embeddings, and image-text-to-text tasks
+- **Onboarding Guide**: Simplified setup and deployment process for new users
+
+
+
+---
+
+## Embedding Model Upgrades
+
+- **Multi-Sequence Length Support**: Auto-selects optimal graph at runtime
+- **Enhanced Pooling**: Flexible pooling strategies for various embedding tasks
+
+---
+
+## Fine-Tuning Support
+
+- **Checkpoint Management**: Resume from epochs with proper state restoration
+- **Enhanced Loss Tracking**: Corrected data type handling for accurate loss computation
+- **Custom Dataset Support**: Improved handling with better tokenization
+- **Device-Aware Scaling**: Optimized GradScaler for multi-device training
+- **Comprehensive Testing**: Unit tests for fine-tuning workflows
+
+---
+
+
 # Efficient Transformer Library - 1.20.0 Release Notes
 
-Welcome to the official release of **Efficient Transformer Library v1.20.0**! This release brings a host of new model integrations, performance enhancements, and fine-tuning capabilities to accelerate your AI development.
+Welcome to the official release of **Efficient Transformer Library v1.20.0**! This release introduces advanced attention mechanisms, expanded model support, optimized serving capabilities, and significant improvements to fine-tuning and deployment workflows.
 
-> ✅ All features and models listed below are available on the [`release/1.20.0`](https://github.com/quic/efficient-transformers/tree/release/v1.20.0) branch and [`mainline`](https://github.com/quic/efficient-transformers/tree/main).
+> ✅ All features and models listed below are available on the [`release/v1.20.0`](https://github.com/quic/efficient-transformers/tree/release/v1.20.0) branch and [`mainline`](https://github.com/quic/efficient-transformers/tree/main).
 
 ---
 
+
 ## Newly Supported Models
 
 - **Llama-4-Scout-17B-16E-Instruct**
diff --git a/docs/source/supported_features.rst b/docs/source/supported_features.rst
index 8260342f2..24551e904 100644
--- a/docs/source/supported_features.rst
+++ b/docs/source/supported_features.rst
@@ -6,6 +6,14 @@ Supported Features
 
    * - Feature
      - Impact
+   * - `Diffusion Models <https://github.com/quic/efficient-transformers/tree/main/examples/diffusers>`_
+     - Full support for diffuser-based image generation models like Stable Diffusion, Imagen, Videogen enabling efficient image and video synthesis tasks.
+   * - `Disaggregated Serving for GPT-OSS <https://github.com/quic/efficient-transformers/tree/main/examples/disagg_serving>`_
+     - Enabled for GPT-OSS models, allowing for flexible deployment of large language models across different hardware configurations.
+   * - `ONNX Sub-Functions <https://github.com/quic/efficient-transformers/pull/621>`_
+     - Feature enabling more efficient model compilation and execution on hardware.
+   * - `BlockedKV attention in CausalLM <https://github.com/quic/efficient-transformers/pull/618>`_
+     - Implements a blocked K/V cache layout so attention reads/processes the cache blockbyblock, improving longcontext decode performance.
    * - `Compute Context Length (CCL) <https://github.com/quic/efficient-transformers/blob/main/examples/performance/compute_context_length/README.md>`_
      - Optimizes inference by using different context lengths during prefill and decode phases, reducing memory footprint and computation for shorter sequences while maintaining support for longer contexts. Supports both text-only and vision-language models. Refer `sample script <https://github.com/quic/efficient-transformers/blob/main/examples/performance/compute_context_length/basic_inference.py>`_ for more **details**.
    * - Sentence embedding, Flexible Pooling configuration and compilation with multiple sequence lengths
@@ -58,5 +66,3 @@ Supported Features
      - A script for computing the perplexity of a model, allowing for the evaluation of model performance and comparison across different models and datasets. Refer `sample script <https://github.com/quic/efficient-transformers/blob/main/scripts/perplexity_computation/calculate_perplexity.py>`_ for more **details**.
    * - KV Heads Replication Script
      - A sample script for replicating key-value (KV) heads for the Llama-3-8B-Instruct model, running inference with the original model, replicating KV heads, validating changes, and exporting the modified model to ONNX format. Refer `sample script <https://github.com/quic/efficient-transformers/blob/main/scripts/replicate_kv_head/replicate_kv_heads.py>`_ for more **details**.
-   * - Block Attention (in progress)
-     - Reduces inference latency and computational cost by dividing context into blocks and reusing key-value states, particularly useful in RAG.
diff --git a/docs/source/validate.md b/docs/source/validate.md
index b5ab87629..2c948e175 100644
--- a/docs/source/validate.md
+++ b/docs/source/validate.md
@@ -8,17 +8,20 @@
 
 | Architecture            | Model Family       | Representative Models                                                                 | [vLLM Support](https://quic.github.io/cloud-ai-sdk-pages/latest/Getting-Started/Installation/vLLM/vLLM/index.html) |
 |-------------------------|--------------------|--------------------------------------------------------------------------------------|--------------|
-| **FalconForCausalLM**   | Falcon**             | [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)                                                                | ✔️          |
+| **MolmoForCausalLM** | Molmo① | [allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924) | ✕           |
+| **Olmo2ForCausalLM**   |       OLMo-2       | [allenai/OLMo-2-0425-1B](https://huggingface.co/allenai/OLMo-2-0425-1B)                                                               | ✕          |
+| **FalconForCausalLM**   | Falcon②            | [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)                                                                | ✔️          |
 | **Qwen3MoeForCausalLM**   | Qwen3Moe             | [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507)                                                                | ✕          |
 | **GemmaForCausalLM**    | CodeGemma          | [google/codegemma-2b](https://huggingface.co/google/codegemma-2b)<br>[google/codegemma-7b](https://huggingface.co/google/codegemma-7b)                                           | ✔️          |
-|                         | Gemma***              | [google/gemma-2b](https://huggingface.co/google/gemma-2b)<br>[google/gemma-7b](https://huggingface.co/google/gemma-7b)<br>[google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b)<br>[google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b)<br>[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b)        | ✔️          |
+|                         | Gemma③             | [google/gemma-2b](https://huggingface.co/google/gemma-2b)<br>[google/gemma-7b](https://huggingface.co/google/gemma-7b)<br>[google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b)<br>[google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b)<br>[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b)        | ✔️          |
+| **GptOssForCausalLM** | GPT-OSS            | [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b)                                                   | ✔️          |
 | **GPTBigCodeForCausalLM** | Starcoder1.5      | [bigcode/starcoder](https://huggingface.co/bigcode/starcoder)                                                                   | ✔️          |
 |                         | Starcoder2         | [bigcode/starcoder2-15b](https://huggingface.co/bigcode/starcoder2-15b)                                                              | ✔️          |
 | **GPTJForCausalLM**     | GPT-J              | [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b)                                                                 | ✔️          |
 | **GPT2LMHeadModel**     | GPT-2              | [openai-community/gpt2](https://huggingface.co/openai-community/gpt2)                                                               | ✔️          |
 | **GraniteForCausalLM**  | Granite 3.1        | [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct)<br>[ibm-granite/granite-guardian-3.1-8b](https://huggingface.co/ibm-granite/granite-guardian-3.1-8b)          | ✔️          |
 |                         | Granite 20B        | [ibm-granite/granite-20b-code-base-8k](https://huggingface.co/ibm-granite/granite-20b-code-base-8k)<br>[ibm-granite/granite-20b-code-instruct-8k](https://huggingface.co/ibm-granite/granite-20b-code-instruct-8k)    | ✔️          |
-| **InternVLChatModel**   | Intern-VL          | [OpenGVLab/InternVL2_5-1B](https://huggingface.co/OpenGVLab/InternVL2_5-1B)   | ✔️          |                                                         |            |
+| **InternVLChatModel**   | Intern-VL①         | [OpenGVLab/InternVL2_5-1B](https://huggingface.co/OpenGVLab/InternVL2_5-1B) <br> [OpenGVLab/InternVL3_5-1B](https://huggingface.co/OpenGVLab/InternVL3_5-1B)  | ✔️          |                                                         |            |
 | **LlamaForCausalLM**    | CodeLlama          | [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf)<br>[codellama/CodeLlama-13b-hf](https://huggingface.co/codellama/CodeLlama-13b-hf)<br>[codellama/CodeLlama-34b-hf](https://huggingface.co/codellama/CodeLlama-34b-hf) | ✔️          |
 |                         | DeepSeek-R1-Distill-Llama | [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B)                                      | ✔️          |
 |                         | InceptionAI-Adapted | [inceptionai/jais-adapted-7b](https://huggingface.co/inceptionai/jais-adapted-7b)<br>[inceptionai/jais-adapted-13b-chat](https://huggingface.co/inceptionai/jais-adapted-13b-chat)<br>[inceptionai/jais-adapted-70b](https://huggingface.co/inceptionai/jais-adapted-70b) | ✔️          |
@@ -31,13 +34,15 @@
 | **MistralForCausalLM**  | Mistral            | [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)                                                  | ✔️          |
 | **MixtralForCausalLM**  | Codestral<br>Mixtral | [mistralai/Codestral-22B-v0.1](https://huggingface.co/mistralai/Codestral-22B-v0.1)<br>[mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)                        | ✔️          |
 | **MPTForCausalLM**      | MPT                | [mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b)                                                                     | ✔️          |
-| **Phi3ForCausalLM**     | Phi-3**, Phi-3.5**     | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)                                                    | ✔️          |
+| **Phi3ForCausalLM**     | Phi-3②, Phi-3.5②     | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)                                                    | ✔️          |
 | **QwenForCausalLM**     | DeepSeek-R1-Distill-Qwen | [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)                                                   | ✔️          |
 |                         | Qwen2, Qwen2.5     | [Qwen/Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct)                                                            | ✔️          |
 | **LlamaSwiftKVForCausalLM**  | swiftkv            | [Snowflake/Llama-3.1-SwiftKV-8B-Instruct](https://huggingface.co/Snowflake/Llama-3.1-SwiftKV-8B-Instruct)                                                  | ✔️          |
-| **Grok1ModelForCausalLM**  |  grok-1          | [hpcai-tech/grok-1](https://huggingface.co/hpcai-tech/grok-1)                                                  | ✕          |
-- ** set "trust-remote-code" flag to True for e2e inference with vLLM
-- *** pass "disable-sliding-window" flag for e2e inference of Gemma-2 family of models with vLLM
+| **Grok1ModelForCausalLM**  |  grok-1②          | [hpcai-tech/grok-1](https://huggingface.co/hpcai-tech/grok-1)                                                  | ✕          |
+
+
+---
+
 ## Embedding Models
 
 ### Text Embedding Task
@@ -47,12 +52,14 @@
 |--------------|--------------|---------------------------------|--------------|
 | **BertModel** | BERT-based   | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)<br> [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5)<br>[BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) <br>[e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) | ✔️          |
 | **MPNetForMaskedLM** | MPNet | [sentence-transformers/multi-qa-mpnet-base-cos-v1](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-cos-v1) | ✕          |
-| **MistralModel** | Mistral | [e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) | ✕          |
-| **NomicBertModel** | NomicBERT | [nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) | ✕          |
-| **Qwen2ForCausalLM** | Qwen2 | [stella_en_1.5B_v5](https://huggingface.co/NovaSearch/stella_en_1.5B_v5) | ✔️          |
+| **MistralModel** | Mistral | [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) | ✕          |
+| **NomicBertModel** | NomicBERT② | [nomic-ai/nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) | ✕          |
+| **Qwen2ForCausalLM** | Qwen2 | [NovaSearch/stella_en_1.5B_v5](https://huggingface.co/NovaSearch/stella_en_1.5B_v5) | ✔️          |
 | **RobertaModel**     | RoBERTa |  [ibm-granite/granite-embedding-30m-english](https://huggingface.co/ibm-granite/granite-embedding-30m-english)<br> [ibm-granite/granite-embedding-125m-english](https://huggingface.co/ibm-granite/granite-embedding-125m-english) | ✔️          |
 | **XLMRobertaForSequenceClassification** | XLM-RoBERTa | [bge-reranker-v2-m3bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) | ✕          |
-| **XLMRobertaModel**    | XLM-RoBERTa  |[ibm-granite/granite-embedding-107m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-107m-multilingual)<br> [ibm-granite/granite-embedding-278m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-278m-multilingual)  | ✔️          |
+| **XLMRobertaModel**    | XLM-RoBERTa  |[ibm-granite/granite-embedding-107m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-107m-multilingual)<br> [ibm-granite/granite-embedding-278m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-278m-multilingual) <br> [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) | ✔️          |
+
+---
 
 ## Multimodal Language Models
 
@@ -65,8 +72,10 @@
 | **MllamaForConditionalGeneration** | Llama 3.2   | [meta-llama/Llama-3.2-11B-Vision Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)<br>[meta-llama/Llama-3.2-90B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct)           | ✔️                       | ✔️                      | ✔️                      | ✔️                      |
 | **LlavaNextForConditionalGeneration** | Granite Vision | [ibm-granite/granite-vision-3.2-2b](https://huggingface.co/ibm-granite/granite-vision-3.2-2b)  | ✕                       | ✔️                      | ✕                       | ✔️                      |
 | **Llama4ForConditionalGeneration** | Llama-4-Scout | [Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct)  | ✔️                       | ✔️                      | ✔️                       | ✔️                      |
-| **Gemma3ForConditionalGeneration** | Gemma3***       | [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)  | ✔️               | ✔️                       | ✔️                      | ✕                      |
-- *** pass "disable-sliding-window" flag for e2e inference with vLLM
+| **Gemma3ForConditionalGeneration** | Gemma3③       | [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)  | ✔️               | ✔️                       |                       |                       |
+| **Qwen2_5_VLForConditionalGeneration** | Qwen2.5-VL | [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)  | ✔️               | ✔️                       |                       |                       |
+| **Mistral3ForConditionalGeneration** | Mistral3| [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503)| ✔️ | ✔️ | | |
+
 
 
 **Dual QPC:**
@@ -84,26 +93,56 @@ In the single QPC(Qualcomm Program Container) setup, the entire model—includin
 
 
 
-**Note:**
+```{NOTE}
 The choice between Single and Dual QPC is determined during model instantiation using the `kv_offload` setting.
 If the `kv_offload` is set to `True` it runs in dual QPC and if its set to `False` model runs in single QPC mode.
+```
 
----
 ### Audio Models
 (Automatic Speech Recognition) - Transcription Task
+
 **QEff Auto Class:** `QEFFAutoModelForSpeechSeq2Seq`
 
 | Architecture | Model Family | Representative Models                                                                 | vLLM Support |
 |--------------|--------------|----------------------------------------------------------------------------------------|--------------|
 | **Whisper**  | Whisper      | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)<br>[openai/whisper-base](https://huggingface.co/openai/whisper-base)<br>[openai/whisper-small](https://huggingface.co/openai/whisper-small)<br>[openai/whisper-medium](https://huggingface.co/openai/whisper-medium)<br>[openai/whisper-large](https://huggingface.co/openai/whisper-large)<br>[openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) | ✔️          |
+| **Wav2Vec2** | Wav2Vec2     | [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base)<br>[facebook/wav2vec2-large](https://huggingface.co/facebook/wav2vec2-large) |           |
+
+---
+
+## Diffusion Models
+
+### Image Generation Models
+**QEff Auto Class:** `QEffFluxPipeline`
+
+| Architecture | Model Family | Representative Models                                                                 | vLLM Support |
+|--------------|--------------|----------------------------------------------------------------------------------------|--------------|
+| **FluxPipeline**  | FLUX.1     | [black-forest-labs/FLUX.1-schnell](https://huggingface.co/stabilityai/stable-diffusion-2-1) |          |
+
+### Video Generation Models
+**QEff Auto Class:** `QEffWanPipeline`
+
+| Architecture | Model Family | Representative Models                                                                 | vLLM Support |
+|--------------|--------------|----------------------------------------------------------------------------------------|--------------|
+| **WanPipeline**  | Wan2.2     | [Wan-AI/Wan2.2-T2V-A14B-Diffusers](https://huggingface.co/stabilityai/stable-diffusion-2-1) |         |
+
+---
+
+```{NOTE}
+① Intern-VL and Molmo models are Vision-Language Models but use `QEFFAutoModelForCausalLM` for inference to stay compatible with HuggingFace Transformers.
+
+② Set `trust_remote_code=True` for end-to-end inference with vLLM.
+
+③ Pass `disable_sliding_window` for few family models when using vLLM.
+```
+---
 
 (models_coming_soon)=
 # Models Coming Soon
 
 | Architecture            | Model Family | Representative Models                      |
 |-------------------------|--------------|--------------------------------------------|
-| **Qwen3MoeForCausalLM** |Qwen3| [Qwen/Qwen3-MoE-15B-A2B]() |
-| **Mistral3ForConditionalGeneration**|Mistral 3.1| [mistralai/Mistral-Small-3.1-24B-Base-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503) |
-| **BaichuanForCausalLM** | Baichuan2    | [baichuan-inc/Baichuan2-7B-Base](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base)             |
-| **CohereForCausalLM**   | Command-R    | [CohereForAI/c4ai-command-r-v01](https://huggingface.co/CohereForAI/c4ai-command-r-v01)             |
-| **DbrxForCausalLM**     | DBRX         | [databricks/dbrx-base](https://huggingface.co/databricks/dbrx-base)                       |
\ No newline at end of file
+| **NemotronHForCausalLM** | NVIDIA Nemotron v3   | [NVIDIA Nemotron v3](https://huggingface.co/collections/nvidia/nvidia-nemotron-v3)             |
+| **Sam3Model**   | facebook/sam3   | [facebook/sam3](https://huggingface.co/facebook/sam3)             |
+| **StableDiffusionModel**     | HiDream-ai         | [HiDream-ai/HiDream-I1-Full](https://huggingface.co/HiDream-ai/HiDream-I1-Full)                       |
+| **MistralLarge3Model**    | Mistral Large 3   | [mistralai/mistral-large-3](https://huggingface.co/collections/mistralai/mistral-large-3) |
\ No newline at end of file
diff --git a/examples/README.md b/examples/README.md
index 3913b25ce..ed2779fdf 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -72,6 +72,14 @@ Optimization techniques.
 
 [See all performance examples →](performance/)
 
+### Disaggregated Serving
+Distributed inference across multiple devices.
+
+| Example | Description | Script |
+|---------|-------------|--------|
+| Basic Disaggregated Serving | Multi-device serving | [disagg_serving/gpt_oss_disagg_mode.py](disagg_serving/gpt_oss_disagg_mode.py) |
+| Chunking Disaggregated Serving | Multi-device serving | [disagg_serving/gpt_oss_disagg_mode_with_chunking.py](disagg_serving/gpt_oss_disagg_mode_with_chunking.py) |
+
 ## Installation
 
 For installation instructions, see the [Quick Installation guide](../README.md#quick-installation) in the main README.
diff --git a/examples/text_generation/README.md b/examples/text_generation/README.md
index 6b80442c2..2d8754768 100644
--- a/examples/text_generation/README.md
+++ b/examples/text_generation/README.md
@@ -24,6 +24,7 @@ Popular model families include:
 - GPT-2, GPT-J
 - Falcon, MPT, Phi-3
 - Granite, StarCoder
+- OLMo 2
 
 ---