Skip to content

huchukato/ComfyUI-QwenVL-Mod

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

635 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QwenVL-Mod for ComfyUI

ComfyUI License Version Python PyTorch CUDA HuggingFace Downloads Stars Issues

LightningAI

buy-me-coffees

🚀 API Support & Troubleshooting

Having issues with ComfyUI API? We've got you covered!

📋 Quick API Help

  • 📖 API Guide - Complete API documentation
  • 🛠️ Debug Script - Automated API diagnostics
  • 🎯 Example Workflows - Ready-to-use API templates

🔧 Common API Issues

Issue Solution
"node not found" Check QwenVL-Mod installation
"model not found" Verify model files in /models/
"invalid input" Check parameter formats
"queue full" Wait for current jobs

🧪 Quick Debug

# Run API diagnostics
python debug_api.py --url http://localhost:18188

# Check available nodes
curl http://localhost:18188/object_info

# Test simple workflow
curl -X POST http://localhost:18188/prompt -d @test_workflow.json

📞 Need More Help?

Ask user for:

  1. Error message (exact text)
  2. Workflow JSON (sanitized)
  3. Debug output (from debug_api.py)
  4. ComfyUI logs (API section)

The ComfyUI-QwenVL custom node integrates powerful Qwen-VL series of vision-language models (LVLMs) from Alibaba Cloud, including latest Qwen3-VL and Qwen2.5-VL, plus GGUF backends and text-only Qwen3 support. This advanced node enables seamless multimodal AI capabilities within your ComfyUI workflows, allowing for efficient text generation, image understanding, and video analysis.

Qwen3-VL-Mod

📰 News & Updates

  • 2026/03/06: v2.2.4 🔧 Critical OOM Fix + Quantization Removal. [Update]

🚨 BitsAndBytes Disabled: Removed problematic quantization causing OOM on RTX 5090.
FP16 Only: All HF nodes now use stable FP16 (~6GB VRAM).
🎯 Cleaner Interface: Removed quantization dropdown - use GGUF nodes for quantized models.
🔧 Both Nodes Fixed: Applied fixes to Standard and Advanced nodes with consistent parameters.
💡 User Guidance: HF nodes for quality, GGUF nodes for quantization - clear separation.

  • 2026/02/27: v2.2.3 🔧 CUDA 13 Compatibility Fix + Redundancy Removal. [Update]

🔧 Removed unload_after_run: Eliminated redundant checkbox from all QwenVL nodes to prevent CUDA 13 conflicts.
🐛 Fixed Parameter Errors: Resolved "missing 1 required positional argument: unload_after_run" errors in all nodes.
🎯 Simplified Interface: Cleaner node interface without redundant parameters.
🧠 VRAM Cleanup Node: Maintained for manual cleanup when needed.
🏆 Community Credits: Thanks to user feedback that identified redundancy and parameter issues.

  • 2026/02/27: v2.2.3 🚀 Critical T2V/I2V Fixes + ComfyUI Optimizations. [Update]

🚀 Batch Processing: Fixed critical T2V → GGUF issue with batch images from video generation.
🔄 Same Model Reuse: Resolved conflict when using same model between T2V and I2V nodes.
⚙️ Flash Attention 2: Added Flash Attention 2 support for performance boost on compatible hardware.
⚙️ ComfyUI Args: Optimized startup arguments with validated experimental features.
🔧 keep_model_loaded: Added missing parameter to PromptEnhancer for consistent memory management.
🐳 Final Docker Build: Optimized build with all fixes and maximum performance.

  • 2026/02/18: v2.2.1 🔧 Critical GGUF VRAM Fix + Docker Optimized. [Update]

🔧 GGUF VRAM Fix: Resolved critical VRAM leak issue causing crashes after 2 executions.
🧹 Aggressive Cleanup: Implemented complete VRAM cleanup for all GGUF nodes (AILab_QwenVL_GGUF and PromptEnhancer).
🚀 Stable Performance: GGUF nodes now work reliably without VRAM accumulation.
🐳 Docker Enhanced: Updated Dockerfiles with RunPod-tested methods for Jupyter and FileBrowser.
🔄 ComfyUI Latest: Always latest stable version without manual updates.
📡 Complete SSH: Server + client SSH for full networking functionality.
🎯 Jupyter Terminal: Adopted RunPod method for working terminal.

  • 2026/02/15: v2.2.0 🎬 WAN 2.2 Story Generation System. [Update]

🎬 Story Generation: Complete 4-segment video story generation with WAN 2.2
🔄 Auto-Split Node: Intelligent prompt splitting for continuous 20-second videos
📝 Show Text Node: Built-in text display node without external dependencies
🎯 Enhanced Prompts: Optimized WAN 2.2 NSFW Story prompts with better formatting
Performance: Optimized context settings for 8B models (65,536 tokens)
🐳 Docker Ready: Complete Story system integrated in Docker containers
🎨 Workflows: Ready-to-use WAN 2.2 Story and T2V workflows included

  • 2026/02/14: v2.1.0 User-Friendly Keep Last Prompt Feature. [Update]

[!NOTE]

  • 2026/02/12: v2.0.9 Bypass Mode parameter for prompt persistence. [Update]

🎛️ Bypass Mode: New bypass_mode parameter allows maintaining previously generated prompts without regeneration.
🔄 Smart Cache: When bypass mode is enabled, nodes retrieve the most recent cached prompt for the current model.
🎯 Perfect Workflow: Generate prompts once, then enable bypass mode to preserve them while changing inputs.
Zero Resource Usage: Bypass mode consumes no computational resources - instant response.
📋 Universal Feature: Available across all nodes (HF, GGUF, PromptEnhancer, Advanced variants).
🎮 Simple Control: Just toggle the bypass_mode checkbox to enable/disable prompt persistence.

  • 2026/02/06: v2.0.8 Bug fixes and stability improvements. [Update]

Note

🐛 Bug Fixes: Fixed JSON syntax error in system prompts and undefined variable issues.
🌐 Multilingual Support: Complete multilingual support added to all WAN 2.2 presets.
🎨 Style Detection: Enhanced visual style detection for anime, 3D, pixel art, and more.
🔧 Stability: Reverted problematic fixed seed improvements to maintain stable operation.
📝 Documentation: Updated README and changelog with today's improvements.

  • 2026/02/04: v2.0.7 Smart prompt caching system with Fixed Seed Mode. [Update]

Note

🧠 Smart Caching: Automatic prompt caching prevents regeneration of identical prompts.
🔒 Fixed Seed Mode: Set any fixed seed value to maintain consistent prompts regardless of media variations.
Performance Boost: Instant response for cached prompts with zero model loading time.
🔧 Code Maintenance: Removed deprecated parameters across all download functions for future compatibility.
📈 GGUF Enhancement: Increased default context size from 8192 to 32768 for better model utilization.
🔄 Universal Caching: Fixed Seed Mode and smart caching now available across all nodes (HF, GGUF, PromptEnhancer).
🎯 Simplified Logic: Streamlined caching system that always includes seed for predictable behavior across all seed values.

🔥 Version 2.1.0 - Enhanced NSFW Support & Lighting Accuracy

  • 🔥 Stronger NSFW Instructions: Enhanced system prompts with explicit directives for uncensored content generation
  • 🔦 Fixed Lighting Invention: Corrected AI tendency to add lights not present in source images
  • 🎯 Improved Accuracy: Better content fidelity for both NSFW descriptions and lighting analysis
  • 🔄 Updated GGUF Models: Replaced noctrex models with mradermacher v2/v3 variants for better performance
  • 📦 Enhanced Model Selection: Added Josiefied-Qwen3-VL-4B-Instruct-abliterated-beta-v1 (HF only) (HF, GGUF, PromptEnhancer).
  • 2026/02/03: v2.0.6 Professional cinematography enhancement for all WAN 2.2 presets. [Update]

Note

🎬 Professional Specs: All WAN 2.2 presets now include comprehensive cinematography specifications.
📹 Technical Details: Light sources, shot types, lens specs, camera movements, color tone requirements.
🎯 Consistent Branding: Updated preset names with WAN family branding for better organization.

  • 2026/03/13: v2.2.4 🎬 Critical I2V Timeline Fixes & NSFW Presets Optimization. [Update]

🎬 I2V Timeline (20s) Critical Fixes:

  • Style Coherence: Fixed AI changing anime→realism mid-sequence
  • Character Stability: Fixed characters disappearing/appearing incorrectly
  • Natural Lighting: Fixed AI adding artificial lights not in image
  • Timeline Structure: Fixed continuous numbering (6,7,8...) instead of 0-5 restart
  • Format Consistency: Fixed missing parentheses and unwanted labels
  • 🔧 All 8 NSFW Presets: Complete specifications + emoji display restored
  • 📋 Token Settings Guide: Comprehensive workflow note for optimal parameters
  • 2026/02/01: v2.0.5 Extended Storyboard preset added for WAN 2.2 format continuity. [Update]

Note

🎬 Extended Storyboard: New preset for seamless storyboard-to-storyboard generation with timeline format.
🔄 Continuity Focus: Each paragraph repeats previous content for smooth transitions.
🎯 WAN 2.2 Compatible: Same timeline structure and NSFW support as I2V preset.

  • 2026/02/01: v2.0.4 Stability update - removed SageAttention for better compatibility and model output reliability. [Update]

Note

🔧 Flash Attention 2: Still available for 2-3x speedup on compatible hardware.
🛡️ Enhanced Stability: Clean attention pipeline with SDPA as reliable fallback.

  • 2026/02/01: v2.0.3 SageAttention compatibility fix for proper patching across transformer versions. [Update]

Note

🔧 Critical Fix: Resolved AttributeError preventing Flash Attention 2 from working with certain transformer versions.
Performance Restored: 2-5x speedup now works correctly with 8-bit quantization on compatible hardware.

  • 2026/02/01: v2.0.2 Enhanced model accessibility, improved custom prompt logic, and expanded NSFW content generation. [Update]

Note

🚀 Free Abliterated Models: Added token-free uncensored models as defaults for better accessibility.
🔧 Custom Prompt Fix: Now combines with preset templates instead of replacing them across all nodes.
📝 Enhanced NSFW: Comprehensive descriptions for adult content generation with detailed act specifications.
🎬 WAN 2.2 Priority: Moved video generation preset to top position for faster workflow access.

  • 2026/01/30: v2.0.1-enhanced Added Flash Attention 2 support and WAN 2.2 integration. [Update]

Note

🚀 Flash Attention 2: 2-5x performance boost with 8-bit quantized attention for RTX 30+ GPUs.
🎬 WAN 2.2 Integration: New specialized prompts for cinematic video generation - convert images/videos to 5-second timeline descriptions (I2V) or text to video (T2V) with professional scene direction.

  • 2025/12/22: v2.0.0 Added GGUF supported nodes and Prompt Enhancer nodes. [Update]

Important

Install llama-cpp-python before running GGUF nodes instruction

600346260_122188475918461193_3763807942053883496_n

  • 2025/11/10: v1.1.0 Runtime overhaul with attention-mode selector, flash-attn auto detection, smarter caching, and quantization/torch.compile controls in both nodes. [Update]
  • 2025/10/31: v1.0.4 Custom Models Supported [Update]
  • 2025/10/22: v1.0.3 Models list updated [Update]
  • 2025/10/17: v1.0.0 Initial Release
    • Support for Qwen3-VL and Qwen2.5-VL series models.
    • Automatic model downloading from Hugging Face.
    • On-the-fly quantization (4-bit, 8-bit, FP16).
    • Preset and Custom Prompt system for flexible and easy use.
    • Includes both a standard and an advanced node for users of all levels.
    • Hardware-aware safeguards for FP8 model compatibility.
    • Image and Video (frame sequence) input support.
    • "Keep Model Loaded" option for improved performance on sequential runs.
    • Seed parameter for reproducible generation.

QwenVL_V1.0.0r

✨ Features

Multimodal Models Quantization Performance WAN2.2 Caching Bypass

  • Standard & Advanced Nodes: Includes a simple QwenVL node for quick use and a QwenVL (Advanced) node with fine-grained control over generation.
  • Prompt Enhancers: Dedicated text-only prompt enhancers for both HF and GGUF backends.
  • Preset & Custom Prompts: Choose from a list of convenient preset prompts or write your own for full control. Custom prompts now combine with preset templates for enhanced flexibility.
  • Smart Prompt Caching: Automatic caching system prevents regeneration of identical prompts, dramatically improving performance for repeated inputs. Cache persists across ComfyUI restarts.
  • 🎛️ Bypass Mode: New bypass_mode parameter allows maintaining previously generated prompts without regeneration. Generate once, then enable bypass mode to preserve prompts while changing inputs. Zero resource usage in bypass mode.
  • Fixed Seed Mode: Set seed = 1 to ignore image/video changes and maintain consistent prompts regardless of media variations. Perfect for stable workflow outputs.
  • WAN 2.2 Integration: Specialized prompts for WAN 2.2 I2V (image-to-video) and T2V (text-to-video) generation with professional cinematography specifications and cinematic timeline structure. I2V preset prioritized for faster workflow access.
  • Professional Cinematography: All WAN 2.2 presets include comprehensive technical specifications - light sources, shot types, lens specifications, camera movements, and color tone requirements for professional video generation.
  • Extended Storyboard: New preset for seamless storyboard-to-storyboard generation with WAN 2.2 format compatibility, continuity focus, and professional cinematography details.
  • WAN Family Branding: Consistent naming across all WAN 2.2 presets for better organization and workflow clarity.
  • Free Abliterated Models: Default models include token-free uncensored options (Qwen3-4B-abliterated-TIES, Qwen3-8B-abliterated-TIES) for immediate accessibility.
  • Multi-Model Support: Easily switch between various official Qwen-VL models with smart 4B-first ordering for VRAM efficiency.
  • Automatic Model Download: Models are downloaded automatically on first use.
  • Smart Quantization: Balance VRAM and performance with 4-bit, 8-bit, and FP16 options. 8-bit quantization enabled by default for optimal accessibility.
  • Optimized Attention: Clean attention pipeline with Flash Attention 2 support and stable SDPA fallback. No complex patching that could interfere with model output.
  • Hardware-Aware: Automatically detects GPU capabilities and prevents errors with incompatible models (e.g., FP8).
  • Reproducible Generation: Use the seed parameter to get consistent outputs, with Fixed Seed Mode for ultimate stability.
  • Memory Management: "Keep Model Loaded" option to retain the model in VRAM for faster processing.
  • Image & Video Support: Accepts both single images and video frame sequences as input.
  • Robust Error Handling: Provides clear error messages for hardware or memory issues.
  • Clean Console Output: Minimal and informative console logs during operation.

🚀 Installation

  1. Clone this repository to your ComfyUI/custom_nodes directory:

    cd ComfyUI/custom_nodes  
    git clone https://github.com/huchukato/ComfyUI-QwenVL-Mod.git
    
  2. Install the required dependencies:

    cd ComfyUI/custom_nodes/ComfyUI-QwenVL-Mod  
    pip install -r requirements.txt
    
  3. Restart ComfyUI.

Optional: Flash Attention 2 Installation

For 2-3x performance boost with compatible GPUs:

# Install Flash Attention 2 (recommended)
pip install flash-attn --no-build-isolation

# Or compile from source
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
python setup.py install

Requirements for Flash Attention 2:

  • NVIDIA GPU with capability >= 8.6 (RTX 20/30/40/50 series)
  • CUDA >= 12.0
  • PyTorch >= 2.3.0

See Flash Attention 2 section for details.

🧭 Node Overview

Transformers (HF) Nodes

  • QwenVL: Quick vision-language inference (image/video + preset/custom prompts).
  • QwenVL (Advanced): Full control over sampling, device, and performance settings.
  • QwenVL Prompt Enhancer: Text-only prompt enhancement (supports both Qwen3 text models and QwenVL models in text mode).

GGUF (llama.cpp) Nodes

  • QwenVL (GGUF): GGUF vision-language inference.
  • QwenVL (GGUF Advanced): Extended GGUF controls (context, GPU layers, etc.).
  • QwenVL Prompt Enhancer (GGUF): GGUF text-only prompt enhancement.

🧩 GGUF Nodes (llama.cpp backend)

This repo includes GGUF nodes powered by llama-cpp-python (separate from the Transformers-based nodes).

  • Nodes: QwenVL (GGUF), QwenVL (GGUF Advanced), QwenVL Prompt Enhancer (GGUF)
  • Model folder (default): ComfyUI/models/llm/GGUF/ (configurable via gguf_models.json)
  • Vision requirement: install a vision-capable llama-cpp-python wheel that provides Qwen3VLChatHandler / Qwen25VLChatHandler
    See docs/LLAMA_CPP_PYTHON_VISION_INSTALL.md

🗂️ Config Files

  • HF models: hf_models.json
    • hf_vl_models: vision-language models (used by QwenVL nodes).
    • hf_text_models: text-only models (used by Prompt Enhancer).
  • GGUF models: gguf_models.json
  • System prompts: AILab_System_Prompts.json (includes both VL prompts and prompt-enhancer styles).

📥 Download Models

The models will be automatically downloaded on first use. If you prefer to download them manually, place them in the ComfyUI/models/LLM/Qwen-VL/ directory.

HF Vision Models (Qwen-VL)

Model Link
Qwen3-VL-2B-Instruct Download
Qwen3-VL-2B-Thinking Download
Qwen3-VL-2B-Instruct-FP8 Download
Qwen3-VL-2B-Thinking-FP8 Download
Qwen3-VL-4B-Instruct Download
Qwen3-VL-4B-Thinking Download
Qwen3-VL-4B-Instruct-FP8 Download
Qwen3-VL-4B-Thinking-FP8 Download
Qwen3-VL-8B-Instruct Download
Qwen3-VL-8B-Thinking Download
Qwen3-VL-8B-Instruct-FP8 Download
Qwen3-VL-8B-Thinking-FP8 Download
Qwen3-VL-32B-Instruct Download
Qwen3-VL-32B-Thinking Download
Qwen3-VL-32B-Instruct-FP8 Download
Qwen3-VL-32B-Thinking-FP8 Download
Qwen2.5-VL-3B-Instruct Download
Qwen2.5-VL-7B-Instruct Download

HF Text Models (Qwen3)

Model Link
Qwen3-0.6B Download
Qwen3-4B-Instruct-2507 Download
qwen3-4b-Z-Image-Engineer Download

GGUF Models (Manual Download)

Group Model Repo Alt Repo Model Files MMProj
Qwen text (GGUF) Qwen3-4B-GGUF Qwen/Qwen3-4B-GGUF Qwen3-4B-Q4_K_M.gguf, Qwen3-4B-Q5_0.gguf, Qwen3-4B-Q5_K_M.gguf, Qwen3-4B-Q6_K.gguf, Qwen3-4B-Q8_0.gguf
Qwen-VL (GGUF) Qwen3-VL-4B-Instruct-GGUF Qwen/Qwen3-VL-4B-Instruct-GGUF Qwen3VL-4B-Instruct-F16.gguf, Qwen3VL-4B-Instruct-Q4_K_M.gguf, Qwen3VL-4B-Instruct-Q8_0.gguf mmproj-Qwen3VL-4B-Instruct-F16.gguf
Qwen-VL (GGUF) Qwen3-VL-8B-Instruct-GGUF Qwen/Qwen3-VL-8B-Instruct-GGUF Qwen3VL-8B-Instruct-F16.gguf, Qwen3VL-8B-Instruct-Q4_K_M.gguf, Qwen3VL-8B-Instruct-Q8_0.gguf mmproj-Qwen3VL-8B-Instruct-F16.gguf
Qwen-VL (GGUF) Qwen3-VL-4B-Thinking-GGUF Qwen/Qwen3-VL-4B-Thinking-GGUF Qwen3VL-4B-Thinking-F16.gguf, Qwen3VL-4B-Thinking-Q4_K_M.gguf, Qwen3VL-4B-Thinking-Q8_0.gguf mmproj-Qwen3VL-4B-Thinking-F16.gguf
Qwen-VL (GGUF) Qwen3-VL-8B-Thinking-GGUF Qwen/Qwen3-VL-8B-Thinking-GGUF Qwen3VL-8B-Thinking-F16.gguf, Qwen3VL-8B-Thinking-Q4_K_M.gguf, Qwen3VL-8B-Thinking-Q8_0.gguf mmproj-Qwen3VL-8B-Thinking-F16.gguf

📖 Usage

Basic Usage

  1. Add the "QwenVL" node from the 🧪AILab/QwenVL category.
  2. Select the model_name you wish to use.
  3. Connect an image or video (image sequence) source to the node.
  4. Write your prompt using the preset or custom field.
  5. Run the workflow.

Advanced Usage

For more control, use the "QwenVL (Advanced)" node. This gives you access to detailed generation parameters like temperature, top_p, beam search, and device selection.

⚙️ Parameters

Parameter Description Default Range Node(s)
model_name The Qwen-VL model to use. Qwen3-VL-4B-Instruct - Standard & Advanced
quantization On-the-fly quantization. Ignored for pre-quantized models (e.g., FP8). 8-bit (Balanced) 4-bit, 8-bit, None Standard & Advanced
preset_prompt A selection of pre-defined prompts for common tasks. "Describe this..." Any text Standard & Advanced
custom_prompt Overrides the preset prompt if provided. Any text Standard & Advanced
max_tokens Maximum number of new tokens to generate. 1024 64-2048 Standard & Advanced
keep_model_loaded Keep the model in VRAM for faster subsequent runs. True True/False Standard & Advanced
seed A seed for reproducible results. 1 1 - 2^64-1 Standard & Advanced
temperature Controls randomness. Higher values = more creative. (Used when num_beams is 1). 0.6 0.1-1.0 Advanced Only
top_p Nucleus sampling threshold. (Used when num_beams is 1). 0.9 0.0-1.0 Advanced Only
num_beams Number of beams for beam search. > 1 disables temperature/top_p sampling. 1 1-10 Advanced Only
repetition_penalty Discourages repeating tokens. 1.2 0.0-2.0 Advanced Only
frame_count Number of frames to sample from the video input. 16 1-64 Advanced Only
device Override automatic device selection. auto auto, cuda, cpu Advanced Only
attention_mode Attention backend for performance optimization. auto auto, flash_attention_2, sdpa Standard & Advanced

💡 Quantization Options

Mode Precision Memory Usage Speed Quality Recommended For
None (FP16) 16-bit Float High Fastest Best High VRAM GPUs (16GB+)
8-bit (Balanced) 8-bit Integer Medium Fast Very Good Balanced performance (8GB+)
4-bit (VRAM-friendly) 4-bit Integer Low Slower* Good Low VRAM GPUs (<8GB)

* Note on 4-bit Speed: 4-bit quantization significantly reduces VRAM usage but may result in slower performance on some systems due to the computational overhead of real-time dequantization.

⚡ Attention Mode Options

Mode Description Speed Memory Requirements
auto Automatically selects Flash Attention 2 if available, falls back to SDPA Fast Medium flash-attn package
flash_attention_2 Uses Flash Attention v2 for optimal performance Fastest Low flash-attn + CUDA GPU
sdpa PyTorch native Scaled Dot Product Attention Medium Medium PyTorch 2.0+

Flash Attention 2 Requirements:

  • NVIDIA GPU with capability >= 8.6 (RTX 20/30/40/50 series)
  • CUDA >= 12.0
  • PyTorch >= 2.3.0
  • flash-attn package installed

🤔 Setting Tips

Setting Recommendation
Model Choice For most users, Qwen3-VL-4B-Instruct is a great starting point. If you have a 40-series GPU, try the -FP8 version for better performance.
Memory Mode Keep keep_model_loaded enabled (True) for the best performance if you plan to run the node multiple times. Disable it only if you are running out of VRAM for other nodes.
Quantization Start with the default 8-bit. If you have plenty of VRAM (>16GB), switch to None (FP16) for the best speed and quality. If you are low on VRAM, use 4-bit.
Performance The first time a model is loaded with a specific quantization, it may be slow. Subsequent runs (with keep_model_loaded enabled) will be much faster.
Attention Mode Use "flash_attention_2" for 2-3x speedup if you have compatible GPU. Otherwise use "auto" for automatic selection.

🧠 About Model

This node utilizes the Qwen-VL series of models, developed by the Qwen Team at Alibaba Cloud. These are powerful, open-source large vision-language models (LVLMs) designed to understand and process both visual and textual information, making them ideal for tasks like detailed image and video description.

⚡ Flash Attention 2 Performance Boost

This integration includes support for Flash Attention 2, a cutting-edge attention implementation that provides significant performance improvements:

🚀 Performance Gains

Model Flash Attention 2 Speedup
Qwen2.5-VL-3B 100% 200-300%
Qwen3-VL-4B 100% 150-250%

🎯 How to Use

  1. Install Flash Attention 2 (see Installation)
  2. Select "flash_attention_2" in the attention_mode parameter
  3. Run your workflow - the system automatically applies the optimization

🔧 Technical Details

  • Implementation: Uses optimized attention kernels for better memory efficiency
  • Compatibility: Works with all quantization modes (4-bit, 8-bit, FP16)
  • Integration: Seamlessly integrates with existing workflows
  • Fallback: Automatically falls back to SDPA if Flash Attention 2 is not available

📋 Requirements Checklist

  • flash-attn package installed
  • Sufficient VRAM for your chosen model
  • Compatible GPU (RTX 20 series or newer)

🐛 Troubleshooting

Flash Attention 2 not working?

# Check installation
python -c "import flash_attn; print('Flash Attention 2 available')"

# Check GPU capability
python -c "import torch; print(f'GPU capability: {torch.cuda.get_device_capability()}')"

Common Problems:

  • "Flash Attention 2 not available": Install the package and check GPU compatibility
  • "CUDA not available": Ensure you have installed PyTorch compatible CUDA
  • "GPU capability insufficient": Flash Attention 2 requires RTX 20 series or newer

📚 References

🎬 WAN 2.2 Integration

This enhanced version includes specialized prompts for WAN 2.2 video generation, supporting both I2V (image-to-video) and T2V (text-to-video) workflows.

🎯 Available WAN 2.2 Prompts

Prompt Type Use Case Input Output Location
🍿 Wan 2.2 I2V Image-to-Video Image + Text 5-second cinematic timeline QwenVL nodes
🍿 Wan 2.2 T2V Text-to-Video Text only 5-second cinematic timeline Prompt Enhancer nodes

⚡ Features

  • Cinematic Timeline Structure: 5-second videos with second-by-second descriptions
  • Multilingual Support: Italian/English input → English optimized output
  • Professional Scene Description: Film-style direction including lighting, camera, composition
  • NSFW Handling: Appropriate content filtering and description
  • WAN 2.2 Optimization: Specifically formatted for best video generation results

📝 Output Format Example

(At 0 seconds: A young woman stands facing a rack of clothes...)
(At 1 second: The blouse falls to the floor around her feet...)
(At 2 seconds: She reaches out with her right hand...)
(At 3 seconds: She turns her body slightly towards the mirror...)
(At 4 seconds: Lifting the hanger, she holds the dark fabric...)
(At 5 seconds: A subtle, thoughtful expression crosses her face...)

🔧 Usage

  1. For I2V: Use "🍿 Wan 2.2 I2V" preset in QwenVL nodes with image input
  2. For T2V: Use "🍿 Wan 2.2 T2V" style in Prompt Enhancer nodes with text only
  3. For Storyboard: Use "🍿 Wan Extended Storyboard" for seamless scene continuity
  4. For General Video: Use "🎥 Wan Cinematic Video" for professional single-scene descriptions

🎨 Best Practices

  • Provide clear, descriptive input for better scene interpretation
  • Use specific camera and lighting directions when possible
  • Include mood and atmosphere details for cinematic results
  • Leverage professional cinematography specs for optimal video quality
  • The system automatically handles timeline optimization for WAN 2.2 presets

🗺️ Roadmap

✅ Completed (v2.0.7)

  • ✅ Support for Qwen3-VL and Qwen2.5-VL models.
  • ✅ GGUF backend support for faster inference.
  • ✅ Prompt Enhancer nodes for text-only workflows.
  • ✅ Flash Attention 2 integration for 2-3x performance boost.
  • ✅ WAN 2.2 I2V and T2V video generation prompts.
  • ✅ Extended Storyboard preset for scene continuity.
  • ✅ Professional cinematography specifications for all WAN 2.2 presets.
  • ✅ WAN family branding and consistent naming.
  • ✅ Extended Storyboard preset for seamless continuity generation.
  • ✅ Free abliterated models without token requirements.
  • ✅ Enhanced custom prompt logic across all nodes.
  • ✅ Comprehensive NSFW content generation support.
  • ✅ Optimized model ordering and quantization defaults.
  • ✅ Clean attention pipeline with SDPA stability.
  • ✅ Removed complexity for better model output reliability.
  • ✅ Smart prompt caching system for performance optimization.
  • ✅ Fixed Seed Mode for stable outputs regardless of media variations.
  • ✅ Persistent cache across ComfyUI restarts.
  • ✅ Code maintenance updates for future compatibility.

🙏 Credits

  • Qwen Team: Alibaba Cloud - For development and open-source powerful Qwen-VL models.
  • ComfyUI: comfyanonymous - For incredible and extensible ComfyUI platform.
  • llama-cpp-python: JamePeng/llama-cpp-python - GGUF backend with vision support used by GGUF nodes.
  • GenorTG: GenorTG/ComfyUI-Genor-QwenVL-Mod - For innovative memory management improvements including unload_after_run parameter and prompt cache optimizations that prevent OOM errors in multi-node workflows.
  • ComfyUI Integration: 1038lab - Developer of this custom node.

👥 Author

📜 License

This repository code is released under GPL-3.0 License.

About

ComfyUI-QwenVL custom node: Integrates the Qwen-VL series, including Qwen2.5-VL and the latest Qwen3-VL, with GGUF support for advanced multimodal AI in text generation, image understanding, and video analysis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

No contributors

Languages

  • Python 79.3%
  • Dockerfile 10.5%
  • Shell 9.4%
  • JavaScript 0.8%