You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m experiencing an issue with Flux models running extremely slowly in Invoke. I’m using Invoke version 5.6.0, installed via Launcher. For comparison, in ComfyUI I achieve a generation speed of approximately 3.2–3.3 seconds per iteration at a resolution of 832×1216. However, using the same resolution in Invoke results in a staggering ~20 seconds per iteration, and I eventually encounter an OOM error. This is despite ComfyUI consuming significantly less VRAM, RAM, and swap. Even Stable Diffusion WebUI Forge runs at roughly the same speed as ComfyUI—although it uses noticeably more RAM, which sometimes leads to OOM errors (when I use multiple LoRAs).
I attempted to import text encoders into Invoke, but was unsuccessful, so I loaded the provided analogues instead.
I’ve also tried adding enable_partial_loading: true in the invokeai.yaml file and experimenting with various combinations of max_cache_ram_gb, max_cache_vram_gb and device_working_mem_gb, but nothing seems to make a fundamental difference. (Note: “Nvidia sysmem fallback” is also disabled.)
I really enjoy using Invoke—I periodically download it to see how it evolves—but I always end up returning to ComfyUI because even the SDXL models in Invoke run slightly slower, and I encounter OOM errors at resolutions where ComfyUI doesn’t even require tiling. I’d love to use Invoke because of its pleasant UI and fantastic inpainting capabilities, but unfortunately, optimization for non-high-end configurations still leaves much to be desired. If needed, I can provide additional information to help diagnose the issue.
Thank you for your attention.
What you expected to happen
I acknowledge that my configuration (8GB of VRAM and 32GB of RAM) isn’t ideal for the demanding Flux models. Nevertheless, both ComfyUI and Forge handle these models at speeds that are acceptable for me. I would like to confirm that the issue isn’t on my end, and I hope to see better optimization in Invoke. This is especially important since img2img, even at higher SDXL resolutions, and upscaling consistently lead to OOM errors in Invoke—whereas ComfyUI performs swiftly at even higher resolutions with lower resource usage. Additionally, when using tiled decoding in the VAE, ComfyUI is capable of handling extraordinarily high resolutions on a system with just 8GB of VRAM by Invoke’s standards.
How to reproduce the problem
Use an Nvidia GPU with 8GB of VRAM; and 32GB of RAM
Standard install Invoke 5.6.0 via Launcher
Disable “Nvidia sysmem fallback” in Nvidia driver settings
Add enable_partial_loading: true to the Invoke.yaml file
Generate a 832x1216 or 1024x1024 image to canvas using flux1-dev-Q8_0.gguf or majicFlus 麦橘超然 (any other Flux model of the same size can be used) + t5xxl_fp16, clip_l and vae
Additional context
No response
Discord username
No response
The text was updated successfully, but these errors were encountered:
Yaruze66
changed the title
Flux Extremely Slow in Invoke Compared to ComfyUI and Forge
[bug]: Flux Extremely Slow in Invoke Compared to ComfyUI and Forge
Feb 6, 2025
I conducted additional tests using SDXL models in Invoke (versions 5.6.0 and 5.6.1rc1), as Flux models failed to generate any output. Below are my findings compared to ComfyUI:
At 832x1216 resolution, Invoke matches ComfyUI’s generation speed (~1.90 it/s) and VRAM/RAM usage only when enable_partial_loading: true is disabled. Adding LoRAs under these conditions has minimal impact on speed.
Enabling enable_partial_loading: true reduces speed to 1.3-1.4 it/s.
Combining max_cache_ram_gb: 28 improves speed to ~1.60 it/s.
Adding max_cache_vram_gb: 5 further increases speed to ~1.8 it/s but introduces OOM errors during img2img upscaling (even at x1.5). In contrast, ComfyUI handles x2.5 upscales (non-img2img, dedicated upscale models) without issues.
With enable_partial_loading: true active, adding multiple LoRAs drops generation speed to 1 it/s. Experimentation with max_cache_vram_gb and device_working_mem_gb yielded no viable solutions—only OOM errors or unacceptably slow speeds.
I absolutely love the incredible inpainting, outpainting, and all those nifty image editing features that Invoke offers. That said, for me, all this potential is a bit held back by Invoke’s pretty high demands. I’m not exactly a tech expert, and I don’t really understand all the “magic” the Comfy folks work behind the scenes—but for users with limited VRAM, ComfyUI is a real lifesaver. It even lets you crank out high-resolution images with Flux models (FP8/Q8_0 + t5xxl_fp16 + a few LoRAs)!
The only downside is that ComfyUI can’t quite match Invoke’s sleek, user-friendly interface or its mind-blowing inpainting capabilities. I really hope the InvokeAI team takes note and manages to optimize things to at least ComfyUI’s level—if not even better!
Is there an existing issue for this problem?
Operating system
Windows 11 23H2 22631.4751
GPU vendor
Nvidia (CUDA)
GPU model
RTX 2070 Super, driver version: 565.90
GPU VRAM
8GB
Version number
5.6.0
Browser
Google Chrome 132.0.6834.160, Invoke Client
Python dependencies
No response
What happened
Hello,
I’m experiencing an issue with Flux models running extremely slowly in Invoke. I’m using Invoke version 5.6.0, installed via Launcher. For comparison, in ComfyUI I achieve a generation speed of approximately 3.2–3.3 seconds per iteration at a resolution of 832×1216. However, using the same resolution in Invoke results in a staggering ~20 seconds per iteration, and I eventually encounter an OOM error. This is despite ComfyUI consuming significantly less VRAM, RAM, and swap. Even Stable Diffusion WebUI Forge runs at roughly the same speed as ComfyUI—although it uses noticeably more RAM, which sometimes leads to OOM errors (when I use multiple LoRAs).
I’m using the following models in ComfyUI:
flux1-dev-Q8_0.gguf
majicFlus 麦橘超然
t5xxl_fp16.safetensors
clip_l.safetensors
ae.safetensors
I attempted to import text encoders into Invoke, but was unsuccessful, so I loaded the provided analogues instead.
I’ve also tried adding
enable_partial_loading: true
in the invokeai.yaml file and experimenting with various combinations ofmax_cache_ram_gb
,max_cache_vram_gb
anddevice_working_mem_gb
, but nothing seems to make a fundamental difference. (Note: “Nvidia sysmem fallback” is also disabled.)I really enjoy using Invoke—I periodically download it to see how it evolves—but I always end up returning to ComfyUI because even the SDXL models in Invoke run slightly slower, and I encounter OOM errors at resolutions where ComfyUI doesn’t even require tiling. I’d love to use Invoke because of its pleasant UI and fantastic inpainting capabilities, but unfortunately, optimization for non-high-end configurations still leaves much to be desired. If needed, I can provide additional information to help diagnose the issue.
Thank you for your attention.
What you expected to happen
I acknowledge that my configuration (8GB of VRAM and 32GB of RAM) isn’t ideal for the demanding Flux models. Nevertheless, both ComfyUI and Forge handle these models at speeds that are acceptable for me. I would like to confirm that the issue isn’t on my end, and I hope to see better optimization in Invoke. This is especially important since img2img, even at higher SDXL resolutions, and upscaling consistently lead to OOM errors in Invoke—whereas ComfyUI performs swiftly at even higher resolutions with lower resource usage. Additionally, when using tiled decoding in the VAE, ComfyUI is capable of handling extraordinarily high resolutions on a system with just 8GB of VRAM by Invoke’s standards.
How to reproduce the problem
enable_partial_loading: true
to the Invoke.yaml fileAdditional context
No response
Discord username
No response
The text was updated successfully, but these errors were encountered: