[bug]: Excessive VRAM usage on SDXL VAE decode stage #7587

fintarn · 2025-01-23T04:12:33Z

Is there an existing issue for this problem?

I have searched the existing issues

Operating system

Windows

GPU vendor

Nvidia (CUDA)

GPU model

3080

GPU VRAM

10GB

Version number

5.6.0

Browser

Invoke Client

Python dependencies

No response

What happened

Installed Invoke from fresh install
Installed models, did some Flux-dev generations without issues
Load a SDXL CheckPoint with sdxl_vae with 32bit precision
Generate image to canvas, notice after all the steps are complete nothing is happening. This would at the time of the VAE decode, before the image is completed. Checked task manager and VRAM is filled and spilling into system RAM by 1.5-3 GB
Behaviour is consistent, even with 16-bit VAE
Afterwards VRAM is unloaded to 5-6 GB for next generation, again hitting very high levels and spilling over into RAM at the time of VAE decode.

Tested resolutions: 1152 x 896 & 1024 x 1024, same behaviour
This used to work well in previous versions by the time support for drawing pads were introduced (5.1?)
This is a standard installation, with exception for invoke.yaml being edited and below parameter being added:
enable_partial_loading: true

Not sure if it's for my system only but I never had this issue with Forge or ComfyUI.
Any idea what may be causing this weirdness with memory management and how one might fix it?
I have sysmem fallback enabled and I don't want to change it. Whats weird is there is no issues with Flux, even with full-size T5 which consumes more VRAM by magnitudes.

What you expected to happen

I expect to generate a picture with an SDXL model without needing to use 11.5-13 GB VRAM at the time of VAE decode.

How to reproduce the problem

Use 10GB VRAM card
Standard installation Invoke
Invoke.yaml - Add: enable_partial_loading: true
Generate a 1152 x 896 or 1024 x 1024 image to canvas using SDXL + sdxl_vae.safetensors (or auto)
At the time of VAE decode, excessive VRAM is used.

Additional context

No response

Discord username

No response

freelancer2000 · 2025-01-23T20:40:09Z

I am also seeing an issue on FLUX side as well. I can generate a couple of images at 2-3 s/it ... and after a while the whole PC starts lagging and generating takes a very long time 20-30 seconds/it, unless I close python/invoke and start it up again.

tampadesignr · 2025-01-25T16:22:58Z

another confirmation of this issue

Yaruze66 · 2025-02-06T01:43:53Z

I can also confirm this issue. SDXL generations consume more VRAM and overall memory than ComfyUI, and Invoke even slightly slower. Moreover, attempts to upscale or run img2img at higher resolutions almost always result in an OOM error (while in ComfyUI such resolutions don’t even require tiling). With the addition of enable_partial_loading: true in the "invokeai.yaml", resource usage becomes roughly the same as ComfyUI, but the generation speed drops even further. While SDXL is still acceptable, Flux is a whole different story for me—I’m getting ~20 seconds per iteration, and I eventually encounter an OOM error, whereas in Comfy everything generates at a much more acceptable rate of about 3.2–3.3 seconds per iteration. And that’s considering the fact that Comfy uses significantly fewer resources even with Flux models, and it even allows me to run a x2 Hires Fix (yes, it’s slow and demanding, but even then it’s faster than the basic 1024x1024 image generation in Invoke). For the record, I don’t use any extra optimization nodes (if such things even exist) or configs in Comfy—it’s pretty much a standard workflow.

fintarn added the bug Something isn't working label Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug]: Excessive VRAM usage on SDXL VAE decode stage #7587

[bug]: Excessive VRAM usage on SDXL VAE decode stage #7587

fintarn commented Jan 23, 2025 •

edited

Loading

freelancer2000 commented Jan 23, 2025

tampadesignr commented Jan 25, 2025

Yaruze66 commented Feb 6, 2025 •

edited

Loading

[bug]: Excessive VRAM usage on SDXL VAE decode stage #7587

[bug]: Excessive VRAM usage on SDXL VAE decode stage #7587

Comments

fintarn commented Jan 23, 2025 • edited Loading

Is there an existing issue for this problem?

Operating system

GPU vendor

GPU model

GPU VRAM

Version number

Browser

Python dependencies

What happened

What you expected to happen

How to reproduce the problem

Additional context

Discord username

freelancer2000 commented Jan 23, 2025

tampadesignr commented Jan 25, 2025

Yaruze66 commented Feb 6, 2025 • edited Loading

fintarn commented Jan 23, 2025 •

edited

Loading

Yaruze66 commented Feb 6, 2025 •

edited

Loading