Has anyone had any success running WAN2.2 TI2V 5B on a GPU with 16 GB VRAM or less at resolutions lower than 832x480? #868
Unanswered
MrSnichovitch
asked this question in
Q&A
Replies: 1 comment
-
So... I managed to figure a couple of things out through a ridiculously long series of tests and web searches:
Does anyone else have any info/insights they can share for additional reference? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I've been struggling with this model, trying to get it to produce usable results while still staying within the 16 GB boundary of my RX 7600 XT (ROCm 6.4.3, Linux). It's the VAE process that's the problem, as an 832x480 resolution requires 20895.80 MB VAE compute buffer size for a 2 second video. If I run
--vae-on-cpu
, it works, but takes 27 min 15 seconds for a 2 second clip, which isn't practical or viable. sd.cpp appears to ignore--vae-tiling
with WAN models, so that's of no help.WAN2.1 has the same problem with the VAE buffer size, but can be worked around by lowering the resolution down to 416x240. It's fully capable but much slower than 2.2, though (an 8 second clip at this resolution takes ~23 minutes), which is why I'd like to get 2.2 working.
Here's the problem. When attempting to do the same half resolution (416x240) on 2.2, it produces garbage results while 2.1's are clear. I can do IMG2VID with an
-i
reference image at the same res, but the video is still a garbled mess, even though the reference image subject is somewhat recognizable. I've tried every combination of options I can think of, from varying--steps
and--cfg
values, to--flow-shift
settings, to combinations of different--sampling-method
and--scheduler
, to--diffusion-conv-direct
and/or--vae-conv-direct
to no avail. WAN2.2 simply doesn't "like" this resolution.If I up the resolution to 3/4 size (624x360) with just a text prompt, I get valid video output, but the video itself is "cropped" like WAN is still trying to generate a video at 832x480 and cutting off the missing pixels, which renders it useless. If I try an IMG2VID at this resolution, I get a crash/segfault regardless of the
-i
image size:This is especially weird because that crash doesn't happen at 416x240.
So, is anyone else experiencing these types of problems with lower resolutions/lower VRAM cards, or do you know of some way to lower the VAE compute buffer size to < 16 GB?
For additional reference, I'm running the Wan2.2-TI2V-5B-Q8_0.gguf model, and have tried both the fp16 and the Q8_0 versions of the umt5_xxl text encoder.
Beta Was this translation helpful? Give feedback.
All reactions