Has anyone had any success running WAN2.2 TI2V 5B on a GPU with 16 GB VRAM or less at resolutions lower than 832x480? #868

MrSnichovitch · 2025-09-30T22:34:26Z

MrSnichovitch
Sep 30, 2025

I've been struggling with this model, trying to get it to produce usable results while still staying within the 16 GB boundary of my RX 7600 XT (ROCm 6.4.3, Linux). It's the VAE process that's the problem, as an 832x480 resolution requires 20895.80 MB VAE compute buffer size for a 2 second video. If I run --vae-on-cpu, it works, but takes 27 min 15 seconds for a 2 second clip, which isn't practical or viable. sd.cpp appears to ignore --vae-tiling with WAN models, so that's of no help.

WAN2.1 has the same problem with the VAE buffer size, but can be worked around by lowering the resolution down to 416x240. It's fully capable but much slower than 2.2, though (an 8 second clip at this resolution takes ~23 minutes), which is why I'd like to get 2.2 working.

Here's the problem. When attempting to do the same half resolution (416x240) on 2.2, it produces garbage results while 2.1's are clear. I can do IMG2VID with an -i reference image at the same res, but the video is still a garbled mess, even though the reference image subject is somewhat recognizable. I've tried every combination of options I can think of, from varying --steps and --cfg values, to --flow-shift settings, to combinations of different --sampling-method and --scheduler, to --diffusion-conv-direct and/or --vae-conv-direct to no avail. WAN2.2 simply doesn't "like" this resolution.

If I up the resolution to 3/4 size (624x360) with just a text prompt, I get valid video output, but the video itself is "cropped" like WAN is still trying to generate a video at 832x480 and cutting off the missing pixels, which renders it useless. If I try an IMG2VID at this resolution, I get a crash/segfault regardless of the -i image size:

[INFO ] stable-diffusion.cpp:2618 - IMG2VID
[INFO ] ggml_extend.hpp:1648 - wan_vae offload params (1344.24 MB, 196 tensors) to runtime backend (ROCm0), taking 0.41s
/home/b/Reckless/AlernativeBuilds/2025-09-24/stable-diffusion.cpp/ggml/src/ggml.c:3436: GGML_ASSERT(ggml_nelements(a) == ne0*ne1*ne2*ne3) failed
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x683376) [0x56382a921376]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x6837b3) [0x56382a9217b3]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x683950) [0x56382a921950]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x68a2e7) [0x56382a9282e7]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x13d81c) [0x56382a3db81c]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x13e628) [0x56382a3dc628]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x13f5f1) [0x56382a3dd5f1]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x13fc85) [0x56382a3ddc85]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x1016a8) [0x56382a39f6a8]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x101ddd) [0x56382a39fddd]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x106926) [0x56382a3a4926]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0xe650f) [0x56382a38450f]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x46248) [0x56382a2e4248]
/usr/lib/libc.so.6(+0x27675) [0x7f3cec827675]
/usr/lib/libc.so.6(__libc_start_main+0x89) [0x7f3cec827729]
/home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd(+0x4abc5) [0x56382a2e8bc5]
zsh: IOT instruction (core dumped)  /home/b/Reckless/AlernativeBuilds/2025-09-24/plonkus3/bin/sd -M vid_gen

This is especially weird because that crash doesn't happen at 416x240.

So, is anyone else experiencing these types of problems with lower resolutions/lower VRAM cards, or do you know of some way to lower the VAE compute buffer size to < 16 GB?

For additional reference, I'm running the Wan2.2-TI2V-5B-Q8_0.gguf model, and have tried both the fp16 and the Q8_0 versions of the umt5_xxl text encoder.

MrSnichovitch · 2025-10-02T23:07:46Z

MrSnichovitch
Oct 2, 2025
Author

So... I managed to figure a couple of things out through a ridiculously long series of tests and web searches:

Using --diffusion-fa with ROCm is absolutely necessary to get viable, non-scrambled output. --clip-on-cpu is needed to avoid black output images, but I've been using that with Chroma as kind of a default, so I forgot to check it earlier.
WAN2.2 really only "likes" 16:9 ratio images, especially those evenly divisible by 8. Why 832x480, which is a weird 26:15 ratio, is stated as being a recommended resolution in varying online docs is beyond me. Perhaps it was mistaken as a "480p" resolution? What most folks call 480p refers back to the old 640x480 4:3 aspect ratio or the oddball DVD 3:2 res of 720x480. Closest 16:9 ratio res would be 832x468, although it's not a clean div-by-8 value. Nearest clean values above and below this would be either 896x504 or 768x432.
640x360 is the largest 16:9 resolution I can use where the VAE process can still be run on VRAM. Using this also appears to solve the cropping problem I noted above. But, that's really prompt dependent... gotta specify things like "wide shot, centered on subject", etc. VAE VRAM usage for this res reports wan_vae compute buffer size: 12235.26 MB(VRAM) in the sd.cpp log output, but amdgpu_top reports well over 14,000 MiB in use while generating a 24 FPS/121 Frame/5 Second video.
512x288 is the largest 16:9 resolution where I can use IMG2VID. Otherwise I still get the crash/segfault as described above.
I have not tested IMG2VID on any res higher than 640x360. (Really no point, since my primary goal is to see what I can run with VAE in VRAM).

Does anyone else have any info/insights they can share for additional reference?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Has anyone had any success running WAN2.2 TI2V 5B on a GPU with 16 GB VRAM or less at resolutions lower than 832x480? #868

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Has anyone had any success running WAN2.2 TI2V 5B on a GPU with 16 GB VRAM or less at resolutions lower than 832x480? #868

Uh oh!

MrSnichovitch Sep 30, 2025

Replies: 1 comment

Uh oh!

MrSnichovitch Oct 2, 2025 Author

MrSnichovitch
Sep 30, 2025

MrSnichovitch
Oct 2, 2025
Author