vulkan: 64-bit im2col #16135

jeffbolznv · 2025-09-20T19:30:09Z

Add variants of the im2col shaders that use buffer_device_address/buffer_reference, and use 64-bit address calculations. This is needed for large convolutions used in stable-diffusion.cpp.

I've been working on getting leejet/stable-diffusion.cpp#778 to work in Vulkan. The main thing that's missing is that it does 2d and 3d convolutions that have intermediate im2col buffers that are larger than 4GB. This change fixes the im2col part, I'll make a separate change for the matmul part.

Memory allocations larger than maxMemoryAllocationSize are not technically forbidden, and at least NVIDIA's windows driver will allocate more than 4GB.

0cc4m · 2025-09-21T08:23:46Z

Memory allocations larger than maxMemoryAllocationSize are not technically forbidden, and at least NVIDIA's windows driver will allocate more than 4GB.

This is true for allocations, but not for buffers. If I disable the allocation size check in ggml_vk_create_buffer your new test kinda runs on all my devices, but I'm not sure if it runs correctly. Validation layers complain about the buffer size and the descriptor range, of course.

I tried running your new im2col and im2col_3d tests:

On AMD (RADV) it does the allocation, but fails the test runs.
On Intel (ANV) it gets the correct result for im2col. im2col_3d fails because 16GB VRAM wasn't enough.
On Nvidia (proprietary Linux driver) it runs correctly.

On all three it takes very long to finish the test. The tests also used huge amounts of RAM (>80GB), not sure if that's the CPU backend or something else.

Add variants of the im2col shaders that use buffer_device_address/buffer_reference, and use 64-bit address calculations. This is needed for large convolutions used in stable-diffusion.cpp.

jeffbolznv · 2025-09-21T14:53:48Z

I've pushed a fix for the descriptor range validation failure. I'm not aware of one related to the buffer size.

The large memory usage and slowness is expected. The test framework ends up with multiple copies of the huge tensor, converted to f32. I don't intend to enable these tests by default.

I'm surprised the AMD driver is failing. Maybe it could have been related to the validation failure, but that's a bit surprising since that descriptor isn't actually used.

0cc4m · 2025-09-21T15:30:53Z

Mesa driver development seems to work by building stuff, optimizing it and fixing issues when they come up. If things don't come up, they often don't work, so this is probably another case of "nobody tried to do this yet". We'll have to open an issue about it, most likely.

jeffbolznv requested a review from 0cc4m as a code owner September 20, 2025 19:30

github-actions bot added testing Everything test related Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Sep 20, 2025

jeffbolznv added 2 commits September 21, 2025 09:50

vulkan: 64-bit im2col

34d4eb5

Add variants of the im2col shaders that use buffer_device_address/buffer_reference, and use 64-bit address calculations. This is needed for large convolutions used in stable-diffusion.cpp.

fix validation error for large im2col

e926474

jeffbolznv force-pushed the im2col_bda branch from 0ed0dcc to e926474 Compare September 21, 2025 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: 64-bit im2col #16135

vulkan: 64-bit im2col #16135

jeffbolznv commented Sep 20, 2025

Uh oh!

0cc4m commented Sep 21, 2025

Uh oh!

jeffbolznv commented Sep 21, 2025

Uh oh!

0cc4m commented Sep 21, 2025

Uh oh!

Uh oh!

vulkan: 64-bit im2col #16135

Are you sure you want to change the base?

vulkan: 64-bit im2col #16135

Conversation

jeffbolznv commented Sep 20, 2025

Uh oh!

0cc4m commented Sep 21, 2025

Uh oh!

jeffbolznv commented Sep 21, 2025

Uh oh!

0cc4m commented Sep 21, 2025

Uh oh!

Uh oh!