Replies: 4 comments 4 replies
-
Found |
Beta Was this translation helpful? Give feedback.
-
I've really only had any luck debugging by either running the same code on CPU, or by piping another buffer that I then write values into. The former can sometimes be easy, but it depends on the types used. The latter is always a pain, but it could be made easier with some utilities - though I think any utilities would be dependent on a runtime, like |
Beta Was this translation helpful? Give feedback.
-
Okay, I managed to create a logger that writes into a buffer using atomic counter to bump the cursor. One of the annoyances I have faced in the process was that
So I'm currently forced to deal with message sizes known at compile time. Okay, if slices do not work, I can do pointers, but nope:
Creating slice from the pointer? Nope:
I must admit, this is quite painful. |
Beta Was this translation helpful? Give feedback.
-
In the end I ended up with this abomination that works around current rust-gpu limitations: #[inline(always)]
fn to_ascii(n: u8) -> u8 {
match n {
0 => b'0',
1 => b'1',
2 => b'2',
3 => b'3',
4 => b'4',
5 => b'5',
6 => b'6',
7 => b'7',
8 => b'8',
9 => b'9',
_ => {
unreachable!();
}
}
}
#[inline(always)]
fn log_message<const N: usize>(
thread_index: u32,
message: [u8; N],
debug_counter: &mut u32,
debug: &mut [u8],
) {
let message_length = N as u32;
let write_at = unsafe {
spirv_std::arch::atomic_i_add::<
u32,
{ Scope::Workgroup as u32 },
{ Semantics::ACQUIRE_RELEASE.bits() },
>(debug_counter, message_length + 8)
};
let thread_index = {
let a = thread_index / 1_000_000;
let b = (thread_index % 1_000_000) / 100_000;
let c = (thread_index % 100_000) / 10_000;
let d = (thread_index % 10_000) / 1000;
let e = (thread_index % 1000) / 100;
let f = (thread_index % 100) / 10;
let g = thread_index % 10;
[
to_ascii(a as u8),
to_ascii(b as u8),
to_ascii(c as u8),
to_ascii(d as u8),
to_ascii(e as u8),
to_ascii(f as u8),
to_ascii(g as u8),
]
};
debug[(write_at + 0) as usize] = thread_index[0];
debug[(write_at + 1) as usize] = thread_index[1];
debug[(write_at + 2) as usize] = thread_index[2];
debug[(write_at + 3) as usize] = thread_index[3];
debug[(write_at + 4) as usize] = thread_index[4];
debug[(write_at + 5) as usize] = thread_index[5];
debug[(write_at + 6) as usize] = thread_index[6];
debug[(write_at + 7) as usize] = b' ';
for index in 0..message_length {
debug[(write_at + 8 + index) as usize] = message[index as usize];
}
} It can log a fixed size message with
At the end I can read both on the host and write into the text file for further inspection. It allowed me to find the first issue: out of bounds access. The second issue was extremely confusing and related to the fact that due to divergence of execution what I expected to be workgroup-sized batch wasn't really progressing in phases as I expected, so I needed a control barrier at the end of the loop to ensure everything moves predictably. It is annoying that different GPUs behave differently, but I am happy I was able to find the cause of it and fix it. Hopefully this helps someone in the future. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I just had my first WTF with shaders.
Something I just wrote works fine and correctly on AMD GPU (tested on 7600 XT), but not using llvmpipe (which I use in CI and which is typically available on Linux even if you don't have a GPU).
No errors anywhere, no indication of an error except incorrect (in my case empty) result.
Added panic at the beginning of a shader, still no errors and no indication that the code panicked.
This is very confusing and frustrating situation to be in. The only way I see is to write some breadcrumbs into buffers and see where things lead. But this is a really bad experience compared to what we have with applications running on CPU where we have
println!()
, interactive debuggers with ability to step through function calls and inspect intermediate variables, etc.I am newbie in GPU development and don't know if this is something to be improved on
rust-gpu
side,wgpu
side,llvmpipe
, all of them or something else entirely.Beta Was this translation helpful? Give feedback.
All reactions