Skip to content

Conversation

@ErichDonGubler
Copy link
Member

@ErichDonGubler ErichDonGubler commented Oct 23, 2025

Connections

Description

For cases where a buffer is mapped_at_creation, our current implementation of Buffer::create initializes the buffer's internal state with BufferMapState::Init (which contains a staging buffer underneath the hood) for a descriptor requesting MAP_READ that is copied to a host-backed buffer . MAP_WRITE works a little differently, starting from the beginning with a host-backed buffer.

Init does a buffer copy between the staging buffer and the host-backed buffer in the device's queue when the buffer is unmapped. However, Buffer::map_async (correctly) assumes that a host-backed buffer need not wait for anything in the queue. This results in a bug where map_async doesn't actually wait long enough for the device queue to complete its operations before resolving. Oops!

Up to the point where a buffer is unmapped after being mapped at creation, MAP_READ, MAP_WRITE, and even non-MAP_* buffers' capabilities are the same. That is, we should be able to get mutable slices for mapped ranges, no matter what. So, make MAP_READ just initialize its internal state in the same way as with MAP_WRITE.

Testing

Added webgpu:api,operation,buffers,map:mapAsync,read:*, which is expected to fail in the first commit, and is resolved with the second commit.

Squash or Rebase?

Rebase.

@ErichDonGubler ErichDonGubler changed the title WIP: fix(core): use BufferMapState::Active for any BufferUsages::MAP_* flags fix(core): use BufferMapState::Active for any BufferUsages::MAP_* flags Oct 23, 2025
@ErichDonGubler
Copy link
Member Author

ErichDonGubler commented Oct 24, 2025

Hmmm…I'm not immediately sure how to fix the CI errors, which all seem to be on GL backends (incl. WebGL). I'm not familiar with GL backends, so I'm sure I'm doing something wrong with bookkeeping there, but not what that might be.

@ErichDonGubler ErichDonGubler force-pushed the erichdongubler-push-dazed-adept-horse branch 2 times, most recently from 60bd075 to 10d652f Compare October 24, 2025 22:04
@ErichDonGubler ErichDonGubler marked this pull request as ready for review October 24, 2025 22:04
@ErichDonGubler
Copy link
Member Author

Going to mark this as ready for review. I'm not sure how to resolve CI yet, but at least the approach can be validated, and maybe a reviewer will know more than I do about how to resolve this. 😖

@teoxoy teoxoy self-assigned this Oct 27, 2025
@ErichDonGubler ErichDonGubler force-pushed the erichdongubler-push-dazed-adept-horse branch from 10d652f to 146f37e Compare October 27, 2025 17:46
@ErichDonGubler

This comment was marked as resolved.

@ErichDonGubler
Copy link
Member Author

ErichDonGubler commented Oct 27, 2025

Hmm, some WebGL tests failed with ac38c82 with recursive snatch log errors, but I'm not sure that they're related to this PR.

Dump of the error log:
failures:

---- shadow::test_webgl output ----
    info output:
        Testing using adapter: AdapterInfo {
            name: "ANGLE (Google, Vulkan 1.3.0 (SwiftShader Device (Subzero) (0x0000C0DE)), SwiftShader driver)",
            vendor: 0,
            device: 0,
            device_type: Cpu,
            device_pci_bus_id: "",
            driver: "",
            driver_info: "WebGL 2.0 (OpenGL ES 3.0 Chromium)",
            backend: Gl,
            transient_saves_memory: false,
        }
        TEST: shadow
        Only vertex shader is present. Creating an empty fragment shader
    
    error output:
        panicked at wgpu-hal/src/gles/device.rs:1264:44:
        called `Option::unwrap()` on a `None` value
        
        Stack:
        
        Error
            at http://127.0.0.1:41349/wasm-bindgen-test:1774:21
            at logError (http://127.0.0.1:41349/wasm-bindgen-test:15:18)
            at imports.wbg.__wbg_new_f346c2f0d1ef8376 (http://127.0.0.1:41349/wasm-bindgen-test:1773:66)
            at wgpu_examples-a17d3159d541b1d4.wasm.__wbg_new_f346c2f0d1ef8376 externref shim (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[55624]:0x1073f48)
            at wgpu_examples-a17d3159d541b1d4.wasm.wasm_bindgen_test::__rt::Context::new::panic_handling::Error::new::hf74adf7e0df59c0e (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[33682]:0xf2bc30)
            at wgpu_examples-a17d3159d541b1d4.wasm.wasm_bindgen_test::__rt::Context::new::panic_handling::h2b93081e962d06e3 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[11193]:0xb7d91f)
            at wgpu_examples-a17d3159d541b1d4.wasm.wasm_bindgen_test::__rt::Context::new::{{closure}}::{{closure}}::hfdb32d2f4b85e1d4 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[31010]:0xeec89e)
            at wgpu_examples-a17d3159d541b1d4.wasm.std::panicking::rust_panic_with_hook::hc276d0501ad5b954 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[17093]:0xcf4680)
            at wgpu_examples-a17d3159d541b1d4.wasm.std::panicking::begin_panic_handler::{{closure}}::h23ff416a921468b4 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[22541]:0xdeb832)
            at wgpu_examples-a17d3159d541b1d4.wasm.std::sys::backtrace::__rust_end_short_backtrace::h16ab72765b32282d (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[56647]:0x1077648)
        
        
        at logError (http://127.0.0.1:41349/wasm-bindgen-test:15:18)
        at imports.wbg.__wbg_new_f346c2f0d1ef8376 (http://127.0.0.1:41349/wasm-bindgen-test:1773:66)
        at wgpu_examples-a17d3159d541b1d4.wasm.__wbg_new_f346c2f0d1ef8376 externref shim (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[55624]:0x1073f48)
        at wgpu_examples-a17d3159d541b1d4.wasm.wasm_bindgen_test::__rt::Context::new::panic_handling::Error::new::hf74adf7e0df59c0e (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[33682]:0xf2bc30)
        at wgpu_examples-a17d3159d541b1d4.wasm.wasm_bindgen_test::__rt::Context::new::panic_handling::h2b93081e962d06e3 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[11193]:0xb7d91f)
        at wgpu_examples-a17d3159d541b1d4.wasm.wasm_bindgen_test::__rt::Context::new::{{closure}}::{{closure}}::hfdb32d2f4b85e1d4 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[31010]:0xeec89e)
        at wgpu_examples-a17d3159d541b1d4.wasm.std::panicking::rust_panic_with_hook::hc276d0501ad5b954 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[17093]:0xcf4680)
        at wgpu_examples-a17d3159d541b1d4.wasm.std::panicking::begin_panic_handler::{{closure}}::h23ff416a921468b4 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[22541]:0xdeb832)
        at wgpu_examples-a17d3159d541b1d4.wasm.std::sys::backtrace::__rust_end_short_backtrace::h16ab72765b32282d (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[56647]:0x1077648)
    
    
    panicked at wgpu-core/src/device/resource.rs:1495:59:
    thread '<unnamed>' attempted to acquire a snatch lock recursively.
    - Currently trying to acquire a read lock at wgpu-core/src/device/resource.rs:1495:59
    disabled backtrace
    - Previously acquired a read lock at wgpu-core/src/device/resource.rs:2958:49
    disabled backtrace
    
    Stack:
    
    Error
        at http://127.0.0.1:41349/wasm-bindgen-test:1774:21
        at logError (http://127.0.0.1:41349/wasm-bindgen-test:15:18)
        at imports.wbg.__wbg_new_f346c2f0d1ef8376 (http://127.0.0.1:41349/wasm-bindgen-test:1773:66)
        at wgpu_examples-a17d3159d541b1d4.wasm.__wbg_new_f346c2f0d1ef8376 externref shim (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[55624]:0x1073f48)
        at wgpu_examples-a17d3159d541b1d4.wasm.wasm_bindgen_test::__rt::Context::new::panic_handling::Error::new::hf74adf7e0df59c0e (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[33682]:0xf2bc30)
        at wgpu_examples-a17d3159d541b1d4.wasm.wasm_bindgen_test::__rt::Context::new::panic_handling::h2b93081e962d06e3 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[11193]:0xb7d91f)
        at wgpu_examples-a17d3159d541b1d4.wasm.wasm_bindgen_test::__rt::Context::new::{{closure}}::{{closure}}::hfdb32d2f4b85e1d4 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[31010]:0xeec89e)
        at wgpu_examples-a17d3159d541b1d4.wasm.std::panicking::rust_panic_with_hook::hc276d0501ad5b954 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[17093]:0xcf4680)
        at wgpu_examples-a17d3159d541b1d4.wasm.std::panicking::begin_panic_handler::{{closure}}::h23ff416a921468b4 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[22541]:0xdeb803)
        at wgpu_examples-a17d3159d541b1d4.wasm.std::sys::backtrace::__rust_end_short_backtrace::h16ab72765b32282d (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[56647]:0x1077648)
    
    
    panicked at wgpu-hal/src/gles/device.rs:1264:44:
    called `Option::unwrap()` on a `None` value
    
    Stack:
    
    Error
        at http://127.0.0.1:41349/wasm-bindgen-test:1774:21
        at logError (http://127.0.0.1:41349/wasm-bindgen-test:15:18)
        at imports.wbg.__wbg_new_f346c2f0d1ef8376 (http://127.0.0.1:41349/wasm-bindgen-test:1773:66)
        at wgpu_examples-a17d3159d541b1d4.wasm.__wbg_new_f346c2f0d1ef8376 externref shim (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[55624]:0x1073f48)
        at wgpu_examples-a17d3159d541b1d4.wasm.wasm_bindgen_test::__rt::Context::new::panic_handling::Error::new::hf74adf7e0df59c0e (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[33682]:0xf2bc30)
        at wgpu_examples-a17d3159d541b1d4.wasm.wasm_bindgen_test::__rt::Context::new::panic_handling::h2b93081e962d06e3 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[11193]:0xb7d91f)
        at wgpu_examples-a17d3159d541b1d4.wasm.wasm_bindgen_test::__rt::Context::new::{{closure}}::{{closure}}::hfdb32d2f4b85e1d4 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[31010]:0xeec89e)
        at wgpu_examples-a17d3159d541b1d4.wasm.std::panicking::rust_panic_with_hook::hc276d0501ad5b954 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[17093]:0xcf4680)
        at wgpu_examples-a17d3159d541b1d4.wasm.std::panicking::begin_panic_handler::{{closure}}::h23ff416a921468b4 (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[22541]:0xdeb832)
        at wgpu_examples-a17d3159d541b1d4.wasm.std::sys::backtrace::__rust_end_short_backtrace::h16ab72765b32282d (http://127.0.0.1:41349/wasm-bindgen-test_bg.wasm:wasm-function[56647]:0x1077648)

[2025-10-27T19:18:32Z DEBUG ureq::response] Body entirely buffered (length: 70)
[2025-10-27T19:18:32Z DEBUG ureq::pool] adding stream to pool: http|127.0.0.1|43809 -> Stream(TcpStream { addr: 127.0.0.1:52486, peer: 127.0.0.1:43809, fd: 4 })
[2025-10-27T19:18:32Z DEBUG ureq::unit] response 200 to DELETE http://127.0.0.1:43809/session/2a28b0a47c449b44d9c0ba6084ab26e6/window
[2025-10-27T19:18:32Z DEBUG wasm_bindgen_test_runner::headless] got: {"sessionId":"2a28b0a47c449b44d9c0ba6084ab26e6","status":0,"value":[]}
[2025-10-27T19:18:32Z DEBUG ureq::stream] dropping stream: Stream(TcpStream { addr: 127.0.0.1:52486, peer: 127.0.0.1:43809, fd: 4 })
Error: some tests failed

I guess we'll see if they reproduce with 34a3316, which should be the same repository state.

EDIT: It does. Ugh. 😩

@ErichDonGubler ErichDonGubler force-pushed the erichdongubler-push-dazed-adept-horse branch from 34a3316 to 684859e Compare October 28, 2025 15:28
@teoxoy
Copy link
Member

teoxoy commented Oct 30, 2025

I think the WASM/WebGL failure is due to the GL backend assuming it won't see a buffer with MAP_READ & MAP_WRITE usages at the same time.

if emulate_map && desc.usage.intersects(wgt::BufferUses::MAP_WRITE) {
return Ok(super::Buffer {
raw: None,
target,
size: desc.size,
map_flags: 0,
data: Some(Arc::new(MaybeMutex::new(vec![0; desc.size as usize]))),
offset_of_current_mapping: Arc::new(MaybeMutex::new(0)),
});
}

This block shouldn't early return if the buffer has MAP_READ usage. We still need to create the raw GL buffer in that case.

@jimblandy
Copy link
Member

@cwfitzgerald @magcius This PR adds MAP_WRITE usage to any hal buffer that is created with mapped_at_creation at the wgpu_core level, rather than using a temporary buffer for the initialization. This means that we're adding MAP_WRITE for the lifetime of the buffer, in order to initialize it more efficiently. Is that going to affect performance?

@ErichDonGubler
Copy link
Member Author

@cwfitzgerald @magcius This PR adds MAP_WRITE usage to any hal buffer that is created with mapped_at_creation at the wgpu_core level, rather than using a temporary buffer for the initialization. This means that we're adding MAP_WRITE for the lifetime of the buffer, in order to initialize it more efficiently. Is that going to affect performance?

I feel like we should be able to recover the performance here by changing HAL usages after unmapping, but I don't know enough to determine if that's allowed. Is that possible, @teoxoy?

@ErichDonGubler ErichDonGubler force-pushed the erichdongubler-push-dazed-adept-horse branch 2 times, most recently from ca86fa8 to 1ee2372 Compare October 30, 2025 15:11
@cwfitzgerald
Copy link
Member

As long as it's only adding map usages to cpu-resident buffers, which it appears to be, this is totally fine. Buffer usages only matter in so much as they determine which heap to put the buffer on.

@jimblandy
Copy link
Member

jimblandy commented Oct 30, 2025

I feel like we should be able to recover the performance here by changing HAL usages after unmapping

Generally you can't change usages once the buffer is created, because they determine what sort of memory the buffer is in. You need to create a new buffer.

I guess this is one effect of using the temporary buffer: it decoupled the requirements for initialization from the requirements for use.

@jimblandy
Copy link
Member

As long as it's only adding map usages to cpu-resident buffers, which it appears to be, this is totally fine. Buffer usages only matter in so much as they determine which heap to put the buffer on.

Isn't MAP_WRITE used to determine whether a buffer's contents need to be copied to the GPU when it is unmapped?

@jimblandy
Copy link
Member

I don't quite understand what this is talking about, but it seems like it's using MAP_WRITE to guide our behavior over the course of the buffer's lifetime.

    // If this is a read mapping, ideally we would use a `clear_buffer` command
    // before reading the data from GPU (i.e. `invalidate_range`). However, this
    // would require us to kick off and wait for a command buffer or piggy back
    // on an existing one (the later is likely the only worthwhile option). As
    // reading uninitialized memory isn't a particular important path to
    // support, we instead just initialize the memory here and make sure it is
    // GPU visible, so this happens at max only once for every buffer region.
    //
    // If this is a write mapping zeroing out the memory here is the only
    // reasonable way as all data is pushed to GPU anyways.

@jimblandy
Copy link
Member

wgpu_hal::vulkan uses MAP_WRITE to select gpu_alloc::UsageFlags::UPLOAD, whose docs suggest that it might affect GPU/CPU caching properties. I have no idea how much this matters.

https://github.com/zakarumych/gpu-alloc/blob/649ca44f9d403f65b450185ca23627cd7da817f9/gpu-alloc/src/usage.rs#L21-L31

        /// Hints allocator that memory will be used for data downloading.
        /// Allocator will strongly prefer host-cached memory.
        /// Implies `HOST_ACCESS` flag.
        const DOWNLOAD = 0x04;

        /// Hints allocator that memory will be used for data uploading.
        /// If `DOWNLOAD` flag is not set then allocator will assume that
        /// host will access memory in write-only manner and may
        /// pick not host-cached.
        /// Implies `HOST_ACCESS` flag.
        const UPLOAD = 0x08;

@jimblandy
Copy link
Member

jimblandy commented Oct 30, 2025

wgpu_hal::metal uses MAP_WRITE to decide whether to set MTLResourceOptions::CPUCacheModeWriteCombined:

options.set(MTLResourceOptions::CPUCacheModeWriteCombined, map_write);

I will defer to people with more experience than I have. But it doesn't seem like MAP_WRITE is something we can just set on hal buffers for convenience.

@cwfitzgerald
Copy link
Member

It sounds like we need to disconnect the buffer usages from the memory heap decision. Hal should expose some way for wgpu-core to tell it about what heap properties it should have, and then core decides based on the user's usages.

@ErichDonGubler ErichDonGubler force-pushed the erichdongubler-push-dazed-adept-horse branch 2 times, most recently from 08562e9 to 50a7936 Compare October 31, 2025 15:11
@ErichDonGubler
Copy link
Member Author

After doing some pairing with @teoxoy and @cwfitzgerald yesterday, we determined that the reason WASM-on-WebGL wasn't working because the emulated mapping workaround in gles was (incorrectly) not flushing the contents of its host-backed Vec<u8> to its correspondent GL runtime buffer. This hadn't been exposed before. I'm going to work on an abstraction that will make these cases clearer, as a dependency of this PR.

@ErichDonGubler ErichDonGubler marked this pull request as draft October 31, 2025 15:34
@ErichDonGubler ErichDonGubler force-pushed the erichdongubler-push-dazed-adept-horse branch from 50a7936 to 5bdbfea Compare October 31, 2025 19:47
… flags

For cases where a buffer is `mapped_at_creation`, our current
implementation of `Buffer::create` initializes the buffer's internal
state with `BufferMapState::Init` (which contains a staging buffer
underneath the hood) for a descriptor requesting `MAP_READ` that is
copied to a host-backed buffer . `MAP_WRITE` works a little differently,
starting from the beginning with a host-backed buffer.

`Init` does a buffer copy between the staging buffer and the host-backed
buffer in the device's queue when the buffer is `unmap`ped. However,
`Buffer::map_async` (correctly) assumes that a host-backed buffer need
not wait for anything in the queue. This results in a bug where
`map_async` doesn't actually wait long enough for the device queue to
complete its operations before resolving. Oops!

Up to the point where a buffer is unmapped after being mapped at
creation, `MAP_READ` and `MAP_WRITE` buffers' capabilities are the same.
That is, we should be able to get mutable slices for mapped ranges, no
matter what. So, make `MAP_READ` just initialize its internal state in
the same way as with `MAP_WRITE`.
@ErichDonGubler ErichDonGubler force-pushed the erichdongubler-push-dazed-adept-horse branch from 5bdbfea to bf25b89 Compare October 31, 2025 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Read-mapped buffer content incorrect unless a submission happens between write and read

4 participants