-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add create_*_pipeline_async()
#3794
Comments
How would we go about implementing this on vk/dx12/metal backends? A dedicated OS-thread for each pipeline? |
The best solution would probably require wgpu to use a thread pool, since spawning OS threads for individual jobs like that might have some decent overhead. If so I'd definitely prefer if wgpu had a way to disable its own thread pool. My understanding is that this is an API that is mostly useful for browser usage, since WebGPU in a browser doesn't yet have any way to do multithreading. In a native context I should just be able to call |
How is this function implemented in the browser? Do they use thread pools for this? |
I wrote the original issue simply as “here is a discrepancy with the spec” without further thought, but here are some further thoughts: My use of This together with @PJB3005's point suggests that the solution here and for similar situations might be for Certainly this could be done by a layer on top of |
These would be very useful for us, we bulk create a lot of APIs and it's a significant cost on some machines to do that concurrently - especially with webgl backend. I've been tempted to try and write a hypothetical "DeviceExt::create_render_pipeline_bulk()" method, but async would solve it much better. |
We definitely don't want to utilize or make threads on our own. In line with the other async functions, a simple native implementation of this api would be calling the standard This is a hard decision to make, as it's very hard to paper over the differences between native and web. |
So we've been musing about this same problem in the webgpu.h standardization meetings and we have come up with a possible solution that we're asking for feedback on. The rough C solution is here but I will translate this to the rust api: type Task = Box<dyn FnOnce() + Send>;
type TaskCallback = Box<dyn Fn(Task) + Send + Sync>
// Maybe actually the device
struct InstanceDescriptor {
...
// Callback which will be called when the implementation wants to do work on another thread.
// If this is not provided, the implementation will not do any work on any threads.
//
// The callback will be called with the task that the runtime wants to do on a thread.
// This task should be spawned onto a threadpool, immediately invoked inside the callback, or otherwise
// made to execute.
//
// It should be assumed that the work spawned on this callback will be of substantial time (1ms+) and pure compute.
task_executor: Option<TaskCallback>,
...
}
impl Device {
// On webgpu will call createRenderPipeline.
//
// On native will:
// - If allow_async is false will create the render pipeline inside the call.
// - If allow_async is true, the implementation is allowed (but not required) to spawn a
// job on the task callback to do the work of compilation if such a callback exists. This leads to
// less predicable performance but increased overall performance as compilation is parallelized.
fn create_render_pipeline(&self, desc: RenderPipelineDescriptor, allow_async: bool) -> RenderPipeline;
// On webgpu will call createRenderPipelineAsync.
//
// On native will:
// - Spawn a job on the instance's `task_executor` if it exists to generate the pipeline. Otherwise:
// - Create the render pipeline inside the call.
async fn create_render_pipeline_async(&self, desc: RenderPipelineDescriptor) -> RenderPipeline;
} This api should allow people to use arbitrary integrations: let desc = InstanceDescriptor {
...
task_executor: Some(Box::new(|task| tokio::spawn_blocking(task)))
} let desc = InstanceDescriptor {
...
task_executor: Some(Box::new(|task| rayon::spawn(task)))
} let my_fancy_threadpool_spawner: Arc<T> = ...;
let desc = InstanceDescriptor {
...
task_executor: Some(Box::new(move |task| my_fancy_threadpool_spawner.spawn(task)))
} Looking forward to people's thoughts on this. This kind of design will also open the door to other possible optimizations like this. |
Why do we have to worry about this at all? Why can't the user just
|
In other words - we have this thread-safe API, so the whole point should be to make the user deal with that stuff. @kpreid, can't you just write your own |
There's a couple considerations here that are pushing towards having internal handling:
The question of "why do we care at all" is still a good one. |
It pretty much boils down to the WASM implementations. For Jim's userspace solution to work on WASM:
Additionally, on WASM, the extra thread needed to initiate that pipeline creation is useless - the actual parallelization is happening in the GPU process, so an extra JS thread is wasted overhead. And JS threads are very expensive compared to native threads. So it works fine if you have that thread already, but it's detrimental if you didn't actually need it. Hence I think it's best for the default experience in native to match the experience in JS (or other remoting implementations) closely where possible. |
Okay - I understand what I wasn't getting before.
|
Right - a separate thread in the content process invites the use of a separate thread in the GPU process, but doesn't require it, so it's useless. |
I like the general idea. How would a user of the API know when a task is done? |
The user would need to have their own signalling as part of the function provided to the hook. |
Just to add a reason for why this is needed on native, Metal is weird and will block on creating pipelines unless you pass in a callback at pipeline creation time. (bevy's async pipeline compilation ran into this with wgpu's existing |
|
The WebGPU specification includes
GPUDevice.createComputePipelineAsync()
GPUDevice.createRenderPipelineAsync()
and their use is recommended to avoid blocking upon pipeline creation. There are currently no corresponding functions in
wgpu
; presumably there should be.The text was updated successfully, but these errors were encountered: