Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slang-RHI/WebGPU: Binding missing from bind group layout #5222

Closed
aleino-nv opened this issue Oct 3, 2024 · 31 comments · Fixed by #5500
Closed

slang-RHI/WebGPU: Binding missing from bind group layout #5222

aleino-nv opened this issue Oct 3, 2024 · 31 comments · Fixed by #5500
Assignees
Labels
goal:forward looking Feature needed at a later date, not connected to a specific use case. kind:enhancement a desirable new feature, option, or behavior siggraphasia-2024

Comments

@aleino-nv
Copy link
Collaborator

aleino-nv commented Oct 3, 2024

This issue was split from #5172.

Affected tests under tests/compute (currently disabled):

  • Category 1: Binding doesn't exist in...
    • column-major
    • constant-buffer-memory-packing
    • func-cbuffer-param
    • parameter-block
    • non-square-column-major
  • Category 2: Number of entries X did not match the expected number of entries Y...
    • kernel-context-threading
    • matrix-layout (.hlsl)
    • non-square-row-major
    • row-major

Tasks:

  • Investigate and fix the issue(s)
  • Try to enable the tests listed above
@aleino-nv aleino-nv added kind:enhancement a desirable new feature, option, or behavior goal:forward looking Feature needed at a later date, not connected to a specific use case. labels Oct 3, 2024
@aleino-nv aleino-nv self-assigned this Oct 3, 2024
@bmillsNV bmillsNV added this to the Q4 2024 (Fall) milestone Oct 3, 2024
@aleino-nv
Copy link
Collaborator Author

Category 1 details

Output from slang-test, for column-major:

...
Warning: Old OnSubmittedWorkDone APIs are deprecated. If using C please pass a CallbackInfo struct that has two userdatas. Otherwise, if using C++, please use templated helpers.
WGPU error: Binding doesn't exist in [BindGroupLayout (unlabeled)].
 - While validating that the entry-point's declaration for @group(0) @binding(0) matches [BindGroupLayout (unlabeled)]
 - While validating the entry-point's compatibility for group 0 with [BindGroupLayout (unlabeled)]
 - While validating compute stage ([ShaderModule (unlabeled)], entryPoint: computeMain).
 - While calling [Device].CreateComputePipeline([ComputePipelineDescriptor]).

Warning: Old MapAsync APIs are deprecated. If using C please pass a CallbackInfo struct that has two userdatas. Otherwise, if using C++, please use templated helpers.
WGPU error: Number of entries (2) did not match the expected number of entries (1) for [BindGroupLayout (unlabeled)].
Expected layout: [{ binding: 1, visibility: ShaderStage::(Fragment|Compute), buffer: {type: BufferBindingType::Storage, minBindingSize: 0, hasDynamicOffset: 0} }]
 - While validating [BindGroupDescriptor] against [BindGroupLayout (unlabeled)]
 - While calling [Device].CreateBindGroup([BindGroupDescriptor]).

WGPU error: Recording in [CommandEncoder (unlabeled)] which is locked while [ComputePassEncoder (unlabeled)] is open.
 - While encoding [CommandEncoder (unlabeled)].CopyBufferToBuffer([Buffer (unlabeled)], 0, [Buffer (unlabeled)], 0, 64).
 - While finishing [CommandEncoder (unlabeled)].

WGPU error: [Invalid CommandBuffer] is invalid.
 - While calling [Queue].Submit([[Invalid CommandBuffer]])
...

Generated WGSL for slangc -o output.wgsl -target wgsl -stage compute -entry computeMain %SLANG_SRC_DIR%\tests\compute\column-major.slang :

struct _MatrixStorage_float4x4_ColMajorstd140_0
{
    @align(16) data_0 : array<vec4<f32>, i32(4)>,
};

@binding(0) @group(0) var<uniform> matrixBuffer_0 : _MatrixStorage_float4x4_ColMajorstd140_0;
@binding(1) @group(0) var<storage, read_write> output_0 : array<u32>;

fn unpackStorage_0( _S1 : _MatrixStorage_float4x4_ColMajorstd140_0) -> mat4x4<f32>
{
    return mat4x4<f32>(_S1.data_0[i32(0)][i32(0)], _S1.data_0[i32(1)][i32(0)], _S1.data_0[i32(2)][i32(0)], _S1.data_0[i32(3)][i32(0)], _S1.data_0[i32(0)][i32(1)], _S1.data_0[i32(1)][i32(1)], _S1.data_0[i32(2)][i32(1)], _S1.data_0[i32(3)][i32(1)], _S1.data_0[i32(0)][i32(2)], _S1.data_0[i32(1)][i32(2)], _S1.data_0[i32(2)][i32(2)], _S1.data_0[i32(3)][i32(2)], _S1.data_0[i32(0)][i32(3)], _S1.data_0[i32(1)][i32(3)], _S1.data_0[i32(2)][i32(3)], _S1.data_0[i32(3)][i32(3)]);
}

fn floatCheck_0( data_1 : f32,  valueToCheckFor_0 : f32) -> bool
{
    var _S2 : bool;
    if(data_1 < valueToCheckFor_0 + 0.00100000004749745f)
    {
        _S2 = data_1 > valueToCheckFor_0 - 0.00100000004749745f;
    }
    else
    {
        _S2 = false;
    }
    return _S2;
}

@compute
@workgroup_size(1, 1, 1)
fn computeMain(@builtin(global_invocation_id) tid_0 : vec3<u32>)
{
    var _S3 : mat4x4<f32> = unpackStorage_0(matrixBuffer_0);
    var r_0 : vec4<f32> = (((_S3) * (vec4<f32>(1.0f, 2.0f, 3.0f, 1.0f))));
    var M2_0 : mat4x4<f32> = (((_S3) * (_S3)));
    var M3_0 : mat4x4<f32> = mat4x4<f32>(1.0f, 0.0f, 0.0f, 10.0f, 0.0f, 1.0f, 0.0f, 20.0f, 0.0f, 0.0f, 1.0f, 30.0f, 0.0f, 0.0f, 0.0f, 1.0f);
    var _S4 : bool;
    if(floatCheck_0(r_0.x, 11.0f))
    {
        _S4 = floatCheck_0(r_0.y, 22.0f);
    }
    else
    {
        _S4 = false;
    }
    if(_S4)
    {
        _S4 = floatCheck_0(r_0.z, 33.0f);
    }
    else
    {
        _S4 = false;
    }
    if(_S4)
    {
        _S4 = floatCheck_0(r_0.w, 1.0f);
    }
    else
    {
        _S4 = false;
    }
    if(_S4)
    {
        _S4 = floatCheck_0(_S3[i32(3)][i32(0)], 10.0f);
    }
    else
    {
        _S4 = false;
    }
    if(_S4)
    {
        _S4 = floatCheck_0(M2_0[i32(3)][i32(0)], 20.0f);
    }
    else
    {
        _S4 = false;
    }
    if(_S4)
    {
        _S4 = floatCheck_0(f32(M2_0[i32(3)][i32(0)]), 20.0f);
    }
    else
    {
        _S4 = false;
    }
    if(_S4)
    {
        _S4 = floatCheck_0(M2_0[i32(3)][i32(0)], 20.0f);
    }
    else
    {
        _S4 = false;
    }
    if(_S4)
    {
        _S4 = floatCheck_0(M2_0[i32(2)][i32(2)], 1.0f);
    }
    else
    {
        _S4 = false;
    }
    if(_S4)
    {
        _S4 = floatCheck_0(M2_0[i32(3)][i32(1)], 40.0f);
    }
    else
    {
        _S4 = false;
    }
    if(_S4)
    {
        _S4 = floatCheck_0(M3_0[i32(0)][i32(3)], 10.0f);
    }
    else
    {
        _S4 = false;
    }
    output_0[i32(0)] = u32(_S4);
    return;
}

@aleino-nv
Copy link
Collaborator Author

Category 2 details

Output from slang-test, for kernel-context-threading:

...
WGPU error: Binding doesn't exist in [BindGroupLayout (unlabeled)].
 - While validating that the entry-point's declaration for @group(0) @binding(0) matches [BindGroupLayout (unlabeled)]
 - While validating the entry-point's compatibility for group 0 with [BindGroupLayout (unlabeled)]
 - While validating compute stage ([ShaderModule (unlabeled)], entryPoint: computeMain).
 - While calling [Device].CreateComputePipeline([ComputePipelineDescriptor]).
...

WGSL for kernel-context-threading:

struct _MatrixStorage_float4x4_ColMajorstd140_0
{
    @align(16) data_0 : array<vec4<f32>, i32(4)>,
};

@binding(0) @group(0) var<uniform> matrixBuffer_0 : _MatrixStorage_float4x4_ColMajorstd140_0;
@binding(1) @group(0) var<storage, read_write> rowOrderMatrixOutput_0 : array<f32>;

fn unpackStorage_0( _S1 : _MatrixStorage_float4x4_ColMajorstd140_0) -> mat4x4<f32>
{
    return mat4x4<f32>(_S1.data_0[i32(0)][i32(0)], _S1.data_0[i32(1)][i32(0)], _S1.data_0[i32(2)][i32(0)], _S1.data_0[i32(3)][i32(0)], _S1.data_0[i32(0)][i32(1)], _S1.data_0[i32(1)][i32(1)], _S1.data_0[i32(2)][i32(1)], _S1.data_0[i32(3)][i32(1)], _S1.data_0[i32(0)][i32(2)], _S1.data_0[i32(1)][i32(2)], _S1.data_0[i32(2)][i32(2)], _S1.data_0[i32(3)][i32(2)], _S1.data_0[i32(0)][i32(3)], _S1.data_0[i32(1)][i32(3)], _S1.data_0[i32(2)][i32(3)], _S1.data_0[i32(3)][i32(3)]);
}

fn writeRow2_0( v_0 : vec4<f32>,  rowIndex_0 : i32)
{
    var baseIndex_0 : i32 = rowIndex_0 * i32(4);
    rowOrderMatrixOutput_0[baseIndex_0] = v_0.x;
    rowOrderMatrixOutput_0[baseIndex_0 + i32(1)] = v_0.y;
    rowOrderMatrixOutput_0[baseIndex_0 + i32(2)] = v_0.z;
    rowOrderMatrixOutput_0[baseIndex_0 + i32(3)] = v_0.w;
    return;
}

fn writeRow_0( v_1 : vec4<f32>,  rowIndex_1 : i32)
{
    writeRow2_0(v_1, rowIndex_1);
    return;
}

@compute
@workgroup_size(1, 1, 1)
fn computeMain(@builtin(global_invocation_id) tid_0 : vec3<u32>)
{
    var _S2 : mat4x4<f32> = unpackStorage_0(matrixBuffer_0);
    writeRow_0(_S2[i32(0)], i32(0));
    writeRow_0(_S2[i32(1)], i32(1));
    writeRow_0(_S2[i32(2)], i32(2));
    writeRow_0(_S2[i32(3)], i32(3));
    return;
}

@aleino-nv
Copy link
Collaborator Author

I noticed that an assert is triggering in Slang-RHI (for parameter-block, maybe others as well):

https://github.com/shader-slang/slang-rhi/blob/84f8ce13f92e9538b2e15f03d6bdac5ad9af215a/src/wgpu/wgpu-shader-object.cpp#L663

@aleino-nv
Copy link
Collaborator Author

aleino-nv commented Oct 4, 2024

I noticed that an assert is triggering in Slang-RHI (for parameter-block, maybe others as well):

https://github.com/shader-slang/slang-rhi/blob/84f8ce13f92e9538b2e15f03d6bdac5ad9af215a/src/wgpu/wgpu-shader-object.cpp#L663

@skallweitNV Is this something you cant take a look at?

You can just run e.g. tests/compute/parameter-block.slang if you rebase on #5174.
To get WGSL output:

set TEST=parameter-block.slang

%BINDIR%\slangc.exe ^
-o %TEST%.wgsl ^
-target wgsl ^
-stage compute ^
-entry computeMain ^
%SLANG_WORKSPACE_DIRECTORY_PATH%\tests\compute\%TEST%

To run slang-test:

cd %SLANG_WORKSPACE_DIRECTORY_PATH% && ^
%BINDIR%\slang-test ^
-bindir %BINDIR% ^
-category full ^
-api wgpu ^
tests/compute/%TEST%

@aleino-nv aleino-nv changed the title slang-test/WebGPU: Binding missing from bind group layout slang-RHI/WebGPU: Binding missing from bind group layout Oct 4, 2024
@bmillsNV
Copy link
Collaborator

bmillsNV commented Oct 8, 2024

@aleino-nv can you help to triage? Possible for you to fix?

@aleino-nv
Copy link
Collaborator Author

aleino-nv commented Oct 9, 2024

@aleino-nv can you help to triage? Possible for you to fix?

I suspect the issue is in RHI, but I can take a look and just ask @skallweitNV questions since he's in a similar timezone.

@aleino-nv
Copy link
Collaborator Author

I checked in with @skallweitNV and he said it seems as though Slang is giving reflection information that doesn't match the WGSL, and so this leads RHI to provide layout information that doesn't match what's in the shader, leading to the errors.

If so then the issue should be investigated from the Slang side, instead.
I'll keep the bug assigned to myself and investigate this soon. (Probably early next week.)

@aleino-nv
Copy link
Collaborator Author

I checked in with @skallweitNV and he said it seems as though Slang is giving reflection information that doesn't match the WGSL, and so this leads RHI to provide layout information that doesn't match what's in the shader, leading to the errors.

@skallweitNV
This does not seem to be the case as far as I can tell. In the case of tests/compute/column-major.slang I do see two bindings coming from the reflection data, but only one seems to make it to the descriptor set build info in the shader object layout builder.
I'm seeing if I can spot the reason for this in RHI, but the code is pretty hard for me to understand. (I'm not very familiar with the slang reflection stuff, and there seems to be a lot of unique terminology.)

@aleino-nv
Copy link
Collaborator Author

Taking a step back, after debugging some more I'm now confused about the error message (column-major test case)

WGPU error: Binding doesn't exist in [BindGroupLayout (unlabeled)].
 - While validating that the entry-point's declaration for @group(0) @binding(0) matches [BindGroupLayout (unlabeled)]
 - While validating the entry-point's compatibility for group 0 with [BindGroupLayout (unlabeled)]
 - While validating compute stage ([ShaderModule (unlabeled)], entryPoint: computeMain).
 - While calling [Device].CreateComputePipeline([ComputePipelineDescriptor]).

I'm confused because, while it's true that the group layout is missing one of the bindings, @group(0) @binding(0) is actually the one being added to the group layout.
I wonder if this message could actually be generated if the attributes of the binding themselves just don't match. I'm creating an example to answer this question.

@aleino-nv
Copy link
Collaborator Author

I'm creating an example to answer this question.

Actually, google has one hit that answers this, and the answer is yes!
https://stackoverflow.com/questions/78141792/how-to-add-mix-texture-image-to-shadowmapping-demo

So next step is to check the attributes of this binding.

@aleino-nv
Copy link
Collaborator Author

The binding in question should be

struct _MatrixStorage_float4x4_ColMajorstd140_0
{
    @align(16) data_0 : array<vec4<f32>, i32(4)>,
};

@binding(0) @group(0) var<uniform> matrixBuffer_0 : _MatrixStorage_float4x4_ColMajorstd140_0;

The minBindingSize of the buffer binding is which is 64. That seems right. The type of buffer is 'uniform'.
It all seems to check out. It was visible to fragment+vertex+compute stages, but even after I fixed that to be just compute I still get the same error.

I'll pop this WGSL code into my native sample app to see if I can reproduce this error there, and then experiment on how to fix it.

@aleino-nv
Copy link
Collaborator Author

(Back to working on this now, after a detour.)

I can reproduce a somewhat similar error message with the following code

        std::vector<WGPUBindGroupLayoutEntry> entries(size_t{2});
        {
            WGPUBindGroupLayoutEntry& entry {entries[0]};
            entry = {};
            entry.binding = 0U;
            entry.visibility = WGPUShaderStage_Compute;
            entry.buffer.type = WGPUBufferBindingType_Uniform;
            entry.buffer.hasDynamicOffset = false;
            entry.buffer.minBindingSize = size_t{64};
        }
#if 0
        {
            WGPUBindGroupLayoutEntry& entry {entries[1]};
            entry = {};
            entry.binding = 1U;
            entry.visibility = WGPUShaderStage_Compute;
            entry.buffer.type = WGPUBufferBindingType_Storage;
            entry.buffer.hasDynamicOffset = false;
            entry.buffer.minBindingSize = size_t{0};
        }
#endif

If I enable the second block, then I can create the pipeline.
The error I get is slightly different in that it's complaining about binding 1 missing instead of binding 0:

error: wgpu: type: 2: message: Binding doesn't exist in [BindGroupLayout (unlabeled)].
 - While validating that the entry-point's declaration for @group(0) @binding(1) matches [BindGroupLayout (unlabeled)]
 - While validating the entry-point's compatibility for group 0 with [BindGroupLayout (unlabeled)]
 - While validating compute stage ([ShaderModule (unlabeled)], entryPoint: ).
 - While calling [Device].CreateComputePipeline([ComputePipelineDescriptor]).

(This could be due to different versions of Dawn being used by slang-test and my test app setup.)
In any case, I'm back to trying to understand why Slang-RHI doesn't add binding 1 to the list of bind group layout entries.
The binding attributes for binding 0 should be fine.

@aleino-nv
Copy link
Collaborator Author

(This is still for tests/compute/column-major test case.)

In renderer_test::AssignValsFromLayoutContext::assignAggregate I see two 'fields' corresponding to the two bindings in group 0.

For some reason the first field (for binding 0) is a shader object and ends up adding a bind group layout element as follows:

render-test-tool.dll!rhi::wgpu::ShaderObjectLayoutImpl::Builder::_addDescriptorRangesAsConstantBuffer(slang::TypeLayoutReflection * elementTypeLayout, const rhi::wgpu::BindingOffset & containerOffset, const rhi::wgpu::BindingOffset & elementOffset) Line 318    C++
render-test-tool.dll!rhi::wgpu::ShaderObjectLayoutImpl::createForElementType(rhi::wgpu::DeviceImpl * device, slang::ISession * session, slang::TypeLayoutReflection * elementType, rhi::wgpu::ShaderObjectLayoutImpl * * outLayout) Line 594    C++
render-test-tool.dll!rhi::wgpu::DeviceImpl::createShaderObjectLayout(slang::ISession * session, slang::TypeLayoutReflection * typeLayout, rhi::ShaderObjectLayout * * outLayout) Line 419    C++
render-test-tool.dll!rhi::Device::getShaderObjectLayout(slang::ISession * session, slang::TypeLayoutReflection * typeLayout, rhi::ShaderObjectLayout * * outLayout) Line 797    C++
render-test-tool.dll!rhi::Device::getShaderObjectLayout(slang::ISession * session, slang::TypeReflection * type, rhi::ShaderObjectContainerType container, rhi::ShaderObjectLayout * * outLayout) Line 778    C++
render-test-tool.dll!rhi::Device::createShaderObject2(slang::ISession * slangSession, slang::TypeReflection * type, rhi::ShaderObjectContainerType container, rhi::IShaderObject * * outObject) Line 638    C++
render-test-tool.dll!rhi::debug::DebugDevice::createShaderObject2(slang::ISession * session, slang::TypeReflection * type, rhi::ShaderObjectContainerType containerType, rhi::IShaderObject * * outShaderObject) Line 308    C++
render-test-tool.dll!renderer_test::AssignValsFromLayoutContext::assignObject(const rhi::ShaderCursor & dstCursor, renderer_test::ShaderInputLayout::ObjectVal * srcVal) Line 381    C++
render-test-tool.dll!renderer_test::AssignValsFromLayoutContext::assign(const rhi::ShaderCursor & dstCursor, const Slang::RefPtr<renderer_test::ShaderInputLayout::Val> & srcVal) Line 445    C++
render-test-tool.dll!renderer_test::AssignValsFromLayoutContext::assignAggregate(const rhi::ShaderCursor & dstCursor, renderer_test::ShaderInputLayout::AggVal * srcVal) Line 336    C++

The second field (for binding 1) is a buffer object and doesn't hit any code paths that create WGPUBindGroupLayoutEntry structs at all:

render-test-tool.dll!rhi::wgpu::DeviceImpl::createBuffer(const rhi::BufferDesc & desc, const void * initData, rhi::IBuffer * * outBuffer) Line 142    C++
render-test-tool.dll!rhi::debug::DebugDevice::createBuffer(const rhi::BufferDesc & desc, const void * initData, rhi::IBuffer * * outBuffer) Line 158    C++
render-test-tool.dll!rhi::IDevice::createBuffer(const rhi::BufferDesc & desc, const void * initData) Line 2324    C++
render-test-tool.dll!renderer_test::ShaderRendererUtil::createBuffer(const renderer_test::InputBufferDesc & inputDesc, unsigned __int64 bufferSize, const void * initData, rhi::IDevice * device, Slang::ComPtr<rhi::IBuffer> & bufferOut) Line 169    C++
render-test-tool.dll!renderer_test::AssignValsFromLayoutContext::assignBuffer(const rhi::ShaderCursor & dstCursor, renderer_test::ShaderInputLayout::BufferVal * srcVal) Line 217    C++
render-test-tool.dll!renderer_test::AssignValsFromLayoutContext::assign(const rhi::ShaderCursor & dstCursor, const Slang::RefPtr<renderer_test::ShaderInputLayout::Val> & srcVal) Line 433    C++
render-test-tool.dll!renderer_test::AssignValsFromLayoutContext::assignAggregate(const rhi::ShaderCursor & dstCursor, renderer_test::ShaderInputLayout::AggVal * srcVal) Line 336    C++

I don't currently see how these objects could possibly end up in the same bind group (as the generated WGSL code requires) if one is a buffer and the other is a shader object.
@csyonghe Is this unexpected? Should the reflection data have these two objects in different bind groups?

@csyonghe
Copy link
Collaborator

csyonghe commented Oct 28, 2024

Shader object and bind group are two orthogonal concepts. A new shader object does not need to be in its own binding group.

For example, if the global scope defines:

Texture2D t0;
ConstantBuffer t1;

Then t1 is a sub object because it is referenced through ConstantBuffer, but that doesn't mean t1 gets its own binding group. To introduce a new binding group for the elements, users need to use the ParametetBlock type instead.

@aleino-nv
Copy link
Collaborator Author

aleino-nv commented Oct 29, 2024

Shader object and bind group are two orthogonal concepts. A new shader object does not need to be in its own binding group.

For example, if the global scope defines:

Texture2D t0; ConstantBuffer t1;

Then t1 is a sub object because it is referenced through ConstantBuffer, but that doesn't mean t1 gets its own binding group. To introduce a new binding group for the elements, users need to use the ParametetBlock type instead.

Let me clarify: I don't see how with the current render test framework/RHI implementation, a shader object and a buffer could end up in the same bind group, as would be required by the WGSL shader, as it is currently generated.

@aleino-nv
Copy link
Collaborator Author

Unless I'm mistaken, the buffer is not accounted for at all in the pipeline layout. It's not even in its own bind group, so something is pretty wrong.

I will do some debugging and compare with what happens on VK, when I get time to work on this.

aleino-nv added a commit to aleino-nv/slang that referenced this issue Nov 1, 2024
NOTE: This is just a test. The WGPU fixes are not yet included at this point.

This helps to address issue shader-slang#5222.
@aleino-nv
Copy link
Collaborator Author

aleino-nv commented Nov 1, 2024

The basic issue is that "staging buffers" can't also be "uniform buffers" in WebGPU, but it's not clear to me why Slang-RHI tries to bind a "staging buffer": it should instead just create the staging buffer, map it, copy data to it, and then issue a command that copies from the staging buffer to the uniform buffer in order to do the update, and then bind the uniform buffer.

(Context: debugging tests/compute/column-major Slang-RHI buffer usage issue)

I found that WebGPU follows the same buffer structure as Vulkan:

  • One 4 byte buffer is created for binding 1 (RWStructuredBuffer<uint> output)
  • A staging buffer is created and bound as uniform buffer for binding 0 (ConstantBuffer<float4x4> matrixBuffer)

The issue again is that you can't really bind staging buffers (implying map-writeable) as uniform buffers in WebGPU, while you can do that in Vulkan.

aleino-nv added a commit to aleino-nv/slang that referenced this issue Nov 1, 2024
This helps to address issue shader-slang#5222.

Also disable Slang-RHI from fetching DXC and Agility SDK, since that seems to break our
tests. (See issue shader-slang#5474.)
aleino-nv added a commit to aleino-nv/slang-rhi that referenced this issue Nov 1, 2024
@aleino-nv
Copy link
Collaborator Author

A change like shader-slang/slang-rhi#92 avoids the issue, at least for tests/compute/column-major.

aleino-nv added a commit that referenced this issue Nov 1, 2024
This helps to address issue #5222.

Also disable Slang-RHI from fetching DXC and Agility SDK, since that seems to break our
tests. (See issue #5474.)
@aleino-nv
Copy link
Collaborator Author

A change like shader-slang/slang-rhi#92 avoids the issue, at least for tests/compute/column-major.

Confirmed: with this change, only the following errors are left for column-major:

WGPU error: Recording in [CommandEncoder (unlabeled)] which is locked while [ComputePassEncoder (unlabeled)] is open.
 - While encoding [CommandEncoder (unlabeled)].CopyBufferToBuffer([Buffer (unlabeled)], 0, [Buffer (unlabeled)], 0, 64).
 - While finishing [CommandEncoder (unlabeled)].

WGPU error: [Invalid CommandBuffer] is invalid.
 - While calling [Queue].Submit([[Invalid CommandBuffer]])

@aleino-nv
Copy link
Collaborator Author

With shader-slang/slang-rhi#91 as well, all errors go away.
However there is still an orthogonal crash during resource release. Simon said it may be because device is torn down first, and then used again to tear down shader object layouts. Looking at that now.

aleino-nv added a commit to aleino-nv/slang that referenced this issue Nov 1, 2024
This fixes a teardown crash, and a buffer usage mismatch issue during bind group creation.
This helps to address issue shader-slang#5222.
@aleino-nv
Copy link
Collaborator Author

Crash issue is now also fixed in RHI. If I roll RHI in Slang with #5475 then the only issue left for column-major is that the buffer contents is not as expected.

The first thing I notice there is that the matrixBuffer seems to be all zeros, so I'm investigating that now.

@aleino-nv
Copy link
Collaborator Author

Turns out recent RHI change didn't pass in init data when creating the buffer. I have a change that fixes that, and with this change I can finally get column-major to pass!

aleino-nv added a commit to aleino-nv/slang-rhi that referenced this issue Nov 1, 2024
aleino-nv added a commit to aleino-nv/slang-rhi that referenced this issue Nov 1, 2024
aleino-nv added a commit to aleino-nv/slang-rhi that referenced this issue Nov 1, 2024
skallweitNV pushed a commit to shader-slang/slang-rhi that referenced this issue Nov 1, 2024
* Ensure that SimpleShaderObjectData returns nullptr for empty buffer

This is a C++ technicality, and is not known to have caused any issue.

https://en.cppreference.com/w/cpp/container/vector/data says:
"If size() is ​0​, data() may or may not return a null pointer."

Thus, we check if the underlying std::vector is empty and then explicitly return nullptr
in this case.

* WGPU: Supply initial data when creating buffers

This helps to address shader-slang/slang#5222
aleino-nv added a commit to aleino-nv/slang that referenced this issue Nov 1, 2024
This fixes a teardown crash, and a buffer usage mismatch issue during bind group creation.
This helps to address issue shader-slang#5222.
aleino-nv added a commit to aleino-nv/slang that referenced this issue Nov 1, 2024
This fixes a teardown crash, and a buffer usage mismatch issue during bind group creation.
These Slang-RHI fixes allow several WGPU tests to be enabled:

- tests/compute/column-major.slang
- tests/compute/constant-buffer-memory-packing.slang
- tests/compute/matrix-layout.hlsl
- tests/compute/non-square-column-major.slang
- tests/compute/row-major.slang
- tests/hlsl/packoffset.slang

This helps to address issue shader-slang#5222.
@aleino-nv
Copy link
Collaborator Author

Managed to enable a whole bunch of tests when taking the above mentioned Slang-RHI update into use in Slang: #5475

Still there are some layout mismatch failures for other tests, so leaving this bug open and looking at those next.

aleino-nv added a commit to aleino-nv/slang that referenced this issue Nov 1, 2024
This fixes a teardown crash, and a buffer usage mismatch issue during bind group creation.
These Slang-RHI fixes allow several WGPU tests to be enabled:

- tests/compute/column-major.slang
- tests/compute/constant-buffer-memory-packing.slang
- tests/compute/matrix-layout.hlsl
- tests/compute/non-square-column-major.slang
- tests/compute/row-major.slang
- tests/hlsl/packoffset.slang

This helps to address issue shader-slang#5222.
aleino-nv added a commit that referenced this issue Nov 1, 2024
This fixes a teardown crash, and a buffer usage mismatch issue during bind group creation.
These Slang-RHI fixes allow several WGPU tests to be enabled:

- tests/compute/column-major.slang
- tests/compute/constant-buffer-memory-packing.slang
- tests/compute/matrix-layout.hlsl
- tests/compute/non-square-column-major.slang
- tests/compute/row-major.slang
- tests/hlsl/packoffset.slang

This helps to address issue #5222.
@aleino-nv
Copy link
Collaborator Author

...for example tests/compute/paramter-block still seems to have a layout issue similar to the one I fixed for testst/compute/column-major. Looking at that next.

@aleino-nv
Copy link
Collaborator Author

Here is the parameter-block.slang shader:

struct P
{
	RWStructuredBuffer<int> buffer;
};

ParameterBlock<P> block0;

ParameterBlock<P> block1;

[numthreads(4, 1, 1)]
void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID)
{
	uint tid = dispatchThreadID.x;
	int inVal = block1.buffer[tid];
	int outVal = inVal;
	block0.buffer[tid] = outVal;
}

It currently compiles to this WGSL:

@binding(1) @group(0) var<storage, read_write> block1_buffer_0 : array<i32>;

@binding(0) @group(0) var<storage, read_write> block0_buffer_0 : array<i32>;

@compute
@workgroup_size(4, 1, 1)
fn computeMain(@builtin(global_invocation_id) dispatchThreadID_0 : vec3<u32>)
{
    var tid_0 : u32 = dispatchThreadID_0.x;
    block0_buffer_0[tid_0] = block1_buffer_0[tid_0];
    return;
}

Then Slang-RHI creates a bind group layout with just a single binding, and so pipeline creation fails with:

WGPU error: Binding doesn't exist in [BindGroupLayout (unlabeled)].
 - While validating that the entry-point's declaration for @group(0) @binding(1) matches [BindGroupLayout (unlabeled)]
 - While validating the entry-point's compatibility for group 0 with [BindGroupLayout (unlabeled)]
 - While validating compute stage ([ShaderModule (unlabeled)], entryPoint: computeMain).
 - While calling [Device].CreateComputePipeline([ComputePipelineDescriptor]).

@aleino-nv
Copy link
Collaborator Author

(Continuing to debug parameter-block.slang)
The corresponding SPIR-V is not analogous to the WGSL output. In this case we get two sets with a single binding.

OpDecorate %block1_buffer Binding 0
OpDecorate %block1_buffer DescriptorSet 1
OpDecorate %block0_buffer Binding 0
OpDecorate %block0_buffer DescriptorSet 0

I'm currently thinking that the WGSL emitter is not emitting the binding attributes correctly, and comparing with how the SPIR-V emitter does it.

@csyonghe
Copy link
Collaborator

csyonghe commented Nov 4, 2024

There may be missing logic in the parameter block layout rule for wgsl. Worth checking for any discrepancies between spirv and wgsl in slang-type-layout.h/cpp and slang-parameter-binding.h/cpp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
goal:forward looking Feature needed at a later date, not connected to a specific use case. kind:enhancement a desirable new feature, option, or behavior siggraphasia-2024
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants