Skip to content

Modern GPU Driven Rendering (How to draw fast)

Yaochuang edited this page Jan 6, 2019 · 1 revision

This section is the summary and questions about the following documents.

These are about geometry, no lighting and volumetric effects involved. Main Techniques used:

  • draw call group, batch different part with same material together. no state changed, just draw calls. Something also known as consolidation.
  • glMultiDrawElements
    • ~6x faster without state change.
    • same topology, large vB/IB, Same state(render pipe state including Shade program)
  • ARB_vertex_attrib_binding, separates format and stream && NV_vertex_buffer_unified_memory uses buffer addresses
  • UNIFORM TO UBO TRANSITION, Changes to existing shaders are minimal, Group parameters by frequency
    • from single uniform variable to uniform buffer. Seems already done in D3D11.
    • glNamedBufferData
    • Good speedup over multiple glUniform calls.
    • Efficiency still dependent on size of uniform data.
  • glBufferSubData
    • dynamic parameters as GL_UNIFORM_BUFFER
    • buffer size, offset, alignment
    • How Metal and D3D update dynamic parameters efficiently ?
  • glBindBufferRange
    • Upload static data once and bind subrange of buffer to target binding point.
    • Avoid expensive CPU -> GPU copies for static data
    • How Metal and D3D handle this, for example lots of Meterial data in GPU.
      • How to update this large buffer sparsely and efficiently? This is a very common problems. Does it have a common solution ?
  • indexed, TBO/SSBO for large/random access data
    • glVertexAttribI2i assign indices as vertex attribute.
    • For example, store matrix in SSBO, How to index it in VS correctly in Multi-Draw ?
  • glMultiDrawElementsIndirect
    • store array of drawcalls on GPU. multiple drawcalls by a single API. Each draw call have one following structure
    • typedef struct { uint count; uint instanceCount; uint firstIndex; uint baseVertex; uint baseInstance; } DrawElementsIndirectCommand;
    • Can encode index info used by each drawcall into baseInstance of this struct. Then accessed by gl_BaseInstanceARB ARB_shader_draw_parameters.
  • NV_BINDLESS_MULTIDRAW_INDIRECT
    • more costly than regular MultiDrawIndirect. worth of doing so whend bigger than 500 triangles per drawcall.
    • glMultiDrawElementsIndirectBindlessNV a flexible user-defined structure containing IB, VB and index associate with a drawcall. It is possible to draw entire scene with a single API.

Summarized as below:

  • Less state change on GPU.
  • Reducing CPU Transfer.

Latency Hiding


Bindless Technology

This section is based on the following document.

GPU Driven Rendering

Clone this wiki locally