University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3
-
Alex Fu
Tested on: Windows 10, i7-10750H @ 2.60GHz, 16GB, GTX 3060 6GB
Nagi is a simple path tracer built in CUDA. As shown in the picture, it's capable of rendering photorealistic images of diffuse, mirror, metal, and glass materials with their own textures.
nagi2.mp4
Finished
- Mesh Loading
- Texture Mapping
- Denoiser
- Bounding Volume Hierarchy
- Refrection
- Skybox
- Preview Window
Working On
- Direct Light Sampling
- Importance Sampling On Skybox
- Multiple Importance Sampling
- Multithread GUI
Nagi.exe <scene file> [<output path>]
For example:
Nagi.exe ./res/cornell_box/cornell_box.json ./results
The scene is defined by a JSON file. It is very self-explanatory and contains rendering configurations. It currently supports 5 material types: Lambert, Specular, Glass, Microfacet, and Light Source. It's also possible to add depth of field by setting the camera's f-number and focus distance (or a look-at point).
Artisit: NewSee2l035
Artisit: James Ray Cock, Greg Zaal, Dimitrios Savva
Artisit: JpArtSky
To accelerate the intersection test, at first, I divide each model's triangles and store them in oct-tree structures. Then all objects will be passed into the path tracer in an array sorted by their volume. The path tracer will perform a depth-first search in the intersection test.
Then I realized using a bottom-to-top bounding volume hiearchy is more optimal. And the stack for DFS consume too much memories so I adopted the BVH described in PBRT and the stack-free traversal.
The triangle intersected with the shortest distance will be recorded. If the distance to an object is larger than the last recorded distance to the triangle, all its triangles will be surpassed.
After dividing the scene into an oct-tree structure, a huge improvement of speed can be seen:
It is interesting that sorting the rays according to their material slightly lower the performance. The overhead of sorting might counteract the improvement brought by memory coalescing.
However, when a mesh is rectangular and contain large triangles, like this mesh from the Modern Hall scene, the oct-tree fails to divide it effectively.
The time cost for the Staircase scene increases to 8 seconds per spp. Therefore, I replaced the oct-tree with BVH and adopted the stack-free traversal. This significantly improved the performance. The timecost for the Modern Hall scene droped to 10ms!
original | real-time denoiser | Open Image Denoise |
---|---|---|
For the real-time preview, I implemented SVGF as the denoiser. For the final result, I implemented Intel's Open Image Denoise.