Refactored computing charts step for parallel processing with 5 worker threadpool (same I/O)#34
Open
samcreviston wants to merge 1 commit into
Open
Refactored computing charts step for parallel processing with 5 worker threadpool (same I/O)#34samcreviston wants to merge 1 commit into
samcreviston wants to merge 1 commit into
Conversation
…ker thread pool, 90% faster step.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue: Curious your thoughts and enjoyed working on this! I use Modly with the Trellis2 GGUF model for 3D model generation from images, that relies on CuMesh for the chart computation, and I found it was on average taking 30 minutes at this one step, and now takes 3 minutes after this refactor for parallel processing of the already existing cluster chunk inputs into the already existing cluster chart step output.
Refactor solution: Implemented a performance-focused refactor of the UV unwrapping pipeline so that the chart computation is performed in parallel across already existing cluster chunks using a bounded worker pool (max 5 threads), while preserving a single global packing stage to prevent UV overlap regressions. This update includes thread-safe result collection into the same pre-existing outputs of the chart computation step, and robustness handling for any empty/non-generate clusters that can produce no mesh entry, preventing an atlas index crash. The chart-computation behavior and contract were preserved: no inputs or outputs were changed for the compute-charts step itself, and final assembled mesh/UV outputs remain in the same format. In short, this change targets throughput and stability only, without altering external pipeline semantics.