⚡️ Speed up method Kandinsky3ConditionalGroupNorm.forward
by 7%
#11667
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 7% (0.07x) speedup for
Kandinsky3ConditionalGroupNorm.forward
insrc/diffusers/models/unets/unet_kandinsky3.py
⏱️ Runtime :
2.16 milliseconds
→2.02 milliseconds
(best of332
runs)📝 Explanation and details
Certainly! Here are the most important optimizations for this program, based on the line profiling results.
self.norm(x) * (scale + 1.0) + shift
and theself.context_mlp(context)
call.unsqueeze
to the context tensor is inefficient..view
or.reshape
to match the desired broadcastable shape all at once, rather than unsqueezing in a loop.The improved code below removes the loop, performs shape expansion more efficiently, and should provide speedups for larger batch sizes or channel/image sizes.
Summary of Optimizations.
unsqueeze
calls are replaced with a singleview
, which is much faster for matching the broadcasting shape.x.dim()
to compute required shape for broadcast once, no per-dimension Python looping.This rewrite keeps the function signatures and logic unchanged, but should yield notable performance improvements, especially for large spatial tensors.
✅ Correctness verification report:
🌀 Generated Regression Tests Details
To edit these changes
git checkout codeflash/optimize-Kandinsky3ConditionalGroupNorm.forward-mb5lqa87
and push.