⚡️ Speed up function interpolate by 38%
#119
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 38% (0.38x) speedup for
interpolateinsrc/transformers/models/clap/modeling_clap.py⏱️ Runtime :
1.40 milliseconds→1.01 milliseconds(best of118runs)📝 Explanation and details
The optimization replaces PyTorch's
repeat()method with a more efficientunsqueeze()+expand()combination for tensor interpolation.Key changes:
hidden_states[:, :, None, :].repeat(1, 1, ratio, 1)tohidden_states.unsqueeze(2).expand(batch_size, time_length, ratio, classes_num)repeat()allocates new memory and copies data, whileexpand()creates a memory-efficient view without copying dataexpand()avoids the expensive memory allocation and data copying operationsWhy this leads to speedup:
The
repeat()operation physically duplicates tensor data in memory (87.7% of original runtime), whileexpand()creates a broadcasted view that shares the underlying memory (58.2% of optimized runtime). This eliminates unnecessary memory allocation and copying, especially beneficial for larger tensors where the memory overhead becomes significant.Performance characteristics:
The optimization is particularly effective for time-domain interpolation in CNN downsampling compensation, where the function likely processes feature maps of varying sizes during model inference.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-interpolate-mhmrvepfand push.