Speed up RIFE by 45x (36.4 seconds -> 0.8 seconds)#102
Open
JohnAlcatraz wants to merge 1 commit intoFannovel16:mainfrom
Open
Speed up RIFE by 45x (36.4 seconds -> 0.8 seconds)#102JohnAlcatraz wants to merge 1 commit intoFannovel16:mainfrom
JohnAlcatraz wants to merge 1 commit intoFannovel16:mainfrom
Conversation
|
Working great on my RTX 5090. |
|
It worked well on a 4070 Ti Super 16GB, thanks. |
|
I'll be showing this vibe-coding example in my lectures, great job! |
|
@Fannovel16 what about merging this? |
marduk191
added a commit
to marduk191/ComfyUI-Frame-Interpolation
that referenced
this pull request
Feb 19, 2026
…ode opts Incorporates the approach from PR Fannovel16#102 (JohnAlcatraz) which replaces the generic_frame_loop with an inline task-list loop. This enables true GPU-level batching: multiple (pair_index, timestep) tasks are stacked into a single batched tensor and processed in one IFNet forward pass, since IFNet already supports batched tensor timesteps. Combined with our existing optimisations: - dtype widget (float32/float16/bfloat16): model and inputs cast to chosen precision; output always returned as float32 for downstream compatibility - torch_compile widget: optional torch.compile() wrapping for 10-30% speedup - batch_size widget: now controls true task-level batching — each task is one intermediate frame, multiple tasks share a single forward pass - torch.inference_mode(): wraps the entire inference loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
marduk191
added a commit
to marduk191/ComfyUI-Frame-Interpolation
that referenced
this pull request
Feb 19, 2026
Add a module-level _model_cache dict keyed by (ckpt_name, dtype, torch_compile). On repeated runs with the same settings the model weights, dtype cast, device transfer, and torch.compile() step are all skipped — only the inference loop runs. Cache is invalidated automatically when any of the three key parameters change. This is the primary source of the speedup claimed by PR Fannovel16#102: the original code reloaded the model from disk on every execution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
|
rolled this into another pr that is merged. thanks a lot! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Processing without this PR: 36.4 seconds
Processing with this PR: 0.8 seconds
Tested on RTX 5090 with a 832x480 video, 81 frames.
To be clear, this is ChatGPT Agents work, I'm not a Python programmer. But it works very well :) No difference in quality, and way faster.