Skip to content

Speed up RIFE by 45x (36.4 seconds -> 0.8 seconds)#102

Open
JohnAlcatraz wants to merge 1 commit intoFannovel16:mainfrom
JohnAlcatraz:main
Open

Speed up RIFE by 45x (36.4 seconds -> 0.8 seconds)#102
JohnAlcatraz wants to merge 1 commit intoFannovel16:mainfrom
JohnAlcatraz:main

Conversation

@JohnAlcatraz
Copy link

Processing without this PR: 36.4 seconds

Processing with this PR: 0.8 seconds

Tested on RTX 5090 with a 832x480 video, 81 frames.

To be clear, this is ChatGPT Agents work, I'm not a Python programmer. But it works very well :) No difference in quality, and way faster.

@dhm99
Copy link

dhm99 commented Nov 9, 2025

Working great on my RTX 5090.
Thanks!

@obraia
Copy link

obraia commented Dec 18, 2025

It worked well on a 4070 Ti Super 16GB, thanks.

@deniaud
Copy link

deniaud commented Jan 12, 2026

I'll be showing this vibe-coding example in my lectures, great job!

@deniaud
Copy link

deniaud commented Jan 12, 2026

@Fannovel16 what about merging this?

marduk191 added a commit to marduk191/ComfyUI-Frame-Interpolation that referenced this pull request Feb 19, 2026
…ode opts

Incorporates the approach from PR Fannovel16#102 (JohnAlcatraz) which replaces the
generic_frame_loop with an inline task-list loop. This enables true GPU-level
batching: multiple (pair_index, timestep) tasks are stacked into a single
batched tensor and processed in one IFNet forward pass, since IFNet already
supports batched tensor timesteps.

Combined with our existing optimisations:
- dtype widget (float32/float16/bfloat16): model and inputs cast to chosen
  precision; output always returned as float32 for downstream compatibility
- torch_compile widget: optional torch.compile() wrapping for 10-30% speedup
- batch_size widget: now controls true task-level batching — each task is one
  intermediate frame, multiple tasks share a single forward pass
- torch.inference_mode(): wraps the entire inference loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
marduk191 added a commit to marduk191/ComfyUI-Frame-Interpolation that referenced this pull request Feb 19, 2026
Add a module-level _model_cache dict keyed by (ckpt_name, dtype, torch_compile).
On repeated runs with the same settings the model weights, dtype cast, device
transfer, and torch.compile() step are all skipped — only the inference loop
runs. Cache is invalidated automatically when any of the three key parameters
change.

This is the primary source of the speedup claimed by PR Fannovel16#102: the original
code reloaded the model from disk on every execution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@marduk191
Copy link
Contributor

rolled this into another pr that is merged. thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants