Skip to content

Conversation

Dan-Flores
Copy link
Contributor

@Dan-Flores Dan-Flores commented Sep 10, 2025

This PR adds the code that will become the custom_frame_mappings tutorial.
Currently, 2 benchmarks are run using a short video (~3 minutes) and a long video (~13 minutes) :

  • Initializing VideoDecoders
  • Decoding 10 frames from 10 videos

Additionally, some fixes / optimizations were added to custom_frame_mappings logic in SingleStreamDecoder.cpp to improve the performance.

The page takes 1 minute 22 seconds to execute.

Benchmark results:

Compare performance of initializing VideoDecoder with custom_frame_mappings vs seek_modes

Running benchmarks on short_video.mp4
Creating a VideoDecoder object with custom_frame_mappings:
med = 7.68ms +- 20.83
Creating a VideoDecoder object with seek_mode='exact':
med = 17.26ms +- 14.54

Running benchmarks on long_video.mp4
Creating a VideoDecoder object with custom_frame_mappings:
med = 23.70ms +- 3.39
Creating a VideoDecoder object with seek_mode='exact':
med = 60.47ms +- 7.30

Decode frames with custom_frame_mappings vs exact seek_mode

Running benchmarks on short_video.mp4
Decoding frames with custom_frame_mappings:
med = 34.21ms +- 11.42
Decoding frames with seek_mode='exact':
med = 39.69ms +- 3.37

Running benchmarks on long_video.mp4
Decoding frames with custom_frame_mappings:
med = 50.53ms +- 8.38
Decoding frames with seek_mode='exact':
med = 83.95ms +- 17.38

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 10, 2025
"Missing frame mappings when custom_frame_mappings seek mode is set.");
readCustomFrameMappingsUpdateMetadataAndIndex(
streamIndex, customFrameMappings.value());
activeStreamIndex_, customFrameMappings.value());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resolves a bug that raised Runtime Error: bad_optional_access.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's surprising, streamIndex isn't an optional. Do you remember what was causing the issue? Should we add a test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the issue was not with streamIndex variable itself, but accessing an unset value in StreamInfo[streamIndex] on videos with multiple streams. I can spend some time digging into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Python constructor streamIndex is optional. It defaults to -1 in custom_ops.cpp's _add_video_stream.

addStream passes this value to av_find_best_stream as either a real stream index if provided, or as -1, indicating FFmpeg should choose.

So at this point in addVideoStream, activeStreamIndex_ is the stream index we should be accessing, since we have already determined and stored the best stream index.

Copy link
Contributor

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dan-Flores , the benchmark results make sense, I left a few comments below.

In terms of narration, the structure of the tutorial makes sense as well, it closely follows the seek_mode one which is a good thing. I think the main conclusion would apply: passing custom frame mappings speeds up the VideoDecoder instanciation (not the frame decoding itself!). We should try to make that clear throughout the tutorial.
It may also be interesting to draw bridges between this tutorial and the seek_mode one. In the seek_mode tutorial we contrast the use of approximate with exact, basically saying approximate is faster while being slightly less accurate. We should make the point that passing frame mappings gives you the best of both worlds: fast instanciation and accurate seeking.

"Missing frame mappings when custom_frame_mappings seek mode is set.");
readCustomFrameMappingsUpdateMetadataAndIndex(
streamIndex, customFrameMappings.value());
activeStreamIndex_, customFrameMappings.value());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's surprising, streamIndex isn't an optional. Do you remember what was causing the issue? Should we add a test?

print(f"Running benchmarks on {Path(video_path).name}")
print("Creating a VideoDecoder object with custom_frame_mappings JSON str from file:")
with open(json_path, "r") as f:
bench(VideoDecoder, source=video_path, stream_index=stream_index, custom_frame_mappings=(f.read()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify the benchmark a bit and only benchmark passing f instead of f.read()

Ultimately the main difference between these 2 will be the speed of the file system, I don't feel like it's super relevant in this specific tutorial which should be a fairly straightforward entry point

@Dan-Flores Dan-Flores force-pushed the frame_mappings_tutorial branch from 1e24ed6 to dbd6610 Compare September 12, 2025 20:48
@Dan-Flores Dan-Flores marked this pull request as ready for review September 12, 2025 20:57
@Dan-Flores Dan-Flores changed the title [wip] custom_frame_mappings tutorial and optimizations Add custom_frame_mappings tutorial and optimizations Sep 12, 2025
@Dan-Flores Dan-Flores marked this pull request as draft September 18, 2025 18:12
@Dan-Flores Dan-Flores force-pushed the frame_mappings_tutorial branch from b97f705 to b067f48 Compare September 25, 2025 21:06
@Dan-Flores Dan-Flores marked this pull request as ready for review September 25, 2025 21:08
Copy link
Contributor

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the great tutorial @Dan-Flores !

I made some small suggestions below, but nothing is blocking so I'll approve.

1. Frame accuracy is critical, so :doc:`approximate mode <approximate_mode>` cannot be used
2. Videos can be preprocessed once and then decoded many times
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great intro 👍 ! It's immediately clear to the reader whether the tutorial is relevant to them

Comment on lines 149 to 150
# :class:`~torchcodec.decoders.VideoDecoder`, decoding workflows
# usually involve creating a :class:`~torchcodec.decoders.VideoDecoder` instance,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# :class:`~torchcodec.decoders.VideoDecoder`, decoding workflows
# usually involve creating a :class:`~torchcodec.decoders.VideoDecoder` instance,
# :class:`~torchcodec.decoders.VideoDecoder`, decoding workflows
# involve creating a :class:`~torchcodec.decoders.VideoDecoder` instance,

sample_data = json.loads(ffprobe_result.stdout)
print("Data structure of custom frame mappings:")
for frame in sample_data["frames"][:3]:
print(f"{frame}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the above: consider printing the whole ffprobe_cmd so that the reader can see exactly what is being executed, and so they can copy/paste it.

If it's worth it, we could also refactor the ffprobe_cmd creation, subprocess.run and f.write() into a function that we call twice? No strong opinion.


QQ what's the side_data_list on the output?

Data structure of custom frame mappings:
{'key_frame': 1, 'pts': 0, 'duration': 1001, 'side_data_list': [{'side_data_type': 'H.26[45] User Data Unregistered SEI message'}]}
{'key_frame': 0, 'pts': 1001, 'duration': 1001}
{'key_frame': 0, 'pts': 2002, 'duration': 1001}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on printing the ffprobe_cmd and refactoring into a reusable function, thanks for the suggestions!

The side_data_list is a field returned when accessing -show_entries, even if it was not a specified key. I believe its additional metadata related to the codec?

Since it is not important (and I am unable to remove it within the ffprobe command), I'll update the print to only show the fields that we use.

# Performance: Frame decoding with custom frame mappings
# ------------------------------------------------------
#
# Although using custom_frame_mappings only impacts the initialization speed of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Although using custom_frame_mappings only impacts the initialization speed of
# Although using ``custom_frame_mappings`` only impacts the initialization speed of

import subprocess
import requests

url = "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4_small.mp4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ Do you think we could be using the same video as in the seek_mode tutorial( url = "https://videos.pexels.com/video-files/854132/854132-sd_640_360_25fps.mp4")?

I could be wrong but I suspect this is where the difference in execution time comes from: 2 minutes for this tutorial vs 30 seconds for seek_mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, using the same video brought the execution time down to 23 seconds. I used a different resource with higher resolution to emphasize the performance gains, but I'll use the same video to reduce the docs execution time.

Comment on lines 482 to 484
all_frames = torch.tensor([x[0] for x in frame_data], dtype=torch.int64)
is_key_frame = torch.tensor([x[1] for x in frame_data], dtype=torch.bool)
duration = torch.tensor([x[2] for x in frame_data], dtype=torch.int64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, it's not worth creating frame_data anymore, since the x[...] indexing becomes un-pythonic. I think it'd be preferable to just define all_frames and the other tensors as e.g.:

all_frames = torch.tensor([int(frame[pts_key]) for frame in input_data["frames"]], dtype=torch.int64)
...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above in this file, in docstring description of custom_frames_mappings, let's add a link to the new tutorial, similarly to what was done for the seek_mode docstring.

@Dan-Flores Dan-Flores merged commit af10398 into meta-pytorch:main Sep 30, 2025
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants