Add custom_frame_mappings tutorial and optimizations #887

Dan-Flores · 2025-09-10T04:57:40Z

This PR adds the code that will become the custom_frame_mappings tutorial.
Currently, 2 benchmarks are run using a short video (~3 minutes) and a long video (~13 minutes) :

Initializing VideoDecoders
Decoding 10 frames from 10 videos

Additionally, some fixes / optimizations were added to custom_frame_mappings logic in SingleStreamDecoder.cpp to improve the performance.

The page takes 1 minute 22 seconds to execute.

Benchmark results:

Compare performance of initializing VideoDecoder with custom_frame_mappings vs seek_modes

Running benchmarks on short_video.mp4
Creating a VideoDecoder object with custom_frame_mappings:
med = 7.68ms +- 20.83
Creating a VideoDecoder object with seek_mode='exact':
med = 17.26ms +- 14.54

Running benchmarks on long_video.mp4
Creating a VideoDecoder object with custom_frame_mappings:
med = 23.70ms +- 3.39
Creating a VideoDecoder object with seek_mode='exact':
med = 60.47ms +- 7.30

Decode frames with custom_frame_mappings vs exact seek_mode

Running benchmarks on short_video.mp4
Decoding frames with custom_frame_mappings:
med = 34.21ms +- 11.42
Decoding frames with seek_mode='exact':
med = 39.69ms +- 3.37

Running benchmarks on long_video.mp4
Decoding frames with custom_frame_mappings:
med = 50.53ms +- 8.38
Decoding frames with seek_mode='exact':
med = 83.95ms +- 17.38

Dan-Flores · 2025-09-10T14:38:01Z

src/torchcodec/_core/SingleStreamDecoder.cpp

        "Missing frame mappings when custom_frame_mappings seek mode is set.");
    readCustomFrameMappingsUpdateMetadataAndIndex(
-        streamIndex, customFrameMappings.value());
+        activeStreamIndex_, customFrameMappings.value());


This resolves a bug that raised Runtime Error: bad_optional_access.

That's surprising, streamIndex isn't an optional. Do you remember what was causing the issue? Should we add a test?

I suspect the issue was not with streamIndex variable itself, but accessing an unset value in StreamInfo[streamIndex] on videos with multiple streams. I can spend some time digging into this.

In the Python constructor streamIndex is optional. It defaults to -1 in custom_ops.cpp's _add_video_stream.

addStream passes this value to av_find_best_stream as either a real stream index if provided, or as -1, indicating FFmpeg should choose.

So at this point in addVideoStream, activeStreamIndex_ is the stream index we should be accessing, since we have already determined and stored the best stream index.

src/torchcodec/_core/SingleStreamDecoder.cpp

NicolasHug

Thanks @Dan-Flores , the benchmark results make sense, I left a few comments below.

In terms of narration, the structure of the tutorial makes sense as well, it closely follows the seek_mode one which is a good thing. I think the main conclusion would apply: passing custom frame mappings speeds up the VideoDecoder instanciation (not the frame decoding itself!). We should try to make that clear throughout the tutorial.
It may also be interesting to draw bridges between this tutorial and the seek_mode one. In the seek_mode tutorial we contrast the use of approximate with exact, basically saying approximate is faster while being slightly less accurate. We should make the point that passing frame mappings gives you the best of both worlds: fast instanciation and accurate seeking.

src/torchcodec/_core/SingleStreamDecoder.cpp

NicolasHug · 2025-09-11T11:19:57Z

src/torchcodec/_core/SingleStreamDecoder.cpp

        "Missing frame mappings when custom_frame_mappings seek mode is set.");
    readCustomFrameMappingsUpdateMetadataAndIndex(
-        streamIndex, customFrameMappings.value());
+        activeStreamIndex_, customFrameMappings.value());


That's surprising, streamIndex isn't an optional. Do you remember what was causing the issue? Should we add a test?

NicolasHug · 2025-09-11T11:23:04Z