-
Notifications
You must be signed in to change notification settings - Fork 62
Add custom_frame_mappings tutorial and optimizations #887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add custom_frame_mappings tutorial and optimizations #887
Conversation
"Missing frame mappings when custom_frame_mappings seek mode is set."); | ||
readCustomFrameMappingsUpdateMetadataAndIndex( | ||
streamIndex, customFrameMappings.value()); | ||
activeStreamIndex_, customFrameMappings.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This resolves a bug that raised Runtime Error: bad_optional_access
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's surprising, streamIndex
isn't an optional. Do you remember what was causing the issue? Should we add a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect the issue was not with streamIndex
variable itself, but accessing an unset value in StreamInfo[streamIndex]
on videos with multiple streams. I can spend some time digging into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Python constructor streamIndex
is optional. It defaults to -1
in custom_ops.cpp
's _add_video_stream
.
addStream
passes this value to av_find_best_stream
as either a real stream index if provided, or as -1
, indicating FFmpeg should choose.
So at this point in addVideoStream
, activeStreamIndex_
is the stream index we should be accessing, since we have already determined and stored the best stream index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Dan-Flores , the benchmark results make sense, I left a few comments below.
In terms of narration, the structure of the tutorial makes sense as well, it closely follows the seek_mode
one which is a good thing. I think the main conclusion would apply: passing custom frame mappings speeds up the VideoDecoder
instanciation (not the frame decoding itself!). We should try to make that clear throughout the tutorial.
It may also be interesting to draw bridges between this tutorial and the seek_mode one. In the seek_mode tutorial we contrast the use of approximate with exact, basically saying approximate is faster while being slightly less accurate. We should make the point that passing frame mappings gives you the best of both worlds: fast instanciation and accurate seeking.
"Missing frame mappings when custom_frame_mappings seek mode is set."); | ||
readCustomFrameMappingsUpdateMetadataAndIndex( | ||
streamIndex, customFrameMappings.value()); | ||
activeStreamIndex_, customFrameMappings.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's surprising, streamIndex
isn't an optional. Do you remember what was causing the issue? Should we add a test?
print(f"Running benchmarks on {Path(video_path).name}") | ||
print("Creating a VideoDecoder object with custom_frame_mappings JSON str from file:") | ||
with open(json_path, "r") as f: | ||
bench(VideoDecoder, source=video_path, stream_index=stream_index, custom_frame_mappings=(f.read())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simplify the benchmark a bit and only benchmark passing f
instead of f.read()
Ultimately the main difference between these 2 will be the speed of the file system, I don't feel like it's super relevant in this specific tutorial which should be a fairly straightforward entry point
1e24ed6
to
dbd6610
Compare
b97f705
to
b067f48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the great tutorial @Dan-Flores !
I made some small suggestions below, but nothing is blocking so I'll approve.
1. Frame accuracy is critical, so :doc:`approximate mode <approximate_mode>` cannot be used | ||
2. Videos can be preprocessed once and then decoded many times | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great intro 👍 ! It's immediately clear to the reader whether the tutorial is relevant to them
# :class:`~torchcodec.decoders.VideoDecoder`, decoding workflows | ||
# usually involve creating a :class:`~torchcodec.decoders.VideoDecoder` instance, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# :class:`~torchcodec.decoders.VideoDecoder`, decoding workflows | |
# usually involve creating a :class:`~torchcodec.decoders.VideoDecoder` instance, | |
# :class:`~torchcodec.decoders.VideoDecoder`, decoding workflows | |
# involve creating a :class:`~torchcodec.decoders.VideoDecoder` instance, |
sample_data = json.loads(ffprobe_result.stdout) | ||
print("Data structure of custom frame mappings:") | ||
for frame in sample_data["frames"][:3]: | ||
print(f"{frame}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the above: consider printing the whole ffprobe_cmd
so that the reader can see exactly what is being executed, and so they can copy/paste it.
If it's worth it, we could also refactor the ffprobe_cmd creation, subprocess.run and f.write()
into a function that we call twice? No strong opinion.
QQ what's the side_data_list
on the output?
Data structure of custom frame mappings:
{'key_frame': 1, 'pts': 0, 'duration': 1001, 'side_data_list': [{'side_data_type': 'H.26[45] User Data Unregistered SEI message'}]}
{'key_frame': 0, 'pts': 1001, 'duration': 1001}
{'key_frame': 0, 'pts': 2002, 'duration': 1001}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on printing the ffprobe_cmd
and refactoring into a reusable function, thanks for the suggestions!
The side_data_list
is a field returned when accessing -show_entries
, even if it was not a specified key. I believe its additional metadata related to the codec?
Since it is not important (and I am unable to remove it within the ffprobe command), I'll update the print to only show the fields that we use.
# Performance: Frame decoding with custom frame mappings | ||
# ------------------------------------------------------ | ||
# | ||
# Although using custom_frame_mappings only impacts the initialization speed of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Although using custom_frame_mappings only impacts the initialization speed of | |
# Although using ``custom_frame_mappings`` only impacts the initialization speed of |
import subprocess | ||
import requests | ||
|
||
url = "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4_small.mp4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ Do you think we could be using the same video as in the seek_mode tutorial( url = "https://videos.pexels.com/video-files/854132/854132-sd_640_360_25fps.mp4")?
I could be wrong but I suspect this is where the difference in execution time comes from: 2 minutes for this tutorial vs 30 seconds for seek_mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, using the same video brought the execution time down to 23 seconds. I used a different resource with higher resolution to emphasize the performance gains, but I'll use the same video to reduce the docs execution time.
all_frames = torch.tensor([x[0] for x in frame_data], dtype=torch.int64) | ||
is_key_frame = torch.tensor([x[1] for x in frame_data], dtype=torch.bool) | ||
duration = torch.tensor([x[2] for x in frame_data], dtype=torch.int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, it's not worth creating frame_data
anymore, since the x[...]
indexing becomes un-pythonic. I think it'd be preferable to just define all_frames
and the other tensors as e.g.:
all_frames = torch.tensor([int(frame[pts_key]) for frame in input_data["frames"]], dtype=torch.int64)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above in this file, in docstring description of custom_frames_mappings
, let's add a link to the new tutorial, similarly to what was done for the seek_mode
docstring.
This PR adds the code that will become the
custom_frame_mappings
tutorial.Currently, 2 benchmarks are run using a short video (~3 minutes) and a long video (~13 minutes) :
Additionally, some fixes / optimizations were added to
custom_frame_mappings
logic inSingleStreamDecoder.cpp
to improve the performance.The page takes 1 minute 22 seconds to execute.
Benchmark results:
Compare performance of initializing VideoDecoder with custom_frame_mappings vs seek_modes
Running benchmarks on short_video.mp4
Creating a VideoDecoder object with custom_frame_mappings:
med = 7.68ms +- 20.83
Creating a VideoDecoder object with seek_mode='exact':
med = 17.26ms +- 14.54
Running benchmarks on long_video.mp4
Creating a VideoDecoder object with custom_frame_mappings:
med = 23.70ms +- 3.39
Creating a VideoDecoder object with seek_mode='exact':
med = 60.47ms +- 7.30
Decode frames with custom_frame_mappings vs exact seek_mode
Running benchmarks on short_video.mp4
Decoding frames with custom_frame_mappings:
med = 34.21ms +- 11.42
Decoding frames with seek_mode='exact':
med = 39.69ms +- 3.37
Running benchmarks on long_video.mp4
Decoding frames with custom_frame_mappings:
med = 50.53ms +- 8.38
Decoding frames with seek_mode='exact':
med = 83.95ms +- 17.38