Use cuda filters to support 10-bit videos #899

dvrogozh · 2025-09-12T23:47:05Z

As we discussed in #853, that's the version to enabled 10-bit support in cuda device interface by managing cuda filter graph within cuda device interface. This change makes these changes:

Added support for 10-bit (and likely 12-bit which I did not test) via scale_cuda ffmpeg filter on ffmpeg >=n5.0
Added support for 10-bit via CPU fallback on ffmpeg n4.4 (as color conversion in scale_cuda appeared in ffmpeg >=n5.0)
Added support to directly returned input from in CPU device interface if it does not require conversion (used in combination with CUDA n4.4 fallback)

CC: @scotts @NicolasHug

src/torchcodec/_core/CpuDeviceInterface.cpp

scotts · 2025-09-22T01:11:15Z

src/torchcodec/_core/CpuDeviceInterface.cpp

-  auto deleter = [filteredAVFramePtr](void*) {
-    UniqueAVFrame avFrameToDelete(filteredAVFramePtr);
+  std::vector<int64_t> strides = {avFrame->linesize[0], 3, 1};
+  AVFrame* avFrameClone = av_frame_clone(avFrame.get());


We weren't calling av_frame_clone() here before. Was that an error?

No, that was not a mistake. The function has changed. Previously it combined 2 operations: 1) conversion of the frame with filtergraph, 2) creating a tensor. The frame converted by filtergraph lived locally inside UniqueAVFrame and we just released it in the end to the tensor deleter. I, the schema was:

torch::Tensor func(const UniqueAVFrame& input) { UniqueAVFrame output = filtergraph->convert(input); AVFrame* outputPtr = output.release(); return torch::from_blob(output->data, deleter(outputPtr), ...); }

In the new code I moved frame conversion with filtergraph outside of the function. And function signature has changed - it started to accept just a constant reference to a frame without knowing whether it will be further needed or not. But we still need to pass a reference to the deleter to make sure that tensor will unreference it once it's not needed. To achieve that we clone the frame which references the same data and pass it to the deleter. In this way caller of the function can still do something with the frame if needed, or just unreference it (what we actually are doing). So, schema now is:

torch::Tensor func(const UniqueAVFrame& frame) { AVFrame* frameClone = av_frame_clone(frame.get()); return torch::from_blob(frameClone->data, deleter(frameClone), ...); }

src/torchcodec/_core/CudaDeviceInterface.cpp

scotts · 2025-09-22T01:47:03Z

src/torchcodec/_core/CudaDeviceInterface.cpp

+  std::stringstream filters;
+
+  unsigned version_int = avfilter_version();
+  if (version_int < AV_VERSION_INT(8, 0, 103)) {


I don't love that we're doing FFmpeg versions checks here, but we also do this in several other places in CudaDeviceInterface.cpp. In the rest of the C++ code, we've kept such checks hidden behind functions in FFMPEGCommon. Most of those, however, are about a particular field of a struct being renamed. This logic is very specific to filtergraph, and it does make sense to keep that here.

There's no action to take based on this comment, I'm just pointing out it's an awkward situation. I might end up refactoring it in some of my decoder-native transform work.

src/torchcodec/_core/CudaDeviceInterface.cpp

scotts

@dvrogozh, thank you for fixing this! There's some minor changes to make, the biggest one being I think we're mistakenly comparing pointer values instead of actual objects. But this is great, we should be able to merge after these minor changes.

For: meta-pytorch#776 Signed-off-by: Dmitry Rogozhkin <[email protected]>

Signed-off-by: Dmitry Rogozhkin <[email protected]>

scotts

@dvrogozh, this looks great, thank you for the fix!

NicolasHug · 2025-09-23T17:00:38Z

src/torchcodec/_core/CpuDeviceInterface.cpp

+    }
+    return;
+  }
+


@dvrogozh QQ - do we expect frames to come out as AV_PIX_FMT_RGB24? I'm a bit surprised that this would be the case, I'd assume most encoded frames are in YUV or something similar?

This change here in CPU device interface is due to CPU fallback in CUDA device interface to handle ffmpeg-4.4 10-bit streams:
https://github.com/dvrogozh/torchcodec/blob/f298fa7e722e9e0932813545e819e7da1a31994d/src/torchcodec/_core/CudaDeviceInterface.cpp#L242-L247
The filter graph gets setup in CUDA device interface which outputs RGB24, then the output needs to be converted to tensor. Which happens here in CPU device interface.

Above being said, I think there might be 2 cases in the future where we might see RGB24 coming out of decoders:

Images support and "image" codecs such as motion jpeg

HW decoders sometimes blend decoding and color space/scaling into single pass for the optimization purposes.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 12, 2025

dvrogozh mentioned this pull request Sep 12, 2025

Abstract filters support #853

Draft

scotts reviewed Sep 22, 2025

View reviewed changes

src/torchcodec/_core/CpuDeviceInterface.cpp Show resolved Hide resolved

scotts reviewed Sep 22, 2025

View reviewed changes

src/torchcodec/_core/CudaDeviceInterface.cpp Outdated Show resolved Hide resolved

scotts reviewed Sep 22, 2025

View reviewed changes

src/torchcodec/_core/CudaDeviceInterface.cpp Outdated Show resolved Hide resolved

scotts requested changes Sep 22, 2025

View reviewed changes

dvrogozh added 2 commits September 22, 2025 18:15

Use cuda filters to support 10-bit videos

652d2a2

For: meta-pytorch#776 Signed-off-by: Dmitry Rogozhkin <[email protected]>

Support direct AVFrame conversion to tensor in CPU device interface

f298fa7

Signed-off-by: Dmitry Rogozhkin <[email protected]>

dvrogozh force-pushed the filters-cuda2 branch from e206e22 to f298fa7 Compare September 22, 2025 18:24

scotts approved these changes Sep 23, 2025

View reviewed changes

scotts merged commit fc60ed6 into meta-pytorch:main Sep 23, 2025
47 checks passed

NicolasHug reviewed Sep 23, 2025

View reviewed changes

scotts mentioned this pull request Sep 29, 2025

Initial C++ implementation of transforms #902

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use cuda filters to support 10-bit videos #899

Use cuda filters to support 10-bit videos #899

Uh oh!

dvrogozh commented Sep 12, 2025

Uh oh!

Uh oh!

scotts Sep 22, 2025

Uh oh!

dvrogozh Sep 22, 2025

Uh oh!

Uh oh!

scotts Sep 22, 2025

Uh oh!

Uh oh!

scotts left a comment

Uh oh!

scotts left a comment

Uh oh!

Uh oh!

NicolasHug Sep 23, 2025

Uh oh!

dvrogozh Sep 23, 2025

Uh oh!

Uh oh!

Use cuda filters to support 10-bit videos #899

Use cuda filters to support 10-bit videos #899

Uh oh!

Conversation

dvrogozh commented Sep 12, 2025

Uh oh!

Uh oh!

scotts Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

dvrogozh Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

scotts Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

scotts left a comment

Choose a reason for hiding this comment

Uh oh!

scotts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasHug Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

dvrogozh Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!