Skip to content

Make device interface generic #606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 4, 2025
Merged

Conversation

dvrogozh
Copy link
Contributor

@dvrogozh dvrogozh commented Mar 28, 2025

Fixes: #605

Changes:

  • Device interface made device agnostic by intorducing class DeviceInterface from which specific backends should inherit their device specific implementations
  • Implemented CudaDevice derived from DeviceInterface
  • Created device interface registration mechanism (registerDeviceInterface)
  • Created device interface creation mechanism (createDeviceInterface)

These changes allow to replace CUDA specific code in VideoDecoder.cpp and VideoDecoderOps.cpp by device agnostic code.

CC: @scotts, @NicolasHug

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 28, 2025
@dvrogozh
Copy link
Contributor Author

dvrogozh commented Apr 1, 2025

Caught by macos ci:

In file included from /Users/ec2-user/runner/_work/torchcodec/torchcodec/pytorch/torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp:16:
/Users/ec2-user/runner/_work/torchcodec/torchcodec/pytorch/torchcodec/src/torchcodec/decoders/_core/./../../../../src/torchcodec/decoders/_core/DeviceInterface.h:27:1: error: 'DeviceInterface' defined as a struct here but previously declared as a class; this is valid, but may result in linker errors under the Microsoft C++ ABI [-Werror,-Wmismatched-tags]
struct DeviceInterface {
^
/Users/ec2-user/runner/_work/torchcodec/torchcodec/pytorch/torchcodec/src/torchcodec/decoders/_core/./../../../../src/torchcodec/decoders/_core/VideoDecoder.h:20:1: note: did you mean struct here?
class DeviceInterface;
^~~~~
struct

Plus there is a conflict with recent commits in VideoDecoderOps.cpp to resolve. Will do today. @scotts : let me know if there are other changes you want me to make.

@@ -138,7 +140,7 @@ class VideoDecoder {
std::optional<int> height;
std::optional<ColorConversionLibrary> colorConversionLibrary;
// By default we use CPU for decoding for both C++ and python users.
torch::Device device = torch::kCPU;
std::shared_ptr<DeviceInterface> device;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DeviceInterface object should be a required member of VideoDecoder. It should also be a unique_ptr, as the VideoDecoder should be the only owner. At VideoDecoder construction, it will be empty, and it will only take on a value after we've added a stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some video clips might contain multiple video streams (multiple audio streams, subtitles, etc.). As of now DeviceInterface is associated with a single stream and allows to potentially process different streams on different accelerators. Did you consider to support such scenario in the future? /We still can handle that even we will move DeviceInterface to VideoDecoder storing it in a vector or map indexed by streams. That is design decision to make./

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current C++ decoder was originally implemented to support decoding from multiple streams at once, hence why we have methods for adding a stream and tracking info per-stream. But, that functionality never actually worked correctly. Getting it to work correctly would have required a lot more changes and testing, and we only expose single-stream decoding in our public API. So we instead decided to just make it explicit: the C++ decoder only works on one stream at a time. That's more explicit now in the new name.

We actually have a TODO to move some of the current fields of StreamInfo into SingleStreamDecoder itself for this reason. The device interface is another field that belongs in SingleStreamDecoder itself rather than in StreamInfo. In general, we're trying to keep the code as simple as possible, which means not keeping things around that we "might need" in the future. If we need it in the future, we'll implement it then. And if we want to implement mulistream decoding, we may end up implementing an entirely new C++ class.

Copy link
Contributor

@scotts scotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dvrogozh, this is great progress! I'd also like to see all of the VideoDecoder::*OnCPU() functions moved into a CpuDevice class. A bunch of the initialize and release member functions will be no-ops, and that's fine. That way it's clear what is device-agnostic code and what is device-specific code. I think that will also clean-up the dispatch logic inside of VideoDecoder.

@dvrogozh
Copy link
Contributor Author

dvrogozh commented Apr 1, 2025

I'd also like to see all of the VideoDecoder::*OnCPU() functions moved into a CpuDevice class.

@scotts : I thought to do that in a follow up PR. Do you want me to do that in a single shot now?

@dvrogozh dvrogozh force-pushed the device-interface branch 3 times, most recently from 5643d10 to 433d0da Compare April 2, 2025 16:46
@dvrogozh
Copy link
Contributor Author

dvrogozh commented Apr 2, 2025

Rebased on top of merged #609 and #611.

Fixes: pytorch#605

Changes:
* Device interface made device agnostic by intorducing `class DeviceInterface`
  from which specific backends should inherit their device specific implementations
* Implemented `CudaDevice` derived from `DeviceInterface`
* Created device interface registration mechanism (`registerDeviceInterface`)
* Created device interface creation mechanism (`createDeviceInterface`)

These changes allow to replace CUDA specific code in `VideoDecoder.cpp` and
`VideoDecoderOps.cpp` by device agnostic code.

Signed-off-by: Dmitry Rogozhkin <[email protected]>
@dvrogozh
Copy link
Contributor Author

dvrogozh commented Apr 3, 2025

@scotts : can you, please, review again?

@scotts
Copy link
Contributor

scotts commented Apr 4, 2025

@dvrogozh, this is looking great! Let's make that one change to where the device lives, and also figure out why the C++ tests are failing.

@scotts
Copy link
Contributor

scotts commented Apr 4, 2025

Also: making a proper CPU device is fine for a follow-up PR, there's a bunch of changes already in this one, and it's self-consistent.

dvrogozh added 3 commits April 4, 2025 16:25
Signed-off-by: Dmitry Rogozhkin <[email protected]>
Signed-off-by: Dmitry Rogozhkin <[email protected]>
@dvrogozh
Copy link
Contributor Author

dvrogozh commented Apr 4, 2025

Let's make that one change to where the device lives

@scotts : done

and also figure out why the C++ tests are failing.

/usr/include/c++/13/bits/unique_ptr.h:97:23: error: invalid application of ‘sizeof’ to incomplete type ‘facebook::torchcodec::DeviceInterface’

@scotts : C++ tests were failing with quite cryptic manner. See error above. I've fixed it by including DeviceInterface.h header file in the test's .cpp file. Let me pay your attention on one thing here. There is a circular dependency at the moment - DeviceInterface.h needs some definitions from SingleStreamDecoder.h, namely SingleStreamDecoder::VideoStreamOptions and SingleStreamDecoder::FrameOutput and SingleStreamDecoder.h needs DeviceInterface. Currently I resolved it in a way that DeviceInterface.h explicitly includes SingleStreamDecoder.h and the latter just has class DeviceInterface forward declaration. This apparently was not enough for the test - it wanted to see full definition of DeviceInterface (I think due to implementaiton details of std::unique_ptr, I did not see this issue with std::shared_ptr - I believe I tried C++ test with it).

I think we can consider to extract SingleStreamDecoder::VideoStreamOptions and SingleStreamDecoder::FrameOutput to separate header file (or move to one of existing common files) and break above circular dependency. I would consider that for the follow up PR though.

@@ -493,6 +492,7 @@ class SingleStreamDecoder {
SeekMode seekMode_;
ContainerMetadata containerMetadata_;
UniqueDecodingAVFormatContext formatContext_;
std::unique_ptr<DeviceInterface> deviceInterface;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class variable members end with an underscore (foo_).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I overlooked moving it from the struct. Will fix in a follow up PR.

@scotts
Copy link
Contributor

scotts commented Apr 4, 2025

Agreed on moving SingleStreamDecoder::VideoStreamOptions and SingleStreamDecoder::FrameOutput in separate, new headers in a follow-up PR. Let's put VideoStreamOptions in its own header, and then the entire family of FrameOutput, FrameBatchOutput and AudioFramesOutput into one new header.

This looks great, thanks for making TorchCodec more general for devices!

@scotts
Copy link
Contributor

scotts commented Apr 4, 2025

@dvrogozh, I'm merging as-is, you can address the class member name in the follow-up PRs.

@dvrogozh
Copy link
Contributor Author

dvrogozh commented Apr 4, 2025

Agreed on moving SingleStreamDecoder::VideoStreamOptions <...>

Filed to track:

Making a proper CPU device is fine for a follow-up PR

Filed to track:


namespace facebook::torchcodec {

class CudaDevice : public DeviceInterface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term "device" should be reserved for the concept of pytorch device object. Here, the CudaDevice class name is misleading because it doesn't refer to a device, it refers to a device interface. @dvrogozh , do you mind submitting a PR to rename CudaDevice into CudaDeviceInterface (file names and file classes)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasHug : no problem, will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasHug : filed #626

dvrogozh added a commit to dvrogozh/torchcodec that referenced this pull request Apr 7, 2025
Renaming as requested in:
* pytorch#606 (comment)

Signed-off-by: Dmitry Rogozhkin <[email protected]>
scotts pushed a commit to scotts/torchcodec that referenced this pull request Apr 10, 2025
…ec][diff_train] Make device interface generic (pytorch#606)" for one test failure

Summary:
This diff reverts D72722867
D72475332: [torchcodec][diff_train] Make device interface generic (pytorch#606) by generatedunixname499836121 causes the following test failure:

Tests affected:
- [cogwheel:cogwheel_fblearner_inferno_hello_world#main](https://www.internalfb.com/intern/test/844425115815788/)

Here's the Multisect link:
https://www.internalfb.com/multisect/25810176
Here are the tasks that are relevant to this breakage:
T191383700: 100+ tests, 10+ build rules, some CI signals unhealthy for offline_inference

The backout may land if someone accepts it.

If this diff has been generated in error, you can Commandeer and Abandon it.

Reviewed By: scotts

Differential Revision: D72775487
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make C++ level device interface generic (device agnostic)
4 participants