Skip to content

RFC-0037-Interoperability-Standard-of-3rd-Backend-Integration-Mechanism #64

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added RFC-0030-assets/3rd_backend_action.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added RFC-0030-assets/3rd_backend_architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added RFC-0030-assets/3rd_backend_vendor.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# The interoperability Standard of Third-party Backend Integartion Mechanism

**Authors:**
* @fffrog
* @hipudding


## **Summary**
As the top AI framework,PyTorch will see more and more backends wanting to integrate with it in the future. A universal third-party device integration mechanism will be their first choice, making the usability, accessibility, and stability of this mechanism particularly important.

The main goals of this RFC include the following three points:
- **CI/CD**: Ensure the quality of the third-party device integration mechanism through in-tree lightweight backend.
- **Demo**: Standardize the integration of third-party devices and provide official reference implementation.
- **Docs**: Provide E2E full-process documentation.


## **Motivation**
### **The Quality of Third-party Device Integration Mechanism**
PyTorch has mechanisms in place to ensure the functionality and stability of backends such as **CPU**, **CUDA**, **ROCM**, etc. However, there is a lack of corresponding mechanism for the universal third-party device integration mechanism that will be widely used by backends in the future. Since PyTorch is not aware of the downstream accelerators integrated into PyTorch based on the third-party device integration mechanism, code modifications in upstream such as function optimization and new features may cause functional problems in the downstream accelerator, or even compilation failure.

Take Ascend NPU as an example:

The Ascend NPU (torch_npu) successfully integrated with PyTorch based on the third-party device integration mechanism (PrivateUse1) in October 2023. Starting from November 2023, a set of daily tasks was established in the downstream code repository to test the functional compatibility between the latest PyTorch (main branch) and Ascend NPU (master branch) on a daily basis. As of March 2024, it has been running stably for approximately four months. The analysis results are shown in the table below:

| Type | Description | Occurr Counts | Example PR |
| :---: | :--- | :---: | :---: |
| Refactoring | Turn Allocator::allocate into non-const, derived class’ override function is not modified. | 1 | [#120969](https://github.com/pytorch/pytorch/pull/120969) |
| Refactoring | ​​Use DeviceIndex instead of int in CUDA wrappers, derived class’ override function is not modified. | 1 | [#119142](https://github.com/pytorch/pytorch/pull/119142) |
| Refactoring | Move new trace utils from source to header, which leads to some symbols can’t be found. | 1 | [#114367](https://github.com/pytorch/pytorch/pull/114367/files) |
| Refactoring | Migrate to getCvar* functions for env variable checking, which leads to function name can’t be found. | 1 | [#113797](https://github.com/pytorch/pytorch/pull/113797) |
| New Features | Add support for new data types, data type assert fails. | 2 | [#107586](https://github.com/pytorch/pytorch/pull/107586), [#116594](https://github.com/pytorch/pytorch/pull/116594) |
| New Features | Add function to materialize COW storages, which add a pure virtual function Allocator::copy_data, derived class didn’t implement this pure virtual function. | 2 | [#117053](https://github.com/pytorch/pytorch/pull/117053), [#113396](https://github.com/pytorch/pytorch/pull/113396) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that these things are going to continue happening, I would be curious what we expect the workflow to be when such a change is needed.

The options that come to mind here would be:

  • specific channel where such change are tracked such that extension writers can subscribe and update their extensions accordingly. I guess this would mean that the extension is pinned to some version of PT and they move forward in lockstep.
  • we define the extension points implemented out of core in such a way that we can preserve BC there even when changing core. Might be tricky to define such API and would restrict what extension writers can do.

In both cases, I think we might need to continue ensuring that it is ok for extensions not to implement all the features that can be extended. Either by having some generic feature flag to say which feature each extension supports, or having a good default implementation that works, ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the first approach seems more appropriate and aligns closely with my description in this RFC.

We can provide a way for third-party devices to easily obtain those modifications that have affected third-party devices (including original PR and adaptation methods). This way, when the corresponding PyTorch version switches from Version A to Version B, it will be easy to see which parts need to be re-adapted for third-party devices. The advantages will be more obvious when there are many third-party devices.

This approach has minimal impact and is relatively easy for third-party devices to accept. It also does not impose significant restrictions, obstacles, or additional workload on community developers.

Regarding the mentioned approach, there are two scenarios:

  • In Tree:
    If the related test cases for the Demo as part of CI fail, the developer needs to modify the corresponding implementation of the Demo synchronously. During the final code merge, the modifications to the Demo files will be checked (distinguishing normal modifications to Demo files). If there are modifications, a special tag will be added to the PR.
  • Out of Tree:
    the reviewer can determine whether to add a special tag based on the real situation and mark the corresponding PR in the PyTorch repository on the PR.


Based on the above table, it is evident that upstream modifications do have a negative impact on out-of-tree third-party devices. These modifications directly affect the functionality of third-party devices and greatly increase the cost of adaptation and maintenance.


### **Standardized Integration of Third-party Device**
PyTorch is a complex framework with numerous features, making it a significant challenge for third-party devices to adapt to. The quality of integration methods directly impacts the stability and sustainability of the subsequent functionality of third-party backends. Therefore, standardizing the integration for third-party devices not only reduces integration barriers and time costs but also enhances compatibility and stability. This, in turn, solidifies PyTorch's leading position in the field of artificial intelligence frameworks.


## **Proposed Implementation**
![Architecture Outline](./RFC-0030-assets/3rd_backend_architecture.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this diagram!
I think we want to do a couple clarifications here:

  • For the middle case of XPU, showcase what still needs to be in core (allocator, stream, event, etc)
  • Similarly for the out of tree case, would be curious to showcase what is in core and is providing the extension point being used.
  • I would separate this PrivateUse1 device in core (that will look similar to XPU) and make it point to both out of core projects and point to the demo project in core (that is independent from the core integration).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is my mistake, I didn't notice that XPU has upstreamed some modules into pytorch, please refer to the latest diagram below.
  2. The out of tree case is similar to that of IPU, only some necessary interfaces, logical branch codes and so on are integrated into the core. However, PrivateUse1 also has its own special features:
  • Completeness: Support all torch functions as much as possible
  • Universality: Provide flexibility to third-party devices as much as possible to shield differences between devices
  1. As you commented below, it's probably best to keep the demo project as a PyTorch-organized project rather than keeping it in the tree.


As shown in the above diagram, based on the third-party device integration mechanism (PrivateUse1) and a real device, a in-tree and lightweight backend will be added. This backend synchronously adds minimal test suites and end-to-end integration documentation. It serves as both a quality assurance for the third-party device integration mechanism and a standard official reference implementation for third-party devices. This backend has the following characteristics:

* **Functionality**: Implements only the most essential and minimal functionality while covering the entire feature set.
* **Tests**: Minimal test suite
* **Usage**: Reserved for CI/CD and official standard reference implementation, not intended for user use.
* **Compilation**: Separate compilation entry point, separate from other backends.
* **Independence**: The related code is stored separately, without coupling with other functional modules of PyTorch.
Comment on lines +48 to +50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like a lot of this can be achieved by making this demo backend another repo on the pytorch/* org that we take as a submodule for testing only and we build. This way, it is:

  • Fully independent codebase from core, just like the real out-of-tree backends
  • A real end-to-end example of how to make a new backend
  • Can be fully tested in our CI and be pinned as needed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.
Compared to in-tree, this is indeed a better way, thank you.

Copy link
Author

@FFFrog FFFrog May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, compared with in-tree, out-of-tree will be more troublesome when merging some PRs that cause the 3rd device to be unavailable, because developers need to coordinate between the two repo by pin commitid or other methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there are pros and cons both ways. Looking into what would be the easiest to setup. Will get back to you shortly.

* **In-tree**: Maintained by community developers collectively.

Taking Ascend NPU as an example, PrivateUse1 is used as the exclusive key of Ascend NPU. Based on PrivateUse1 Key + third-party device access mechanism + Ascend NPU, a lightweight in-tree backend is implemented, providing official reference implementation and third-party device quality care (combined with github action ).

Based on the lightweight backend, we will add a new github action task described in the below to verify the functionalities related to PrivateUse1, which is similar to other github actions, but there will be some differences:
1. If the validation fails, developers need to attempt code refactoring to minimize the impact on the privateuse1 mechanism. If unavoidable, they need to modify the implementation of the built-in lightweight backend and resubmit the PR.
2. If the validation passes, when the PR is merged into the main branch, the modified files will be checked. If there are modifications related to the built-in lightweight backend, a dedicated label will be added to the current PR for downstream developers to filter and identify, speeding up downstream adaptation.

![Workflow for Contributor](./RFC-0030-assets/3rd_backend_action.png)

Based on the above mechanism, we can improve the compatibility and stability between upstream and downstream while minimizing the impact on upstream developers.

The impact on the third-party device integration process is shown in the following diagram:
![Workflow for Vendor](./RFC-0030-assets/3rd_backend_vendor.png)

Explanation:
1. Third-party device developers can directly copy the official in-tree reference implementation provided by PyTorch.
2. By following the module granularity, they can refer to the corresponding end-to-end documentation and module reference implementation for reuse/modification. This approach minimizes the adaptation threshold and ensures compatibility between upstream and downstream.


## **Drawbacks**
The above mechanism has the following drawbacks:
1. To some extent, it increases the development workload for community developers.


## **Alternatives**
Selecting an out-of-tree backend implementation based on the third-party device integration mechanism that is stable and continuously evolving as the CI/CD validation backend for the third-party device integration mechanism and as the official reference implementation.

We will also add a new github action related to PrivateUse1 and the impact on community developers is similar to the above solution, but slightly different:
Per PR workflow (referring to XLA's implementation):
1. Developers perform feature development and validate functionality locally.
2. They submit a Pull Request, triggering relevant GitHub actions (we will add a new action here to fetch the pinned commit ID code of the out-of-tree backend, compile it, and perform testing and validation). If the validation fails, developers need to attempt code refactoring to minimize the impact on the privateuse1 mechanism.
3. If unavoidable, they need to wait for the out-of-tree backend team to adapt to the PR within 24 hours, and then update the pinned commit ID in PyTorch for that out-of-tree backend.
4. Developers retrigger the GitHub action, and if it passes, the PR is merged.

Weekly recurring task:
1. Add a weekly recurring task (GitHub action) to update the pinned commit ID for the corresponding out-of-tree backend. This helps reduce the negative impact of the out-of-tree backend on upstream.


## Open points
Is it possible to preset a common test set for all out-of-tree devices?