Skip to content

Commit 525b6f6

Browse files
committed
publish contribution guidelines
1 parent 9874eb7 commit 525b6f6

File tree

1 file changed

+219
-0
lines changed

1 file changed

+219
-0
lines changed

docs/pci/contribution-guidelines.md

+219
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
# PCIe Support in Firecracker Community Roadmap
2+
3+
This document describes the high-level changes required to support PCIe and device passthrough in Firecracker
4+
and the main responsibilities of the maintainers and the community to achieve the success of the initiative.
5+
This document will be discussed during the November 6, 2024 meeting.
6+
I will upload this document as a PR to the [poc/pcie](https://github.com/firecracker-microvm/firecracker/tree/poc/pcie)
7+
branch so that everybody will have the opportunity to leave comments along the way.
8+
9+
## Motivation
10+
11+
Firecracker currently supports only MMIO devices.
12+
By adding support for PCIe we would get the following benefits:
13+
14+
* Increase max attached device count - up to 31 devices on a single PCI bus, with up to 256 buses,
15+
if we add support for multiple buses.
16+
* Ability to assign multiple interrupts per device (MSI-X) - opens the door for I/O scalability / MQ devices
17+
* MSI-X interrupts also improve virtio-pci performance over virtio-mmio which uses legacy IRQ
18+
* device hot-plugging through PCIe Hotplug
19+
* pass-through of physical devices, like GPUs or EBS volumes, through VFIO.
20+
21+
## Challenges
22+
23+
Supporting PCIe in Firecracker and, in particular, device pass-through, introduces new challenges. Namely:
24+
25+
* **overheads:** supporting the full PCI specification might negatively impact the boot time and
26+
memory overheads of Firecracker VMs.
27+
* We can mitigate this by allowing for completely disabling PCIe support via VM configuration
28+
when more lightweight virtualization is preferred.
29+
* **oversubscription:** simple PCIe device passthrough using VFIO requires the VMM to allocate the
30+
entire physical memory of the VM to allow for DMA from the device.
31+
* Solutions to this exist, the most promising being virtio-iommu, but also swiotlb and PCI ATS/PRI
32+
* **security**: the device has access to the entire guest physical memory, which may change the
33+
security posture of firecracker.
34+
* The device will need to be cleared before being attached to avoid cross-VM interferences.
35+
* Compatibility with the secret-hiding initiative to harden Firecracker security posture needs
36+
to be carefully evaluated.
37+
* **snapshot/resume**: it will likely not be possible to snapshot external PCIe devices,
38+
therefore snapshot/resume will not be supported for active/online passed-through devices.
39+
* support for resumption with offline device should be possible
40+
* an alternative to this could be hotplugging a device after resume
41+
42+
## Contribution Guidelines
43+
44+
Before diving deeper into the required changes in Firecracker, it’s important to be clear on the
45+
responsibility splitbetween the community contributors and the maintainers.
46+
As this is a community-driven initiative, it will be responsibility of contributors to propose designs,
47+
make changes, and work with the upstream rust-vmm community.
48+
Maintainers of Firecracker will provide guidance, code reviews, project organization, facilitate rust-vmm
49+
interactions, and automated testing of the new features.
50+
51+
### Contributors
52+
53+
* PCIe-specific development will happen on a separate feature branch `features/pcie` which maintainers will setup,
54+
with all the required CI artifacts and infrastructure.
55+
* Code refactors to enable PCI features should be split in a refactor merged into main and a PCI-specific part
56+
merged into the feature branch.
57+
For example, we need to rework FC device management to support PCI, the development will need to be done in main,
58+
and then merged to the PCIe feature branch.
59+
* Generic code that is not specific to Firecracker should be discussed with the upstream rust-vmm community, and,
60+
if possible, merged in rust-vmm, unless explicit exemption is granted by the maintainers.
61+
* Contributors should provide design documents in case of features spanning multiple PRs to receive
62+
early guidance from maintainers.
63+
* Contributors should not leave open PRs stale for more than two weeks.
64+
* All usual contribution guidelines apply: [CONTRIBUTING.md](https://github.com/firecracker-microvm/firecracker/blob/main/CONTRIBUTING.md).
65+
66+
### Maintainers
67+
68+
* Maintainers will create a separate feature branch and periodically rebase it on top of main
69+
(every 3 weeks or on-demand in case of dependencies).
70+
* Maintainers will provide a POC reference implementation showcasing basic PCIe support:
71+
[poc/pcie](https://github.com/firecracker-microvm/firecracker/tree/poc/pcie).
72+
The POC is just a scrappy implementation and will need to be rewritten from scratch to meet the quality
73+
and security bars of Firecracker.
74+
* Maintainers will prepare CI artifacts for PCIe-specific testing, adding separate artifacts with
75+
PCIe support (eg guest kernels)
76+
* Maintainers will setup test-on-PR for the feature branch to run on PCIe specific artifacts
77+
* Maintainers will setup nightly functional and performance testing on the PCIe feature branch
78+
* Maintainers will create a new project on GitHub to track the progress of the project using public github issues
79+
* Maintainers will organize periodic meeting sync-ups with the community to organize the work (proposed every 2 weeks)
80+
* Maintainers will provide guidance around the code changes
81+
* Maintainers will review new PRs to the feature branch within one week.
82+
Two approvals from maintainers are required to merge a PR.
83+
Maintainers should provide the required approvals or guidance to unblock the PR to unblock within two weeks.
84+
* Maintainers will work with the internal Amazon security team to review the changes
85+
before every merge of the feature branch in main.
86+
Any finding will be shared with the community to help address the issues.
87+
88+
### Acceptance Criteria
89+
90+
A proposal of the different milestones of the project is defined in the following sections.
91+
Each milestone identifies a point in the project where a merge of the developed features in the main branch is possible.
92+
In order to accept the merge:
93+
94+
* All Firecracker features and architectures are supported for PCIe (for example, Snapshot Resume, and ARM).
95+
* All functional and security tests should pass with the PCIe feature enabled on all supported devices.
96+
* Open-source performance tests should not regress with the PCIe feature enabled compared to MMIO devices.
97+
* Internal performance tests should not regress with the PCIe feature enabled.
98+
In case of regressions, details and reproducers will be shared with the community.
99+
* Approval from internal Amazon security team needs to be granted.
100+
In case of blockers, details will be shared with the community.
101+
* Overhead of firecracker must not increase significantly (more than 5%)
102+
* Oversubscription of firecracker VMs should not be impaired by the changes.
103+
Exceptions can be granted if there is a path forward towards mitigation (for example, in the case of VFIO support).
104+
105+
## Milestones
106+
107+
This section describes a proposed high-level plan of action to be discussed with the community.
108+
A more detailed plan will need to be provided by contributors before starting the implementation,
109+
which maintainers will help refine.
110+
111+
### 0. Proof of Concept and Definition of Goals
112+
113+
It is important that both maintainers and the community build confidence with the changes
114+
and verify that it’s possible to achieve the respective goals with this solution.
115+
For this reason, the Firecracker team has built a public proof-of-concept with basic PCI passthrough and virtio-pci support:
116+
[poc/pcie](https://github.com/firecracker-microvm/firecracker/tree/poc/pcie).
117+
The implementation of the POC is scrappy and would require a complete rewrite from scratch that meets
118+
Firecracker quality and security bars, but it showcases the main features (and drawbacks) of
119+
PCIe-passthrough and virtio-pci devices.
120+
121+
Before starting the actual implementation below, we need to be able to answer:
122+
123+
* what are the benefits to internal and external customers for supporting PCIe in firecracker?
124+
* how is performance going to improve for virtio devices?
125+
* what are the additional overheads to boot time and memory?
126+
* what are the limitations of PCIe-passthrough? How can we avoid them?
127+
128+
### 1. virtio-pci support
129+
130+
The first milestone will be the support of the virtio-pci transport layer for virtio.
131+
This is not strictly required for PCIe device passthrough, but we believe it is the easier way to get
132+
the bulk of the PCI code merged into firecracker and rust-vmm, as there shouldn’t be any concerns from
133+
the security and over-subscription point of view.
134+
135+
With this milestone, Firecracker customers will be able to configure any device to be attached on the
136+
PCI bus instead of the MMIO bus through a per-device config.
137+
If no device in the VM uses PCI, no PCI bus will be created and there will be no changes over the current state.
138+
PCI support will be a first-class citizen of Firecracker and will be compiled in the official releases of Firecracker.
139+
140+
Maintainers will:
141+
142+
* setup a new feature branch
143+
* setup testing artifacts and infrastructure (automated test-on-PR and nightly tests on the new branch).
144+
* provide guidance and reviews to the community
145+
* share performance results from public and internal tests
146+
* drive the security review with Amazon Security
147+
148+
A proposed high-level plan for the contributions is presented below.
149+
A more detailed plan will need to be provided by contributors before starting the implementation.
150+
151+
* refactor Firecracker device management code to make it more extensible and work with the PCI bus.
152+
* refactor Firecracker virtio code to abstract the transport layer (mmio vs pci).
153+
* implement PCI-specific code to emulate the PCI root device and the PCI configuration space.
154+
* if possible, it would be ideal to create a new PCI crate in rust-vmm.
155+
A good starting point is cloud-hypervisor implementation.
156+
* (x86) implement the MMCONFIG extended PCI configuration space for x86.
157+
* (ARM) expose the PCI root device in the device tree (double check).
158+
* implement the virtio-pci transport code with legacy irq
159+
* implement MSI-X interrupts
160+
* MSI-X is an enhanced way for the device to deliver interrupts to the driver,
161+
allowing for up to 2048 interrupt lines per device
162+
* add support for snapshot-resume for the virtio-pci devices and PCI bus.
163+
164+
Open questions:
165+
166+
* will it be possible to upstream the pci crate in rust-vmm?
167+
Will it require using rust-vmm crates not yet used in Firecracker (vm-devices, vm-allocator, ...)?
168+
How much work will it be to refactor FC device management to start using those crates as well?
169+
* do we need to support PCI BAR relocation as well?
170+
* will we need to maintain both PCI and MMIO transport layers for virtio devices?
171+
172+
### 2. PCIe-passthrough support design
173+
174+
The second milestone will be the design of the support of VFIO-based PCI-passthrough
175+
which will allow passing to the guest any PCIe device from the host.
176+
This design will need to answer the still open questions around snapshot/resume and VM oversubscriptability,
177+
and will guide the implementation of the following milestones.
178+
179+
In particular, the main problems to solve are:
180+
181+
* how do we allow for oversubscriptability of VMs with VIRTIO devices?
182+
* some ideas are to use virtio-iommu or a swiotlb or PCI ATS/PRI
183+
* how do we securely perform DMA from the device if we enable “secret hiding”.
184+
* "Secret hiding" is the un-mapping the guest physical memory from the host kernel address space
185+
to remove sensible information from it, protecting it from speculative execution attacks.
186+
* one idea is the use of a swiotlb in the guest
187+
* how do we manage the snapshot/resume of these vfio devices?
188+
* can we snapshot/resume with an offline device? Do we need to support hotplugging?
189+
190+
To enable prototyping of this milestone, maintainers will setup test artifacts and infrastructure to
191+
test on Nvidia GPUs on PR and nightly.
192+
Maintainers will also start early consultation with Amazon Security to identify additional requirements.
193+
194+
### 3. Basic PCIe-passthrough support implementation
195+
196+
This proposed milestone will cover the basic implementation of PCIe device-passthrough via VFIO.
197+
With this milestone, Firecracker customers will be able to attach any and as many VFIO devices to the VM before boot.
198+
However, customers will not be able to oversubscribe memory of VMs with PCI-passthrough devices,
199+
as the entire guest physical memory needs to be allocated for DMA.
200+
It should be possible, depending on the investigations in milestone 2, to snapshot/resume a VM with an offlined VFIO device.
201+
202+
We expect this change to be fairly modular and self-contained as it builds upon the first milestone,
203+
adding just an additional device type.
204+
The biggest hurdle will be the thorough security review and the considerations around its usefulness for internal customers.
205+
206+
We expect the biggest hurdles for this change to be the security review, as it’s a change in the current Firecracker threat model.
207+
Furthermore, a path forward towards full oversubscribability needs to be identified and prototyped for this milestone to be accepted.
208+
209+
### 4. Over-subscriptable PCIe-passthrough VMs
210+
211+
Depending on the investigations in milestone 2, we need to implement a way to oversubscribe memory
212+
from VMs with PCI-passthrough devices.
213+
The challenge is that the hypervisor needs to know in advance which guest physical memory ranges will be used by DMA.
214+
215+
One way to do it would be to ask the guest to configure a virtual IOMMU to enable DMA from the device.
216+
In this case, the hypervisor will know which memory ranges the guest is using for DMA so that they can be granularly pre-allocated.
217+
This could be done through the `virtio-iommu` device.
218+
219+
One alternative could be PCI ATS/PRI or using a swiotlb in the guest.

0 commit comments

Comments
 (0)