Skip to content

Commit b9f4cb7

Browse files
authored
Merge pull request #13118 from k8s-infra-cherrypick-robot/cherry-pick-13016-to-release-1.12
[release-1.12] 📖 Update in-place update implementation notes
2 parents 5635dc2 + f18bded commit b9f4cb7

File tree

1 file changed

+169
-0
lines changed

1 file changed

+169
-0
lines changed

docs/proposals/20240807-in-place-updates-implementation-notes.md

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,3 +86,172 @@ sequenceDiagram
8686
MS2 (NewMS)-->>MS Controller: Yes, M1!
8787
MS Controller->>M1: Remove annotation ".../pending-acknowledge-move": ""
8888
```
89+
90+
## Notes about in-place update implementation for KubeadmControlPlane
91+
92+
- In-place updates respect the existing control plane update strategy:
93+
- KCP controller uses `rollingUpdate` strategy with `maxSurge` (0 or 1)
94+
- When `maxSurge` is 0, no new machines are created during rollout; updates are performed only on existing machines via in-place updates or by scaling down outdated machines
95+
- When `maxSurge` is 1:
96+
- The controller first scales up by creating one new machine to maximize fault tolerance
97+
- Once `maxReplicas` (desiredReplicas + 1) is reached, it evaluates whether to in-place update or scale down old machines
98+
- For each old machine needing rollout, the controller evaluates if it is eligible for in-place update. If so, it performs the in-place update on that machine. Otherwise, it scales down the outdated machine (which will be replaced by a new one in the next reconciliation cycle)
99+
- This pattern repeats until all machines are up-to-date, it then scales back to the desired replica count
100+
101+
- The implementation respects the existing set of responsibilities:
102+
- KCP controller manages control plane Machines directly
103+
- KCP controller enforces `maxSurge` limits during rolling updates
104+
- KCP controller decides when to scale up, scale down, or perform in-place updates
105+
- KCP controller runs preflight checks to ensure the control plane is stable before in-place updates
106+
- KCP controller calls the `CanUpdateMachine` hook to verify if extensions can handle the changes
107+
- When in-place update is possible, the KCP controller triggers the update by reconciling the desired state
108+
109+
- The in-place update decision flow definition is:
110+
- If `currentReplicas < maxReplicas` (desiredReplicas + maxSurge), scale up first to maximize fault tolerance
111+
- If `currentReplicas >= maxReplicas`, select a machine needing rollout and evaluate options:
112+
- Check if selected Machine is eligible for in-place update (determined by `UpToDate` function)
113+
- Check if we already have enough up-to-date replicas (if `currentUpToDateReplicas >= desiredReplicas`, skip in-place and scale down)
114+
- Run preflight checks to ensure control plane stability
115+
- Call the `CanUpdateMachine` hook on registered runtime extensions
116+
- If all checks pass, trigger in-place update. Otherwise, fall back to scale down/recreate
117+
- This flow repeats on each reconciliation until all machines are up-to-date
118+
119+
- Orchestration of in-place updates uses two key annotations:
120+
- `in-place-updates.internal.cluster.x-k8s.io/update-in-progress` - Marks a Machine as undergoing in-place update
121+
- `runtime.cluster.x-k8s.io/pending-hooks` - Tracks pending `UpdateMachine` runtime hooks
122+
123+
The following schemas provide an overview of the in-place update workflow for KCP.
124+
125+
Workflow #1: KCP controller determines that a Machine can be updated in-place and triggers the update.
126+
127+
```mermaid
128+
sequenceDiagram
129+
autonumber
130+
participant KCP Controller
131+
participant RX as Runtime Extension
132+
participant M1 as Machine
133+
participant IM1 as InfraMachine
134+
participant KC1 as KubeadmConfig
135+
136+
KCP Controller->>KCP Controller: Select Machine for rollout
137+
KCP Controller->>KCP Controller: Run preflight checks on control plane
138+
KCP Controller->>RX: CanUpdateMachine(current, desired)?
139+
RX-->>KCP Controller: Yes, with patches to indicate supported changes
140+
141+
KCP Controller->>M1: Set annotation "update-in-progress": ""
142+
KCP Controller->>IM1: Apply desired InfraMachine spec<br/>Set annotation "update-in-progress": ""
143+
KCP Controller->>KC1: Apply desired KubeadmConfig spec<br/>Set annotation "update-in-progress": ""
144+
KCP Controller->>M1: Apply desired Machine spec<br/>Set annotation "pending-hooks": "UpdateMachine"
145+
```
146+
147+
Workflow #2: The Machine controller detects the pending `UpdateMachine` hook and calls the runtime extension to perform the update.
148+
149+
```mermaid
150+
sequenceDiagram
151+
autonumber
152+
participant Machine Controller
153+
participant RX as Runtime Extension
154+
participant M1 as Machine
155+
participant IM1 as InfraMachine
156+
participant KC1 as KubeadmConfig
157+
158+
Machine Controller-->>M1: Has "update-in-progress" and "pending-hooks: UpdateMachine"?
159+
M1-->>Machine Controller: Yes!
160+
161+
Machine Controller->>RX: UpdateMachine(desired state)
162+
RX-->>Machine Controller: Status: InProgress, RetryAfterSeconds: 30
163+
164+
Note over Machine Controller: Wait and retry
165+
166+
Machine Controller->>RX: UpdateMachine(desired state)
167+
RX-->>Machine Controller: Status: Done
168+
169+
Machine Controller->>IM1: Remove annotation "update-in-progress"
170+
Machine Controller->>KC1: Remove annotation "update-in-progress"
171+
Machine Controller->>M1: Remove annotation "update-in-progress"<br/>Remove "UpdateMachine" from "pending-hooks"
172+
```
173+
174+
Workflow #3: The KCP controller waits for in-place update to complete before proceeding with further operations.
175+
176+
```mermaid
177+
sequenceDiagram
178+
autonumber
179+
participant KCP Controller
180+
participant M1 as Machine
181+
182+
KCP Controller-->>M1: Is in-place update in progress?
183+
M1-->>KCP Controller: Yes! ("update-in-progress" or "pending-hooks: UpdateMachine")
184+
185+
Note over KCP Controller: Wait for update to complete<br/>Requeue on Machine changes
186+
187+
KCP Controller-->>M1: Is in-place update in progress?
188+
M1-->>KCP Controller: No! (annotations removed)
189+
190+
Note over KCP Controller: Continue with next Machine rollout or other operations
191+
```
192+
193+
## Notes about managedFields refactoring for in-place updates (KCP/MS)
194+
195+
To enable correct in-place updates of BootstrapConfigs and InfraMachines, CAPI v1.12 introduced a refactored managedFields structure. This change was necessary for the following reasons:
196+
197+
- In CAPI <= v1.11, BootstrapConfigs/InfraMachines were only created, never updated
198+
- Starting with CAPI v1.12, BootstrapConfigs/InfraMachines need to be updated during in-place updates. SSA is used because it provides proper handling of co-ownership of fields and enables unsetting fields during updates
199+
200+
### A "two field managers" approach
201+
202+
The refactoring uses **two separate field managers** to enable different responsibilities:
203+
204+
1. **Metadata manager** (`capi-kubeadmcontrolplane-metadata` / `capi-machineset-metadata`):
205+
- Continuously syncs labels and annotations
206+
- Updates on every reconciliation via `syncMachines`
207+
208+
2. **Spec manager** (`capi-kubeadmcontrolplane` / `capi-machineset`):
209+
- Manages the spec and in-place update specific annotations
210+
- Updates only when creating objects or triggering in-place updates
211+
212+
### ManagedFields structure comparison
213+
214+
**CAPI <= v1.11** (legacy):
215+
- Machine:
216+
- spec, labels, and annotations are owned by `capi-kubeadmcontrolplane` / `capi-machineset` (Apply)
217+
- BootstrapConfig / InfraMachine:
218+
- labels and annotations are owned by `capi-kubeadmcontrolplane` / `capi-machineset` (Apply)
219+
- spec is owned by `manager` (Update)
220+
221+
**CAPI >= v1.12** (new):
222+
- Machine (unchanged):
223+
- spec, labels, and annotations are owned by `capi-kubeadmcontrolplane` / `capi-machineset` (Apply)
224+
- BootstrapConfig / InfraMachine:
225+
- labels and annotations are owned by `capi-kubeadmcontrolplane-metadata` / `capi-machineset-metadata` (Apply)
226+
- spec is owned by `capi-kubeadmcontrolplane` / `capi-machineset` (Apply)
227+
228+
### Object creation workflow (CAPI >= v1.12)
229+
230+
When creating new BootstrapConfig/InfraMachine:
231+
232+
1. **Initial creation**:
233+
- Apply BootstrapConfig/InfraMachine with spec (manager: `capi-kubeadmcontrolplane` / `capi-machineset`)
234+
- Remove managedFields for labels and annotations
235+
- Result: labels and annotations are orphaned, spec is owned
236+
237+
2. **First syncMachines call** (happens immediately after):
238+
- Apply labels and annotations (manager: `capi-kubeadmcontrolplane-metadata` / `capi-machineset-metadata`)
239+
- Result: Final desired managedField structure is established
240+
241+
3. **Ready for operations**:
242+
- Continuous `syncMachines` calls update labels/annotations without affecting the spec of a Machine
243+
- In-place updates can now properly update spec fields and unset fields as needed
244+
245+
### In-place update object modifications
246+
247+
When triggering in-place updates:
248+
249+
1. Apply BootstrapConfig/InfraMachine with:
250+
- Updated spec (owned by the spec manager)
251+
- `update-in-progress` annotation (owned by spec manager)
252+
- For InfraMachine: `cloned-from` annotations (owned by the spec manager)
253+
254+
2. Result after the in-place update trigger:
255+
- labels and annotations are owned by the metadata manager
256+
- spec is owned by the spec manager
257+
- in-progress and cloned-from annotations are owned by the spec manager

0 commit comments

Comments
 (0)