You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/proposals/20240807-in-place-updates-implementation-notes.md
+169Lines changed: 169 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,3 +86,172 @@ sequenceDiagram
86
86
MS2 (NewMS)-->>MS Controller: Yes, M1!
87
87
MS Controller->>M1: Remove annotation ".../pending-acknowledge-move": ""
88
88
```
89
+
90
+
## Notes about in-place update implementation for KubeadmControlPlane
91
+
92
+
- In-place updates respect the existing control plane update strategy:
93
+
- KCP controller uses `rollingUpdate` strategy with `maxSurge` (0 or 1)
94
+
- When `maxSurge` is 0, no new machines are created during rollout; updates are performed only on existing machines via in-place updates or by scaling down outdated machines
95
+
- When `maxSurge` is 1:
96
+
- The controller first scales up by creating one new machine to maximize fault tolerance
97
+
- Once `maxReplicas` (desiredReplicas + 1) is reached, it evaluates whether to in-place update or scale down old machines
98
+
- For each old machine needing rollout, the controller evaluates if it is eligible for in-place update. If so, it performs the in-place update on that machine. Otherwise, it scales down the outdated machine (which will be replaced by a new one in the next reconciliation cycle)
99
+
- This pattern repeats until all machines are up-to-date, it then scales back to the desired replica count
100
+
101
+
- The implementation respects the existing set of responsibilities:
102
+
- KCP controller manages control plane Machines directly
103
+
- KCP controller enforces `maxSurge` limits during rolling updates
104
+
- KCP controller decides when to scale up, scale down, or perform in-place updates
105
+
- KCP controller runs preflight checks to ensure the control plane is stable before in-place updates
106
+
- KCP controller calls the `CanUpdateMachine` hook to verify if extensions can handle the changes
107
+
- When in-place update is possible, the KCP controller triggers the update by reconciling the desired state
108
+
109
+
- The in-place update decision flow definition is:
110
+
- If `currentReplicas < maxReplicas` (desiredReplicas + maxSurge), scale up first to maximize fault tolerance
111
+
- If `currentReplicas >= maxReplicas`, select a machine needing rollout and evaluate options:
112
+
- Check if selected Machine is eligible for in-place update (determined by `UpToDate` function)
113
+
- Check if we already have enough up-to-date replicas (if `currentUpToDateReplicas >= desiredReplicas`, skip in-place and scale down)
114
+
- Run preflight checks to ensure control plane stability
115
+
- Call the `CanUpdateMachine` hook on registered runtime extensions
116
+
- If all checks pass, trigger in-place update. Otherwise, fall back to scale down/recreate
117
+
- This flow repeats on each reconciliation until all machines are up-to-date
118
+
119
+
- Orchestration of in-place updates uses two key annotations:
120
+
-`in-place-updates.internal.cluster.x-k8s.io/update-in-progress` - Marks a Machine as undergoing in-place update
Machine Controller->>M1: Remove annotation "update-in-progress"<br/>Remove "UpdateMachine" from "pending-hooks"
172
+
```
173
+
174
+
Workflow #3: The KCP controller waits for in-place update to complete before proceeding with further operations.
175
+
176
+
```mermaid
177
+
sequenceDiagram
178
+
autonumber
179
+
participant KCP Controller
180
+
participant M1 as Machine
181
+
182
+
KCP Controller-->>M1: Is in-place update in progress?
183
+
M1-->>KCP Controller: Yes! ("update-in-progress" or "pending-hooks: UpdateMachine")
184
+
185
+
Note over KCP Controller: Wait for update to complete<br/>Requeue on Machine changes
186
+
187
+
KCP Controller-->>M1: Is in-place update in progress?
188
+
M1-->>KCP Controller: No! (annotations removed)
189
+
190
+
Note over KCP Controller: Continue with next Machine rollout or other operations
191
+
```
192
+
193
+
## Notes about managedFields refactoring for in-place updates (KCP/MS)
194
+
195
+
To enable correct in-place updates of BootstrapConfigs and InfraMachines, CAPI v1.12 introduced a refactored managedFields structure. This change was necessary for the following reasons:
196
+
197
+
- In CAPI <= v1.11, BootstrapConfigs/InfraMachines were only created, never updated
198
+
- Starting with CAPI v1.12, BootstrapConfigs/InfraMachines need to be updated during in-place updates. SSA is used because it provides proper handling of co-ownership of fields and enables unsetting fields during updates
199
+
200
+
### A "two field managers" approach
201
+
202
+
The refactoring uses **two separate field managers** to enable different responsibilities:
0 commit comments