-
Notifications
You must be signed in to change notification settings - Fork 331
feat: drain and volume detachment status conditions #1876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: drain and volume detachment status conditions #1876
Conversation
4bb4d97
to
fb3ac47
Compare
Pull Request Test Coverage Report for Build 15175010548Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
/assign @engedaam |
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
/remove-lifecycle stale |
b527992
to
21176e1
Compare
/hold |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
d536a96
to
43949ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
39b868c
to
aba9dfe
Compare
aba9dfe
to
3fc5cac
Compare
655f7a5
to
a34eca9
Compare
a34eca9
to
8361363
Compare
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: engedaam, jmdeal, jonathan-innis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
…es-sigs#1876)" This reverts commit 5a6707a.
* test: Lower resource requests for NodeClaim test (kubernetes-sigs#2229) * perf: Don't deepcopy inside of watch handler functions (kubernetes-sigs#2232) * test: Add random name string for NodePool and NodeClass (kubernetes-sigs#2231) * test: Update E2E testing suite to be named Regression (kubernetes-sigs#2234) * refactor: convert validation to an interface (kubernetes-sigs#2220) * fix: allow non-churn empty nodes to be disrupted (kubernetes-sigs#2206) * perf: Only deep copy nodes during GetCandidates once (kubernetes-sigs#2233) * feat: add metrics for disruption candidate validation (kubernetes-sigs#2239) * perf: Only call .Available() once which prevents duplicate allocs (kubernetes-sigs#2241) * docs: update issue triage meeting schedule (kubernetes-sigs#2244) * test: deflake NodeClaim and presubmit tests (kubernetes-sigs#2240) * perf: Avoid deepcopy when get nodePool/cluster health (kubernetes-sigs#2247) * perf: Improve OrderByPrice performance (kubernetes-sigs#2250) * test: add validating admission policy for nodeclass status (kubernetes-sigs#2251) Co-authored-by: Jonathan Innis <[email protected]> * feat: drain and volume detachment status conditions (kubernetes-sigs#1876) * fix: show the cron parse error to users to allow them to debug (kubernetes-sigs#2258) * perf: Don't deep-copy nodes and nodeclaims in our synced check (kubernetes-sigs#2260) * chore: Fix getting current script directory in install-kwok.sh (kubernetes-sigs#2262) * perf: Perform quick checks in node health first (kubernetes-sigs#2264) * chore: Update pod metrics when pod is completed (kubernetes-sigs#2259) * fix: Correctly build nodepool mapping for complex clusters (kubernetes-sigs#2263) * fix: fail open for missing nodeclaims in termination (kubernetes-sigs#2266) * perf: Limit GetInstanceTypes() calls per-NodeClaim (kubernetes-sigs#2271) * perf: Parallelize disruption execution actions (kubernetes-sigs#2270) * fix: Fix node owner reference update (kubernetes-sigs#2274) * perf: Be more resilient to deletion failures in disruption controller (kubernetes-sigs#2272) * chore(deps): bump the go-deps group with 2 updates (kubernetes-sigs#2277) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: Ensure we can stand up multiple partitions with kwok (kubernetes-sigs#2283) * chore: Inject resources into Kwok through a patch (kubernetes-sigs#2285) * chore: Update NodeClaim E2E test to only replace one status condition (kubernetes-sigs#2284) * chore: Avoid validating admission policy for clusters older then 1.30 (kubernetes-sigs#2289) * chore(deps): bump the go-deps group with 2 updates (kubernetes-sigs#2295) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: bump go version to 1.24.4 (kubernetes-sigs#2298) * chore: Only log that the command succeeded when it actually did (kubernetes-sigs#2302) * fix: Fix bug with MarkForDeletion before creating replacements (kubernetes-sigs#2300) * perf: Refactor the eviction queue to be multithreaded (kubernetes-sigs#2252) * docs: Add Bizfly Cloud provider (kubernetes-sigs#2303) * chore: Bump lifecycle cache expiration to one hour (kubernetes-sigs#2307) * chore: Use cluster state to check replacement NodeClaim existence (kubernetes-sigs#2308) * chore(deps): bump github.com/samber/lo from 1.50.0 to 1.51.0 in the go-deps group (kubernetes-sigs#2315) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: bump operatorpkg (kubernetes-sigs#2314) * chore(deps): bump the k8s-go-deps group across 1 directory with 4 updates (kubernetes-sigs#2317) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: Refactor Orchestration Queue and Handle Mark/Unmark Deletion in Queue (kubernetes-sigs#2305) * chore(deps): bump the k8s-go-deps group with 7 updates (kubernetes-sigs#2326) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * perf: multithreaded orchestration queue (kubernetes-sigs#2293) * test: Add nodeclaim name when you have garbage collection (kubernetes-sigs#2333) * perf: Reduce multiple patch calls in instance termination (kubernetes-sigs#2324) * fix: add helm rbac for kwok-provider to update finalizers (kubernetes-sigs#2336) Signed-off-by: Max Cao <[email protected]> * feat: configure CRD status operator with larger histogram buckets (kubernetes-sigs#2328) * chore(deps): bump sigs.k8s.io/yaml from 1.4.0 to 1.5.0 in the k8s-go-deps group (kubernetes-sigs#2339) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump github.com/docker/docker from 28.2.2+incompatible to 28.3.0+incompatible in the go-deps group (kubernetes-sigs#2340) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix: Fix re-retrieving object on retry (kubernetes-sigs#2337) * fix: Fix overriding error with patch call (kubernetes-sigs#2338) * fix: add missing rlock to disruption queue (kubernetes-sigs#2348) * test: allow e2e tests to output junit report (kubernetes-sigs#2334) Signed-off-by: Max Cao <[email protected]> * docs: Add Oracle Cloud Infrastructure (OCI) provider (kubernetes-sigs#2342) * fix: no longer allow the same hostname to take multiple capacity (kubernetes-sigs#2356) * feat: support auto relaxing min values (kubernetes-sigs#2299) * fix: update provider ID to ensure that Cloud Provider tests pass (kubernetes-sigs#2363) * fix: remove unsupported capacity_type label from karpenter_nodeclaims… (kubernetes-sigs#2364) * fix: update deletionTimestamp on terminating pods when after nodeDeletionTimestamp (kubernetes-sigs#2316) Co-authored-by: Amanuel Engeda <[email protected]> * chore: promote ReservedCapacity feature gate to beta (kubernetes-sigs#2365) * fix: flakiness in expiration tests (kubernetes-sigs#2366) * test: Bump the termination time for the deletion timestamp (kubernetes-sigs#2367) * chore: cherry-pick kubernetes-sigs#2399 (kubernetes-sigs#2401) --------- Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Max Cao <[email protected]> Co-authored-by: Amanuel Engeda <[email protected]> Co-authored-by: Jonathan Innis <[email protected]> Co-authored-by: Reed Schalo <[email protected]> Co-authored-by: DerekFrank <[email protected]> Co-authored-by: Jason Deal <[email protected]> Co-authored-by: Reed Schalo <[email protected]> Co-authored-by: Jonathan Innis <[email protected]> Co-authored-by: Todd Neal <[email protected]> Co-authored-by: Jigisha Patil <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Lê Minh Quân <[email protected]> Co-authored-by: Max Cao <[email protected]> Co-authored-by: Aidan Rowe <[email protected]> Co-authored-by: Daniel Lopes <[email protected]> Co-authored-by: Saurav Agarwalla <[email protected]> Co-authored-by: cosimomeli <[email protected]>
Fixes #N/A
Description
Adds status conditions for node drain and volume detachment to improve observability for the individual termination stages. This is a scoped down version of #1837, which takes these changes along with splitting each termination stage into a separate controller. I will continue to work on that refactor, but I'm decoupling to work on higher priority work.
Status Conditions:
Drained
VolumesDetached
Drained
transitions to true.How was this change tested?
make presubmit
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.