Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE: ocp-next #2148

Draft
wants to merge 2,813 commits into
base: master
Choose a base branch
from
Draft

DO NOT MERGE: ocp-next #2148

wants to merge 2,813 commits into from

Conversation

bertinatto
Copy link
Member

No description provided.

k8s-ci-robot and others added 30 commits November 7, 2024 23:33
…ard_reset

Client go port forward reset, error handling and tests
add warnings for cases one of projected volume types get overwritten by service account token
…nplace-resize-delay

[FG:InPlacePodVerticalScaling] bug(quota): handle resources changed on resource quota filter
[FG:InPlacePodVerticalScaling] kubelet: Propagate error in doPodResizeAction() to the caller
KEP-3926: unsafe deletion of corrupt objects
…-place-pod-vertical-scaling-version-skew

Updated version skew strategy for InPlacePodVerticalScaling
DRA: Implementation of ResourceClaim.Status.Devices (KEP-4817)
kubelet/kuberuntime: switch to runc/libct
Refactor: Move IsRestartableInitContainer to common utility package
1. Add Resources struct to PodSpec struct in both external and internal API packages
2. Adding feature gate and logic for dropping disabled fields for Pod Level Resources
KEP: enhancements/keps/sig-node/2837-pod-level-resource-spec
1. Add support for pod level resources in kubectl
2. Reuse the existing method to describe container resources and generalize it to describe both pod and container level resources
1. If pod-level limit is set, pod-level request is unset and container-level request is set: derive pod-level request from container-level requests
2. If pod-level limit is set, pod-level request is unset and container-level request is unset: set pod-level request equal to pod-level limit
1. The effective container requests cannot be greater than pod-level requests
2. Inidividual container limits cannot be greater than pod-level limits
3. Only CPU & Memory are supported at pod-level
4. Inplace container resources updates are not supported if pod-level resources are set
Note: effective container requests cannot be greater than pod-level limits is supported by transitivity. Effective container requests <= pod-level requests && pod-level requests <= pod-level limits; Therefore effective container requests <= pod-level limits

Signed-off-by: ndixita <[email protected]>
1. Use pod-level resource when feature is enabled and resources are set at pod-level
2. Edge case handling: When a pod defines only CPU or memory limits at pod-level (but not both), and container-level requests/limits are unset, the pod-level requests stay empty for the resource without a pod-limit. The container's request for that resource is then set to the default request value from schedutil.
1. Pod cgrooup configured to use resources from pod spec if feature is enabled and resources are set at pod-level
2. Container cgroup limits defaulted to pod-level limits is container limits are not set
Signed-off-by: ndixita <[email protected]>
bertinatto and others added 25 commits December 11, 2024 19:48
…util/managedfields

Some of the code we use in openshift-tests was recently made internal
in kubernetes#115065. This patch
exposes the code we need there.
…rnetes.default.svc, don't wait for aggregated availability
…roups

that have kinds that are served by both CRDs
and external apiservers (eg openshift-apiserver)

this includes:
- authorization.openshift.io (rolebindingrestrictions served by a CRD)
- security.openshift.io (securitycontextconstraints served by a CRD)
- quota.openshift.io (clusterresourcequotas served by a CRD)

By merging all sources, we ensure that kinds served by a CRD will have
openapi discovery and spec available even when openshift-apiserver is
unavailable.
…self-SARs that have user:check-access

Otherwise, the request will inherit any scopes that an access token might have
and the scopeAuthorizer will deny the access review if the scopes do not include
user:full
This commit renews openshift#327

What has changed compared to the original PR is:
- The retryClient interface has been adapted to storage.Interface.
- The isRetriableEtcdError method has been completely changed; it seems that previously the error we wanted to retry was not being retried. Even the unit tests were failing.

Overall, I still think this is not the correct fix. The proper fix should be added to the etcd client.

UPSTREAM: <carry>: retry etcd Unavailable errors

This is the second commit for the retry logic.
This commit adds unit tests and slightly improves the logging.

During a rebase squash with the previous one.

UPSTREAM: <carry>: retry_etcdclient: expose retry logic functionality

during rebase merge with: UPSTREAM: <carry>: retry etcd Unavailable errors
When a PerformanceProfile configures a node for cpu partitioning,
it also lets OVS use all the cpus available to burstable pods.
To be able to do that, OVS was moved to its own slice and that
slice needs to be re-added to cAdvisor for monitoring purposes.
Kubelet should advertise the shared cpus as extedned resources.
This has the benefit of limiting the amount of containers
that can request an access to the shared cpus.

For more information see - openshift/enhancements#1396

Signed-off-by: Talor Itzhak <[email protected]>
Adding a new mutation plugin that handles the following:

1. In case of `workload.openshift.io/enable-shared-cpus` request, it
   adds an annotation to hint runtime about the request. runtime
   is not aware of extended resources, hence we need the annotation.
2. It validates the pod's QoS class and return an error if it's not a
   guaranteed QoS class
3. It validates that no more than a single resource is being request.
4. It validates that the pod deployed in a namespace that has mixedcpus
   workloads allowed annotation.

For more information see - openshift/enhancements#1396

Signed-off-by: Talor Itzhak <[email protected]>

UPSTREAM: <carry>: Update management webhook pod admission logic

Updating the logic for pod admission to allow a pod creation with workload partitioning annotations to be run in a namespace that has no workload allow annoations.

The pod will be stripped of its workload annotations and treated as if it were normal, a warning annoation will be placed to note the behavior on the pod.

Signed-off-by: ehila <[email protected]>

UPSTREAM: <carry>: add support for cpu limits into management workloads

Added support to allow workload partitioning to use the CPU limits for a container, to allow the runtime to make better decisions around workload cpu quotas we are passing down the cpu limit as part of the cpulimit value in the annotation. CRI-O will take that information and calculate the quota per node. This should support situations where workloads might have different cpu period overrides assigned.

Updated kubelet for static pods and the admission webhook for regular to support cpu limits.

Updated unit test to reflect changes.

Signed-off-by: ehila <[email protected]>
…ject openshift feature gates into pkg/features

Signed-off-by: Swarup Ghosh <[email protected]>
This is a short term fix, once we improve the cert rotation logic
in library-go that does not depend on this hack, then we can
remove this carry patch.

squash with the previous PR during the rebase
openshift#1924

squash with the previous PRs during the rebase
openshift#1924
openshift#1929
…phase and graceful termination phase

This reverts commit 85f0f2c.
…navailable errors for the etcd health checker client

UPSTREAM: <carry>: replace newETCD3ProberMonitor with etcd3RetryingProberMonitor
This commit fixes bug 1919737.

https://bugzilla.redhat.com/show_bug.cgi?id=1919737

* pkg/proxy/iptables/proxier.go (syncProxyRules): Prefer a local endpoint
for the cluster DNS service.
There are cases when the kubelet is starting where networking, or other
components can cause the kubelet to not post the status with the bootId.
The failed status update will cause the Kubelet to queue the
NodeRebooted warning and sometimes cause many events to be created.

This fix wraps the recordEventFunc to only emit one message per kubelet
instantiation.
similarly to what we do for the managed CPU (aka workload partitioning)
feature, introduce a master configuration file
`/etc/kubernetes/openshift-llc-alignment` which needs to be present for
the LLC alignment feature to be activated, in addition to the policy
option being required.

Note this replace the standard upstream feature gate check.

This can be dropped when the feature per  KEP
kubernetes/enhancements#4800 goes beta.

Signed-off-by: Francesco Romani <[email protected]>
@openshift-ci-robot
Copy link

@bertinatto: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@bertinatto
Copy link
Member Author

/test all

@dusk125
Copy link

dusk125 commented Dec 12, 2024

/retest-required

Copy link

openshift-ci bot commented Dec 12, 2024

@bertinatto: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn dc0e758 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.