Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support TargetLoadPacking strategy #1087

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

binacs
Copy link
Member

@binacs binacs commented Mar 11, 2023

In line with the load aware scheduling plugin trimaran/targetloadpacking, we can support TargetLoadPacking strategy in the descheduler to perform evictions based on real node utilization.

This is a beginner version that works, and I look forward to everyone's review comments to make it better!

Hope it helps to solve the issue #225.

I'll be working on migrating trimaran/loadvariationriskbalancing shortly.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 11, 2023
@k8s-ci-robot k8s-ci-robot requested a review from damemi March 11, 2023 02:53
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ingvagabund for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@binacs binacs force-pushed the binacs/support-targetloadpacking branch 4 times, most recently from e3e72e5 to 2a3c7b7 Compare March 11, 2023 07:17
@knelasevero
Copy link
Contributor

Hey @binacs, thanks for the contrib!

We want to avoid adding custom plugins like that one here in the main repo. We plan to do the same approach as the kube-scheduler and have a separate repo to host all the out-of-tree plugins that we may add in the future.

If you wouldn't mind waiting for when we are ready to create that other repo, we could review and merge this one there.

@binacs
Copy link
Member Author

binacs commented Mar 13, 2023

@knelasevero Sounds great! We can review and merge in that repo, and please let me know if there is anything I can do to help.

@knelasevero
Copy link
Contributor

Hey! I raised an issue that will serve as a place for us to discuss all this:

#1089

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 3, 2023
@AllenZMC
Copy link
Contributor

AllenZMC commented Apr 5, 2023

What is the difference with this PR(#1092), is the effect the same?

@binacs
Copy link
Member Author

binacs commented Apr 5, 2023

What is the difference with this PR(#1092), is the effect the same?

@AllenZMC IMHO: They both aim to reschedule based on real-time utilization, but the principles are different.

  • real utilzation descheduler #1092 straightforwardly uses the real node utilization for rescheduling, but because of the lack of a corresponding scheduling plugin, the scheduler and rescheduler may work in opposite directions.

  • TargetLoadPacking strategy will converts the real node utilization into a score using the same calculation method as the scheduling plugin trimaran/targetloadpacking, and finally achieves cooperation with the scheduler.

@binacs binacs force-pushed the binacs/support-targetloadpacking branch from 2a3c7b7 to f234922 Compare April 23, 2023 13:34
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 23, 2023
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 25, 2023
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wang-xiaowu
Copy link

hi, Is there any progress of this pr?
i am looking forward

@binacs
Copy link
Member Author

binacs commented Dec 21, 2023

@wang-xiaowu The community has not yet decided on how to organize and maintain the new plugins, so this MR cannot be merged yet.

@wang-xiaowu
Copy link

@wang-xiaowu The community has not yet decided on how to organize and maintain the new plugins, so this MR cannot be merged yet.

i see, this is an exciting function, hope it can be integrated into the descheduler as soon as possible

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 20, 2024
@binacs
Copy link
Member Author

binacs commented Mar 24, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 24, 2024
@fanhaouu
Copy link
Contributor

I think we should not only consider the upper limit value->targetUtilization, but also have a lower limit value->lowUtilization. If there are no nodes in the cluster below the lowUtilization threshold or if the number of nodes below this threshold is too small, eviction operations should not be performed. This logic is similar to the nodeutilization plugin, which essentially changes the way node usage is calculated, while other logics can be reused.

@rajgani
Copy link

rajgani commented May 14, 2024

Does it support excluding namespaces for pod eviction from Descheduler while using TargetLoadPacking plugin ?

@ingvagabund
Copy link
Contributor

@binacs @fanhaouu @wang-xiaowu would either of you be interested in writing a proposal for the out-of-tree plugins? We are still restricted by time. I have one proposal in progress. Though I can help to guide you in the right direction. We need someone who will put all the discussions and use cases for it into a single place. Starting with a google doc and slowly extending it as we go and resume the discussions. Any takers?

@binacs
Copy link
Member Author

binacs commented May 16, 2024

If there are no nodes in the cluster below the lowUtilization threshold or if the number of nodes below this threshold is too small, eviction operations should not be performed.

@fanhaouu make sense for me

@binacs
Copy link
Member Author

binacs commented May 16, 2024

Does it support excluding namespaces for pod eviction from Descheduler while using TargetLoadPacking plugin ?

@rajgani yes of course. we will follow the Evictor().Filter and it will apply to all plugins (not only TargetLoadPacking plugin)

@fanhaouu
Copy link
Contributor

fanhaouu commented May 16, 2024

@binacs @fanhaouu @wang-xiaowu would either of you be interested in writing a proposal for the out-of-tree plugins? We are still restricted by time. I have one proposal in progress. Though I can help to guide you in the right direction. We need someone who will put all the discussions and use cases for it into a single place. Starting with a google doc and slowly extending it as we go and resume the discussions. Any takers?

I am happy to do this. Within 9 days, I will first complete a draft document, then I will share it for everyone to freely comment on.

@binacs
Copy link
Member Author

binacs commented May 16, 2024

hi @ingvagabund , I'm interested in how to dealing with out-of-tree plugins (not only load aware plugin). I will formulate a proposal soon.

@ingvagabund
Copy link
Contributor

@fanhaouu @binacs thank you

@fanhaouu
Copy link
Contributor

@fanhaouu @binacs thank you

Mark, I'll start writing the KEP now.✌️

@ingvagabund

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 25, 2024
@binacs
Copy link
Member Author

binacs commented Aug 25, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 25, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 27, 2024
@binacs
Copy link
Member Author

binacs commented Nov 30, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 30, 2024
@k8s-ci-robot
Copy link
Contributor

@binacs: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-descheduler-test-e2e-k8s-master-1-28 f234922 link true /test pull-descheduler-test-e2e-k8s-master-1-28
pull-descheduler-test-e2e-k8s-master-1-29 f234922 link true /test pull-descheduler-test-e2e-k8s-master-1-29
pull-descheduler-test-e2e-k8s-master-1-30 f234922 link true /test pull-descheduler-test-e2e-k8s-master-1-30
pull-descheduler-test-e2e-k8s-master-1-31 f234922 link true /test pull-descheduler-test-e2e-k8s-master-1-31
pull-descheduler-test-e2e-k8s-master-1-32 f234922 link true /test pull-descheduler-test-e2e-k8s-master-1-32

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants