Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPCT-332: introducing 'dedicated controller' to mutate pods failed to schedule #160

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

mtulio
Copy link
Contributor

@mtulio mtulio commented Mar 6, 2025

What this PR does / why we need it:

This change is proposing a controller to watch the e2e pods failed to schedule, mutating those to apply the OPCT dedicated toleration, targeting to prevent false-positive failures in the conformance results due the dedicated node introduced as required by OPCT.

The controller is started by a command adm e2e-dedicated controller, distributed with the same image of opct. The deployment will be created in opct namespace when the command run is executed. The destroy flow will remove all resources in opct namespace, not requiring a dedicated logic to remove the controller deployment.

Additionally the command adm setup-node is being deprecated moved to adm e2e-dedicated taint-node.

Tests: openshift/release#62503

Done checklist:

  • Validate if the tests failed to schedule due dedicated node have been resolved
  • Validate CI job using this feature
  • Reproduce if bugs linked to the Epic* have been resolved
  • Collect the controller logs (currently only the object is collected - default behavior)
  • Refine/review scope of Epic*
  • Cleanup code and commits

*Epic OPCT-332

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@openshift-ci openshift-ci bot requested review from jcpowermac and rvanderp3 March 6, 2025 20:01
Copy link

openshift-ci bot commented Mar 6, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from mtulio. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mtulio mtulio marked this pull request as draft March 6, 2025 20:01
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 6, 2025
@mtulio mtulio added kind/feature Categorizes issue or PR as related to a new feature. target-release/opct-v0.6 labels Mar 6, 2025
@mtulio mtulio force-pushed the e2e-dedicated-controller branch from e442ed7 to d4ccfeb Compare March 6, 2025 20:08
@mtulio mtulio force-pushed the e2e-dedicated-controller branch from d4ccfeb to c2aa784 Compare March 6, 2025 20:10
@mtulio mtulio changed the title introducing e2e dedicated controller to mutate tests failed to schedule OPCT-TBD: introducing e2e dedicated controller to mutate tests failed to schedule Mar 6, 2025
@mtulio mtulio force-pushed the e2e-dedicated-controller branch from 1d6aa70 to b78e7eb Compare March 6, 2025 21:08
@mtulio mtulio requested a review from Copilot March 6, 2025 21:32
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR introduces a dedicated e2e controller to mutate pods that fail to schedule due to dedicated node configuration, preventing false-positive test failures.

  • Renames and refactors the node setup command to "taint-node" under the e2ed package.
  • Adds a new deployment creation for the e2e-dedicated controller in the run package.
  • Introduces the new controller command with informer logic to automatically update pods with the required toleration.

Reviewed Changes

File Description
pkg/cmd/adm/e2ed/taintNode.go Refactors the node setup command and updates help text.
pkg/run/run.go Adds deployment creation logic for the dedicated e2e controller.
pkg/cmd/adm/e2ed/root.go Registers the new e2e-dedicated commands.
pkg/cmd/adm/e2ed/controller.go Implements controller logic to watch and mutate unscheduled pods.
pkg/types.go Introduces constants for controller image and dedicated controller name.
pkg/cmd/adm/root.go Registers the new e2e-dedicated command in the admin command tree.

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


You can automatically accept the selected node with the option --yes.

Alternativaly you can set the custom node with the option --node <node-name>.
Copy link
Preview

Copilot AI Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider correcting the typo 'Alternativaly' to 'Alternatively'.

Suggested change
Alternativaly you can set the custom node with the option --node <node-name>.
Alternatively you can set the custom node with the option --node <node-name>.

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Comment on lines +50 to +51
var podCounters map[string]*podMutateStatus

Copy link
Preview

Copilot AI Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

podCounters is accessed from multiple goroutines without synchronization, which may lead to race conditions. Consider using a mutex or a concurrent-safe map to protect its access.

Suggested change
var podCounters map[string]*podMutateStatus
var (
podCounters = make(map[string]*podMutateStatus)
podCountersMutex sync.Mutex
)

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
@mtulio mtulio changed the title OPCT-TBD: introducing e2e dedicated controller to mutate tests failed to schedule OPCT-332: introducing e2e dedicated controller to mutate tests failed to schedule Mar 7, 2025
@mtulio mtulio changed the title OPCT-332: introducing e2e dedicated controller to mutate tests failed to schedule OPCT-332: introducing 'dedicated controller' to mutate tests failed to schedule Mar 7, 2025
@mtulio mtulio changed the title OPCT-332: introducing 'dedicated controller' to mutate tests failed to schedule OPCT-332: introducing 'dedicated controller' to mutate pods failed to schedule Mar 7, 2025
@mtulio mtulio force-pushed the e2e-dedicated-controller branch from d3cc001 to 4ce0190 Compare March 7, 2025 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. target-release/opct-v0.6
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants