Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes required make a build after update of component-base #3004

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

mszadkow
Copy link
Contributor

@mszadkow mszadkow commented Feb 11, 2025

Why are these changes needed?

Kuberay in v1.3.0 is incompatible with Kueue v.0.10.1 and above, due to:

vendor/github.com/ray-project/kuberay/ray-operator/pkg/features/features.go:40:9: featuregatetesting.SetFeatureGateDuringTest(tb, utilfeature.DefaultFeatureGate, f, value) (no value) used as value
make: *** [build] Error 1

The change comes from the component-base version difference:
Kuberay has k8s.io/component-base v0.29.6
Kueue has k8s.io/component-base v0.32.1
a small but significant change in https://github.com/kubernetes/component-base/blob/264c1fd30132a3b36b7588e50ac54eb0ff75f26a/featuregate/testing/feature_gate.go#L47

Because of this we had to update component-base on Kuberay.

In result:

  • go and go-toolchain version was bumped to v1.23 (also in Dockerfiles)
  • k8s.io/api, k8s.io/apimachinery and k8s.io/client-go was bumped to v0.32.1
  • sigs.k8s.io/controller-runtime was bumped to v1.19.0
  • controller-gen was bumped to v0.16.5 as well as generation was migrated to use kube_codegen.sh

Related issue number

Relates to: kubernetes-sigs/kueue#3822

Checks

  • [*] I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • [*] Manual tests
    • This PR is not tested :(

@mszadkow mszadkow force-pushed the feature/upgrade-component-base-version branch 7 times, most recently from dbff897 to a43e436 Compare February 17, 2025 10:54
@mszadkow mszadkow marked this pull request as ready for review February 17, 2025 10:54
@mszadkow
Copy link
Contributor Author

/cc @mimowo @andrewsykim @kevin85421

@mszadkow mszadkow force-pushed the feature/upgrade-component-base-version branch 4 times, most recently from a10c58c to 70b35d6 Compare February 17, 2025 17:19
@mimowo
Copy link

mimowo commented Feb 18, 2025

LGTM

@andrewsykim @kevin85421 PTAL, this is currently blocking upgrade of KubeRay to 1.3 in Kueue, which is required for the MultiKueue integration (requiring the managedBy field).

I know this is a massive amount of changes, but all of them are due to the new code generation script, which actually simplifies the code generation: https://github.com/ray-project/kuberay/pull/3004/files#diff-791462c57818fbde5da46c5925c2b709d459724d4824513356aea9816167e893R26-R41.

@mimowo
Copy link

mimowo commented Feb 18, 2025

@mszadkow is there a way to build Kueue on CI using this commit to KubeRay to double check there are no other blockers to bump KubeRay in Kueue?

@mszadkow
Copy link
Contributor Author

@mszadkow is there a way to build Kueue on CI using this commit to KubeRay to double check there are no other blockers to bump KubeRay in Kueue?

In terms of compatibility now, I have been able to build Kueue with this feature branch version of Kuberay.
To test it I have added basic managedBy tests and they run and pass.
Seems we have no more blockers as of this point

@andrewsykim
Copy link
Collaborator

This LGTM, but given it bumps Go version and other deps, it would be great to get another set of eyes on this. @MortalHappiness @kevin85421 can you take a look as well pleas?

@andrewsykim
Copy link
Collaborator

Just realized that KubeRay v1.3 is not compatible with Kueue until this is merged. We're just about to cut v1.3 and it seems risky to include all the changes here, specifically the controller-gen bump. @mszadkow are all the changes in this PR required or is there a more minimal version of this that can work? Specifically worried about the bump to controller-gen that is changing the generated CRD yaml

func SetFeatureGateDuringTest(tb testing.TB, f featuregate.Feature, value bool) func() {
return featuregatetesting.SetFeatureGateDuringTest(tb, utilfeature.DefaultFeatureGate, f, value)
func SetFeatureGateDuringTest(tb testing.TB, f featuregate.Feature, value bool) {
featuregatetesting.SetFeatureGateDuringTest(tb, utilfeature.DefaultFeatureGate, f, value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing something here. I was expecting the function signature of featuregatetesting.SetFeatureGateDuringTest(tb, utilfeature.DefaultFeatureGate, f, value) to change here. Why do we need to bump dependencies if this doesn't change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has changed so we also adapt this wrapped to it.
In 0.29 this give you a cleanup function that you should call with defer, but since 0.31 it sets the cleanup function for you in ginkgo.
The underlying change is also that it detects parallel set to feature flags so this required change in tests

@mimowo
Copy link

mimowo commented Feb 18, 2025

@mszadkow may know more details but IIRC controller-gen is a transitive dependency of the core k8s, and it was refactored in 1.31 k8s. OTOH we need to align the core k8s version with Kueue to be able to compile.

@andrewsykim
Copy link
Collaborator

I think this would be too risky to add into v1.3 at this point.

To resolve kueue compilation issues, we can consider backport to v1.3.1 or wait til v1.4 as a last resort

@kevin85421 kevin85421 self-assigned this Feb 19, 2025
@mszadkow mszadkow force-pushed the feature/upgrade-component-base-version branch 3 times, most recently from 8ed48e1 to 71d36bf Compare February 20, 2025 07:38
@mszadkow mszadkow force-pushed the feature/upgrade-component-base-version branch from 71d36bf to 3211bb8 Compare February 21, 2025 08:15
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mszadkow, @owenowenisme noticed that it seems this PR didn't run make generate.

I cloned your fork and ran make generate with a small fix in update-codegen.sh, and the changed files are as follows:

image

@@ -557,14 +557,11 @@ var _ = Context("Inside the default namespace", func() {
headFilters := common.RayClusterHeadPodsAssociationOptions(rayCluster).ToListOptions()
allFilters := common.RayClusterAllPodsAssociationOptions(rayCluster).ToListOptions()

BeforeAll(func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the changes in this file manual or automatic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are manual.
Some tests required a refactoring, e.g. this change was required because it's no longer run in Describe but inside subject node It()

@mszadkow
Copy link
Contributor Author

/test buildkite/ray-ecosystem-ci-kuberay-ci/test-autoscaler-e2e-nightly-operator

@mszadkow
Copy link
Contributor Author

I will attempt to reproduce the issue of autoscaler e2e tests locally

@mszadkow mszadkow force-pushed the feature/upgrade-component-base-version branch from 994555a to 398d467 Compare February 24, 2025 15:24
@kevin85421
Copy link
Member

@mszadkow you can rebase with the master branch after #3100 is merged.

@kevin85421
Copy link
Member

#3100 has already been merged.

@mszadkow mszadkow force-pushed the feature/upgrade-component-base-version branch from 398d467 to b9c7073 Compare February 25, 2025 06:52
@mszadkow
Copy link
Contributor Author

@kevin85421 sure, rebased

Copy link

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after the recent changes and rebase.

What we could consider to make the diff smaller is:

  • make a small preapratory PR for ray-operator/controllers/ray/rayjob_controller_test.go
  • make a small preparatory PR for bumping golang to 1.23

However, this would mean more PRs to cherry-pick for 1.3.1, and it would not reduce the volume of the PR but much %, but raising this for consideration.

@andrewsykim @kevin85421 PTAL.

@kevin85421
Copy link
Member

Screenshot 2025-02-28 at 2 19 44 PM

@mszadkow would you mind rebasing with the master branch?

@mszadkow mszadkow force-pushed the feature/upgrade-component-base-version branch from b9c7073 to b07fb54 Compare March 3, 2025 07:06
@mszadkow
Copy link
Contributor Author

mszadkow commented Mar 3, 2025

thanks for approval @kevin85421, it's rebased now

@mszadkow mszadkow force-pushed the feature/upgrade-component-base-version branch from b07fb54 to dbb42bf Compare March 3, 2025 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants