Skip to content

🌱 Add test-operator with custom controller #2070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

perdasilva
Copy link
Contributor

@perdasilva perdasilva commented Jul 3, 2025

Depends on #2080

Description

Adds the scaffolding for the test-operator in versions v1 and v2 using operator-sdk. The test-operator is a basic echo service.

The test-operator supports AllNamespaces, SingleNamespace, and OwnNamespace install modes.
v1.0.0: is just the crd and the existing configmap resource
v1.3.0: has the same crd as v1.0.0 and the updated configmap previously part of v2.0.0
v2.0.0: has validating, mutating and conversion webhooks

I've had to modify an e2e that expected v2.0.0 of the test operator to be installed since v2 now includes webhooks and support is not enabled by default (yet). I've added a v1.3.0 as a stand-in for the previous v2 and updated the catalogs accordingly.

Because the test-operator now has its own controller, I've also had to update testdata/push to handle the controller image building/pushing

As it is, the test-operator should make a good fixture for testing the Single/OwnNamespace and Webhook support features.

Thing we should feel confident in doing in the future

  • Producing different kinds of bundles: helm, registry+v2, etc.
  • Adding more bundles (though we probably want to keep the general number low)

Risks

  • Dependency (CRD) between v1 and v2, i.e. changes in v1 might need to be carried forward to v2

v1

A reconciled test-operator v1 resource would look like this:

apiVersion: testolm.operator-framework.io/v1
kind: TestOperator
metadata:
  name: test-operator
spec:
  message: "hello"
status:
  echo: "hello"

The v1 operator also includes a validating and mutating webhook. The validating webhook validates .spec.message against a keyword and rejects admission if its found. The mutating webhook is used as a "defaulting" webhook that sets .spec.message to the default message of Echo.

v2

A reconciled test-operator v2 resource would look like this:

apiVersion: testolm.operator-framework.io/v2
kind: TestOperator
metadata:
  name: test-operator
spec:
  echoMessage: "hello"
status:
  echo: "hello"

The same webhooks are available for v2 apis + a conversion webhook to support the CRD upgrade.

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 3, 2025
Copy link

netlify bot commented Jul 3, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 8564214
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/686e43f95215c40008ee0af8
😎 Deploy Preview https://deploy-preview-2070--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link

codecov bot commented Jul 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.43%. Comparing base (eb4b979) to head (8564214).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2070      +/-   ##
==========================================
+ Coverage   73.33%   73.43%   +0.09%     
==========================================
  Files          77       77              
  Lines        7076     7076              
==========================================
+ Hits         5189     5196       +7     
+ Misses       1545     1540       -5     
+ Partials      342      340       -2     
Flag Coverage Δ
e2e 44.97% <ø> (+0.06%) ⬆️
experimental-e2e 51.17% <ø> (+0.01%) ⬆️
unit 58.39% <ø> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@perdasilva perdasilva changed the title In-tree test operator 🌱 In-tree test operator Jul 3, 2025
@perdasilva perdasilva force-pushed the test-operator branch 23 times, most recently from 3fb8ef7 to 693a355 Compare July 8, 2025 16:09
Per Goncalves da Silva added 3 commits July 8, 2025 18:23
Signed-off-by: Per Goncalves da Silva <[email protected]>
Signed-off-by: Per Goncalves da Silva <[email protected]>
@perdasilva perdasilva changed the title 🌱 In-tree test operator 🌱 Add test-operator with custom controller Jul 8, 2025
@perdasilva perdasilva marked this pull request as ready for review July 8, 2025 16:25
@perdasilva perdasilva requested a review from a team as a code owner July 8, 2025 16:25
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 8, 2025
@openshift-ci openshift-ci bot requested review from gavinmbell and OchiengEd July 8, 2025 16:25
@@ -246,11 +246,18 @@ test-unit: $(SETUP_ENVTEST) envtest-k8s-bins #HELP Run the unit tests
$(UNIT_TEST_DIRS) \
-test.gocoverdir=$(COVERAGE_UNIT_DIR)

TEST_OPERATOR_CONTROLLERS_HOME=./testdata/images/controllers
TEST_OPERATOR_CONTROLLERS=v1.0.0 v2.0.0
Copy link
Contributor

@tmshort tmshort Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this missing v1.3.0? Or is it not needed because it's not actually built?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, this is just for building the controller binaries. The 1.3.0 version uses the 1.0.0 controller.

Copy link
Contributor

@camilamacedo86 camilamacedo86 Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I did not understood it as well because

  • We need to have the "operator"
  • Then, we need run the commands from this project to generate the manager (it will be image)
  • Then, we need generate the bundle for each case scenario (using SDK or OPM for example)
  • Then, we need add those bundles to a catalogue
  • Then, create a Catalog Source for those

The 1.3.0 version uses the 1.0.0 controller
we have a verion that we use the bin generated by another?
I did not understood it

Copy link
Contributor Author

@perdasilva perdasilva Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to the currently existing e2e tests. The v1.3.0 bundle in the PR substitutes the v2.0.0 bundle in main [this] commit.

There's only a change in the included manifests (namely the configmap) between v1.0.0 and v1.3.0. I.e. v1.3.0 still uses the same controller as v1.0.0. This make target builds only the controllers. The bundle image itself is built in testdata/push - which is how it's currently done anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these tests even run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the Makefile, I don't think so. I wasn't too concerned though, they are fairly simplistic and the code will anyway be exercised by the e2es. Let me see if I can add them in the Makefile.

Copy link
Contributor Author

@perdasilva perdasilva Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added them here: https://github.com/operator-framework/operator-controller/pull/2070/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52R230

I don't want to pollute the o-c project's coverage numbers with test-operator unit tests. So, I've added a separate make target and CI unit test run step.

Though, I still don't know if its worth the CO2 emissions for something a) will rarely change b) will be exercised by the e2e tests anyway.

Comment on lines +211 to +213
if _, err := w.Write(fileBytes); err != nil {
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we concerned about filesize here? In that we only read/write once per file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I get it. I wouldn't be too worried about the size of the files we're packing. They tend to be relatively small text files with the exception of the controller binaries, which are currently around 50MB. We can't pipe the bytes directly from reader to writer because we need the file size for the header. I think all of this code will go away soon anyway (given our convo with Joe).

.PHONE: test-test-operator
test-test-operator:
KUBEBUILDER_ASSETS="$(shell $(SETUP_ENVTEST) use -p path $(ENVTEST_VERSION) $(SETUP_ENVTEST_BIN_DIR_OVERRIDE))" \
CGO_ENABLED=1 go test -tags '$(GO_BUILD_TAGS)' -count=1 -race -short ./testdata/images/bundles/test-operator/...
Copy link
Contributor

@camilamacedo86 camilamacedo86 Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! 👋

I think we should mock the scenarios just like our users would experience them. This means:

  • The flow to generate a project and a bundle should mirror what a real user does. ( create project with SDK or Kubebuilder, inject the code, run the commands and generate a BUNDLE )
  • We should have a script that automates those steps, so it's repeatable.

Therefore, we can keep our samples well-maintained.

Now, doing all this at once might be a bit much, so I’m totally fine starting with fixed samples, and then we can evolve toward full automation in follow-ups and discuss all properly.

However: We need to change the file structure:

We can’t organize it as bundles/test-operator | versions because:

  1. We don’t have bundles — we have the actual projects.
  2. The current path is strange — why are we putting projects under images/?
  3. ⚠️ The directory name matters — tools like Kubebuilder or the SDK treat that as the project name. So if we use v1.0.0 as the folder name, automation breaks.

Proposal

Could we change it here to be like:

testdata/operators/
  ├── v1.0.0/
  │    └── test-operator/
  ├── v1.3.0/
  │    └── test-operator/
  └── v2.0.0/
       └── test-operator/

This way:

  • Each version gets its own isolated test setup.
  • The project directory stays named test-operator, so tooling works as expected.
  • It’s clean, easy to automate, and mirrors user behavior more closely.

Let me know what you think!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By following my suggestion here for we have a real Operator that can be maintained via automation #2070 (comment), could we please use a name like memcached-operator instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda get what you're trying to do, but I don't think that's scalable or maintainable given the necessary controller, webhook, and crd code that will be required by the test-operators (also between versions of it), which will either require patching or copying over generated code, which can be brittle, error prone, and create problems that are annoying to debug and fix - the only benefit being that we'll know when the tools no longer work for building the fixtures.

The point of the test-operator isn't to exercise the steps involved in creating an operator and to make sure that works with OLMv1. We already have a test that exercises this developer flow. The point is to have a small, simple, test-operator which we own and can be used to exercise different features through out our e2e tests.

I'd prefer not to call it memcached-operator since it isn't. What's wrong with test-operator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raised problem here is with the path that does not allow we have those samples well maintained, well documented and well generated in the long term, and the names does not reflect to the reality, could we

testdata/operators/
  ├── v1.0.0/
  │    └── test-operator/
  ├── v1.3.0/
  │    └── test-operator/
  └── v2.0.0/
       └── test-operator/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why they can't be well maintained. It seems more natural to group the versions under the operator than the operator under the versions...?

Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @perdasilva

Great work 🥇

Following my main concerns/suggestions:

@@ -23,6 +23,10 @@ jobs:
run: |
make test-unit

- name: Run test-operator unit tests
run: |
make test-test-operator
Copy link
Contributor

@camilamacedo86 camilamacedo86 Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should introduce a new GitHub Action, something like:

test-testdata / test-samples

This job would be responsible for running tests that validate our test samples.

Why this helps:

  • Keeps test ownership clear — each job has a focused responsibility.
  • Keeps tests running in parallel, so we don’t slow down CI.
  • When something fails, it's easier to identify what broke and why — especially if the failure is specific to testdata samples.

Also, we would have a matrix to run all tests for all samples, calling the make test scaffold for the sample itself see examples:

If we do as our Operator Authors, and use the tool to properly generate the samples we will have have a Makefile in the sample that has the targets to run the tests. See, example: https://github.com/kubernetes-sigs/kubebuilder/blob/master/testdata/project-v4-with-plugins/Makefile#L60-L63

So, the github action to verify if samples are OK would call those directly and we should not created those in the Makefile of the project operator-controller itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the test over to another job. Don't know that it will be faster since it needs to be scheduled, cloned, go modded, etc. again...

Makefile Outdated
@@ -246,11 +246,24 @@ test-unit: $(SETUP_ENVTEST) envtest-k8s-bins #HELP Run the unit tests
$(UNIT_TEST_DIRS) \
-test.gocoverdir=$(COVERAGE_UNIT_DIR)


.PHONE: test-test-operator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO:
We should not have it
Instead we should have a real mock generated with the tools
and this mock will have a Makefile
And we call the targets available in the mock directly.
See, example: https://github.com/kubernetes-sigs/kubebuilder/blob/master/testdata/project-v4-with-plugins/Makefile#L60-L63

@@ -0,0 +1,137 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's build the samples with the tooling and have a more real case scenario that can be automatically well-maintained instead. Otherwise it will become a burn to keep those maintained

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kinds of changes do you foresee? I think the only thing I don't really like is that we don't generate the manifest automatically. I think this is something we can overcome in a follow-up. I'd prefer not to bootstrap entire operator projects as part of fixture building though.

- name: Run test-operator unit tests
run: |
make test-test-operator

- uses: codecov/[email protected]
Copy link
Contributor

@camilamacedo86 camilamacedo86 Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please use the tools to generate the mocks?
Note that we have a plugin that generates a project with controller implementation, controller tests, which are a sample/mock, much better than this test operator, which has zero implementation.

Following the steps:

  • operator-sdk | kubebuilder init
  • operator-sdk | kubebuilder create api --group example.com --version v1alpha1 --kind Memcached --image=memcached:1.6.26-alpine3.19 --image-container-command="memcached,--memory-limit=64,-o,modern,-v" --image-container-port="11211" --run-as-user="1001" --plugins="deploy-image/v1-alpha"
  • make all

That will generate a FULL example.

However, if we use kubebuilder to do it, Kubuilder already validate the mocks, so we would not need to have an action to test them out; we could skip that.

Furthermore, in the future, we can indeed mock webhooks with the tooling, and a conversion, for example, is almost 100% scaffolded with the tool. Example

 # Create API to check webhook --conversion from v1 to v2
    $kubebuilder create api --group example.com --version v1 --kind Wordpress --controller=true --resource=true 
    $kubebuilder create api --group example.com --version v2 --kind Wordpress --controller=false --resource=true
    $kubebuilder create webhook --group example.com --version v1 --kind Wordpress --conversion --spoke v2

The above will result in all scaffolds doing the conversion webhook with the spoke approach.

(I would suggest we use kubebuilder and SDK to do the bundle only as the users have been doing after SDK deprecation) In the future, we replace SDK to use KPM ( new tooling to do bundle only ) and we have lower maintainance burn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I answered this in the previous comment, but I'll re-iterate. The point of the test-operator isn't to exercise the tools required to generate an operator. I think this approach works for simple cases, but as soon as you have custom code: reconciler implementations, webhook implementations, changes to CRDs it will become brittle and horrible to maintain as it inevitably will require either code overwriting or patching. Also, given that we already have a test that exercises the dev workflow, I don't know what this buys us...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also rather not use the memcached implementation since part of the exercise it to use something we own and have control over.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal here is not to just exercise tooling at all.
The idea is to create a sample in the same way users would — with real-case scenarios —
and ensure they are maintainable over time via automation.
This approach allows us to:

  • Keep samples well maintained through automation
  • Clearly show what changed from one version to another
  • Easily understand the generation process by reading the code
  • Update samples quickly and at scale (without manual work)

We know these samples will need updates. Doing that manually doesn't scale.
With automation, we can:

  • Make changes safely and consistently
  • Always retain control over the output
  • Keep every change explicit — both what changed and why

Example By looking at the automation, we can see:

  • What code do we inject ( with a description of why )
  • What commands were called ( we create webhooks and etc with the tooling, save time, save the burn to manually maintain, ensure that we are doing as the huge majority of our users )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to go down this path. So, I'll close this and focus on something else.

Copy link

openshift-ci bot commented Jul 9, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from camilamacedo86. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@perdasilva
Copy link
Contributor Author

perdasilva commented Jul 9, 2025

Hi @perdasilva

Great work 🥇

Following my main concerns/suggestions:

* 1. Could we please ensure that the paths are better organised and allow automation: https://github.com/operator-framework/operator-controller/pull/2070/files#r2194412567 before we merge this one?

* 2. Could we please use the tool to generate a mock sample so that we have a functional Operator with a real-world case that we can expand upon? By utilising the tools, we can automate many aspects and help ensure that the solution is well-maintained in the long term. See;  [#2070 (comment)](https://github.com/operator-framework/operator-controller/pull/2070#discussion_r2194498704)

Thanks ^^

I'd rather not make the developer workflow something that's in the critical path of creating our fixtures. Part of the motivation was to have control over the operator (so, I'd rather not use memcached). The point of the fixtures is the test olm and not the developer workflow + commingling custom and generated code is brittle. So, I'd rather not pursue this path if we can avoid it.

@perdasilva perdasilva requested a review from camilamacedo86 July 9, 2025 10:17
Signed-off-by: Per Goncalves da Silva <[email protected]>
@@ -246,11 +246,24 @@ test-unit: $(SETUP_ENVTEST) envtest-k8s-bins #HELP Run the unit tests
$(UNIT_TEST_DIRS) \
-test.gocoverdir=$(COVERAGE_UNIT_DIR)


.PHONY: test-test-operator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a function of the Operator/Mock sample itself.
It should not be part of the operator-controller Makefile at all — it belongs inside the sample.

Placing it in the main Makefile goes against important separation-of-concerns principles.
It introduces coupling that will make these components harder to maintain over time.
Keeping this logic self-contained in the sample is key to preserving clarity and long-term maintainability.

@@ -1,10 +1,6 @@
annotations:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the bundle generated?
What happens when we change the mock sample?
Shouldn't the bundle be re-generated to reflect those changes?

This is exactly why we need automation in place — to ensure that:

  • When we push a PR, the samples are properly updated
  • The bundle reflects those changes accurately
  • We avoid manual steps that are error-prone and hard to maintain

Without automation, it's too easy for samples and bundles to get out of sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants