chore: Optimize event controller cache hydration for high-volume clusters #124

engedaam · 2025-01-24T01:12:17Z

*Issue N/A

Problem:
When dealing with clusters that generate high volumes of events, the initial cache hydration in controller-runtime creates significant API Server load. This occurs because controller-runtime performs a LIST operation on all historical events during startup.

Solution:
Since we only care about events that occur after our controller starts, we're optimizing this by:

Removing the initial LIST call performed by controller-runtime
Implementing a direct watch on event objects
Processing only new events from controller start time

Benefits:

Reduces API server load during controller startup
Eliminates unnecessary processing of historical events
Improves API Server performance in large clusters running the events controller

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

events/controller.go

singleton/controller.go

events/suite_test.go

singleton/controller.go

jonathan-innis · 2025-02-03T06:08:19Z

Small nit just about commit cleanliness: You should maybe consider squashing commits so you don't end-up with a 22-commit PR -- doesn't really matter since we end-up squashing commits in the PR anyways, but something to note for other repos that might use rebase merges

events/controller.go

jonathan-innis · 2025-02-07T05:56:25Z

events/suite_test.go

+	client := fake.NewSimpleClientset()
+	eventChannel = make(chan watch.Event, 1000)
+	controller = events.NewController[*test.CustomObject](ctx, kubeClient, fakeClock, client)
+	controller.EventWatchChannel = eventChannel


jonathan-innis · 2025-02-07T06:10:33Z

events/controller.go


-func (c *Controller[T]) Reconcile(ctx context.Context, event *v1.Event) (reconcile.Result, error) {
+func (c *Controller[T]) Reconcile(ctx context.Context) (reconcile.Result, error) {
+	e := <-c.EventWatchChannel


I just took a look at the underlying code and the implementation of the ResultChan uses an unbuffered channel -- that means new writes are going to get blocked on reads here -- I'm concerned that we may block the watch stream if we don't either pull these events into a buffered channel OR use multiple goroutines to process these events.

I'm trying to think of a good way to handle this while keeping the same interface so that we keep the same logging behavior and metrics -- nothing comes to mind immediately, but I think it's something that we should explore. Minimally, we can kick off a goroutine in the register that pulls data off of the watch channel and puts it in a separate buffered channel -- another option is to keep a static cache of events as we see them, enqueue a reconcile.Request and then read from that local cache (this is basically exactly what controller-runtime does with a read cache without the deletions)

jonathan-innis

LGTM 🚀

Accidentally submitted approval -- meant to just submit comments

engedaam requested a review from a team as a code owner January 24, 2025 01:12

engedaam changed the title ~~chore: Avoid Event List call by using Clinet go Watcher~~ chore: Remove Event List Call from controller-runtime by using Clinet go Watcher Jan 24, 2025

jonathan-innis reviewed Jan 24, 2025

View reviewed changes

events/controller.go Outdated Show resolved Hide resolved

singleton/controller.go Outdated Show resolved Hide resolved

singleton/controller.go Outdated Show resolved Hide resolved

rschalo reviewed Jan 24, 2025

View reviewed changes

events/suite_test.go Show resolved Hide resolved

engedaam force-pushed the use-clientgo-watcher branch 2 times, most recently from a882103 to ce1c5cc Compare January 25, 2025 00:06

jonathan-innis reviewed Jan 27, 2025

View reviewed changes

singleton/controller.go Outdated Show resolved Hide resolved

singleton/controller.go Outdated Show resolved Hide resolved

singleton/controller.go Outdated Show resolved Hide resolved

engedaam changed the title ~~chore: Remove Event List Call from controller-runtime by using Clinet go Watcher~~ chore: Remove Event List Call from controller-runtime by using Client go Watcher Jan 27, 2025

engedaam changed the title ~~chore: Remove Event List Call from controller-runtime by using Client go Watcher~~ chore: Remove Event List Call from controller-runtime by using Client-go Watcher Jan 27, 2025

engedaam changed the title ~~chore: Remove Event List Call from controller-runtime by using Client-go Watcher~~ chore: Optimize event controller cache hydration for high-volume clusters Jan 27, 2025

jonathan-innis reviewed Feb 3, 2025

View reviewed changes

events/controller.go Outdated Show resolved Hide resolved

engedaam force-pushed the use-clientgo-watcher branch from 7b05b23 to 6e47626 Compare February 3, 2025 17:25

engedaam added 17 commits February 3, 2025 09:54

Avoid Event cache hydration by using clinet go watcher

062432a

Fix conditional

1b94c3b

Update tests

fb096da

Add kubebuilder installation for presubmit

6d06c8b

Fix file name typo

d39b60d

Add setup-envtest

0289cab

Clean-up for presubmit

9ab9d62

Clean-up for presubmit

e7be7d0

Adding

cc8fc7d

Removed the source queue and pulled object from the channel

cfb097f

Removed the source queue and pulled object from the channel

3e10d24

Removed the source queue and pulled object from the channel

f8ec3e9

Removed the source queue and pulled object from the channel

589d83c

Removed the source queue and pulled object from the channel

df22a8c

Removed the source queue and pulled object from the channel

8025df2

Drop controller-runtime deps

13bc543

Drop controller-runtime deps

a4cf242

engedaam added 4 commits February 3, 2025 09:54

Drop controller-runtime deps

daf6a22

Drop controller-runtime deps

3a1c475

Drop controller-runtime deps

8a1bc5e

Drop controller-runtime deps

56c5c39

engedaam force-pushed the use-clientgo-watcher branch from 3e6e5c0 to 56c5c39 Compare February 3, 2025 17:55

jonathan-innis reviewed Feb 7, 2025

View reviewed changes

jonathan-innis previously approved these changes Feb 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: Optimize event controller cache hydration for high-volume clusters #124

chore: Optimize event controller cache hydration for high-volume clusters #124

Uh oh!

engedaam commented Jan 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonathan-innis commented Feb 3, 2025

Uh oh!

Uh oh!

jonathan-innis Feb 7, 2025

Uh oh!

jonathan-innis Feb 7, 2025

Uh oh!

jonathan-innis left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore: Optimize event controller cache hydration for high-volume clusters #124

Are you sure you want to change the base?

chore: Optimize event controller cache hydration for high-volume clusters #124

Uh oh!

Conversation

engedaam commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonathan-innis commented Feb 3, 2025

Uh oh!

Uh oh!

jonathan-innis Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-innis Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-innis left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

engedaam commented Jan 24, 2025 •

edited

Loading