Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Optimize event controller cache hydration for high-volume clusters #124

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

engedaam
Copy link
Contributor

@engedaam engedaam commented Jan 24, 2025

*Issue N/A

Problem:
When dealing with clusters that generate high volumes of events, the initial cache hydration in controller-runtime creates significant API Server load. This occurs because controller-runtime performs a LIST operation on all historical events during startup.

Solution:
Since we only care about events that occur after our controller starts, we're optimizing this by:

  1. Removing the initial LIST call performed by controller-runtime
  2. Implementing a direct watch on event objects
  3. Processing only new events from controller start time

Benefits:

  • Reduces API server load during controller startup
  • Eliminates unnecessary processing of historical events
  • Improves API Server performance in large clusters running the events controller

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@engedaam engedaam requested a review from a team as a code owner January 24, 2025 01:12
@engedaam engedaam changed the title chore: Avoid Event List call by using Clinet go Watcher chore: Remove Event List Call from controller-runtime by using Clinet go Watcher Jan 24, 2025
@engedaam engedaam force-pushed the use-clientgo-watcher branch 2 times, most recently from a882103 to ce1c5cc Compare January 25, 2025 00:06
@engedaam engedaam changed the title chore: Remove Event List Call from controller-runtime by using Clinet go Watcher chore: Remove Event List Call from controller-runtime by using Client go Watcher Jan 27, 2025
@engedaam engedaam changed the title chore: Remove Event List Call from controller-runtime by using Client go Watcher chore: Remove Event List Call from controller-runtime by using Client-go Watcher Jan 27, 2025
@engedaam engedaam changed the title chore: Remove Event List Call from controller-runtime by using Client-go Watcher chore: Optimize event controller cache hydration for high-volume clusters Jan 27, 2025
@jonathan-innis
Copy link
Contributor

Small nit just about commit cleanliness: You should maybe consider squashing commits so you don't end-up with a 22-commit PR -- doesn't really matter since we end-up squashing commits in the PR anyways, but something to note for other repos that might use rebase merges

@engedaam engedaam force-pushed the use-clientgo-watcher branch from 7b05b23 to 6e47626 Compare February 3, 2025 17:25
@engedaam engedaam force-pushed the use-clientgo-watcher branch from 3e6e5c0 to 56c5c39 Compare February 3, 2025 17:55
client := fake.NewSimpleClientset()
eventChannel = make(chan watch.Event, 1000)
controller = events.NewController[*test.CustomObject](ctx, kubeClient, fakeClock, client)
controller.EventWatchChannel = eventChannel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

}

func (c *Controller[T]) Reconcile(ctx context.Context, event *v1.Event) (reconcile.Result, error) {
func (c *Controller[T]) Reconcile(ctx context.Context) (reconcile.Result, error) {
e := <-c.EventWatchChannel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just took a look at the underlying code and the implementation of the ResultChan uses an unbuffered channel -- that means new writes are going to get blocked on reads here -- I'm concerned that we may block the watch stream if we don't either pull these events into a buffered channel OR use multiple goroutines to process these events.

I'm trying to think of a good way to handle this while keeping the same interface so that we keep the same logging behavior and metrics -- nothing comes to mind immediately, but I think it's something that we should explore. Minimally, we can kick off a goroutine in the register that pulls data off of the watch channel and puts it in a separate buffered channel -- another option is to keep a static cache of events as we see them, enqueue a reconcile.Request and then read from that local cache (this is basically exactly what controller-runtime does with a read cache without the deletions)

jonathan-innis
jonathan-innis previously approved these changes Feb 7, 2025
Copy link
Contributor

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@jonathan-innis jonathan-innis dismissed their stale review February 7, 2025 06:11

Accidentally submitted approval -- meant to just submit comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants