Skip to content

Add WatchList support with GenericListWatcher abstraction#10187

Open
liuxu623 wants to merge 1 commit intoprojectcalico:masterfrom
liuxu623:watch-list
Open

Add WatchList support with GenericListWatcher abstraction#10187
liuxu623 wants to merge 1 commit intoprojectcalico:masterfrom
liuxu623:watch-list

Conversation

@liuxu623
Copy link
Contributor

@liuxu623 liuxu623 commented Apr 10, 2025

Description

This PR introduces Kubernetes WatchList synchronization mode support and refactors the list-watch logic into a reusable GenericListWatcher abstraction. WatchList mode provides more efficient initial synchronization by streaming resources through watch events instead of loading all resources at once via the List API.

Motivation

The traditional List+Watch pattern has a significant limitation: during initial synchronization, the entire dataset must be loaded into memory via a single List API call. For large clusters with thousands of resources, this creates:

  1. Memory pressure spikes during initial sync
  2. Longer time-to-first-sync as the entire list must complete before processing begins
  3. Potential OOM issues in resource-constrained environments

The WatchList feature (GA in Kubernetes 1.32) addresses these issues by streaming initial data incrementally through watch events.

Refactoring Summary

This PR performs a significant refactoring of the list-watch infrastructure:

Before Refactoring

watcherCache (500+ lines)
├── List/Watch logic embedded
├── Error handling mixed with event processing
├── Retry/throttle logic duplicated
├── oldResources map for deletion detection
└── Tightly coupled to k8s client

After Refactoring

watcherCache (278 lines, -45%)
├── Implements EventHandler interface only
├── Clean event handling: OnAdd/OnUpdate/OnDelete/OnSync
├── resyncEpoch for deletion detection
└── Delegates list-watch to Client.ListAndWatch()

GenericListWatcher (new)
├── Common retry/throttle logic
├── Revision tracking
├── Error counting
└── Shared by k8s and etcd backends

k8s.ListWatcher (new)
├── WatchList mode support
├── Bookmark handling
├── Automatic fallback to List+Watch
└── K8s-specific error handling

etcdv3.ListWatcher (new)
├── Traditional List+Watch
└── etcd-specific error handling

Design

Architecture

flowchart TB
    subgraph WatcherSyncer["WatcherSyncer Layer"]
        WS[WatcherSyncer<br/>Coordinates multiple watchers]
        WC1[watcherCache #1<br/>e.g., Pods]
        WC2[watcherCache #2<br/>e.g., Services]
        WC3[watcherCache #N<br/>e.g., NetworkPolicies]
        WS --> WC1
        WS --> WC2
        WS --> WC3
    end

    subgraph EventHandling["Event Handling"]
        EH{{EventHandler Interface}}
        EH1[OnResyncStarted]
        EH2[OnAdd / OnUpdate / OnDelete]
        EH3[OnSync]
        EH4[OnError]
        EH --> EH1
        EH --> EH2
        EH --> EH3
        EH --> EH4
    end

    subgraph ClientInterface["api.Client Interface"]
        LIST[List]
        WATCH[Watch]
        LAW[ListAndWatch<br/>NEW in this PR]
    end

    subgraph K8sBackend["Kubernetes Backend"]
        K8SC[k8s.KubeClient]
        K8SLW[k8s.ListWatcher]
        
        subgraph K8sModes["Sync Modes"]
            WLM[WatchList Mode<br/>K8s 1.32+ GA]
            LWM[List+Watch Mode<br/>Fallback]
        end
        
        K8SC --> K8SLW
        K8SLW --> WLM
        K8SLW -.->|fallback| LWM
    end

    subgraph EtcdBackend["etcd Backend"]
        ETCDC[etcdv3.Client]
        ETCDLW[etcdv3.ListWatcher]
        ETCDC --> ETCDLW
        ETCDLW --> LWM2[List+Watch Mode]
    end

    subgraph GenericLW["GenericListWatcher (Shared)"]
        GLW[Common Logic]
        GLW1[Retry Throttling<br/>MinResyncInterval]
        GLW2[Revision Tracking<br/>CurrentRevision]
        GLW3[Error Counting<br/>MaxErrorsPerRevision]
        GLW4[Connection Timeout<br/>WatchRetryTimeout]
        GLW5[Event Loop<br/>LoopReadingFromWatcher]
        GLW --> GLW1
        GLW --> GLW2
        GLW --> GLW3
        GLW --> GLW4
        GLW --> GLW5
    end

    subgraph DataSources["Data Sources"]
        ETCD[(etcd)]
        CACHE[(API Server<br/>Watch Cache)]
    end

    WC1 -.->|implements| EH
    WC2 -.->|implements| EH
    WC3 -.->|implements| EH

    WC1 -->|calls| LAW
    WC2 -->|calls| LAW
    WC3 -->|calls| LAW

    LAW --> K8SC
    LAW --> ETCDC

    K8SLW --> GLW
    ETCDLW --> GLW

    WLM -->|rv=empty| CACHE
    WLM -->|streaming| CACHE
    LWM -->|rv=empty| ETCD
    LWM2 -->|List+Watch| ETCD

    style LAW fill:#9f9,stroke:#393
    style WLM fill:#9cf,stroke:#369
    style GLW fill:#fc9,stroke:#963
Loading

Data Flow

sequenceDiagram
    participant WS as WatcherSyncer
    participant WC as watcherCache
    participant Client as api.Client
    participant LW as ListWatcher
    participant GLW as GenericListWatcher
    participant Server as K8s/etcd

    WS->>WC: run(ctx)
    WC->>Client: ListAndWatch(ctx, list, handler)
    Client->>LW: create ListWatcher
    LW->>GLW: embed GenericListWatcher
    
    loop Main Loop
        GLW->>GLW: RetryThrottleC()
        GLW->>LW: PerformInitialSync()
        
        alt WatchList Mode (K8s)
            LW->>WC: OnResyncStarted()
            LW->>Server: Watch(SendInitialEvents=true)
            loop Streaming Events
                Server-->>LW: WatchAdded/Modified/Deleted
                LW->>GLW: UpdateRevision()
                LW->>WC: OnAdd/OnUpdate/OnDelete
            end
            Server-->>LW: Bookmark(InitialEvents=true)
            LW->>WC: OnSync()
        else List+Watch Mode (etcd/fallback)
            LW->>WC: OnResyncStarted()
            LW->>Server: List()
            Server-->>LW: KVPairList
            loop Send Events
                LW->>WC: OnAdd(kvp)
            end
            LW->>WC: OnSync()
            LW->>Server: Watch(revision)
        end
        
        loop Watch Loop
            Server-->>LW: Events
            LW->>GLW: HandleBasicWatchEvent()
            GLW->>WC: OnAdd/OnUpdate/OnDelete
        end
        
        alt Error Occurred
            Server-->>LW: Error
            LW->>GLW: HandleWatchError()
            GLW->>GLW: IncrementErrorCount / ResetForFullResync
        end
    end
Loading

Key Interfaces

EventHandler Interface

type EventHandler interface {
    OnResyncStarted()           // Called before resync begins
    OnAdd(kvp *model.KVPair)    // Resource added
    OnUpdate(kvp *model.KVPair) // Resource modified  
    OnDelete(kvp *model.KVPair) // Resource deleted
    OnSync()                    // Initial sync complete
    OnError(err error)          // Critical error occurred
}

ListWatchBackend Interface

type ListWatchBackend interface {
    PerformList(ctx context.Context) (*model.KVPairList, error)
    CreateWatch(ctx context.Context, isInitialSync bool) (WatchInterface, error)
    HandleWatchEvent(event WatchEvent) error
    HandleListError(err error)
    HandleWatchError(err error)
    PerformInitialSync(ctx context.Context, g *GenericListWatcher) error
}

Implementation Details

1. GenericListWatcher (api/listwatcher.go)

The GenericListWatcher provides common functionality for all backend implementations:

  • Retry Throttling: Configurable minimum intervals between retry attempts
  • Revision Tracking: Maintains current revision to resume watches efficiently
  • Error Counting: Tracks consecutive errors at the same revision, triggers full resync after MaxErrorsPerRevision (default: 5)
  • Connection Timeout: Detects prolonged disconnection (default: 600s) and signals errors

Key state management:

type GenericListWatcher struct {
    CurrentRevision        string    // Last known revision
    ErrorCountAtCurrentRev int       // Consecutive errors at current revision
    InitialSyncPending     bool      // Whether initial sync is required
    RetryBlockedUntil      time.Time // Throttle retry attempts
}

2. Kubernetes ListWatcher (k8s/listwatcher.go)

Implements WatchList mode with automatic fallback:

WatchList Mode (default):

  • Sets SendInitialEvents=true and ResourceVersionMatch=NotOlderThan in watch options
  • Initial data streams through watch events incrementally
  • Sync completion signaled by bookmark with InitialEventsAnnotationKey annotation
  • Memory-efficient: no need to hold entire list in memory

Fallback to List+Watch:

  • Automatic fallback when server returns IsInvalid error (older K8s versions)
  • Uses traditional List API followed by Watch
  • Transparent to the caller

K8s-specific Error Handling:

  • CRD not installed: Marks as synced, retries after 30 minutes
  • ResourceExpired/Gone: Triggers full resync
  • Connection refused/TooManyRequests: Retries with connection timeout check

3. etcd ListWatcher (etcdv3/listwatcher.go)

Implements traditional List+Watch pattern:

  • Always uses PerformListSync() for initial synchronization
  • Simpler error handling without bookmark or CRD concerns
  • Reuses all common functionality from GenericListWatcher

4. Simplified watcherCache (watchersyncer/watchercache.go)

The watcherCache now implements EventHandler interface directly:

Before: ~500 lines with embedded list-watch logic
After: ~278 lines (-45% reduction)

Key simplifications:

  • Removed embedded watch/list logic (moved to GenericListWatcher)
  • Replaced oldResources map with resyncEpoch counter for deletion detection
  • Removed duplicate retry/revision tracking code
  • Clean separation between event handling and list-watch mechanics

Resync Epoch Mechanism

type cacheEntry struct {
    revision    string
    key         model.Key
    resyncEpoch uint64  // NEW: tracks which resync cycle this entry belongs to
}

type watcherCache struct {
    resyncEpoch            uint64  // Incremented on each resync
    lastHandledResyncEpoch uint64  // Tracks processed resyncs
    // ...
}

On resync start:

  1. Increment resyncEpoch
  2. Process all incoming events, updating entries with current epoch
  3. After sync: delete entries with resyncEpoch < current (stale entries)

This is more memory-efficient than the previous approach of copying all resources to an oldResources map.

sequenceDiagram
    participant Cache as watcherCache
    participant Resources as resources map
    participant Handler as EventHandler
    
    Note over Resources: Initial State (epoch=1)<br/>pod-a:1, pod-b:1, pod-c:1
    
    Note over Cache: Connection lost!
    
    Cache->>Cache: OnResyncStarted()<br/>resyncEpoch = 2
    
    Note over Cache: Receive events during resync<br/>(pod-b was deleted while disconnected)
    
    Cache->>Resources: OnAdd(pod-a) → epoch=2
    Cache->>Resources: OnAdd(pod-c) → epoch=2
    Note over Resources: pod-a:2, pod-b:1 (STALE!), pod-c:2
    
    Cache->>Cache: OnSync() called
    
    Cache->>Resources: Scan for stale entries<br/>(epoch < current)
    Resources-->>Cache: pod-b.epoch(1) < current(2)
    
    Cache->>Handler: Send delete for pod-b
    Cache->>Resources: Remove pod-b
    
    Note over Resources: Final State<br/>pod-a:2, pod-c:2
Loading

ResourceVersion Management

Overview

ResourceVersion (revision) is a critical concept in Kubernetes/etcd that ensures consistency during list-watch operations. This PR improves revision management by:

  1. Starting with empty revision for initial sync (instead of "0")
  2. Tracking revision per-event for accurate resumption
  3. Resetting revision on errors to trigger full resync when needed

Revision Lifecycle

stateDiagram-v2
    [*] --> Empty: NewGenericListWatcher
    Empty --> Syncing: PerformInitialSync
    
    state Syncing {
        [*] --> WatchList: K8s default
        [*] --> ListWatch: etcd / fallback
        
        WatchList --> ReceiveEvents: Stream events
        ReceiveEvents --> UpdateRev: Each event updates revision
        UpdateRev --> ReceiveEvents
        UpdateRev --> Synced: Bookmark with annotation
        
        ListWatch --> ListAll: PerformList
        ListAll --> UpdateRevFromList: Get list revision
        UpdateRevFromList --> Synced: OnSync
    }
    
    Synced --> Watching: CreateWatch(revision)
    
    state Watching {
        [*] --> WaitEvent
        WaitEvent --> ProcessEvent: Event received
        ProcessEvent --> UpdateRevision: Update CurrentRevision
        UpdateRevision --> WaitEvent
    }
    
    Watching --> Empty: Error / Resync needed
    Watching --> Syncing: Connection lost
Loading

Revision Update Points

Scenario Revision Source Action
Initial state Empty string "" Request latest data from server
WatchList event (Added/Modified/Deleted) event.New.Revision or event.Old.Revision UpdateRevision()
WatchList bookmark event.New.Revision UpdateRevision()
List completion list.Revision UpdateRevision()
ResourceExpired error Reset to "" ResetForFullResync()
Too many errors Reset to "" ResetForFullResync()
Connection timeout Reset to "" ResetForFullResync()

Revision Flow Diagram

sequenceDiagram
    participant GLW as GenericListWatcher
    participant Backend as k8s/etcd Backend
    participant Server as API Server
    
    Note over GLW: CurrentRevision = ""
    
    GLW->>Backend: PerformInitialSync()
    
    alt WatchList Mode
        Backend->>Server: Watch(SendInitialEvents=true)
        loop Initial Events
            Server-->>Backend: WatchAdded(rev=100)
            Backend->>GLW: UpdateRevision("100")
            Server-->>Backend: WatchAdded(rev=101)
            Backend->>GLW: UpdateRevision("101")
            Server-->>Backend: WatchAdded(rev=102)
            Backend->>GLW: UpdateRevision("102")
        end
        Server-->>Backend: Bookmark(rev=102, InitialEvents=true)
        Backend->>GLW: UpdateRevision("102")
        Note over GLW: InitialSyncPending = false
    else List+Watch Mode
        Backend->>Server: List()
        Server-->>Backend: KVPairList(rev=102)
        Backend->>GLW: UpdateRevision("102")
        Note over GLW: InitialSyncPending = false
        Backend->>Server: Watch(revision=102)
    end
    
    Note over GLW: CurrentRevision = "102"
    
    loop Watch Events
        Server-->>Backend: WatchModified(rev=103)
        Backend->>GLW: UpdateRevision("103")
        Server-->>Backend: WatchDeleted(rev=104)
        Backend->>GLW: UpdateRevision("104")
        Server-->>Backend: Bookmark(rev=110)
        Backend->>GLW: UpdateRevision("110")
    end
    
    Note over GLW: CurrentRevision = "110"
    
    alt Error: ResourceExpired
        Server-->>Backend: Error(410 Gone)
        Backend->>GLW: ResetForFullResync()
        Note over GLW: CurrentRevision = ""
        Note over GLW: InitialSyncPending = true
    end
Loading

Why Empty Revision Instead of "0"

In the previous implementation, revision was initialized to "0". This PR changes it to empty string:

// Before
currentWatchRevision: "0"

// After
CurrentRevision: ""

Reasons for using empty revision:

  1. Data source difference (see Kubernetes API Concepts - Resource Versions):

    resourceVersion Data Source Consistency
    "" (empty/unset) etcd (quorum read) Strong consistency, returns latest data from etcd
    "0" API server watch cache May return stale data, but faster

    Using empty resourceVersion ensures we get the most recent data from etcd with strong consistency guarantees, which is critical for the initial sync to be accurate.

  2. Pagination bypass with rv=0: When resourceVersion=0 is used, the API server returns all data at once from watch cache, bypassing pagination parameters. This defeats the purpose of memory-efficient streaming:

    // With rv=0: API server ignores limit parameter, returns everything from cache
    GET /api/v1/pods?resourceVersion=0&limit=500
    → Returns ALL pods from watch cache (could be 10,000+), ignoring limit=500
    
    // With rv="" (empty): Pagination works correctly, reads from etcd
    GET /api/v1/pods?resourceVersion=&limit=500
    → Returns 500 pods with continue token, consistent read from etcd
    
  3. etcd behavior: According to etcd client documentation:

    "When passed WithRev(rev) with rev > 0, Get retrieves keys at the given revision"

    This means in etcd Get/List operations, WithRev(0) and not specifying WithRev have the same behavior - both return the latest data from etcd. The Calico etcd backend handles this by not passing WithRev option when revision is empty:

    // From etcdv3.go List():
    if len(revision) != 0 {
        rev, err := parseRevision(revision)
        ops = append(ops, clientv3.WithRev(rev))  // Only add WithRev if revision specified
    }
    // Empty revision = no WithRev option = get latest from etcd

    Key difference from Kubernetes: While in Kubernetes rv=0 vs rv="" have different semantics (cache vs etcd), in native etcd client they are equivalent. The Calico implementation uses empty string consistently for both backends to request latest data.

  4. Modern API server optimizations: Recent Kubernetes versions (1.28+) have implemented significant performance improvements for consistent reads that work best with empty resourceVersion:

    • ConsistentListFromCache (KEP-3157): Allows serving consistent List requests directly from the watch cache instead of requiring a quorum read from etcd. When using empty resourceVersion with ResourceVersionMatch=NotOlderThan, the API server can serve the request from cache while still guaranteeing consistency.

    • ListFromCacheSnapshot (KEP-3926): Creates a point-in-time snapshot of the watch cache for List operations, reducing memory allocations and improving performance for large lists.

    These optimizations are designed around the pattern of requesting "latest" data (empty resourceVersion) rather than a specific version, making empty revision the recommended approach for new implementations.

Todos

  • Tests
  • Documentation
  • Release note

Release Note

Add WatchList synchronization mode support for Kubernetes backend. WatchList mode streams initial resources incrementally through watch events instead of loading all resources at once via List API, significantly reducing memory pressure during initial sync and improving scalability for large clusters. The feature automatically falls back to traditional List+Watch mode on older Kubernetes versions that don't support WatchList.

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

@liuxu623 liuxu623 requested a review from a team as a code owner April 10, 2025 07:02
@marvin-tigera marvin-tigera added this to the Calico v3.31.0 milestone Apr 10, 2025
@marvin-tigera marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Apr 10, 2025
@liuxu623 liuxu623 force-pushed the watch-list branch 2 times, most recently from 7a86dbf to c31cd88 Compare April 10, 2025 12:10
@liuxu623 liuxu623 changed the title [WIP] support use WatchList for k8s support use WatchList for k8s Apr 10, 2025
@liuxu623 liuxu623 force-pushed the watch-list branch 6 times, most recently from b2de890 to f4dc1ff Compare April 11, 2025 06:26
@fasaxc
Copy link
Member

fasaxc commented Apr 11, 2025

D'oh, think we're duplicating some work here; I've been working on the same. #9593 was an enabling PR to be able to add this feature. You're right; it's a big win for us. It actually moves the k8s API to work the same way that Calico's internal "Syncer" API works(!)

I need to dust off my work in progress and see if there were any gotchas and compare with your approach.

@fasaxc fasaxc self-assigned this Apr 11, 2025
@fasaxc fasaxc added docs-not-required Docs not required for this change and removed docs-pr-required Change is not yet documented labels Apr 11, 2025
@fasaxc
Copy link
Member

fasaxc commented Apr 11, 2025

/sem-approve

@liuxu623
Copy link
Contributor Author

D'oh, think we're duplicating some work here; I've been working on the same. #9593 was an enabling PR to be able to add this feature. You're right; it's a big win for us. It actually moves the k8s API to work the same way that Calico's internal "Syncer" API works(!)

I need to dust off my work in progress and see if there were any gotchas and compare with your approach.

I noticed #9593 before, WatchList feature depends on bookmark. After support bookmark, a lot of work was reduced.

@liuxu623 liuxu623 force-pushed the watch-list branch 2 times, most recently from 88db18c to c5ecc46 Compare April 14, 2025 07:10
@liuxu623
Copy link
Contributor Author

@fasaxc Felix add K8S_USE_WATCH_LIST env to controls whether use of WatchList, PTAL.

@fasaxc
Copy link
Member

fasaxc commented May 1, 2025

Sorry for the radio silence, I finally got a chance to have a look. A few high level thoughts rather than jumping into line-by line review:

  • Overall, this looks like it'd work.

  • Can we detect specifically that the server doesn't support the watch-list operation and fall back only in that case? I think k8s reflector/informer does manage to do this somehow; I think the error we expect from an old API server is distinct.

  • I think introducing the func (wc *watcherCache) watchList(ctx context.Context) method might be too cautious. We shouldn't need to turn the first part of the watch into a simulated "list" call because the output of the watchersyncer uses Calico's "Syncer" API, which is actually almost identical to the "send initial events" approach that k8s has now adopted(!).

    • Record that we're resyncing
    • Start a "send initial events" watch.
    • Handle each event as normal, send it downstream straight away (downstream can handle duplicates).
    • When we get the first bookmark:
      • Use the cache of previously seen resources to figure out any deletions and send those downstream.
      • If this is the first time we've been in sync, send an "in-sync" message downstream. This is Calico's equivalent to the first bookmark"; it tells the downstream "ok you've got everything now".
  • For config, I'd prefer to go through felix/typha's config_params.go, which is how those components are generally configured. This code normally runs in Typha in a large cluster, offloading it from Felix. That said, there isn't a great way to configure Typha in most deployments because the operator locks down its env vars.

  • We need to test this thoroughly. Bugs in this area tend to be catastrophic. It'd be good to:

    • Run the test suite both with and without the feature enabled.
    • If we can, test trying to use the feature when it's disabled at the API server so we get an error and test the fallback code.
    • See if we can figure out a way to test corner cases like getting disconnected mid-resync.
  • Calico still supports the etcdv3 backend, I think you need to make sure that the etcdv3 backend returns a "not supported" error if passed the new watch options that it doesn't support.

For reference, here's a gist with my proof-of-concept version: https://gist.github.com/fasaxc/79da02a2ab91c3bcd9644761e6abfc79 It was based on a much earlier version of the watch bookmarks PR and it's hard to rebase now but there might be something worth borrowing from it. Some things I did differently:

  • I didn't handle fallback from watch to list, but I was planning on doing that.

  • Split the initial resync into two versions, one using list and one not. I was hoping that the watch list version would be a lot simpler due to being closer to Calico's model. Looking at it now, I'd move the for loop up into the calling function since it's the same in both. I added a bunch of documentation while I was trying to understand this code thoroughly again.

  • To do the "mark and sweep" GC of deleted keys after a resync, I introduced an epoch counter that was incremented once per resync. This made it easy to scan the resources map and find all the keys that should be deleted. If I remember correctly, the oldResources map got very fiddly when doing watch-based resyncs due to losing the atomicity of doing a full list. You've preserved the atomicity by simulating the list operation, but that means that you have to buffer the whole output in memory rather than stream it.

  • I introduced a method wc.client.SupportedWatchOptions() to let the watchercache ask the client if it supported the feature. This was how I planned to detect etcd not supporting the feature. Not sure this panned out, better to just try it and have the backend respond with a "not supported" error, since there's no easy way to detect if k8s supports it without trying.

  • I was experimenting with the options pattern on wc.client.Watch(... WithBookmarks()), I think this was a dead end so I went with an options struct in the watch bookmarks PR.

@liuxu623
Copy link
Contributor Author

liuxu623 commented May 7, 2025

  • Can we detect specifically that the server doesn't support the watch-list operation and fall back only in that case? I think k8s reflector/informer does manage to do this somehow; I think the error we expect from an old API server is distinct.

Before k8s 1.27, watch not support resourceVersionMatch, ValidateListOptions return err.
https://github.com/kubernetes/kubernetes/blob/release-1.26/staging/src/k8s.io/apimachinery/pkg/apis/meta/internalversion/validation/validation.go#L30

// ValidateListOptions returns all validation errors found while validating the ListOptions.
func ValidateListOptions(options *internalversion.ListOptions) field.ErrorList {
	allErrs := field.ErrorList{}
	if match := options.ResourceVersionMatch; len(match) > 0 {
		if options.Watch {
			allErrs = append(allErrs, field.Forbidden(field.NewPath("resourceVersionMatch"), "resourceVersionMatch is forbidden for watch"))
		}
                 ......
	}
	return allErrs
}

After k8s 1.27, sendInitialEvents not support if WatchList feature gate is disabled, ValidateListOptions return err.
https://github.com/kubernetes/kubernetes/blob/release-1.27/staging/src/k8s.io/apimachinery/pkg/apis/meta/internalversion/validation/validation.go#L61

func validateWatchOptions(options *internalversion.ListOptions, isWatchListFeatureEnabled bool) field.ErrorList {
	allErrs := field.ErrorList{}
	match := options.ResourceVersionMatch
	if options.SendInitialEvents != nil {
		if match != metav1.ResourceVersionMatchNotOlderThan {
			allErrs = append(allErrs, field.Forbidden(field.NewPath("resourceVersionMatch"), fmt.Sprintf("sendInitialEvents requires setting resourceVersionMatch to %s", metav1.ResourceVersionMatchNotOlderThan)))
		}
		if !isWatchListFeatureEnabled {
			allErrs = append(allErrs, field.Forbidden(field.NewPath("sendInitialEvents"), "sendInitialEvents is forbidden for watch unless the WatchList feature gate is enabled"))
		}
	}
	......
	return allErrs
}

So, the error is same for new and old apiserver, and the error is Invalid error, we can use the error type to determine whether the apiserver supports WatchList.
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/get.go#L202

metainternalversion.SetListOptionsDefaults(&opts, utilfeature.DefaultFeatureGate.Enabled(features.WatchList))
if errs := metainternalversionvalidation.ValidateListOptions(&opts, utilfeature.DefaultFeatureGate.Enabled(features.WatchList)); len(errs) > 0 {
    err := errors.NewInvalid(schema.GroupKind{Group: metav1.GroupName, Kind: "ListOptions"}, "", errs)
    scope.err(err, w, req)
    return
}

@liuxu623
Copy link
Contributor Author

liuxu623 commented May 7, 2025

  • Calico still supports the etcdv3 backend, I think you need to make sure that the etcdv3 backend returns a "not supported" error if passed the new watch options that it doesn't support.

Yes, I plan to make etcd watch return the same error as apiserver if specifies ResourceVersionMatch or SendInitialEvents.

// Watch entries in the datastore matching the resources specified by the ListInterface.
func (c *etcdV3Client) Watch(cxt context.Context, l model.ListInterface, options api.WatchOptions) (api.WatchInterface, error) {
	allErrs := field.ErrorList{}
	if len(options.ResourceVersionMatch) > 0 {
		allErrs = append(allErrs, field.Forbidden(field.NewPath("resourceVersionMatch"), "resourceVersionMatch is forbidden for etcdv3 backend"))
	}
	if options.SendInitialEvents != nil && *options.SendInitialEvents {
		allErrs = append(allErrs, field.Forbidden(field.NewPath("sendInitialEvents"), "sendInitialEvents is forbidden for etcdv3 backend"))
	}
	if len(allErrs) > 0 {
		return nil, apierrors.NewInvalid(schema.GroupKind{Group: metav1.GroupName, Kind: "ListOptions"}, "", allErrs)
	}

@liuxu623
Copy link
Contributor Author

liuxu623 commented May 7, 2025

https://github.com/projectcalico/calico/blob/v3.30.0/libcalico-go/lib/backend/etcdv3/watcher.go#L93-L115

if wc.initialRev == 0 {
	// No initial revision supplied, so perform a list of current configuration
	// which will also get the current revision we will start our watch from.
	var kvps *model.KVPairList
	var err error
	if kvps, err = wc.listCurrent(); err != nil {
		log.Errorf("failed to list current with latest state: %v", err)
		// Error considered as terminating error, hence terminate watcher.
		wc.sendError(err)
		return
	}

	// If we're handling profiles, filter out the default-allow profile.
	if len(kvps.KVPairs) > 0 && (key == profilesKey || key == defaultAllowProfileKey) {
		wc.removeDefaultAllowProfile(kvps)
	}

	// We are sending an initial sync of entries to the watcher to provide current
	// state.  To the perspective of the watcher, these are added entries, so set the
	// event type to WatchAdded.
	log.WithField("NumEntries", len(kvps.KVPairs)).Debug("Sending create events for each existing entry")
	wc.sendAddedEvents(kvps)
}

I found etcdv3 watcher will list first if initialRev == 0, maybe we needn't return a "not supported" error for etcdv3 backend. Could we assume that etcdv3 supports WatchList?

We shouldn't use this code to implement WatchList for etcdv3, because there is no time to call onInSync.

@liuxu623 liuxu623 changed the title support use WatchList for k8s [WIP] support use WatchList for k8s May 7, 2025
@liuxu623 liuxu623 force-pushed the watch-list branch 2 times, most recently from 367be48 to 7809dfb Compare May 7, 2025 12:28
@liuxu623
Copy link
Contributor Author

https://kubernetes.io/blog/2025/05/09/kubernetes-v1-33-streaming-list-responses/

kube-apiserver disables the beta WatchList feature by default in 1.33 in favor of the StreamingCollectionEncodingToJSON and StreamingCollectionEncodingToProtobuf features.kube-controller-manager no longer opts into enabling the WatchListClient feature in 1.33. (https://github.com/kubernetes/kubernetes/pull/131359, [@deads2k](https://github.com/deads2k)) [SIG API Machinery]

@fasaxc WatchList feature has disabled by default in 1.33, because StreamingCollectionEncodingToJSON and StreamingCollectionEncodingToProtobuf appear to work better, maybe we doesn't need WatchList anymore.

@fasaxc
Copy link
Member

fasaxc commented May 12, 2025

Agree, let's put it on hold. It's a shame; the streaming decode is better for Calico on the client side. We avoid a big resource spike while we parse/validate the very large List object.

@liuxu623
Copy link
Contributor Author

Agree, let's put it on hold. It's a shame; the streaming decode is better for Calico on the client side. We avoid a big resource spike while we parse/validate the very large List object.

@fasaxc Kubernetes 1.34 has re-promoted the WatchList feature to Beta, and we can resume this work now.​​

@mazdakn
Copy link
Member

mazdakn commented Oct 21, 2025

@fasaxc @liuxu623 fyi, we bumped k8s version to 1.34.1 recently: #11200
Just wanted to point it out here as it might have unblocked this PR.

@liuxu623 liuxu623 force-pushed the watch-list branch 2 times, most recently from 0d4667e to 1770165 Compare December 13, 2025 07:41
@liuxu623 liuxu623 changed the title support use WatchList for k8s Add WatchList support with GenericListWatcher abstraction Dec 13, 2025
@liuxu623
Copy link
Contributor Author

@fasaxc I apologize for the lack of updates to this PR recently due to work commitments. I've made some refactorings to the ListWatch code; details can be found in the PR description. Looking forward to your feedback.

@github-actions
Copy link
Contributor

This PR is stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the stale Issues without recent activity label Feb 13, 2026
Introduce Kubernetes WatchList synchronization mode (GA in K8s 1.32),
enabling efficient initial sync by streaming data via watch events.

Key changes:
- Extract list-watch logic into GenericListWatcher with ListWatchBackend
  and EventHandler interfaces for clean separation of concerns
- Add k8s ListWatcher with WatchList mode, bookmark handling, and
  automatic fallback to List+Watch for older K8s versions
- Add etcd ListWatcher using traditional List+Watch pattern
- Use empty revision for List/Watch to avoid stale revision errors
- Replace oldResources map with resyncEpoch for memory efficiency
- Simplify watcherCache from ~500 to ~278 lines (-45%)

WatchList mode reduces memory pressure by streaming resources
incrementally instead of loading all at once via List API.
@github-actions github-actions bot removed the stale Issues without recent activity label Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required Docs not required for this change release-note-required Change has user-facing impact (no matter how small)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants