How is shared-state implemented？ #37

katoomegumi · 2024-03-24T08:06:53Z

According to the paper, godel-scheduler is a shared-state scheduler. Where can I find the implementation in the code? Particularly how to synchronize the of the global cluster view?

NickrenREN · 2024-03-26T02:32:59Z

based on list-watch mechanism

katoomegumi · 2024-04-08T07:51:31Z

Sorry, I've learned about the list-watch but still have difficulties. The list-watch mechanism is mainly through monitoring the events about create, delete, etc. But what puzzles me is how can multiple schedulers get the global cluster view. Does every scheduler supervise all these events so they don't need synchronization; or they synchronize to a central global cluster view at a certain frequency ( the real-time synchronization )? And which struct in the code serve as the central global cluster view? I'm not sure about that.Is it is commoncache in the struct binder or the generationstore or some other struct?

NickrenREN · 2024-04-08T09:39:47Z

@katoomegumi each scheduler instance watches all events from apiserver(etcd), they don't need to sync up with each other.

katoomegumi · 2024-04-08T11:35:03Z

@NickrenREN Thanks, and I think it's impossible to sync scheduler's cache for every event. so I think the code define the time internal to sync cache from events. Is it true?

// pkg/scheduler/scheduler.go
// func Run
if utilfeature.DefaultFeatureGate.Enabled(features.SchedulerCacheScrape) {
		// The metrics agent scrape endpoint every 5s and flush them to the metrics server every 30s. To
		// be more precise, scrape cache metrics every 5s.
		go wait.Until(func() {
			sched.commonCache.ScrapeCollectable(sched.metricsRecorder)
			sched.metricsRecorder.UpdateMetrics()
		}, 5*time.Second, sched.StopEverything)
	}

NickrenREN · 2024-04-08T12:09:47Z

@katoomegumi No, scheduler can receive every event and react to them (update cache and queue based on events). The code you posted is for collecting metrics, not the syncing cache logic.
btw, godel scheduler is created on the basis of kubernetes. you can spend more time on kubernetes and etcd.

Wang-Xinkai · 2024-04-09T14:59:15Z

@NickrenREN Thanks for the reply. Actually, we are interested in the "watch delay" in godel scheduler, which refers to the duration between event in etcd (e.g., cluster resource change) and each scheduler actually watch the event (update its cache). It is obvious that with higher QPS and larger cluster, the "watch delay" would be more severe... Admittedly, it is an inherent problem of K8S itself, but we wonder if godel made characterizations or specific optimizations of the "watch delay"?

FYI, the related discussion in k8s repo: kubernetes/kubernetes#108556

NickrenREN · 2024-04-10T03:10:43Z

@Wang-Xinkai hello, we optimize the "event latency" from two aspects.

one is the server side: we don't use etcd for large scale clusters in Bytedance, we use Kubebrain + ByteKV instead. This is not done in Godel Scheduler.
another one is the client side: we optimize the event processing workflow in Godel Scheduler, so that events won't be stuck in delta queue.

Wang-Xinkai · 2024-04-10T05:43:31Z

Thanks. I have checked the client-side optimizations.
According to my understanding, realistic godel has real-time resource view of its corresponding sub-cluster based on the dual-side optimizations on "event delay"? In such case, the event delay just relates to the network communication cost between apiserver and scheduler.

NickrenREN · 2024-04-10T06:50:59Z

@Wang-Xinkai My understanding is: event latency depends on three parts: 1. apiserver and etcd processing efficiency; 2. network condition between apiserver and client; 3. client processing efficiency.

We are now optimizing 1 and 3 to accelerate the event processing flow. But we can't say every thing (server and client sides) will alway be ok, so, we can't say the event delay just relates to the network communication cost between apiserver and scheduler.

Network conditon has nothing to do with k8s ecosystem, but in the future, we can explore if we can do something to simplify the interaction process between godel scheduler components. e.g. for now, all godel scheduler components get events from apiserver, can we let them talk to each other directly ? ...

Wang-Xinkai · 2024-04-10T07:42:35Z

Interesting idea lol. I agree with you on event delay decomposition! Do you have some cursory estimation of the scale of event delay: tens-of-ms, hundreds-of-ms, second-scale? under normal scenarios and extreme high load scenarios.

I mean, if the event delay is huge, the scheduler would be blind to some "free resource" in the cluster during the event delay, that causes great waste of cluster resource (if there are tasks waiting to be scheduled). That's why we are interested in this metric. Thanks.

NickrenREN · 2024-04-10T09:13:56Z

@Wang-Xinkai IIUC, you are worried about the Node resource update events latency ?

Wang-Xinkai · 2024-04-10T10:24:51Z

Right, do you have some thoughts about this issue? Or the experiences of the actual latency in realistic clusters? We suspect it affects the resource visibility of schedulers…

NickrenREN · 2024-04-11T02:54:09Z

@Wang-Xinkai In kubernetes, different resources (node, pods...) have different event transmission links. The number of nodes is not that large, so node resource is less likely to cause latency issues. At least, in Bytedance, we have never meet this kind of problems (our largest single cluster size: 20k nodes, 1000k pods)

Wang-Xinkai · 2024-04-11T03:17:20Z

okay, thanks for your generous replies. We will use Godel to study more about shared-state schedulers. Keep in touch!

NickrenREN · 2024-04-11T03:22:40Z

@Wang-Xinkai Cool, if you have any question, feel free to reach out to me

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is shared-state implemented？ #37

How is shared-state implemented？ #37

katoomegumi commented Mar 24, 2024

NickrenREN commented Mar 26, 2024

katoomegumi commented Apr 8, 2024

NickrenREN commented Apr 8, 2024

katoomegumi commented Apr 8, 2024

NickrenREN commented Apr 8, 2024

Wang-Xinkai commented Apr 9, 2024

NickrenREN commented Apr 10, 2024

Wang-Xinkai commented Apr 10, 2024

NickrenREN commented Apr 10, 2024 •

edited

Loading

Wang-Xinkai commented Apr 10, 2024

NickrenREN commented Apr 10, 2024

Wang-Xinkai commented Apr 10, 2024

NickrenREN commented Apr 11, 2024

Wang-Xinkai commented Apr 11, 2024

NickrenREN commented Apr 11, 2024

How is shared-state implemented？ #37

How is shared-state implemented？ #37

Comments

katoomegumi commented Mar 24, 2024

NickrenREN commented Mar 26, 2024

katoomegumi commented Apr 8, 2024

NickrenREN commented Apr 8, 2024

katoomegumi commented Apr 8, 2024

NickrenREN commented Apr 8, 2024

Wang-Xinkai commented Apr 9, 2024

NickrenREN commented Apr 10, 2024

Wang-Xinkai commented Apr 10, 2024

NickrenREN commented Apr 10, 2024 • edited Loading

Wang-Xinkai commented Apr 10, 2024

NickrenREN commented Apr 10, 2024

Wang-Xinkai commented Apr 10, 2024

NickrenREN commented Apr 11, 2024

Wang-Xinkai commented Apr 11, 2024

NickrenREN commented Apr 11, 2024

NickrenREN commented Apr 10, 2024 •

edited

Loading