-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How is shared-state implemented? #37
Comments
based on list-watch mechanism |
Sorry, I've learned about the list-watch but still have difficulties. The list-watch mechanism is mainly through monitoring the events about create, delete, etc. But what puzzles me is how can multiple schedulers get the global cluster view. Does every scheduler supervise all these events so they don't need synchronization; or they synchronize to a central global cluster view at a certain frequency ( the real-time synchronization )? And which struct in the code serve as the central global cluster view? I'm not sure about that.Is it is commoncache in the struct binder or the generationstore or some other struct? |
@katoomegumi each scheduler instance watches all events from apiserver(etcd), they don't need to sync up with each other. |
@NickrenREN Thanks, and I think it's impossible to sync scheduler's cache for every event. so I think the code define the time internal to sync cache from events. Is it true? // pkg/scheduler/scheduler.go
// func Run
if utilfeature.DefaultFeatureGate.Enabled(features.SchedulerCacheScrape) {
// The metrics agent scrape endpoint every 5s and flush them to the metrics server every 30s. To
// be more precise, scrape cache metrics every 5s.
go wait.Until(func() {
sched.commonCache.ScrapeCollectable(sched.metricsRecorder)
sched.metricsRecorder.UpdateMetrics()
}, 5*time.Second, sched.StopEverything)
} |
@katoomegumi No, scheduler can receive every event and react to them (update cache and queue based on events). The code you posted is for collecting metrics, not the syncing cache logic. |
@NickrenREN Thanks for the reply. Actually, we are interested in the "watch delay" in godel scheduler, which refers to the duration between event in etcd (e.g., cluster resource change) and each scheduler actually watch the event (update its cache). It is obvious that with higher QPS and larger cluster, the "watch delay" would be more severe... Admittedly, it is an inherent problem of K8S itself, but we wonder if godel made characterizations or specific optimizations of the "watch delay"? FYI, the related discussion in k8s repo: kubernetes/kubernetes#108556 |
@Wang-Xinkai hello, we optimize the "event latency" from two aspects.
|
Thanks. I have checked the client-side optimizations. |
@Wang-Xinkai My understanding is: event latency depends on three parts: 1. apiserver and etcd processing efficiency; 2. network condition between apiserver and client; 3. client processing efficiency. We are now optimizing 1 and 3 to accelerate the event processing flow. But we can't say every thing (server and client sides) will alway be ok, so, we can't say Network conditon has nothing to do with k8s ecosystem, but in the future, we can explore if we can do something to simplify the interaction process between godel scheduler components. e.g. for now, all godel scheduler components get events from apiserver, can we let them talk to each other directly ? ... |
Interesting idea lol. I agree with you on event delay decomposition! Do you have some cursory estimation of the scale of event delay: tens-of-ms, hundreds-of-ms, second-scale? under normal scenarios and extreme high load scenarios. I mean, if the event delay is huge, the scheduler would be blind to some "free resource" in the cluster during the event delay, that causes great waste of cluster resource (if there are tasks waiting to be scheduled). That's why we are interested in this metric. Thanks. |
@Wang-Xinkai IIUC, you are worried about the Node resource update events latency ? |
Right, do you have some thoughts about this issue? Or the experiences of the actual latency in realistic clusters? We suspect it affects the resource visibility of schedulers… |
@Wang-Xinkai In kubernetes, different resources (node, pods...) have different event transmission links. The number of nodes is not that large, so node resource is less likely to cause latency issues. At least, in Bytedance, we have never meet this kind of problems (our largest single cluster size: 20k nodes, 1000k pods) |
okay, thanks for your generous replies. We will use Godel to study more about shared-state schedulers. Keep in touch! |
@Wang-Xinkai Cool, if you have any question, feel free to reach out to me |
According to the paper, godel-scheduler is a shared-state scheduler. Where can I find the implementation in the code? Particularly how to synchronize the of the global cluster view?
The text was updated successfully, but these errors were encountered: