-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why another scheduler? What purpose of this project? #26
Comments
@Cdayz The main goal of this project is to provide an unified scheduler for online and offline workloads, so that it will be easier to do colocation and improve resource utilization and resource elasticity. Volcano is a offline scheduler, and yunikorn only provides kubernetes adaptor. In Bytedance, the cluster scale is very large (20k nodes, 1000k pods in one single cluster) and the business scenarios are complex, it is difficult to use existing scheduler directly and the development effort (based on them) is not acceptable. |
@NickrenREN, can you please describe the difference between online and offline workloads, for example, what you mean by that? I am asking this question because, according to my understanding, an online scheduler is a scheduler that does not know when a new task arrives or when an already running task will finish. A volcano can be used in this environment as far as i know. Maybe I am wrong, and if I am wrong, I apologize for wasting your time. However, I think it is important to clarify the goals of this project and possibly write a decision record with the pros and cons of other solutions, along with some clarification as to why this one is necessary. |
@Cdayz generally speaking, online workloads are SLA, latency sensitive workloads, such as micro-service workloads, RPC services, and offline workloads are mostly throughput oriented and care more about job completion time, such as Hadhoop batch apps and ML training tasks... They care about different metrics, the scheduling requirements are different, for example: Hadhoop batch apps need high scheduling throughput (1k pods per second in our prod env), and ML training tasks need |
Yeah, my question is pretty simple, why you started to built new one k8s scheduler?
There are many different production-grade solutions like:
They offers the same functiionality and much more extra things that are already built-in.
What purpose of this project?
The text was updated successfully, but these errors were encountered: