Skip to content

Commit e82beb6

Browse files
authored
Update KServe 2024-2025 Roadmap (kserve#3810)
* Update ROADMAP.md Signed-off-by: Dan Sun <[email protected]> * Add llm gateway Signed-off-by: Dan Sun <[email protected]> * Update ROADMAP.md Signed-off-by: Dan Sun <[email protected]> * Update ROADMAP.md Signed-off-by: Dan Sun <[email protected]> --------- Signed-off-by: Dan Sun <[email protected]>
1 parent 87cf2cd commit e82beb6

File tree

1 file changed

+37
-26
lines changed

1 file changed

+37
-26
lines changed

ROADMAP.md

+37-26
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,38 @@
1-
# KServe 2023 Roadmap
1+
# KServe 2024-2025 Roadmap
2+
## Objective: "Support GenAI inference"
3+
- LLM Serving Runtimes
4+
* Support Speculative Decoding with vLLM runtime [https://github.com/kserve/kserve/issues/3800].
5+
* Support LoRA adapters [https://github.com/kserve/kserve/issues/3750].
6+
* Support LLM Serving runtimes for TensorRT-LLM, TGI and provide benchmarking comparisons [https://github.com/kserve/kserve/issues/3868].
7+
* Support multi-host, multi-GPU inference runtime [https://github.com/kserve/kserve/issues/2145].
8+
9+
- LLM Autoscaling
10+
* Support Model Caching with automatic PV/PVC provisioning [https://github.com/kserve/kserve/issues/3869].
11+
* Support Autoscaling settings for serving runtimes.
12+
* Support Autoscaling based on custom metrics [https://github.com/kserve/kserve/issues/3561].
13+
14+
- LLM RAG/Agent Pipeline Orchestration
15+
* Support declarative RAG/Agent workflow using KServe Inference Graph [https://github.com/kserve/kserve/issues/3829].
16+
17+
- Open Inference Protocol extension to GenAI Task APIs
18+
* Community-maintained Open Inference Protocol repo for OpenAI schema [https://docs.google.com/document/d/1odTMdIFdm01CbRQ6CpLzUIGVppHSoUvJV_zwcX6GuaU].
19+
* Support vertical GenAI Task APIs such as embedding, Text-to-Image, Text-To-Code, Doc-To-Text [https://github.com/kserve/kserve/issues/3572].
20+
21+
- LLM Gateway
22+
* Support multiple LLM providers.
23+
* Support token based rate limiting.
24+
* Support LLM router with traffic shaping, fallback, load balancing.
25+
* LLM Gateway observability for metrics and cost reporting
226

327
## Objective: "Graduate core inference capability to stable/GA"
4-
- Promote `InferenceService` and `ClusterServingRuntime`/`ServingRuntime` CRD from v1beta1 to v1
28+
- Promote `InferenceService` and `ClusterServingRuntime`/`ServingRuntime` CRD to v1
529
* Improve `InferenceService` CRD for REST/gRPC protocol interface
6-
* Unify model storage spec and implementation between KServe and ModelMesh
7-
* Add Status to `ServingRuntime` for both ModelMesh and KServe, surface `ServingRuntime` validation errors and deployment status
8-
* Deprecate `TrainedModel` CRD and use `InferenceService` annotation to allow dynamic model updates as alternative option to storage initializer
9-
* Collocate transformer and predictor in the pod to reduce sidecar resources and networking latency
10-
* Stablize `RawDeployment` mode with comprehensive testing for supported features
11-
12-
- All model formats to support v2 inference protocol including custom serving runtime
13-
* TorchServe to support v2 gRPC inference protocol
30+
* Improve model storage interface
31+
* Deprecate `TrainedModel` CRD and add multiple model support for co-hosting, draft model, LoRA adapters to InferenceService.
32+
* Improve YAML UX for predictor and transformer container collocation.
33+
* Close the feature gap between `RawDeployment` and `Serverless` mode.
34+
35+
- Open Inference Protocol
1436
* Support batching for v2 inference protocol
1537
* Transformer and Explainer v2 inference protocol interoperability
1638
* Improve codec for v2 inference protocol
@@ -19,30 +41,19 @@ Reference: [Control plane issues](https://github.com/kserve/kserve/issues?q=is%3
1941

2042
## Objective: "Graduate KServe Python SDK to 1.0“
2143

22-
- Improve KServe Python SDK dependency management with Poetry
23-
- Create standarized model packaging API
24-
- Improve KServe model server observability with metrics and distruted tracing
44+
- Create standardized model packaging API
45+
- Improve KServe model server observability with metrics and distributed tracing
2546
- Support batch inference
2647

2748
Reference:[Python SDK issues](https://github.com/kserve/kserve/issues?q=is%3Aissue+is%3Aopen+label%3Akserve%2Fsdk), [Storage issues](https://github.com/kserve/kserve/issues?q=is%3Aissue+is%3Aopen+label%3Akfserving%2Fstorage)
2849

29-
## Objective: "Graduate ModelMesh to beta"
30-
- Support TorchServe ServingRuntime
31-
- Add PVC support and unify storage implementation with KServe
32-
- Add optional ingress for ModelMesh deployments
33-
- Etcd secret security for multi-namespace mode
34-
- Add estimated model size field
35-
36-
Reference: [ModelMesh issues](https://github.com/kserve/modelmesh-serving/issues?page=1&q=is%3Aissue+is%3Aopen)
37-
38-
## Objective: "Graduate InferenceGraph to beta"
50+
## Objective: "Graduate InferenceGraph"
3951
- Improve `InferenceGraph` spec for replica and concurrency control
40-
- Allow setting resource limits per `InferenceGraph`
4152
- Support distributed tracing
4253
- Support gRPC for `InferenceGraph`
4354
- Standalone `Transformer` support for `InferenceGraph`
4455
- Support traffic mirroring node
45-
- Support `RawDeployment` mode for `InferenceGraph`
56+
- Improve `RawDeployment` mode for `InferenceGraph`
4657

4758
Reference: [InferenceGraph issues](https://github.com/kserve/kserve/issues?q=is%3Aissue+is%3Aopen+label%3Akserve%2Finference_graph)
4859

@@ -58,5 +69,5 @@ Reference: [Auth related issues](https://github.com/kserve/kserve/issues?q=is%3A
5869
- Add ModelMesh docs and explain the use cases for classic KServe and ModelMesh
5970
- Unify the data plane v1 and v2 page formats
6071
- Improve v2 data plane docs to tell the story why and what changed
61-
- Clean up the examples in kserve repo and unify them with the website's by creating one source of truth for example documentation
72+
- Clean up the examples in kserve repo and unify them with the website's by creating one source of truth for documentation
6273
- Update any out-of-date documentation and make sure the website as a whole is consistent and cohesive

0 commit comments

Comments
 (0)