Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ spec:
description: |-
`model` in the LLM request, it could be a base model name, lora adapter name or even
a virtual model name. This field is used to match scenarios other than model adapter name and
this field could be empty, but it and `ModelAdapters` can't both be empty.
this field could be empty, but it and `ModelAdapters` can't both be empty.
type: string
x-kubernetes-validations:
- message: modelName is immutable
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ _Appears in:_

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `modelName` _string_ | `model` in the LLM request, it could be a base model name, lora adapter name or even<br />a virtual model name. This field is used to match scenarios other than model adapter name and<br />this field could be empty, but it and `ModelAdapters` can't both be empty. | | |
| `modelName` _string_ | `model` in the LLM request, it could be a base model name, lora adapter name or even<br />a virtual model name. This field is used to match scenarios other than model adapter name and<br />this field could be empty, but it and `ModelAdapters` can't both be empty. | | |
| `loraAdapters` _string array_ | `model` in the LLM request could be lora adapter name,<br />here is a list of Lora Adapter Names to match. | | MaxItems: 10 <br /> |
| `parentRefs` _ParentReference array_ | ParentRefs references the Gateways that this ModelRoute should be attached to.<br />If empty, the ModelRoute will be attached to all Gateways in the same namespace. | | |
| `rules` _[Rule](#rule) array_ | An ordered list of route rules for LLM traffic. The first rule<br />matching an incoming request will be used.<br />If no rule is matched, an HTTP 404 status code MUST be returned. | | MaxItems: 16 <br /> |
Expand Down
2 changes: 1 addition & 1 deletion pkg/apis/networking/v1alpha1/modelroute_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ import (
type ModelRouteSpec struct {
// `model` in the LLM request, it could be a base model name, lora adapter name or even
// a virtual model name. This field is used to match scenarios other than model adapter name and
// this field could be empty, but it and `ModelAdapters` can't both be empty.
// this field could be empty, but it and `ModelAdapters` can't both be empty.
//
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="modelName is immutable"
ModelName string `json:"modelName,omitempty"`
Expand Down
2 changes: 1 addition & 1 deletion pkg/kthena-router/backend/metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ func HTTPClient() *http.Client {
return httpClient
}

// This function refer to aibrix(https://github.com/vllm-project/aibrix/blob/main/pkg/metrics/utils.go)
// This function refers to aibrix(https://github.com/vllm-project/aibrix/blob/main/pkg/metrics/utils.go)
func ParseMetricsURL(url string) (map[string]*dto.MetricFamily, error) {
resp, err := httpClient.Get(url)
if err != nil {
Expand Down
2 changes: 1 addition & 1 deletion pkg/kthena-router/scheduler/plugins/least_request.go
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ func (l *LeastRequest) Score(ctx *framework.Context, pods []*datastore.PodInfo)
baseScores := make(map[*datastore.PodInfo]float64)
maxScore := 0.0
for _, info := range pods {
// The weight of waiting requests is 100. It's a magic number just to sinificantly lower the score of the pod when there are waiting reqs.
// The weight of waiting requests is 100. It's a magic number just to significantly lower the score of the pod when there are waiting reqs.
base := info.GetRequestRunningNum() + 100*info.GetRequestWaitingNum()
baseScores[info] = base
if base > maxScore {
Expand Down
Loading