Configuring concurrent requests for Knative Serving autoscaling

You can specify the number of concurrent requests that should be handled by each instance of an application (revision container) by adding the target annotation or the containerConcurrency field in the revision template.

Here is an example of target being used in a revision template:

apiVersion: serving.knative.dev/v1alpha1
kind: Service
metadata:
  name: myapp
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/target: 50
    spec:
      containers:
      - image: myimage

Here is an example of containerConcurrency being used in a revision template:

apiVersion: serving.knative.dev/v1alpha1
kind: Service
metadata:
  name: myapp
spec:
  template:
    metadata:
      annotations:
    spec:
      containerConcurrency: 100
      containers:
      - image: myimage

Adding a value for both target and containerConcurrency will target the target number of concurrent requests, but impose a hard limit of the containerConcurrency number of requests.

For example, if the target value is 50 and the containerConcurrency value is 100, the targeted number of requests will be 50, but the hard limit will be 100.

If the containerConcurrency value is less than the target value, the target value will be tuned down, since there is no need to target more requests than the number that can actually be handled.

Note	`containerConcurrency` should only be used if there is a clear need to limit how many requests reach the application at a given time. Using `containerConcurrency` is only advised if the application needs to have an enforced constraint of concurrency.

Configuring concurrent requests using the target annotation

The default target for the number of concurrent requests is 100, but you can override this value by adding or modifying the autoscaling.knative.dev/target annotation value in the revision template.

Here is an example of how this annotation is used in the revision template to set the target to 50.

autoscaling.knative.dev/target: 50

Configuring concurrent requests using the containerConcurrency field

containerConcurrency sets a hard limit on the number of concurrent requests handled.

containerConcurrency: 0 | 1 | 2-N

0: allows unlimited concurrent requests.
1: guarantees that only one request is handled at a time by a given instance of the revision container.
2 or more: will limit request concurrency to that value.

Note	If there is no `target` annotation, autoscaling is configured as if `target` is equal to the value of `containerConcurrency`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

knative-serving-concurrent-autoscaling-requests.adoc

knative-serving-concurrent-autoscaling-requests.adoc

Configuring concurrent requests for Knative Serving autoscaling

Configuring concurrent requests using the target annotation

Configuring concurrent requests using the containerConcurrency field

Files

knative-serving-concurrent-autoscaling-requests.adoc

Latest commit

History

knative-serving-concurrent-autoscaling-requests.adoc

File metadata and controls

Configuring concurrent requests for Knative Serving autoscaling

Configuring concurrent requests using the target annotation

Configuring concurrent requests using the containerConcurrency field