Skip to content

Commit 91919bd

Browse files
authored
Merge pull request #46798 from fasaxc/patch-1
Add more suggestions for avoiding deadlocks to webhook docs
2 parents d79daa2 + 0c40ece commit 91919bd

File tree

1 file changed

+46
-11
lines changed

1 file changed

+46
-11
lines changed

content/en/docs/reference/access-authn-authz/extensible-admission-controllers.md

Lines changed: 46 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1247,17 +1247,52 @@ object.
12471247

12481248
### Avoiding deadlocks in self-hosted webhooks
12491249

1250-
A webhook running inside the cluster might cause deadlocks for its own deployment if it is configured
1251-
to intercept resources required to start its own pods.
1252-
1253-
For example, a mutating admission webhook is configured to admit `CREATE` pod requests only if a certain label is set in the
1254-
pod (e.g. `"env": "prod"`). The webhook server runs in a deployment which doesn't set the `"env"` label.
1255-
When a node that runs the webhook server pods
1256-
becomes unhealthy, the webhook deployment will try to reschedule the pods to another node. However the requests will
1257-
get rejected by the existing webhook server since the `"env"` label is unset, and the migration cannot happen.
1258-
1259-
It is recommended to exclude the namespace where your webhook is running with a
1260-
[namespaceSelector](#matching-requests-namespaceselector).
1250+
There are several ways that webhooks can cause deadlocks, where the cluster cannot make progress in
1251+
scheduling pods:
1252+
1253+
* A webhook running inside the cluster might cause deadlocks for its own deployment if it is configured
1254+
to intercept resources required to start its own pods.
1255+
1256+
For example, a mutating admission webhook is configured to admit **create** Pod requests only if a certain label is set in the
1257+
pod (such as `env: "prod"`). However, the webhook server runs as a Deployment that doesn't set the `env` label.
1258+
When a node that runs the webhook server pods
1259+
becomes unhealthy, the webhook deployment will try to reschedule the pods to another node. However the requests will
1260+
get rejected by the existing webhook server since the `env` label is unset, and the replacement Pod
1261+
cannot be created. Eventually, the entire Deployment for the webhook server may become unhealthy.
1262+
1263+
If you use admission webhooks to check Pods, consider excluding the namespace where your webhook
1264+
listener is running, by specifying a
1265+
[namespaceSelector](#matching-requests-namespaceselector).
1266+
1267+
* If the cluster has multiple webhooks configured (possibly from independent applications deployed on
1268+
the cluster), they can form a cycle. Webhook A must be called to process startup of webhook B's
1269+
pods and vice versa. If both webhook A and webhook B ever become unavailable at the same time (for
1270+
example, due to a cluster-wide outage or a node failure where both pods run on the same node)
1271+
deadlock occurs because neither webhook pod can be recreated without the other already running.
1272+
1273+
One way to prevent this is to exclude webhook A's pods from being acted on be webhook B. This
1274+
allows webhook A's pods to start, which in turn allows webhook B's pods to start. If you had a
1275+
third webhook, webhook C, you'd need to exclude both webhook A and webhook B's pods from
1276+
webhook C. This ensures that webhook A can _always_ start, which then allows webhook B's pods
1277+
to start, which in turn allows webhook C's pods to start.
1278+
1279+
If you want to ensure protection that avoids these risks, [ValidatingAdmissionPolicies](/docs/reference/access-authn-authz/validating-admission-policy/)
1280+
can
1281+
provide many protection capabilities without introducing dependency cycles.
1282+
1283+
* Admission webhooks can intercept resources used by critical cluster add-ons, such as CoreDNS,
1284+
network plugins, or storage plugins. These add-ons may be required to schedule or successfully run the
1285+
pods for a particular admission webhook on the cluster. This can cause a deadlock if both the
1286+
webhook and critical add-on is unavailable at the same time.
1287+
1288+
You may wish to exclude cluster infrastructure namespaces from webhooks, or make sure that
1289+
the webhook does not depend on the particular add-on that it acts on. For exmaple, running
1290+
a webhook as a host-networked pod ensures that it does not depend on a networking plugin.
1291+
1292+
If you want to ensure protection for a core add-on / or its namespace,
1293+
[ValidatingAdmissionPolicies](/docs/reference/access-authn-authz/validating-admission-policy/)
1294+
can
1295+
provide many protection capabilities without any dependency on worker nodes and Pods.
12611296

12621297
### Side effects
12631298

0 commit comments

Comments
 (0)