Admission control threat model. #2

jvanz · 2022-04-18T12:04:20Z

Adds a RFC to discuss the admission control threat model and its mitigations.

Fix kubewarden/kubewarden-controller#129

raulcabello

LGTM, thanks @jvanz ! I just left some minor comments

raulcabello · 2022-04-19T09:11:48Z

rfc/0006-threat-model.md

+timeouts as the  admission controller uses compute power to process the workloads
+
+**Mitigation**
+Webhook fail closed and authenticate callers. We are safe, Kubewarden already does that.


just to mention that someone could change failurePolicy to ignore. But by default is fail so I agree that we are safe here.

Same comment as in Threat 1.

Also in this case, I think the only reasonable way to mitigate this issue is to introduce allow each policy to run for at most X seconds. Once the timeout is reached the evaluation response will be a rejection accompanied by a certain error code and message.

Yes, users with permissions can change that. This is also mapped in threat #4. But I still believe we should keep failing closed. This is a good practice to avoid bad actors from deploying malicious apps during a DOS. Even if we add some timeout, given enough load, that may not be sufficient. Furthermore, as far as I can remember, users can change this behavior with helm values.

raulcabello · 2022-04-19T09:19:27Z

rfc/0006-threat-model.md

+
+**Mitigation**
+All rules are reviewed and tested.
+


pod-privileged-policy can also be used here

As @raulcabello mentions, if an explicit pod-privileged-policy is installed (validating only), then this cannot happen. Regardless of whatever the mutating policy did, the validating policy has to approve the request.

This is a harder situation for example if a user deploys a policy that does mutation and validation in the same policy (e.g. to save time reducing requests made by the api server). If a policy does many things, then it's up to the policy author to make sure that this kind of situation does not happen.

Still, it sounds like a gotcha that deserves a paragraph in the policy authors docs.

If our "PSP best practices" selection deploys the pod-privileged-policy, this is not going to be an issue. Even if the user provides a mutating policy that by design/mistake makes a container privileged.

Yes, the "PSP best practices" includes the pod-privileged-policy.

ereslibre

Thank you @jvanz!

ereslibre · 2022-04-19T14:15:54Z

rfc/0006-threat-model.md

+
+Webhook fails closed. In other words, if the webhook does not respond in time,
+for any reasion, API server should reject the request. We are safe on this.
+Kubewarden default behavior  already does that.


Given the policy-server service is reachable from within the cluster, it would be possible to attempt a denial of service by accessing the service from within the cluster, by requesting a big amount of requests.

If the policy is configured to fail open (failurePolicy: ignore), then policies will have no effects if the endpoint is not reachable. If the policy is configured to fail closed (failuePolicy: fail), then it can cause a denial of service if the webhooks are not reachable.

Network policies would mitigate this threat.

I think the only reasonable way to mitigate this issue is to introduce allow each policy to run for at most X seconds. Once the timeout is reached the evaluation response will be a rejection accompanied by a certain error code and message.

I think we should keep failing closed. This is a good way to avoid bad actors from deploying malicious apps during a DOS.

ereslibre · 2022-04-19T14:16:26Z

rfc/0006-threat-model.md

+timeouts as the  admission controller uses compute power to process the workloads
+
+**Mitigation**
+Webhook fail closed and authenticate callers. We are safe, Kubewarden already does that.


Same comment as in Threat 1.

ereslibre · 2022-04-19T14:24:37Z

rfc/0006-threat-model.md

+**Mitigation**
+N/A
+
+I cannot see how Kubewarden can help here.


It is also not very clear in the original description "via a number of mechanisms.".

ereslibre · 2022-04-19T14:26:58Z

rfc/0006-threat-model.md

+**To do**
+Kubewarden should implement mutual TLS authentication
+We can add in the recommended policies from the `kubewarden-defaults` Helm
+chart a policy to drop the `NET_RAW` capability.


NET_RAW containers will always exist though. CNI plugins are a typical example.

I suppose then drop NET_RAW capability besides in a rejectlist ns (control plane).

We can prevent the creation of containers that have this capability via this policy. Is that one of the policies we enable via our "PSP best practices" selection?

Yes, that's what I had in mind. Adds the capabilities-psp-policy in the recommended policies skipping the control plane namespaces already listed in the chart.

ereslibre · 2022-04-19T14:30:49Z

rfc/0006-threat-model.md

+
+**Mitigation**
+All rules are reviewed and tested.
+


As @raulcabello mentions, if an explicit pod-privileged-policy is installed (validating only), then this cannot happen. Regardless of whatever the mutating policy did, the validating policy has to approve the request.

This is a harder situation for example if a user deploys a policy that does mutation and validation in the same policy (e.g. to save time reducing requests made by the api server). If a policy does many things, then it's up to the policy author to make sure that this kind of situation does not happen.

ereslibre · 2022-04-19T14:32:44Z

rfc/0006-threat-model.md

+**To do**
+I think most of the RBAC is not responsability. But we can help our users if we:
+- Warning them in our docs and *suggest* some minimum RBAC to be used.
+- Provide a policy which detect RBAC changes and **maybe** block them. Is this possible?


Also, I think is a good practice to deploy ClusterAdmissionPolicy policies with a reject list by default. This ensures that these policies are enforced everywhere except for a well known list of namespaces where they are not enforced.

This makes Threat 11 harder to happen.

Agree here.
I wonder if we could provide that rejectlist in a configmap deployed via kubewarden-defaults, that all ClusterAdmissionPolicy can use. We could even configure the default policy-server to apply that rejectlist for all ClusterAdmissionPolicies that users create, leaving the control plane and Kubewarden undisturbed.

ereslibre · 2022-04-19T14:36:37Z

rfc/0006-threat-model.md

+version not cover by the policy.  We should warning policies developers to cover
+all the supported API version in theirs tests and reject all of others.


Yeah, this is tricky. Imagine a policy already built some time ago. Now the API server adds support for ephemeral containers, but this policy only checks init and regular containers: there's no way in which we could have warned the policy author when they wrote the policy some time ago: the Kubernetes API didn't have ephemeral containers.

I think the best we can do is document in our blog when this kind of API addition requires to review existing policies in common API's.

Maybe we can add metadata to policies, so authors are explicitly stating which bracket of API version they cover against. We as authors can review and bump as needed, but yep, it's a cat and mouse game.

I suppose we either cover all of K8s API for all supported versions with policies. Or we create a policy that rejects all k8s objects, besides a very specific API allowlist.

I think it's nice to allow our user to choose if they want to reject requests from unexpected API version or just accept the requests. As @viccuad said, this can be done using policies. Which is nice, because we do not need to change the core of the project and still give an option for our users to protect themselves from this.

I need to check, but if the context aware policies have read access to the Kuberwarden objects, they can read the mapped API version dynamically and rejecting requests of API version that are not listed there.

ereslibre · 2022-04-19T14:38:14Z

rfc/0006-threat-model.md

+- We may need to define all the API versions cover by the recommended policies
+- We can create a configuration to reject by default requests where the API version not cover by the policy.
+- We should warning policies developers to cover all the supported API version in theirs tests and reject all of others.


These don't apply here. Leftover?

We provide policies that can help with this and that can be deployed in the default policy-server namespace, where all policy server deployments are created.

I agree with what rafa wrote

Yes, copy and past issue. 🤦‍♂️

Furthermore, our policies cannot help on this. The threat said container deployed in the cluster node. Kubewarden does not have access to this information. Image a bad actors with access to the node running the controller, we cannot block them from running a privileged container in the node.

rfc/0006-threat-model.md

viccuad · 2022-04-20T15:32:26Z

rfc/0006-threat-model.md

+**To do**
+Kubewarden should implement mutual TLS authentication
+We can add in the recommended policies from the `kubewarden-defaults` Helm
+chart a policy to drop the `NET_RAW` capability.


I suppose then drop NET_RAW capability besides in a rejectlist ns (control plane).

viccuad · 2022-04-20T15:34:32Z

rfc/0006-threat-model.md

+
+**Mitigation**
+All rules are reviewed and tested.
+


Still, it sounds like a gotcha that deserves a paragraph in the policy authors docs.

viccuad · 2022-04-20T15:42:31Z

rfc/0006-threat-model.md

+**To do**
+I think most of the RBAC is not responsability. But we can help our users if we:
+- Warning them in our docs and *suggest* some minimum RBAC to be used.
+- Provide a policy which detect RBAC changes and **maybe** block them. Is this possible?


Agree here.
I wonder if we could provide that rejectlist in a configmap deployed via kubewarden-defaults, that all ClusterAdmissionPolicy can use. We could even configure the default policy-server to apply that rejectlist for all ClusterAdmissionPolicies that users create, leaving the control plane and Kubewarden undisturbed.

rfc/0006-threat-model.md

viccuad · 2022-04-20T15:53:30Z

rfc/0006-threat-model.md

+version not cover by the policy.  We should warning policies developers to cover
+all the supported API version in theirs tests and reject all of others.


Maybe we can add metadata to policies, so authors are explicitly stating which bracket of API version they cover against. We as authors can review and bump as needed, but yep, it's a cat and mouse game.

I suppose we either cover all of K8s API for all supported versions with policies. Or we create a policy that rejects all k8s objects, besides a very specific API allowlist.

viccuad

LGTM, thanks :). Left some comments on the ideas, which I suppose would go into default policies, yet for me the RFC is ok.

flavio · 2022-04-20T16:29:43Z

rfc/0006-threat-model.md

+
+Webhook fails closed. In other words, if the webhook does not respond in time,
+for any reasion, API server should reject the request. We are safe on this.
+Kubewarden default behavior  already does that.


I think the only reasonable way to mitigate this issue is to introduce allow each policy to run for at most X seconds. Once the timeout is reached the evaluation response will be a rejection accompanied by a certain error code and message.

flavio · 2022-04-20T16:30:03Z

rfc/0006-threat-model.md

+timeouts as the  admission controller uses compute power to process the workloads
+
+**Mitigation**
+Webhook fail closed and authenticate callers. We are safe, Kubewarden already does that.


Also in this case, I think the only reasonable way to mitigate this issue is to introduce allow each policy to run for at most X seconds. Once the timeout is reached the evaluation response will be a rejection accompanied by a certain error code and message.

flavio · 2022-04-20T16:32:29Z

rfc/0006-threat-model.md

+We may provide some mechanism to notify operators when a webhook is changed.
+Maybe the controller can watch its webhooks, fix when the configuration is
+changed and notify the operator.


We already have a quite a list of things to do, I would not give the impression we're going to do something, unless there's a clear responsibility on our side/anything concrete we can do.

In this case, I think there's nothing we can do. The mitigation proposed by the paper makes sense. Our users can also leverage kwctl run to keep their configuration tested and look out for these edge cases.

This comment from Flavio makes me want to have all entries in this document with "test configuration" be "test configuration with kwctl", just to be explicit.

Actually the To do section does not seem to make sense in the context of this threat. The controller cannot know which is the "right" configuration. Thus, I cannot notify changes in the cluster that it's not taking care. This seems similar to the mitigation of other threats, so it can be a confusion during the document edit.

flavio · 2022-04-20T16:36:25Z

rfc/0006-threat-model.md

+I think most of the RBAC is not Kubewarden responsibility. But we can help our users if we:
+- Warning them in our docs and *suggest* some minimum RBAC to be used.
+- Provide a policy which detect RBAC changes and **maybe** block them.


I think there are two things to take into account for this scenario.

The attacker changes one of our CR

An attacker could delete a ClusterAdmissionPolicy or an AdmissionPolicy object. There's nothing we can do in that case. We might want to highlight how important is to give the RBAC privileges to change these objects only to the users that are trusted.

The attacker changes a WebhookConfiguration object

Other than stating how important is to control via RBAC who can "touch" these Kubernetes objects... there's one question I have: is our controller able to identify the drift between what is described inside of a ClusterAdmissionPolicy/AdmissionPolicy and its related webhook configuration object? If our controller is able to notice that, we would be able to restore the policy.
This is ofc not enough, but would be helpful.

I think the controller can compare the yaml configuration of the current webhook and the yaml which it generates from the policy definition. If it's different, it can reapply it. Cannot it?

flavio · 2022-04-20T16:39:14Z

rfc/0006-threat-model.md

+**To do**
+Kubewarden should implement mutual TLS authentication
+We can add in the recommended policies from the `kubewarden-defaults` Helm
+chart a policy to drop the `NET_RAW` capability.


We can prevent the creation of containers that have this capability via this policy. Is that one of the policies we enable via our "PSP best practices" selection?

flavio · 2022-04-20T16:40:56Z

rfc/0006-threat-model.md

+
+**Mitigation**
+All rules are reviewed and tested.
+


If our "PSP best practices" selection deploys the pod-privileged-policy, this is not going to be an issue. Even if the user provides a mutating policy that by design/mistake makes a container privileged.

flavio · 2022-04-20T16:44:37Z

rfc/0006-threat-model.md

+- We may need to define all the API versions cover by the recommended policies
+- We can create a configuration to reject by default requests where the API version not cover by the policy.
+- We should warning policies developers to cover all the supported API version in theirs tests and reject all of others.


I agree with what rafa wrote

rfc/0006-threat-model.md

jvanz

@kubewarden/kubewarden-developers , JFYI: most of the mitigations came from the official k8s document.

jvanz · 2022-04-22T17:48:43Z

rfc/0006-threat-model.md

+
+Webhook fails closed. In other words, if the webhook does not respond in time,
+for any reasion, API server should reject the request. We are safe on this.
+Kubewarden default behavior  already does that.


I think we should keep failing closed. This is a good way to avoid bad actors from deploying malicious apps during a DOS.

jvanz · 2022-04-22T17:51:46Z

rfc/0006-threat-model.md

+timeouts as the  admission controller uses compute power to process the workloads
+
+**Mitigation**
+Webhook fail closed and authenticate callers. We are safe, Kubewarden already does that.


Yes, users with permissions can change that. This is also mapped in threat #4. But I still believe we should keep failing closed. This is a good practice to avoid bad actors from deploying malicious apps during a DOS. Even if we add some timeout, given enough load, that may not be sufficient. Furthermore, as far as I can remember, users can change this behavior with helm values.

jvanz · 2022-04-22T17:59:27Z

rfc/0006-threat-model.md

+**To do**
+Kubewarden should implement mutual TLS authentication
+We can add in the recommended policies from the `kubewarden-defaults` Helm
+chart a policy to drop the `NET_RAW` capability.


Yes, that's what I had in mind. Adds the capabilities-psp-policy in the recommended policies skipping the control plane namespaces already listed in the chart.

jvanz · 2022-04-22T18:01:23Z

rfc/0006-threat-model.md

+
+**Mitigation**
+All rules are reviewed and tested.
+


Yes, the "PSP best practices" includes the pod-privileged-policy.

jvanz · 2022-04-22T18:09:05Z

rfc/0006-threat-model.md

+version not cover by the policy.  We should warning policies developers to cover
+all the supported API version in theirs tests and reject all of others.


I think it's nice to allow our user to choose if they want to reject requests from unexpected API version or just accept the requests. As @viccuad said, this can be done using policies. Which is nice, because we do not need to change the core of the project and still give an option for our users to protect themselves from this.

I need to check, but if the context aware policies have read access to the Kuberwarden objects, they can read the mapped API version dynamically and rejecting requests of API version that are not listed there.

rfc/0006-threat-model.md

jvanz · 2022-04-22T18:18:17Z

rfc/0006-threat-model.md

+- We may need to define all the API versions cover by the recommended policies
+- We can create a configuration to reject by default requests where the API version not cover by the policy.
+- We should warning policies developers to cover all the supported API version in theirs tests and reject all of others.


Yes, copy and past issue. 🤦‍♂️

jvanz · 2022-05-17T14:47:10Z

@kubewarden/kubewarden-developers, I've reviewed the conversations in this PR and I come up with some actionable items from all the team. Thus, we can discuss the items, change or even drop some of them. They are:

Document the importance of properly RBAC configured privileges. docs#115
Document the importance of properly RBAC configured privileges. Only the right users should be allowed to manipulate CRD objects. docs#116
Adds a documentation in the policy author section warning about the security threat of the same policy performing mutation and validation. docs#117
Add the capabilities-psp-policy in the recommended policies list from kubewaraden-default chart helm-charts#90
Add the hostpaths-psp-policy in the recommended policies list from kubewaraden-default chart. helm-charts#91
Add timeout to policy evaluation policy-server#254
Research if controller is able to detect and fix changes between the webhook object and its related ClusterAdmissionPolicy/AdmissionPolicy configuration. kubewarden-controller#224
Implement mutual TLS authentication kubewarden-controller#225
Global reject list for ClusterAdmissionPolicy. To allow operators define the namespaces where the policies should ignore. Or document a best practice to add this reject list in the policy definition. kubewarden-controller#226
Research a solution to allow operator rejects requests using not allowed Kubernetes API versions. kubewarden-controller#227
Provide Software Bill Of Materials kubewarden-controller#228
Sign Helm charts helm-charts#92

I believe we should do items 1 to 5 now. They are easy to do. All the other should be reviewed with more care.

Please, let me know if I forgot something.

viccuad · 2022-05-17T15:52:42Z

That list of action items sounds fine to me. We could create cards already for each of them and discuss them in the planning.

The PR is also complete from my point of view. Nice job!

raulcabello

it looks great! Thanks @jvanz

ereslibre

Thanks for this @jvanz.

Although I have some doubts on how to approach some specific issues, I think the split looks great and actionable.

Adds a RFC to discuss the admission control threat model and its mitigations.

jvanz requested a review from a team April 18, 2022 12:04

jvanz self-assigned this Apr 18, 2022

jvanz force-pushed the rfc-threat-model branch from f66f88b to 5fb6e34 Compare April 18, 2022 12:05

jvanz mentioned this pull request Apr 18, 2022

Re-check security posture against the k8s admission control threat model kubewarden/kubewarden-controller#129

Closed

raulcabello approved these changes Apr 19, 2022

View reviewed changes

ereslibre reviewed Apr 19, 2022

View reviewed changes

viccuad reviewed Apr 20, 2022

View reviewed changes

viccuad approved these changes Apr 20, 2022

View reviewed changes

flavio reviewed Apr 20, 2022

View reviewed changes

viccuad reviewed Apr 21, 2022

View reviewed changes

rfc/0006-threat-model.md Show resolved Hide resolved

jvanz commented Apr 22, 2022

View reviewed changes

jvanz force-pushed the rfc-threat-model branch 2 times, most recently from e1f9db2 to aa979ec Compare April 22, 2022 18:42

jvanz requested review from ereslibre and flavio April 22, 2022 18:45

jvanz requested review from a team, raulcabello and viccuad May 17, 2022 14:47

jvanz force-pushed the rfc-threat-model branch from aa979ec to d726248 Compare May 17, 2022 14:51

viccuad approved these changes May 17, 2022

View reviewed changes

raulcabello approved these changes May 18, 2022

View reviewed changes

ereslibre approved these changes May 23, 2022

View reviewed changes

Admission control threat model.

d87391c

Adds a RFC to discuss the admission control threat model and its mitigations.

jvanz force-pushed the rfc-threat-model branch 2 times, most recently from 7c772a5 to d87391c Compare May 24, 2022 12:18

jvanz merged commit 465203e into main May 24, 2022

jvanz deleted the rfc-threat-model branch May 24, 2022 12:19

jvanz mentioned this pull request May 24, 2022

Research a solution to allow operator rejects requests using not allowed Kubernetes API versions. kubewarden/kubewarden-controller#227

Open

		version not cover by the policy. We should warning policies developers to cover
		all the supported API version in theirs tests and reject all of others.

Admission control threat model. #2

Admission control threat model. #2

Conversation

jvanz commented Apr 18, 2022 • edited Loading

raulcabello left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ereslibre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jvanz Apr 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viccuad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

The attacker changes one of our CR

The attacker changes a WebhookConfiguration object

jvanz Apr 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jvanz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jvanz Apr 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jvanz commented May 17, 2022 • edited Loading

viccuad commented May 17, 2022

raulcabello left a comment

Choose a reason for hiding this comment

ereslibre left a comment

Choose a reason for hiding this comment

jvanz commented Apr 18, 2022 •

edited

Loading

jvanz Apr 22, 2022 •

edited

Loading

jvanz Apr 22, 2022 •

edited

Loading

jvanz Apr 22, 2022 •

edited

Loading

jvanz commented May 17, 2022 •

edited

Loading