-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Admission control threat model. #2
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @jvanz ! I just left some minor comments
timeouts as the admission controller uses compute power to process the workloads | ||
|
||
**Mitigation** | ||
Webhook fail closed and authenticate callers. We are safe, Kubewarden already does that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to mention that someone could change failurePolicy
to ignore
. But by default is fail
so I agree that we are safe here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in Threat 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in this case, I think the only reasonable way to mitigate this issue is to introduce allow each policy to run for at most X seconds. Once the timeout is reached the evaluation response will be a rejection accompanied by a certain error code and message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, users with permissions can change that. This is also mapped in threat #4. But I still believe we should keep failing closed. This is a good practice to avoid bad actors from deploying malicious apps during a DOS. Even if we add some timeout, given enough load, that may not be sufficient. Furthermore, as far as I can remember, users can change this behavior with helm values.
|
||
**Mitigation** | ||
All rules are reviewed and tested. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pod-privileged-policy can also be used here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @raulcabello mentions, if an explicit pod-privileged-policy is installed (validating only), then this cannot happen. Regardless of whatever the mutating policy did, the validating policy has to approve the request.
This is a harder situation for example if a user deploys a policy that does mutation and validation in the same policy (e.g. to save time reducing requests made by the api server). If a policy does many things, then it's up to the policy author to make sure that this kind of situation does not happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, it sounds like a gotcha that deserves a paragraph in the policy authors docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If our "PSP best practices" selection deploys the pod-privileged-policy, this is not going to be an issue. Even if the user provides a mutating policy that by design/mistake makes a container privileged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the "PSP best practices" includes the pod-privileged-policy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jvanz!
|
||
Webhook fails closed. In other words, if the webhook does not respond in time, | ||
for any reasion, API server should reject the request. We are safe on this. | ||
Kubewarden default behavior already does that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the policy-server service is reachable from within the cluster, it would be possible to attempt a denial of service by accessing the service from within the cluster, by requesting a big amount of requests.
If the policy is configured to fail open (failurePolicy: ignore
), then policies will have no effects if the endpoint is not reachable. If the policy is configured to fail closed (failuePolicy: fail
), then it can cause a denial of service if the webhooks are not reachable.
Network policies would mitigate this threat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only reasonable way to mitigate this issue is to introduce allow each policy to run for at most X seconds. Once the timeout is reached the evaluation response will be a rejection accompanied by a certain error code and message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep failing closed. This is a good way to avoid bad actors from deploying malicious apps during a DOS.
timeouts as the admission controller uses compute power to process the workloads | ||
|
||
**Mitigation** | ||
Webhook fail closed and authenticate callers. We are safe, Kubewarden already does that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in Threat 1.
**Mitigation** | ||
N/A | ||
|
||
I cannot see how Kubewarden can help here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also not very clear in the original description "via a number of mechanisms.".
**To do** | ||
Kubewarden should implement mutual TLS authentication | ||
We can add in the recommended policies from the `kubewarden-defaults` Helm | ||
chart a policy to drop the `NET_RAW` capability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NET_RAW
containers will always exist though. CNI plugins are a typical example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose then drop NET_RAW
capability besides in a rejectlist ns (control plane).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can prevent the creation of containers that have this capability via this policy. Is that one of the policies we enable via our "PSP best practices" selection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I had in mind. Adds the capabilities-psp-policy in the recommended policies skipping the control plane namespaces already listed in the chart.
|
||
**Mitigation** | ||
All rules are reviewed and tested. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @raulcabello mentions, if an explicit pod-privileged-policy is installed (validating only), then this cannot happen. Regardless of whatever the mutating policy did, the validating policy has to approve the request.
This is a harder situation for example if a user deploys a policy that does mutation and validation in the same policy (e.g. to save time reducing requests made by the api server). If a policy does many things, then it's up to the policy author to make sure that this kind of situation does not happen.
**To do** | ||
I think most of the RBAC is not responsability. But we can help our users if we: | ||
- Warning them in our docs and *suggest* some minimum RBAC to be used. | ||
- Provide a policy which detect RBAC changes and **maybe** block them. Is this possible? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think is a good practice to deploy ClusterAdmissionPolicy
policies with a reject list by default. This ensures that these policies are enforced everywhere except for a well known list of namespaces where they are not enforced.
This makes Threat 11 harder to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree here.
I wonder if we could provide that rejectlist in a configmap deployed via kubewarden-defaults
, that all ClusterAdmissionPolicy can use. We could even configure the default policy-server to apply that rejectlist for all ClusterAdmissionPolicies that users create, leaving the control plane and Kubewarden undisturbed.
version not cover by the policy. We should warning policies developers to cover | ||
all the supported API version in theirs tests and reject all of others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is tricky. Imagine a policy already built some time ago. Now the API server adds support for ephemeral containers, but this policy only checks init and regular containers: there's no way in which we could have warned the policy author when they wrote the policy some time ago: the Kubernetes API didn't have ephemeral containers.
I think the best we can do is document in our blog when this kind of API addition requires to review existing policies in common API's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add metadata to policies, so authors are explicitly stating which bracket of API version they cover against. We as authors can review and bump as needed, but yep, it's a cat and mouse game.
I suppose we either cover all of K8s API for all supported versions with policies. Or we create a policy that rejects all k8s objects, besides a very specific API allowlist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's nice to allow our user to choose if they want to reject requests from unexpected API version or just accept the requests. As @viccuad said, this can be done using policies. Which is nice, because we do not need to change the core of the project and still give an option for our users to protect themselves from this.
I need to check, but if the context aware policies have read access to the Kuberwarden objects, they can read the mapped API version dynamically and rejecting requests of API version that are not listed there.
rfc/0006-threat-model.md
Outdated
- We may need to define all the API versions cover by the recommended policies | ||
- We can create a configuration to reject by default requests where the API version not cover by the policy. | ||
- We should warning policies developers to cover all the supported API version in theirs tests and reject all of others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These don't apply here. Leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We provide policies that can help with this and that can be deployed in the default policy-server
namespace, where all policy server deployments are created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with what rafa wrote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, copy and past issue. 🤦♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, our policies cannot help on this. The threat said container deployed in the cluster node. Kubewarden does not have access to this information. Image a bad actors with access to the node running the controller, we cannot block them from running a privileged container in the node.
**To do** | ||
Kubewarden should implement mutual TLS authentication | ||
We can add in the recommended policies from the `kubewarden-defaults` Helm | ||
chart a policy to drop the `NET_RAW` capability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose then drop NET_RAW
capability besides in a rejectlist ns (control plane).
|
||
**Mitigation** | ||
All rules are reviewed and tested. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, it sounds like a gotcha that deserves a paragraph in the policy authors docs.
**To do** | ||
I think most of the RBAC is not responsability. But we can help our users if we: | ||
- Warning them in our docs and *suggest* some minimum RBAC to be used. | ||
- Provide a policy which detect RBAC changes and **maybe** block them. Is this possible? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree here.
I wonder if we could provide that rejectlist in a configmap deployed via kubewarden-defaults
, that all ClusterAdmissionPolicy can use. We could even configure the default policy-server to apply that rejectlist for all ClusterAdmissionPolicies that users create, leaving the control plane and Kubewarden undisturbed.
version not cover by the policy. We should warning policies developers to cover | ||
all the supported API version in theirs tests and reject all of others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add metadata to policies, so authors are explicitly stating which bracket of API version they cover against. We as authors can review and bump as needed, but yep, it's a cat and mouse game.
I suppose we either cover all of K8s API for all supported versions with policies. Or we create a policy that rejects all k8s objects, besides a very specific API allowlist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks :). Left some comments on the ideas, which I suppose would go into default policies, yet for me the RFC is ok.
|
||
Webhook fails closed. In other words, if the webhook does not respond in time, | ||
for any reasion, API server should reject the request. We are safe on this. | ||
Kubewarden default behavior already does that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only reasonable way to mitigate this issue is to introduce allow each policy to run for at most X seconds. Once the timeout is reached the evaluation response will be a rejection accompanied by a certain error code and message.
timeouts as the admission controller uses compute power to process the workloads | ||
|
||
**Mitigation** | ||
Webhook fail closed and authenticate callers. We are safe, Kubewarden already does that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in this case, I think the only reasonable way to mitigate this issue is to introduce allow each policy to run for at most X seconds. Once the timeout is reached the evaluation response will be a rejection accompanied by a certain error code and message.
rfc/0006-threat-model.md
Outdated
We may provide some mechanism to notify operators when a webhook is changed. | ||
Maybe the controller can watch its webhooks, fix when the configuration is | ||
changed and notify the operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a quite a list of things to do, I would not give the impression we're going to do something, unless there's a clear responsibility on our side/anything concrete we can do.
In this case, I think there's nothing we can do. The mitigation proposed by the paper makes sense. Our users can also leverage kwctl run
to keep their configuration tested and look out for these edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment from Flavio makes me want to have all entries in this document with "test configuration" be "test configuration with kwctl", just to be explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the To do
section does not seem to make sense in the context of this threat. The controller cannot know which is the "right" configuration. Thus, I cannot notify changes in the cluster that it's not taking care. This seems similar to the mitigation of other threats, so it can be a confusion during the document edit.
I think most of the RBAC is not Kubewarden responsibility. But we can help our users if we: | ||
- Warning them in our docs and *suggest* some minimum RBAC to be used. | ||
- Provide a policy which detect RBAC changes and **maybe** block them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are two things to take into account for this scenario.
The attacker changes one of our CR
An attacker could delete a ClusterAdmissionPolicy
or an AdmissionPolicy
object. There's nothing we can do in that case. We might want to highlight how important is to give the RBAC privileges to change these objects only to the users that are trusted.
The attacker changes a WebhookConfiguration object
Other than stating how important is to control via RBAC who can "touch" these Kubernetes objects... there's one question I have: is our controller able to identify the drift between what is described inside of a ClusterAdmissionPolicy
/AdmissionPolicy
and its related webhook configuration object? If our controller is able to notice that, we would be able to restore the policy.
This is ofc not enough, but would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the controller can compare the yaml configuration of the current webhook and the yaml which it generates from the policy definition. If it's different, it can reapply it. Cannot it?
**To do** | ||
Kubewarden should implement mutual TLS authentication | ||
We can add in the recommended policies from the `kubewarden-defaults` Helm | ||
chart a policy to drop the `NET_RAW` capability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can prevent the creation of containers that have this capability via this policy. Is that one of the policies we enable via our "PSP best practices" selection?
|
||
**Mitigation** | ||
All rules are reviewed and tested. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If our "PSP best practices" selection deploys the pod-privileged-policy, this is not going to be an issue. Even if the user provides a mutating policy that by design/mistake makes a container privileged.
rfc/0006-threat-model.md
Outdated
- We may need to define all the API versions cover by the recommended policies | ||
- We can create a configuration to reject by default requests where the API version not cover by the policy. | ||
- We should warning policies developers to cover all the supported API version in theirs tests and reject all of others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with what rafa wrote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kubewarden/kubewarden-developers , JFYI: most of the mitigations came from the official k8s document.
|
||
Webhook fails closed. In other words, if the webhook does not respond in time, | ||
for any reasion, API server should reject the request. We are safe on this. | ||
Kubewarden default behavior already does that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep failing closed. This is a good way to avoid bad actors from deploying malicious apps during a DOS.
timeouts as the admission controller uses compute power to process the workloads | ||
|
||
**Mitigation** | ||
Webhook fail closed and authenticate callers. We are safe, Kubewarden already does that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, users with permissions can change that. This is also mapped in threat #4. But I still believe we should keep failing closed. This is a good practice to avoid bad actors from deploying malicious apps during a DOS. Even if we add some timeout, given enough load, that may not be sufficient. Furthermore, as far as I can remember, users can change this behavior with helm values.
**To do** | ||
Kubewarden should implement mutual TLS authentication | ||
We can add in the recommended policies from the `kubewarden-defaults` Helm | ||
chart a policy to drop the `NET_RAW` capability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I had in mind. Adds the capabilities-psp-policy in the recommended policies skipping the control plane namespaces already listed in the chart.
|
||
**Mitigation** | ||
All rules are reviewed and tested. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the "PSP best practices" includes the pod-privileged-policy
.
version not cover by the policy. We should warning policies developers to cover | ||
all the supported API version in theirs tests and reject all of others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's nice to allow our user to choose if they want to reject requests from unexpected API version or just accept the requests. As @viccuad said, this can be done using policies. Which is nice, because we do not need to change the core of the project and still give an option for our users to protect themselves from this.
I need to check, but if the context aware policies have read access to the Kuberwarden objects, they can read the mapped API version dynamically and rejecting requests of API version that are not listed there.
rfc/0006-threat-model.md
Outdated
- We may need to define all the API versions cover by the recommended policies | ||
- We can create a configuration to reject by default requests where the API version not cover by the policy. | ||
- We should warning policies developers to cover all the supported API version in theirs tests and reject all of others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, copy and past issue. 🤦♂️
e1f9db2
to
aa979ec
Compare
@kubewarden/kubewarden-developers, I've reviewed the conversations in this PR and I come up with some actionable items from all the team. Thus, we can discuss the items, change or even drop some of them. They are:
I believe we should do items 1 to 5 now. They are easy to do. All the other should be reviewed with more care. Please, let me know if I forgot something. |
That list of action items sounds fine to me. We could create cards already for each of them and discuss them in the planning. The PR is also complete from my point of view. Nice job! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks great! Thanks @jvanz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @jvanz.
Although I have some doubts on how to approach some specific issues, I think the split looks great and actionable.
Adds a RFC to discuss the admission control threat model and its mitigations.
7c772a5
to
d87391c
Compare
Adds a RFC to discuss the admission control threat model and its mitigations.
Fix kubewarden/kubewarden-controller#129