Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArangoDB Operator fail with unprivileged PodSecurityPolicy #808

Closed
ognjen-it opened this issue Oct 5, 2021 · 18 comments
Closed

ArangoDB Operator fail with unprivileged PodSecurityPolicy #808

ognjen-it opened this issue Oct 5, 2021 · 18 comments

Comments

@ognjen-it
Copy link

Hello all,

I have a problem when I try to install the ArangoDB operator on Kubernetes with an unprivileged policy.
The error looks like:
Warning FailedCreate 5m35s (x18 over 16m) replicaset-controller Error creating: pods "arango-ts-operator-7c8cf4cf7d-" is forbidden: PodSecurityPolicy: unable to admit pod: []

The policy is:

---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: restricted
  labels:
    kubernetes.io/cluster-service: "true"
    eks.amazonaws.com/component: pod-security-policy
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
    # apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
    seccomp.security.alpha.kubernetes.io/defaultProfileName:  'runtime/default'
    # apparmor.security.beta.kubernetes.io/defaultProfileName:  'runtime/default'
spec:
  privileged: false
  # Required to prevent escalations to root.
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
    - 'KILL'
    - 'MKNOD'
    - 'SETUID'
    - 'SETGID'
  # Allow core volume types.
  volumes:
    - configMap
    - downwardAPI
    - emptyDir
    - persistentVolumeClaim
    - projected
    - secret
  hostNetwork: false
  hostIPC: false
  hostPID: false
  runAsUser:
    # Require the container to run without root privileges.
    rule: 'MustRunAsNonRoot'
  seLinux:
    # This policy assumes the nodes are using AppArmor rather than SELinux.
    rule: 'RunAsAny'
  supplementalGroups:
    rule: 'MustRunAs'
    ranges:
      # Forbid adding the root group.
      - min: 1
        max: 65535
  fsGroup:
    rule: 'MustRunAs'
    ranges:
      # Forbid adding the root group.
      - min: 1
        max: 65535
  readOnlyRootFilesystem: false

CRD:
helm -n arangodb-operator install arangodb-operator-crd https://github.com/arangodb/kube-arangodb/releases/download/1.2.3/kube-arangodb-crd-1.2.3.tgz
Operator:
helm -n arangodb-operator install ts https://github.com/arangodb/kube-arangodb/releases/download/1.2.3/kube-arangodb-1.2.3.tgz

Does anyone know how to resolve that? Or is it possible to resolve that?
#677 This problem may be related

@ajanikow
Copy link
Collaborator

ajanikow commented Oct 6, 2021

Hello!

ArangoMember Pods are prepared to be configured to work in a fully protected environment, but I see that something is missing on the Operator level.

Can you say what is missing in the Deployment template? The operator is not using anything on FileSystem, so change can be propagated easily.

Best Regards,
Adam.

@ognjen-it
Copy link
Author

Hello Adam,

thank you for your message!
I hope you're doing well! :)

I cannot find how to resolve the problem but I can share with you all the information that I can provide you.

Describe of the arangodb-operator deployment is:

Name:               arango-ts-operator
Namespace:          arangodb-operator
CreationTimestamp:  Wed, 06 Oct 2021 18:14:01 +0200
Labels:             app.kubernetes.io/instance=ts
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=kube-arangodb
                    helm.sh/chart=kube-arangodb-1.2.3
                    release=ts
Annotations:        deployment.kubernetes.io/revision: 1
                    meta.helm.sh/release-name: ts
                    meta.helm.sh/release-namespace: arangodb-operator
Selector:           app.kubernetes.io/instance=ts,app.kubernetes.io/managed-by=Helm,app.kubernetes.io/name=kube-arangodb,release=ts
Replicas:           2 desired | 0 updated | 0 total | 0 available | 2 unavailable
StrategyType:       Recreate
MinReadySeconds:    0
Pod Template:
  Labels:           app.kubernetes.io/instance=ts
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=kube-arangodb
                    helm.sh/chart=kube-arangodb-1.2.3
                    release=ts
  Service Account:  arango-ts-operator
  Containers:
   operator:
    Image:      arangodb/kube-arangodb:1.2.3
    Port:       8528/TCP
    Host Port:  0/TCP
    Args:
      --scope=legacy
      --operator.deployment
      --operator.deployment-replication
      --chaos.allowed=false
    Limits:
      cpu:     1
      memory:  256Mi
    Requests:
      cpu:      250m
      memory:   256Mi
    Liveness:   http-get https://:8528/health delay=5s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get https://:8528/ready delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      MY_POD_NAMESPACE:                (v1:metadata.namespace)
      MY_POD_NAME:                     (v1:metadata.name)
      MY_POD_IP:                       (v1:status.podIP)
      RELATED_IMAGE_UBI:              alpine:3.11
      RELATED_IMAGE_METRICSEXPORTER:  arangodb/arangodb-exporter:0.1.7
      RELATED_IMAGE_DATABASE:         arangodb/arangodb:latest
    Mounts:                           <none>
  Volumes:                            <none>
Conditions:
  Type             Status  Reason
  ----             ------  ------
  Progressing      True    NewReplicaSetCreated
  Available        False   MinimumReplicasUnavailable
  ReplicaFailure   True    FailedCreate
OldReplicaSets:    <none>
NewReplicaSet:     arango-ts-operator-7c8cf4cf7d (0/2 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  34s   deployment-controller  Scaled up replica set arango-ts-operator-7c8cf4cf7d to 2

Also, describe of the ReplicaSet is:

Name:           arango-ts-operator-7c8cf4cf7d
Namespace:      arangodb-operator
Selector:       app.kubernetes.io/instance=ts,app.kubernetes.io/managed-by=Helm,app.kubernetes.io/name=kube-arangodb,pod-template-hash=7c8cf4cf7d,release=ts
Labels:         app.kubernetes.io/instance=ts
                app.kubernetes.io/managed-by=Helm
                app.kubernetes.io/name=kube-arangodb
                helm.sh/chart=kube-arangodb-1.2.3
                pod-template-hash=7c8cf4cf7d
                release=ts
Annotations:    deployment.kubernetes.io/desired-replicas: 2
                deployment.kubernetes.io/max-replicas: 2
                deployment.kubernetes.io/revision: 1
                meta.helm.sh/release-name: ts
                meta.helm.sh/release-namespace: arangodb-operator
Controlled By:  Deployment/arango-ts-operator
Replicas:       0 current / 2 desired
Pods Status:    0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app.kubernetes.io/instance=ts
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=kube-arangodb
                    helm.sh/chart=kube-arangodb-1.2.3
                    pod-template-hash=7c8cf4cf7d
                    release=ts
  Service Account:  arango-ts-operator
  Containers:
   operator:
    Image:      arangodb/kube-arangodb:1.2.3
    Port:       8528/TCP
    Host Port:  0/TCP
    Args:
      --scope=legacy
      --operator.deployment
      --operator.deployment-replication
      --chaos.allowed=false
    Limits:
      cpu:     1
      memory:  256Mi
    Requests:
      cpu:      250m
      memory:   256Mi
    Liveness:   http-get https://:8528/health delay=5s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get https://:8528/ready delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      MY_POD_NAMESPACE:                (v1:metadata.namespace)
      MY_POD_NAME:                     (v1:metadata.name)
      MY_POD_IP:                       (v1:status.podIP)
      RELATED_IMAGE_UBI:              alpine:3.11
      RELATED_IMAGE_METRICSEXPORTER:  arangodb/arangodb-exporter:0.1.7
      RELATED_IMAGE_DATABASE:         arangodb/arangodb:latest
    Mounts:                           <none>
  Volumes:                            <none>
Conditions:
  Type             Status  Reason
  ----             ------  ------
  ReplicaFailure   True    FailedCreate
Events:
  Type     Reason        Age                   From                   Message
  ----     ------        ----                  ----                   -------
  Warning  FailedCreate  60s (x15 over 2m22s)  replicaset-controller  Error creating: pods "arango-ts-operator-7c8cf4cf7d-" is forbidden: PodSecurityPolicy: unable to admit pod: []

@informalict
Copy link
Contributor

Hi @ognjen-it

Can you show me your ClusterRole and RoleBinding which you created for the PodSecurityPolicy purposes?

@ognjen-it
Copy link
Author

Hi @informalict

Thank you for your response!
First of all, I delete privileged PodSecurityPolicy (PSP) and then create unprivileged PSP. After that, I create Role and RoleBinding witch is apply the unprivileged policy to all authenticated users.

Role and RoleBing:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ognjen-allow-role
  namespace: ognjen-restrict
  labels:
    kubernetes.io/cluster-service: "true"
    eks.amazonaws.com/component: pod-security-policy
rules:
- apiGroups: ['policy']
  resources: ['podsecuritypolicies']
  verbs:     ['use']
  resourceNames:
  - restricted
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: ognjen-allow-rolebinding
  namespace: ognjen-restrict
  labels:
    kubernetes.io/cluster-service: "true"
    eks.amazonaws.com/component: pod-security-policy
subjects:
  - kind: Group
    apiGroup: rbac.authorization.k8s.io
    name: system:authenticated
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: ognjen-allow-role

@informalict
Copy link
Contributor

informalict commented Oct 13, 2021

@ognjen-it

I was able to reproduce the same issue with your settings.

I can see that namespace for the Role and RoleBinding is ognjen-restrict and deployment was launched in the namespace arangodb-operator. It must be in the same namespace.

When I changed Role and RoleBinding namespace to arangodb-operator then it works.

@ognjen-it
Copy link
Author

ognjen-it commented Oct 13, 2021

@informalict

Great, the problem is resolved. It was my mistake.

However, now I'm getting a new error when I try to deploy the cluster. The error is related to PSP.

Error: container has runAsNonRoot and image will run as root (pod: "example-arangodb-cluster-id-f8307a_arangodb-operator(2cc0d47e-1f4c-4abe-b487-ed82492489c7)", container: server)

Also, here are some commands that I did to investigate what happened.

 ~/Desktop/test-restrict: k -n arangodb-operator  get pods                                                                                                                    
NAME                                  READY   STATUS                       RESTARTS   AGE
arango-ts-operator-7c8cf4cf7d-2qsrb   1/1     Running                      0          10m
arango-ts-operator-7c8cf4cf7d-thxzw   1/1     Running                      0          10m
example-arangodb-cluster-id-f8307a    0/1     CreateContainerConfigError   0          8m17s

 ~/Desktop/test-restrict: k -n arangodb-operator logs example-arangodb-cluster-id-f8307a                                                                                      
Error from server (BadRequest): container "server" in pod "example-arangodb-cluster-id-f8307a" is waiting to start: CreateContainerConfigError

 ~/Desktop/test-restrict: k -n arangodb-operator get events --sort-by='.metadata.creationTimestamp' |grep pod/example-arangodb-cluster-id-f8307a  
19m         Normal    Scheduled               pod/example-arangodb-cluster-id-f8307a                         Successfully assigned arangodb-operator/example-arangodb-cluster-id-f8307a to ip-10-0-4-21.eu-central-1.compute.internal
19m         Normal    Pulling                 pod/example-arangodb-cluster-id-f8307a                         Pulling image "arangodb/arangodb:3.7.15"
19m         Normal    Pulled                  pod/example-arangodb-cluster-id-f8307a                         Successfully pulled image "arangodb/arangodb:3.7.15" in 10.554007891s
16m         Warning   Failed                  pod/example-arangodb-cluster-id-f8307a                         Error: container has runAsNonRoot and image will run as root (pod: "example-arangodb-cluster-id-f8307a_arangodb-operator(2cc0d47e-1f4c-4abe-b487-ed82492489c7)", container: server)
4m8s        Normal    Pulled                  pod/example-arangodb-cluster-id-f8307a                         Container image "arangodb/arangodb:3.7.15" already present on machine

My deployment script is:

apiVersion: "database.arangodb.com/v1alpha"
kind: "ArangoDeployment"
metadata:
  name: "example-arangodb-cluster"
  namespace: arangodb-operator
spec:
  mode: Cluster
  environment: Production
  image: "arangodb/arangodb:3.7.15"
  imagePullPolicy: IfNotPresent
  metrics:
    mode: sidecar
    enabled: true
    image: 'arangodb/arangodb-exporter:0.1.8'
    tls: false
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '9101'
    prometheus.io/scrape_interval: '5s'
  agents:
    count: 3
    args:
      - --log.level=info
    resources:
      requests:
        storage: 4Gi
    storageClassName: gp2
  dbservers:
    count: 3
    args:
      - --log.level=info
    resources:
      requests:
        storage: 8Gi
    storageClassName: gp2
  coordinators:
    count: 3
    args:
      - --log.level=info

Could you please reproduce the same issue on your cluster? Actually, check if the error is the same?

@informalict
Copy link
Contributor

@ognjen-it please try the following excerpt:

apiVersion: "database.arangodb.com/v1alpha"
kind: "ArangoDeployment"
metadata:
  name: "example-arangodb-cluster"
  namespace: arangodb-operator
spec:
  mode: Cluster
  environment: Production
  id:
    securityContext:
      runAsUser: 1000
  image: "arangodb/arangodb:3.7.15"
  imagePullPolicy: IfNotPresent
  metrics:
    mode: sidecar
    enabled: true
    image: 'arangodb/arangodb-exporter:0.1.8'
    tls: false
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '9101'
    prometheus.io/scrape_interval: '5s'
  agents:
    count: 3
    args:
      - --log.level=info
    resources:
      requests:
        storage: 4Gi
    securityContext:
      runAsUser: 1000
    storageClassName: gp2
  dbservers:
    count: 3
    args:
      - --log.level=info
    resources:
      requests:
        storage: 8Gi
    securityContext:
      runAsUser: 1000
    storageClassName: gp2
  coordinators:
    count: 3
    args:
      - --log.level=info
    securityContext:
      runAsUser: 1000

I have added:

    securityContext:
      runAsUser: 1000

to run all containers as non-root. You can change the user 1000 to something else if it is necessary.

@ognjen-it
Copy link
Author

Thank you @informalict , I checked it and it works in my local env.
Let the issue stay open until I check it in a real environment. In my local ENV SELinux is not enforced (in PSP), so it could be additional problem.

@vbasem
Copy link

vbasem commented Oct 15, 2021

sadly fsGroup is the only parameter that is not considered in the securityContext, and in my environment I endup with permission denied to the PVC that has restriction on root fsGroup

@informalict
Copy link
Contributor

Hi @vbasem
can you the following security context:

securityContext:
      runAsUser: 1000
      fsGroup: 2000

@ognjen-it
Copy link
Author

@informalict First of all, BIG thank you for your support!

Regarding deployment of the cluster it's ok because the securityContext can be set, but what about the operator? How to change the fsGroup for it?

I didn't find how to change it from the Values.yaml
https://github.com/arangodb/kube-arangodb/blob/master/chart/kube-arangodb/values.yaml

 ~/Desktop/test-restrict: helm -n arangodb-operator install ts https://github.com/arangodb/kube-arangodb/releases/download/1.2.3/kube-arangodb-1.2.3.tgz
NAME: ts
LAST DEPLOYED: Fri Oct 15 17:28:41 2021
NAMESPACE: arangodb-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
You have installed Kubernetes ArangoDB Operator in version 1.2.3

To access ArangoDeployments you can use:

kubectl --namespace "arangodb-operator" get arangodeployments

More details can be found on https://github.com/arangodb/kube-arangodb/tree/1.2.3/docs

 ~/Desktop/test-restrict: kubectl -n arangodb-operator get pods 
NAME                                  READY   STATUS    RESTARTS   AGE
arango-ts-operator-7c8cf4cf7d-87k9d   1/1     Running   0          20s
arango-ts-operator-7c8cf4cf7d-mkmns   1/1     Running   0          20s

 ~/Desktop/test-restrict: kubectl -n arangodb-operator get pod arango-ts-operator-7c8cf4cf7d-87k9d -o yaml |grep -i fsgroup
    fsGroup: 1

@vbasem
Copy link

vbasem commented Oct 15, 2021

while fsGroup is not used from the securitycontext to be used into the id container, we found out that the problem lies with the docker image sof the id container itself.

The alpine image has echo chmod -R 775 /var/lib/arangodb3 /var/lib/arangodb3-apps which doesnt do much other than echo the command instead of changing the permissions on the folder to be root group writeable.

With that fixed, setting the user group to 0, solves the issue. Now I have to create an issue in that repo for it!

@ognjen-it
Copy link
Author

@vbasem could you please send to us a link to the issue that you created?

@vbasem
Copy link

vbasem commented Nov 8, 2021

sure here is the issue
sadly no one has replied to it from the arangodb team!

@ajanikow
Copy link
Collaborator

ajanikow commented Nov 8, 2021

@ognjen-it Security context for Operator can be skipped, it is anyway full limited one. Operator does not use FileSystem, it runs as user with id 1000 (thats why it is not exposed). Operator is able to run with fully secured environment by default.

@informalict
Copy link
Contributor

Hi @ognjen-it

Can this issue be closed?

@ognjen-it
Copy link
Author

Hi @informalict

the issue could be closed but it's not resolved. I can't keep researching because I don't have the resources (time) until the end of the year.
I will reopen the Issue when my security team gives me feedback on why they don't allow me to deploy it in a production environment.

Big thank you for your support!

@informalict
Copy link
Contributor

@ognjen-it, so I am closing it. If you encounter the issue again please reopen it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants