Cannot set tolerations/nodeSelector for temporary arangodb-cluster-id pod

Hi! :wave:

I'm trying to deploy an ArangoDeployment to a mixed amd/arm cluster, where the arm nodes have a no schedule taint.

When I boot a fresh cluster with `spec.architecture: [arm64]`, there will be a temporary `arangodb-cluster-id-xxx` pod created, which has hard NodeAffinity for arm64, but no way to add tolerations/nodeSelector. This means the pod won't be scheduled on a node, and the cluster boot sequence will hang.

When I change to `spec.architecture: [amd64, arm64]`, the temporary pod will be allowed to schedule on an amd node in the cluster, ~and the rest of the pods will have the toleration for arm and so will schedule on the arm nodes~ (as we also have a prefer no schedule taint on the amd nodes). This is an acceptable workaround for now, but I'd rather specify `spec.architecture: [arm64]` and get a successful boot. 

> Deleted the ArangoDeployment from above (but with arm64 in the spec), and now trying to re-create it.
> 
> The operator creates the following `id` Pod, with arm64 nodeAffinity, but without arm64 tolerations. We have tolerations in the ArangoDeployment for controller,agent,prmr -- because we have a mixed amd64/arm64 cluster, and all arm64 nodes are tainted such that they are opt-in. Now, because of the hard nodeAffinity but missing tolerations on the `id` Pod, our ArangoDeployment won't boot at all 😅 

<details><summary>arangodb-cluster-id-162f0e</summary>
<p>

```yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2022-09-23T07:48:45Z"
  labels:
    app: arangodb
    arango_deployment: arangodb-cluster
    deployment.arangodb.com/member: 162f0e
    role: id
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .: {}
          f:app: {}
          f:arango_deployment: {}
          f:deployment.arangodb.com/member: {}
          f:role: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"6054f633-c606-4f75-b7ee-a28430e1b952"}: {}
      f:spec:
        f:affinity:
          .: {}
          f:nodeAffinity:
            .: {}
            f:requiredDuringSchedulingIgnoredDuringExecution: {}
          f:podAntiAffinity:
            .: {}
            f:preferredDuringSchedulingIgnoredDuringExecution: {}
        f:containers:
          k:{"name":"server"}:
            .: {}
            f:command: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:name: {}
            f:ports:
              .: {}
              k:{"containerPort":8529,"protocol":"TCP"}:
                .: {}
                f:containerPort: {}
                f:name: {}
                f:protocol: {}
            f:resources: {}
            f:securityContext:
              .: {}
              f:capabilities:
                .: {}
                f:drop: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/data"}:
                .: {}
                f:mountPath: {}
                f:name: {}
        f:dnsPolicy: {}
        f:enableServiceLinks: {}
        f:hostname: {}
        f:restartPolicy: {}
        f:schedulerName: {}
        f:securityContext: {}
        f:subdomain: {}
        f:terminationGracePeriodSeconds: {}
        f:tolerations: {}
        f:volumes:
          .: {}
          k:{"name":"arangod-data"}:
            .: {}
            f:emptyDir: {}
            f:name: {}
    manager: arangodb_operator
    operation: Update
    time: "2022-09-23T07:48:45Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
          .: {}
          k:{"type":"PodScheduled"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
    manager: kube-scheduler
    operation: Update
    subresource: status
    time: "2022-09-23T07:48:45Z"
  name: arangodb-cluster-id-162f0e
  namespace: aa-data-api
  ownerReferences:
  - apiVersion: database.arangodb.com/v1
    controller: true
    kind: ArangoDeployment
    name: arangodb-cluster
    uid: 6054f633-c606-4f75-b7ee-a28430e1b952
  resourceVersion: "109818868"
  uid: cf969e3d-6bc1-4395-a774-070b6b629c1e
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/arch
            operator: In
            values:
            - arm64
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchLabels:
              app: arangodb
              arango_deployment: arangodb-cluster
              role: id
          topologyKey: kubernetes.io/hostname
        weight: 1
  containers:
  - command:
    - /usr/sbin/arangod
    - --server.authentication=false
    - --server.endpoint=tcp://[::]:8529
    - --database.directory=/data
    - --log.output=+
    image: arangodb/arangodb-preview:3.10.0-beta.1
    imagePullPolicy: IfNotPresent
    name: server
    ports:
    - containerPort: 8529
      name: server
      protocol: TCP
    resources: {}
    securityContext:
      capabilities:
        drop:
        - ALL
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /data
      name: arangod-data
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-mzlkl
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: arangodb-cluster-id-162f0e
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  subdomain: arangodb-cluster-int
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 5
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  - effect: NoExecute
    key: node.alpha.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 5
  volumes:
  - emptyDir: {}
    name: arangod-data
  - name: kube-api-access-mzlkl
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-09-23T07:48:45Z"
    message: '0/22 nodes are available: 17 node(s) had taint {arch: arm64}, that the
      pod didn''t tolerate, 2 node(s) didn''t match Pod''s node affinity/selector,
      3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn''t
      tolerate.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: BestEffort
```

</p>
</details>

_Originally posted by @ddelange in https://github.com/arangodb/arangodb-docker/issues/53#issuecomment-1255914415_
      

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cannot set tolerations/nodeSelector for temporary arangodb-cluster-id pod #110

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot set tolerations/nodeSelector for temporary arangodb-cluster-id pod #110

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions