Pods stuck in Terminating state #1951

sabhika · 2022-10-25T06:04:55Z

sabhika
Oct 25, 2022

Hi Team,

I am using ARC, with two running deployments one for java apps and another for mulesoft apps. Whenever we autoscaling happens only the mulesoft runners are going into terminating state. I have read the discussions on pods getting stuck in terminating state but am not able to resolve my issue. Hope you can guide me here.

I am using the latest version ARC. 1.26

My runnerdeployment:

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: mule-runner
namespace: actions-runner
spec:
replicas: 6
template:
spec:
organization: my-org
image: custom-image-here
labels:
- mule-runner

kubectl get pod -o yaml "runnerpod" -n namespace

apiVersion: v1
kind: Pod
metadata:
annotations:
actions-runner-controller/token-expires-at: "2022-10-19T17:23:42Z"
actions-runner/id: "2007"
actions-runner/runner-completion-wait-start-timestamp: "2022-10-20T16:13:27Z"
actions-runner/unregistration-failure-message: Bad request - Runner "mule-runner-h6chf-9s5d9"
is still running a job"
actions-runner/unregistration-start-timestamp: "2022-10-20T16:13:27Z"
kubernetes.io/psp: eks.privileged
sync-time: "2022-10-19T16:23:42Z"
creationTimestamp: "2022-10-19T16:23:42Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2022-10-20T16:12:59Z"
finalizers:

actions.summerwind.dev/runner-pod
labels:
actions-runner: ""
actions-runner-controller/inject-registration-token: "true"
pod-template-hash: 6c5fd57df9
runner-deployment-name: mule-runner
runner-template-hash: b554d6c8
name: mule-runner-h6chf-9s5d9
namespace: actions-runner
ownerReferences:
apiVersion: actions.summerwind.dev/v1alpha1
blockOwnerDeletion: true
controller: true
kind: Runner
name: mule-runner-h6chf-9s5d9
uid: 3286b78e-d6fd-417f-bc33-6b2caa9af704
resourceVersion: "18492717"
uid: b2bab42c-b3e6-4d02-9a91-cdb809e25bd3
spec:
containers:
env:
- name: RUNNER_ORG
  value: my-org
- name: RUNNER_REPO
- name: RUNNER_ENTERPRISE
- name: RUNNER_LABELS
  value: mule-runner
- name: RUNNER_GROUP
- name: DOCKER_ENABLED
  value: "true"
- name: DOCKERD_IN_RUNNER
  value: "false"
- name: GITHUB_URL
  value: https://github.com/
- name: RUNNER_WORKDIR
  value: /runner/_work
- name: RUNNER_EPHEMERAL
  value: "true"
- name: RUNNER_STATUS_UPDATE_HOOK
  value: "false"
- name: DOCKER_HOST
  value: tcp://localhost:2376
- name: DOCKER_TLS_VERIFY
  value: "1"
- name: DOCKER_CERT_PATH
  value: /certs/client
- name: RUNNER_NAME
  value: mule-runner-h6chf-9s5d9
- name: RUNNER_TOKEN
  value: A3IXUG6DBAMH35N6FNTWT73DKAZJ5AVPNFXHG5DBNRWGC5DJN5XF62LEZYA4HTDDWFUW443UMFWGYYLUNFXW4X3UPFYGLN2JNZ2GKZ3SMF2GS33OJFXHG5DBNRWGC5DJN5XA
  image: my-cutom-image
  imagePullPolicy: Always
  name: runner
  resources: {}
  securityContext:
  privileged: false
  terminationMessagePath: /dev/termination-log
  terminationMessagePolicy: File
  volumeMounts:
- mountPath: /runner
  name: runner
- mountPath: /runner/_work
  name: work
- mountPath: /certs/client
  name: certs-client
  readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
  name: kube-api-access-m6v46
  readOnly: true
env:
- name: DOCKER_TLS_CERTDIR
  value: /certs
  image: docker:dind
  imagePullPolicy: IfNotPresent
  name: docker
  resources: {}
  securityContext:
  privileged: true
  terminationMessagePath: /dev/termination-log
  terminationMessagePolicy: File
  volumeMounts:
- mountPath: /runner
  name: runner
- mountPath: /certs/client
  name: certs-client
- mountPath: /runner/_work
  name: work
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
  name: kube-api-access-m6v46
  readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-xxxx-us-east-2.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
emptyDir: {}
name: runner
emptyDir: {}
name: work
emptyDir: {}
name: certs-client
name: kube-api-access-m6v46
projected:
defaultMode: 420
sources:
- serviceAccountToken:
  expirationSeconds: 3607
  path: token
- configMap:
  items:
  - key: ca.crt
    path: ca.crt
    name: kube-root-ca.crt
- downwardAPI:
  items:
  - fieldRef:
    apiVersion: v1
    fieldPath: metadata.namespace
    path: namespace
    status:
    conditions:
lastProbeTime: null
lastTransitionTime: "2022-10-19T16:35:52Z"
status: "True"
type: Initialized
lastProbeTime: null
lastTransitionTime: "2022-10-20T16:07:53Z"
status: "False"
type: Ready
lastProbeTime: null
lastTransitionTime: "2022-10-19T16:35:59Z"
status: "True"
type: ContainersReady
lastProbeTime: null
lastTransitionTime: "2022-10-19T16:35:52Z"
status: "True"
type: PodScheduled
containerStatuses:
containerID: docker://a4161f472b33206e0b6b74c7475fe5219a121632842f0e8e423845bfddda0855
image: docker:dind
imageID: docker-pullable://docker@sha256:999fc127a51b8a86593ff9ba2518f14cbd18555849f8927fd56fa82395effe16
lastState: {}
name: docker
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2022-10-19T16:35:56Z"
containerID: docker://45b5bd924d58b72d5ef2f0e47f985dadb67576831af6c98b9ad4d975e8087d64
image: 854311749298.dkr.ecr.us-east-2.amazonaws.com/jb-autoscale-runner-mule-image:dev-0.217.0
imageID: docker-pullable://854311749298.dkr.ecr.us-east-2.amazonaws.com/jb-autoscale-runner-mule-image@sha256:12a31f0f7214f8788670b5975edea48f9159f637fe068d2b6c4ca3f8c36ca756
lastState: {}
name: runner
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2022-10-19T16:35:56Z"
hostIP: xxxx
phase: Running
podIP: xxxx
podIPs:
ip: xxxx
qosClass: BestEffort
startTime: "2022-10-19T16:35:52Z"

I see the pods are in terminating state. Please suggest what is wrong with my deployment and am not able to get any logs since they are in terminating state.

The action runner controller log:

[11:30] Sabhika Roofi
2022-10-25T06:00:43Z INFO actions-runner-controller.runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "actions-runner/mule-runner-g9x6v-2qq7s"}2022-10-25T06:00:43Z INFO actions-runner-controller.runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "actions-runner/mule-runner-g9x6v-h2sqj"}2022-10-25T06:00:43Z INFO actions-runner-controller.runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "actions-runner/mule-runner-h6chf-gn4pf"}2022-10-25T06:00:43Z INFO actions-runner-controller.runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "actions-runner/mule-runner-h6chf-xbdxf"}2022-10-25T06:00:43Z INFO actions-runner-controller.runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "actions-runner/mule-runner-h6chf-klmvz"}

AWS EKS - 1.22
Node group - Managed node group
Autoscaling - automatic scheduled autoscaling

@mumoshu @toast-gear Please assist me. This is causing lot of jobs to be in queued status.

sabhika · 2022-10-26T08:45:53Z

sabhika
Oct 26, 2022
Author

@toast-gear I see the pods are again in terminating state today, even though the autoscaling is removed.

I see this in action-controller manager logs

2022-10-26T08:19:17Z INFO actions-runner-controller.runnerpod Runner pod is annotated to wait for completion, and the runner container is not restarting {"runnerpod": "actions-runner/mule-runner-new-bqg48-jqgnp"}

0 replies

dongho-jung · 2022-10-26T08:47:58Z

dongho-jung
Oct 26, 2022

I can't read your log due to no indentation and no formatting. please post it in code block by using ```(trible backticks) as follows

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: example-runnerdeploy
spec:
  replicas: 1
  template:
    spec:
      repository: mumoshu/actions-runner-controller-ci

9 replies

dongho-jung Oct 26, 2022

I expected that your k8s manifests and the output of kubectl are well indented though,, I can't see anything suspicious in your deployment yaml

by the way "Runner pod is annotated to wait for completion, and the runner container is not restarting" is from below snippets

https://github.com/actions-runner-controller/actions-runner-controller/blob/332548093a62aeacd5e3737fcbe0d019055a1fb5/controllers/runner_graceful_stop.go#L131-L144
https://github.com/actions-runner-controller/actions-runner-controller/blob/8161136cbd7451e055cbe4ca5cf9729ff1fe141e/controllers/constants.go#L26-L28

so the reason ARC decided to wait until the pod to complete by itself, is your pod has the annotation "actions-runner/runner-completion-wait-start-timestamp"

dongho-jung Oct 26, 2022

and I also found the similar issues, refer to them

sabhika Oct 26, 2022
Author

Yes @0xF4D3C0D3 The terminating pods are having the annotation "actions-runner/runner-completion-wait-start-timestamp".

sabhika Oct 31, 2022
Author

@toast-gear @mumoshu @0xF4D3C0D3 I have looked at the above links that you have shared. But the issues were resolbved with 0.22.3 version itself. And I am using the latest version 0.26.0 and still seeing the exact issue.

This the exact issue - #1369

The logs are also same.

When I remove the finalizers the pods are coming back to running state. Please help me resolve this issue.

sabhika Nov 1, 2022
Author

@0xF4D3C0D3 @mumoshu Can you please look into this.

samus-aran · 2025-02-12T09:32:09Z

samus-aran
Feb 12, 2025
Collaborator

We're locking this discussion because it has not had recent activity and/or other members have asked for more information to assist you but received no response. Thank you for helping us maintain a productive and tidy community for all our members.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods stuck in Terminating state #1951

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 9 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Pods stuck in Terminating state #1951

sabhika Oct 25, 2022

Replies: 3 comments · 9 replies

sabhika Oct 26, 2022 Author

dongho-jung Oct 26, 2022

dongho-jung Oct 26, 2022

dongho-jung Oct 26, 2022

sabhika Oct 26, 2022 Author

sabhika Oct 31, 2022 Author

sabhika Nov 1, 2022 Author

samus-aran Feb 12, 2025 Collaborator

sabhika
Oct 25, 2022

Replies: 3 comments 9 replies

sabhika
Oct 26, 2022
Author

dongho-jung
Oct 26, 2022

sabhika Oct 26, 2022
Author

sabhika Oct 31, 2022
Author

sabhika Nov 1, 2022
Author

samus-aran
Feb 12, 2025
Collaborator