Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create the helper pod when specifying multiple TARGET_NODES for experiment. #453

Open
jayzziebone opened this issue Mar 9, 2023 · 1 comment
Assignees

Comments

@jayzziebone
Copy link

BUG REPORT

What happened:
While running chaos experiment, for node cpu hog, sometimes it's not able to bring up some helper pod if I specify multiple TARGET_NODES in the comma separated format. In my case I have 4 nodes, and If I specify all 4 nodes, it's able to bring up 2 helper pods, then fails to bring up the other 2. And I see the error bellow inside de node-cpu-xxxx-xxx pod:
CPU hog failed, err: unable to create the helper pod, err: Post "https://10.96.0.1:443/api/v1/namespaces/default/pods\": read tcp 192.168.230.167:50174->10.96.0.1:443: read: connection reset by peer"

time="2023-03-09T15:43:36Z" level=info msg="Experiment Name: node-cpu-hog"
time="2023-03-09T15:43:36Z" level=info msg="[PreReq]: Getting the ENV for the node-cpu-hog experiment"
time="2023-03-09T15:43:38Z" level=info msg="[PreReq]: Updating the chaos result of node-cpu-hog experiment (SOT)"
time="2023-03-09T15:43:42Z" level=info msg="The application information is as follows" Node Label= Chaos Duration=60 Target Nodes="node-10-120-127-170,node-10-120-127-171,node-10-120-127-172,node-10-120-127-173" Node CPU Cores=1
time="2023-03-09T15:43:42Z" level=info msg="[Status]: Verify that the AUT (Application Under Test) is running (pre-chaos)"
time="2023-03-09T15:43:42Z" level=info msg="[Status]: No appLabels provided, skipping the application status checks"
time="2023-03-09T15:43:42Z" level=info msg="[Status]: Getting the status of target nodes"
time="2023-03-09T15:43:42Z" level=info msg="The Node status are as follows" Ready=true Node=node-10-120-127-170
time="2023-03-09T15:43:42Z" level=info msg="The Node status are as follows" Node=node-10-120-127-171 Ready=true
time="2023-03-09T15:43:42Z" level=info msg="The Node status are as follows" Node=node-10-120-127-172 Ready=true
time="2023-03-09T15:43:42Z" level=info msg="The Node status are as follows" Ready=true Node=node-10-120-127-173
time="2023-03-09T15:43:44Z" level=info msg="[Info]: The chaos tunables are:" Sequence=parallel Node CPU Cores=1 CPU Load=0 Node Affce Perc=0
time="2023-03-09T15:43:44Z" level=info msg="[Info]: Details of Nodes under chaos injection" No. Of Nodes=4 Node Names="[node-10-120-127-170 node-10-120-127-171 node-10-120-127-172 node-10-120-127-173]"
time="2023-03-09T15:43:44Z" level=info msg="[Info]: Details of Node under chaos injection" NodeName=node-10-120-127-170 NodeCPUcores=1
time="2023-03-09T15:43:44Z" level=info msg="[Info]: Details of Node under chaos injection" NodeName=node-10-120-127-171 NodeCPUcores=1
time="2023-03-09T15:43:44Z" level=info msg="[Info]: Details of Node under chaos injection" NodeName=node-10-120-127-172 NodeCPUcores=1
time="2023-03-09T15:43:45Z" level=error msg="[Error]: CPU hog failed, err: unable to create the helper pod, err: Post \"https://10.96.0.1:443/api/v1/namespaces/default/pods\": read tcp 192.168.230.167:50174->10.96.0.1:443: read: connection reset by peer"

And this fails the experiment at the end:

kubectl describe chaosresults.litmuschaos.io nginx-chaos-node-cpu-hog
Name:         nginx-chaos-node-cpu-hog
Namespace:    default
Labels:       app.kubernetes.io/component=experiment-job
              app.kubernetes.io/part-of=litmus
              app.kubernetes.io/version=2.14.0
              chaosUID=9c104680-26c3-49a6-801c-2ee3f9f96505
              controller-uid=f20544d9-b90a-4f08-9438-fbfbdf3c74e5
              job-name=node-cpu-hog-i0wu3z
              name=node-cpu-hog
Annotations:  <none>
API Version:  litmuschaos.io/v1alpha1
Kind:         ChaosResult
Metadata:
  Creation Timestamp:  2023-03-08T16:17:17Z
  Generation:          4
  Managed Fields:
    API Version:  litmuschaos.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:app.kubernetes.io/component:
          f:app.kubernetes.io/part-of:
          f:app.kubernetes.io/version:
          f:chaosUID:
          f:controller-uid:
          f:job-name:
          f:name:
      f:spec:
        .:
        f:engine:
        f:experiment:
      f:status:
        .:
        f:experimentStatus:
        f:history:
    Manager:         experiments
    Operation:       Update
    Time:            2023-03-08T16:17:17Z
  Resource Version:  5704800
  UID:               d53abe6b-e176-4769-9b72-4af35cd7d2ee
Spec:
  Engine:      nginx-chaos
  Experiment:  node-cpu-hog
Status:
  Experiment Status:
    Fail Step:                 [chaos]: Failed inside the chaoslib, err: unable to create the helper pod, err: Post "https://10.96.0.1:443/api/v1/namespaces/default/pods": read tcp 192.168.230.167:50174->10.96.0.1:443: read: connection reset by peer
    Phase:                     Completed
    Probe Success Percentage:  0
    Verdict:                   Fail
  History:
    Failed Runs:   1
    Passed Runs:   1
    Stopped Runs:  0
Events:
  Type     Reason   Age    From                       Message
  ----     ------   ----   ----                       -------
  Normal   Awaited  3m26s  node-cpu-hog-i0wu3z-h7q5j  experiment: node-cpu-hog, Result: Awaited
  Warning  Fail     3m19s  node-cpu-hog-i0wu3z-h7q5j  experiment: node-cpu-hog, Result: Fail

What you expected to happen:
I expect all the helper pods able to be up and Running and the experiment successful.

How to reproduce it (as minimally and precisely as possible):

  1. Install Litmus Operator

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.13.8.yaml

  1. Install the experiment engine

kubectl apply -f https://github.com/litmuschaos/chaos-charts/raw/v2.14.x/experiments/generic/node-cpu-hog/experiment.yaml

  1. Install the rbac yaml file

kubectl https://github.com/litmuschaos/chaos-charts/raw/v2.14.x/experiments/generic/node-cpu-hog/rbac.yaml

  1. Apply the node-cpu-hog-engine.yaml file below

kubectl apply -f node-cpu-hog-engine.yaml

Anything else we need to know?:

Environment:

kubectl get nodes
NAME                  STATUS   ROLES       AGE   VERSION
node-10-120-127-170   Ready    edge,node   8d    v1.22.17
node-10-120-127-171   Ready    edge,node   8d    v1.22.17
node-10-120-127-172   Ready    node        8d    v1.22.17
node-10-120-127-173   Ready    node        8d    v1.22.17

node-cpu-hog-engine YAML File:

cat node-cpu-hog-engine.yaml
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nginx-chaos
  namespace: default
spec:
  # It can be active/stop
  engineState: 'active'
  #ex. values: ns1:name=percona,ns2:run=nginx
  auxiliaryAppInfo: ''
  chaosServiceAccount: node-cpu-hog-sa
  experiments:
    - name: node-cpu-hog
      spec:
        components:
          env:
            # set chaos duration (in sec) as desired
            - name: TOTAL_CHAOS_DURATION
              value: '60'

            ## ENTER THE NUMBER OF CORES OF CPU FOR CPU HOGGING
            ## OPTIONAL VALUE IN CASE OF EMPTY VALUE IT WILL TAKE NODE CPU CAPACITY
            - name: NODE_CPU_CORE
              value: '1'

            ## LOAD CPU WITH GIVEN PERCENT LOADING FOR THE CPU STRESS WORKERS.
            ## 0 IS EFFECTIVELY A SLEEP (NO LOAD) AND 100 IS FULL LOADING
            - name: CPU_LOAD
              value: '0'

            ## percentage of total nodes to target
            - name: NODES_AFFECTED_PERC
              value: ''

            # provide the comma separated target node names
            - name: TARGET_NODES
              value: 'node-10-120-127-170,node-10-120-127-171,node-10-120-127-172,node-10-120-127-173'
@neelanjan00
Copy link
Member

Can you please specify in which k8s environment you're facing this?

@neelanjan00 neelanjan00 self-assigned this Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants