Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEFAULT_HEATTH_CHECK tunable does not disable all default verifications of AUTs pods. #638

Open
cavasalcai opened this issue Feb 13, 2023 · 1 comment

Comments

@cavasalcai
Copy link

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

What happened:

Hi, I am working with Litmus to induce chaos into my target environment. However, due to the default health checks I encounter issues on finishing my experiments or even starting them. This is because, in my environment I have multiple replicas for my pods, which means that I would like to induce chaos and be able to delete a pod even if one other replica pod is not yet in ready state.

I can see in the code that we have a tunable that can disable the default health check for the Application Under Test, which is great -- I already did this. However, there is another default check, i.e., checking if all containers are in running state. See here:

Because of this check, if there is one single replica from my target pool of pods that is not running at the time of chaos injection, then it will fail. See below:
image

What you expected to happen:

I expect that if I set the DEFAULT_HEALTH_CHECK on false, then I should be able to induce chaos by deleting a pod regardless if one or more pods/containers are not in ready/running state.

How to reproduce it (as minimally and precisely as possible):

  1. Prepare a chaos experiment using the generic/pod-delete template.
  2. Select a label that is shared by multiple pods from your AUT
  3. Each component has multiple replicas, e.g., an application with 3 components might have 15 pods -- 5 replicas per each component
  4. Add the DEFAULT_HEALTH_CHECK tunable on false
  5. Start your experiment.

In this scenario, if there is one single replica not in running state, then the chaos experiment fails.

Anything else we need to know?:

As far as I know, reading the documentation and looking in GitHub, I was not able to find another tunable to disable this default check. Am I missing something or the ability to disable this extra check using a tunable does not exists at the moment?

Thank you.

@neelanjan00
Copy link
Member

Can you please check why the container is still not ready? Ideally, the expectation is that all the replica containers should be in Ready state once the chaos is injected and the duration has passed, otherwise, it indicates that there's an issue with the scaling of your app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants