Skip to content

Releases: litmuschaos/litmus

1.7.0

15 Aug 16:19
ae0b913
Compare
Choose a tag to compare

New Features & Enhancements

  • Introduces experiment probes to enable declarative specification of entry/exit (success) criteria via the chaosengine. This release supports the Command, Kubernetes & HTTP probe types that can be configured in SoT (Start of Test), EoT (End of Test) & Edge execution modes. With this, users can reuse generic experiments to test a variety of app-specific/context-specific chaos scenarios.

  • Enhances the chaosresult status schema to include the ProbeSuccessPercentage score that gives an overview of the app/infra resilience to a specific chaos experiment run

  • Refines operational modes of litmus: Introduces namespaced operator support in helm charts to support multi-developer/shared cluster use-case with dedicated namespaces, such as in the Okteto Cloud, while updating the admin & standard mode functionality to watch engine resources in litmus & across namespaces respectively

  • Adds functionality to look for target applications in the chaosengine resource namespace if the target namespace is not explicitly specified.

  • Validates/prevents malformed application labels in the chaosengine

  • Improves the ChaosEngine status schema to hold more info (experiment pod names, runner names) that can aid other tools/abstractions running the experiment to derive/parse useful info for further reuse (logs extraction, for ex.)

  • Adds Microsoft Azure Kubernetes Service (AKS) as a supported platform for the generic experiment suite.

  • Adds a new chaos experiment to scale pods/test node autoscale functionality

  • Adds the libraries for the execution of AWS chaos using chaostoolkit, orchestrated by Litmus.

  • Adds support for the specification of host file mounts in chaos experiments

  • Allows setting polling intervals and timeouts for status checks via chaosengine to aid tuning execution for slower environments

  • Removes dependencies on multiple experiment “helper” (auxiliary) images and makes the litmus go-runner self-sufficient in handling the required chaos business logic. This eases maintenance, especially in the case of air-gapped environments / downstream projects that build the litmus components in their respective CI/CD pipelines.

  • Enhances the experiment to “fail fast” upon failed app checks in cases where containers are terminated

  • Upgrades the ansible-runner to use python3

  • Enhances the developer experience for litmus chaos experiments by using Okteto CLI to develop & test experiment business logic in-cluster over repeating image-build-job-run cycles

  • Updates the scaffold utils to generate the experiment bootstrap code based on the latest developments in the experiment structure.

  • Adds chaos-instrumented grafana dashboards for the sock-shop application along with details on setting up monitoring for chaos experiment runs.

  • Adds pre-defined/usable workflows for repeatable execution of node resource chaos in the chaos-charts repo

  • Pushes the technical preview / pre-alpha version of the litmus-portal (available on the master branch).

  • Refactors the litmus-e2e repo/code-structure to simplify the addition of new BDD tests (modularization, removal of bash utils, formatted errors, klog usage, scenario coverage parameters)

  • Adds additional stages in litmus-e2e GitLab pipelines to execute both the go-based & ansible-based chaos experiments

  • Improves github-actions based comment-triggered e2e runs with log details

  • Features a completely revamped & improved ChaosHub

  • Improves the project wiki with more information for users and developers (architecture docs, video tutorials, charters for the Litmus Special Interest Groups)

Major Bug Fixes

  • Patches the chaosengine with the right (‘stopped’) and fixes the event to provide the right reason in cases where app filtering is unsuccessful. This will allow a re-apply of the engine to re-trigger the application.

  • Adds a check to factor-in cordoned (SchedulingDisabled) status of nodes in kubelet & docker-service kill experiments.

  • Provides the tc_image used in network chaos experiments as an experiment tunable over hardcoding in order to support users with internal image registries

  • Decides experiment termination based on chaos container status over that of chaos pod objects to support operations in a service-mesh environment (istio, linkerd) where all pods (including chaos resources) are injected with sidecars. Without this, the experiment runs forever due to the proxy sidecars.

  • Sets the restart policy of the experiments jobs to Never over OnFailure to prevent repeated re-execution for certain experiment failure conditions.

  • Fixes the incorrect eventType for chaos events in cases of failures & skipped executions.

  • Fixes the go-based pod-cpu-hog & pod-memory-hog experiments to execute the chaos processes (commands) in the target container by passing them as a args to shell instance (/bin/sh -c) to account for targets which may run with different entrypoints.

  • Fixes permission issues on the infra helm chart resulting in failed metrics collection

Breaking Changes

  • Stops support for the ansible-runner/executor (EoL) (Not to be confused with the ansible-based chaos experiments)

  • Removes the following repositories:

    • litmuschaos/pages: The operator manifests are available over gh-pages sourced out of litmuschaos/litmus

    • litmuschaos/chaos-helm: The experiments helm chart is also into the litmus-helm repo.

    • litmuschaos/community: The demo procedures & community info are now available within the litmus-demo & the litmus repo respectively.

Installation

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.7.0.yaml

Verify your installation

  • Verify if the chaos operator is running
    kubectl get pods -n litmus

  • Verify if chaos CRDs are installed
    kubectl get crds | grep chaos

For more details refer to the documentation at Docs

1.6.0

15 Jul 15:13
38b701c
Compare
Choose a tag to compare

New Features and Enhancements

  • Specification of pod and container security context for the experiment resources via chaosexperiment spec
  • Introduces pod scheduling policy support via NodeSelector specification on the chaosengine (instance-specific attribute)
  • Ability to override experiment images from the chaosengine
  • Pushes an experiment execution summary event on the chaosresult resource
  • Adds the network chaos experiment to induce packet duplication
  • Adds node chaos experiment to force pod evictions via taints
  • Adds service chaos experiment to kill docker service on the node
  • Extends the golang chaoslib support for all existing chaos experiments in the generic suite
  • Validation webhook enhancements to verify if application labels specified in the chaosengine are propagated to pod templates of the applications under test (AUT)
  • Additional examples to illustrate litmus chaos-workflows using nginx benchmark using apache benchmark tool with parallel pod-kills
  • Migrates the ansible-based chaos experiments to the litmus-ansible repo from litmuschaos/litmus in line with the litmus-go, litmus-python repo structure
  • Improves the unit-test based coverage for chaos operator by 30%
  • Extends the capability trigger on-demand e2e runs for PRs via GitHub comments to chaos operator
  • Adds framework to determine e2e coverage percentage based on comparison of executed tests in the pipeline versus test plan
  • Introduces an e2e portal to view e2e pipeline data and coverage
  • Improves the Travis-based CI pipeline of the test-tools repo to build images only if respective Dockerfile or scripts are modified instead of building all images irrespective the nature of the commit.
  • Increases sources for (helm-based) litmus installation to include helm hub & jfrog chartcenter artifact repositories
  • Adds betterci integration to charthub to obtain UI/UX previews for PRs
  • Enhances individual experiment documentation with abort procedure & troubleshooting references
  • Enhances the experiment failure and uninstall troubleshooting sections to include more conditions
  • Includes steps to run chaos experiments on rancher platform
  • Includes missing video links/examples for chaos experiments in the generic suite
  • Updates all the litmuschaos websites (docs, charthub, project website) based on CNCF guidelines
  • Enhances the release guidelines doc with an enhanced release checklist

Major Bug Fixes

  • Fixes invalid Jinja template for chaos injection (helper) pod in the kubelet-service-kill experiment
  • Specifies an upper limit for the memory hog experiment docs based on the current resource exhaustion approach via dd
  • Adds instructions in infra (node) chaos experiments to cordon the AUT before the execution of chaos to prevent the restart of litmus pods
  • Fixes a race condition in the pod-delete experiment where the verdict is flagged as fail despite successful execution
  • Fixes Kafka experiment failure while trying to derive leader broker for the test topic (partition) due to missing ns and improper regex
  • Fixes coredns experiment regression (caused due to introduction of helper pods logic for the pod-delete experiment) due to missing
    lib_image in experiment CR

Installation

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.6.0.yaml

Verify your installation

  • Verify if the chaos operator is running
    kubectl get pods -n litmus

  • Verify if chaos CRDs are installed
    kubectl get crds | grep chaos

For more details refer to the documentation at Docs

1.5.1

09 Jul 05:16
8ba20bc
Compare
Choose a tag to compare
[Cherry Pick to 1.5.1] Inhibit experiment image creation from branche…

…s of litmus repo (#1682)

* (chore)releases: updated release artefacts (#1552)

Signed-off-by: ksatchit <[email protected]>

* (chore)roadmap: add item for litmus portal (#1553)

Signed-off-by: ksatchit <[email protected]>

* Add merge label and auto-merge feature in gihtub actions (#1556)

Signed-off-by: Udit Gaurav <[email protected]>

* refactor(readme): Add more details in pod network corruption readme (#1555)

* refactor(readme): Add more details in ood network corruption readme

Signed-off-by: Udit Gaurav <[email protected]>

* Update experiments/generic/pod_network_corruption/README.md

Co-authored-by: Karthik Satchitanand <[email protected]>

Co-authored-by: Karthik Satchitanand <[email protected]>

* refactor(experiment): Add pod memory hog default memory consumption (#1576)

Signed-off-by: Udit Gaurav <[email protected]>

* bug(indentation): Fix the indentation in kubelet service kill experiment (#1581)

Signed-off-by: Udit Gaurav <[email protected]>

* (chore)roadmap: update availability of scaffold scripts to generate experiment code (#1584)

Signed-off-by: ksatchit <[email protected]>

* (chore): update schematic representation of litmus arch (#1589)

* (chore): update schematic representation of litmus arch

Signed-off-by: ksatchit <[email protected]>

* (refactor)demo: add an updated demo video

Signed-off-by: ksatchit <[email protected]>

* (chore)governance: update maintainer email IDs (#1599)

Signed-off-by: ksatchit <[email protected]>

* (chore)content: add folder to discuss chaos engg (#1619)

Signed-off-by: ksatchit <[email protected]>

* Update the backlog in Roadmap with IO-Chaos

* Stopped CircleCi Build for master branch (#1625)

* Stopped CircleCi Build for master

Signed-off-by: gdsoumya <[email protected]>

* Update config.yml

* Update config.yml

* Update config.yml

* Update config.yml

* Update config.yml

* (chore)roadmap: add backlog item on chaos workflows for application benchmarks (#1626)

Signed-off-by: ksatchit <[email protected]>

* (chore)ci: inhibit push of ansible-runner image from litmus (#1660)

Signed-off-by: ksatchit <[email protected]>

Co-authored-by: UDIT GAURAV <[email protected]>
Co-authored-by: Uma Mukkara <[email protected]>
Co-authored-by: Soumya Ghosh Dastidar <[email protected]>

1.5.0

15 Jun 17:41
993d2e7
Compare
Choose a tag to compare

New Features and Enhancements

  • Features a revamped chaos charthub with a more resilient design and improved user experience

  • Introduces ability (github workflows) to trigger individual/multiple e2e tests or complete e2e test-suite for litmus PRs via GitHub comments

  • Adds a new repo litmuschaos/litmus-demo to provide a fully packaged demo environment to run chaos under 10 min

  • Adds node service kill chaos chaos libraries (& kubelet kill chaos experiment on specified nodes)

  • Improves the pod cpu hog experiment by adding go chaoslib to support containerd/crio runtime

  • Introduces chaoslib pattern to choose blast radius / percentage (target) pods and abort chaos on target containers

  • Improves the chaos-scheduler controller to halt/resume chaos

  • Enhances the chaos-schedule CR schema to provide dedicated attributes for the schedule modes (now, once, repeat) over mutually-exclusive fields with enhanced OpenAPI schema validation

  • Introduces ImagePullPolicy as a chaosexperiment CR attribute (.spec.definition.imagePullPolicy) to support usecases where the experiments are needed to be run with locally built images, as with PR-triggered e2e

  • Enhances the container-kill experiment to repeat the chaos per an interval over a total duration with support for containerd/crio runtime.

  • Adds go-based helper pods for pod-delete and container-kill chaos libraries

  • Improves the litmus-go scaffold tool to use lighter base images & improved default events

  • Improves the validating webhook-based admission controller to call out missed annotations on target applications

  • Improves unit-test coverage for chaos-operator

  • Enhances the getting started (chaosengine construction) & troubleshooting docs (uninstallation steps)

Major Bug Fixes

  • Fixes the missing/clustered event generation on litmus-go chaos experiment

  • Fixes operator behavior of triggering chaos disregarding annotation status on the target application

  • Fixes the cluster level running experiment count metric from chaos-exporter

  • Adds concurrent updation of the event counter for each iteration of chaos injection

  • Fixes chaos experiment failures (securitycontext additions) on OpenShift 4.3

Installation

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.5.0.yaml

Verify your installation

  • Verify if the chaos operator is running
    kubectl get pods -n litmus

  • Verify if chaos CRDs are installed
    kubectl get crds | grep chaos

For more details refer to the documentation at Docs

1.4.1

03 Jun 05:42
961c7fa
Compare
Choose a tag to compare
[Cherry-Pick for 1.4.1]  (#1535)

* (chore)roadmap: update roadmap status (#1530)

Signed-off-by: ksatchit <[email protected]>

* update(helper-pod): Wait till the helper pod come into running state (#1533)

Signed-off-by: shubhamchaudhary <[email protected]>

Co-authored-by: Shubham Chaudhary <[email protected]>

1.4.1-RC1

29 May 20:11
af7ce00
Compare
Choose a tag to compare
1.4.1-RC1 Pre-release
Pre-release
fix(pod-delete): Fixing pod-delete chaolib (#1526) (#1528)

Signed-off-by: Udit Gaurav <[email protected]>

Co-authored-by: UDIT GAURAV <[email protected]>

1.4.0

15 May 12:41
80c61b1
Compare
Choose a tag to compare

New Features and Enhancements

  • Introduces the ChaosSchedule CRD & Controller to execute background chaos jobs with a variety of scheduling policies: Immediate, at specific timestamp or between a defined start & end time. Supports both randomized as well as strictly scheduled execution of chaos.

  • Introduces argo-based Chaos Workflows as a means to help users construct complex scenarios around chaos experiments such as ability to parallelize benchmark runs with chaos operations. The initial commits include workflows to gauge impact of pod failures on the performance of a multi-replica nginx deployment.

  • Introduces litmus-go - a repo to hold experiments and chaoslib written in golang, with an alpha litmus-go SDK that has the ability to scaffold go experiments, complete with all artefacts, including the chaosexperiment custom resources. Also introduces litmus-python, which primarily holds chaostoolkit-based chaos experiments.

  • Introduces an alpha Validation Webhook for Litmus to offload experiment dependency validation checks from chaos-operator & chaos-runner components.

  • Adds support for chaos on DeploymentConfig resources on OpenShift

  • Introduces ability to insert user-defined annotations into chaos resources (chaos-runner, experiment pods) via chaosengine

  • Adds support for instance specific metadata (id) definition by users to specify the purpose/track chaos experiment and lend uniqueness to the chaosresult via chaosengine environment variable

  • Refactors the chaos exporter metrics to provide aggregated cluster level chaos metrics with improved naming convention.

  • Introduces a suite of standard observability resources to aid with visualization & monitoring of chaos experiments - including events (heptio eventrouter-prometheus-grafana, metricbeat-elasticsearch-kibana), metrics (chaos-exporter-prometheus-grafana) & logs (promtail-loki-grafana).

  • Homogenizes chaos experiments to use LIB model to invoke chaos injection functions

  • Improves the litmus helm chart to support admin mode installation. Also includes optional install of chaos-exporter.

  • Updates to use stress-ng over stress in chaos libraries to support greater chaos support

  • Adds helm chart testing in CI for litmus-helm repo

  • Updates the litmus-e2e gitlab job scripts to function on on-prem Kubernetes clusters over NAT

  • Shifts to Go Modules for dependency management across litmus components

  • Improves general & troubleshooting FAQs on litmus-docs around failed chaos experiment execution.

Major Bug Fixes

  • Fixes inability to run litmus experiment containers in OpenShift due to “AnsibleError: Unable to create local directories” by generating resource manifests from jinja templates into /tmp.

  • Fixes disk-fill experiment execution on Gravity Kubernetes cluster via dynamic container data path.

  • Fixes exceptions seen in chaos-operator due to lack of resource permissions for replicasets

  • Fixes “unable to update resource” / “operation cannot be fulfilled” transient errors on chaos-operator

  • Fixes broken BDD tests in chaos-runner, chaos-operator CI pipelines

  • Enforces hard stop of pod-delete chaos experiment at total_chaos_duration via chaos timestamp comparisons

  • Fixes algolia-based search functionality in litmus-docs

  • Fixes the analytics count round off issue for operator installation & experiment run count in the charthub

Getting Started

Prerequisites to install

  • Make sure you have a healthy Kubernetes Cluster.
  • Kubernetes 1.12+ is installed

Installation

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.4.0.yaml

Verify your installation

  • Verify if the chaos operator is running
    kubectl get pods -n litmus

  • Verify if chaos CRDs are installed
    kubectl get crds | grep chaos

For more details refer to the documentation at Docs

1.4.0-RC2

13 May 17:27
80c61b1
Compare
Choose a tag to compare
1.4.0-RC2 Pre-release
Pre-release
[Cherry-pick for RC2]  (#1506)

* update(pod-delete): Adding type casting in pod-delete experiment (#1503)

Signed-off-by: shubhamchaudhary <[email protected]>

* feat(disk-fill): Adding Dynamic Variable for Container Path (#1502)

* (feat) Adding Dynamic Variable for Container Path

Signed-off-by: Rahul M Chheda <[email protected]>

* Adding appropriate comments

Signed-off-by: Rahul M Chheda <[email protected]>

* feat(image): Using a comman image for stress-ng commands (#1462)

Signed-off-by: Udit Gaurav <[email protected]>

Co-authored-by: Rahul M Chheda <[email protected]>
Co-authored-by: UDIT GAURAV <[email protected]>

1.4.0-RC1

11 May 16:24
81bc298
Compare
Choose a tag to compare
1.4.0-RC1 Pre-release
Pre-release
update(disk-loss):Adding lib env in disk-loss experiment (#1457)

Signed-off-by: shubhamchaudhary <[email protected]>

1.3.0

15 Apr 14:06
4248bdb
Compare
Choose a tag to compare

New Features and Enhancements

  • Introduces admin mode of chaos execution by which all chaos resources can be maintained in a single namespace while injecting chaos on applications across multiple namespaces
  • Introduces helm charts for litmus infrastructure components and chaos charts
  • Supports download of versioned chaos chart bundles on the chaoshub
  • Supports custom/user-specified annotation filters to determine application chaos candidates
  • Makes the chaos exporter a cluster-wide component deployed alongside the operator to extract metrics for all chaosengines
  • Adds increased Kubernetes events to track failures (ex: inability to create chaos resources, access/patch chaosengine etc.,)
  • Adds ability to re-trigger experiments for completed chaosengines via a patch operation
  • Adds OpenEBS NFS provisioner failure experiment with external liveness checks to verify provisioner functionality & data persistence
  • Introduces the Cassandra chaos chart with cassandra node failure experiment along with external liveness checks to perform database CRUD operations during chaos
  • Adds pod level memory hog experiment with provision for users to provide memory to consume (in MB)
  • Enhances the chaostoolkit based pod delete experiment to use python modules with added support for a json (blob) result artifact and different failure modes (i.e., single/multi pod failure)
  • Enhances the node cpu hog experiment to accept cpu core count as a user input
  • Enhances the container kill experiment to repeat chaos actions over a total chaos duration instead of being a single-action test
  • Restructures the chaoslib to categorize chaos injection functions/taskfiles under respective tool-based lib
  • Improves the experiment logs (task banners) based on the category/function performed by the tasks
  • Adds (aquasecurity) trivy based static security scans for all litmus component images as part of respective CI builds
  • Includes lint-checks with custom/project-specific rules for ansible playbooks in litmus CI build
  • Improves the litmus e2e pipelines with addition of new tests around admin mode, multiple parallel chaosengine execution across namespaces, validation for engine status patch
  • Improves e2e infra (scripts) to be able to launch e2e pipelines with custom image versions
  • Adds pipeline history information in the litmus-e2e repo to track experiment status
  • Introduces a new repo to hold charts and experiment icons linked to respective CSVs on the chaoshub.
  • Adds documentation to explain the plugin model in litmus and integration with other chaos tools
  • Adds a new artifact in the litmus repository called releases to track salient resource schema changes and provide references to detailed release notes

Major Bug Fixes

  • Fixes the incorrect experiment status on chaosengine (“Awaited”) despite completion of experiment
  • Fixes failure to schedule auxiliary/helper pods with nodeSelector specification on EKS clusters
  • Fixes ambiguity/missing steps in developer guide and updates experiment artefacts templates with latest changes (since v1.0)
  • Fixes the event source names in case of events generated by chaos-runners to bear chaos-runner pod name
  • Fixes the failure to verify successful app reschedule post drain operation in node-drain experiment
  • Fixes the crash of powerfulseal deployment due to use of improper service account

Getting Started

Prerequisites to install

  • Make sure you have a healthy Kubernetes Cluster.
  • Kubernetes 1.11+ is installed

Installation

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.3.0.yaml

Verify your installation

  • Verify if the chaos operator is running
    kubectl get pods -n litmus

  • Verify if chaos CRDs are installed
    kubectl get crds | grep chaos

For more details refer to the documentation at Docs