1.7.0
New Features & Enhancements
-
Introduces experiment probes to enable declarative specification of entry/exit (success) criteria via the chaosengine. This release supports the Command, Kubernetes & HTTP probe types that can be configured in SoT (Start of Test), EoT (End of Test) & Edge execution modes. With this, users can reuse generic experiments to test a variety of app-specific/context-specific chaos scenarios.
-
Enhances the chaosresult status schema to include the ProbeSuccessPercentage score that gives an overview of the app/infra resilience to a specific chaos experiment run
-
Refines operational modes of litmus: Introduces namespaced operator support in helm charts to support multi-developer/shared cluster use-case with dedicated namespaces, such as in the Okteto Cloud, while updating the admin & standard mode functionality to watch engine resources in litmus & across namespaces respectively
-
Adds functionality to look for target applications in the chaosengine resource namespace if the target namespace is not explicitly specified.
-
Validates/prevents malformed application labels in the chaosengine
-
Improves the ChaosEngine status schema to hold more info (experiment pod names, runner names) that can aid other tools/abstractions running the experiment to derive/parse useful info for further reuse (logs extraction, for ex.)
-
Adds Microsoft Azure Kubernetes Service (AKS) as a supported platform for the generic experiment suite.
-
Adds a new chaos experiment to scale pods/test node autoscale functionality
-
Adds the libraries for the execution of AWS chaos using chaostoolkit, orchestrated by Litmus.
-
Adds support for the specification of host file mounts in chaos experiments
-
Allows setting polling intervals and timeouts for status checks via chaosengine to aid tuning execution for slower environments
-
Removes dependencies on multiple experiment “helper” (auxiliary) images and makes the litmus go-runner self-sufficient in handling the required chaos business logic. This eases maintenance, especially in the case of air-gapped environments / downstream projects that build the litmus components in their respective CI/CD pipelines.
-
Enhances the experiment to “fail fast” upon failed app checks in cases where containers are terminated
-
Upgrades the ansible-runner to use python3
-
Enhances the developer experience for litmus chaos experiments by using Okteto CLI to develop & test experiment business logic in-cluster over repeating image-build-job-run cycles
-
Updates the scaffold utils to generate the experiment bootstrap code based on the latest developments in the experiment structure.
-
Adds chaos-instrumented grafana dashboards for the sock-shop application along with details on setting up monitoring for chaos experiment runs.
-
Adds pre-defined/usable workflows for repeatable execution of node resource chaos in the chaos-charts repo
-
Pushes the technical preview / pre-alpha version of the litmus-portal (available on the master branch).
-
Refactors the litmus-e2e repo/code-structure to simplify the addition of new BDD tests (modularization, removal of bash utils, formatted errors, klog usage, scenario coverage parameters)
-
Adds additional stages in litmus-e2e GitLab pipelines to execute both the go-based & ansible-based chaos experiments
-
Improves github-actions based comment-triggered e2e runs with log details
-
Features a completely revamped & improved ChaosHub
-
Improves the project wiki with more information for users and developers (architecture docs, video tutorials, charters for the Litmus Special Interest Groups)
Major Bug Fixes
-
Patches the chaosengine with the right (‘stopped’) and fixes the event to provide the right reason in cases where app filtering is unsuccessful. This will allow a re-apply of the engine to re-trigger the application.
-
Adds a check to factor-in cordoned (SchedulingDisabled) status of nodes in kubelet & docker-service kill experiments.
-
Provides the tc_image used in network chaos experiments as an experiment tunable over hardcoding in order to support users with internal image registries
-
Decides experiment termination based on chaos container status over that of chaos pod objects to support operations in a service-mesh environment (istio, linkerd) where all pods (including chaos resources) are injected with sidecars. Without this, the experiment runs forever due to the proxy sidecars.
-
Sets the restart policy of the experiments jobs to Never over OnFailure to prevent repeated re-execution for certain experiment failure conditions.
-
Fixes the incorrect eventType for chaos events in cases of failures & skipped executions.
-
Fixes the go-based pod-cpu-hog & pod-memory-hog experiments to execute the chaos processes (commands) in the target container by passing them as a args to shell instance (/bin/sh -c) to account for targets which may run with different entrypoints.
-
Fixes permission issues on the infra helm chart resulting in failed metrics collection
Breaking Changes
-
Stops support for the ansible-runner/executor (EoL) (Not to be confused with the ansible-based chaos experiments)
-
Removes the following repositories:
-
litmuschaos/pages: The operator manifests are available over gh-pages sourced out of litmuschaos/litmus
-
litmuschaos/chaos-helm: The experiments helm chart is also into the litmus-helm repo.
-
litmuschaos/community: The demo procedures & community info are now available within the litmus-demo & the litmus repo respectively.
-
Installation
kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.7.0.yaml
Verify your installation
-
Verify if the chaos operator is running
kubectl get pods -n litmus
-
Verify if chaos CRDs are installed
kubectl get crds | grep chaos
For more details refer to the documentation at Docs