Skip to content

1.7.0

Compare
Choose a tag to compare
@ksatchit ksatchit released this 15 Aug 16:19
ae0b913

New Features & Enhancements

  • Introduces experiment probes to enable declarative specification of entry/exit (success) criteria via the chaosengine. This release supports the Command, Kubernetes & HTTP probe types that can be configured in SoT (Start of Test), EoT (End of Test) & Edge execution modes. With this, users can reuse generic experiments to test a variety of app-specific/context-specific chaos scenarios.

  • Enhances the chaosresult status schema to include the ProbeSuccessPercentage score that gives an overview of the app/infra resilience to a specific chaos experiment run

  • Refines operational modes of litmus: Introduces namespaced operator support in helm charts to support multi-developer/shared cluster use-case with dedicated namespaces, such as in the Okteto Cloud, while updating the admin & standard mode functionality to watch engine resources in litmus & across namespaces respectively

  • Adds functionality to look for target applications in the chaosengine resource namespace if the target namespace is not explicitly specified.

  • Validates/prevents malformed application labels in the chaosengine

  • Improves the ChaosEngine status schema to hold more info (experiment pod names, runner names) that can aid other tools/abstractions running the experiment to derive/parse useful info for further reuse (logs extraction, for ex.)

  • Adds Microsoft Azure Kubernetes Service (AKS) as a supported platform for the generic experiment suite.

  • Adds a new chaos experiment to scale pods/test node autoscale functionality

  • Adds the libraries for the execution of AWS chaos using chaostoolkit, orchestrated by Litmus.

  • Adds support for the specification of host file mounts in chaos experiments

  • Allows setting polling intervals and timeouts for status checks via chaosengine to aid tuning execution for slower environments

  • Removes dependencies on multiple experiment “helper” (auxiliary) images and makes the litmus go-runner self-sufficient in handling the required chaos business logic. This eases maintenance, especially in the case of air-gapped environments / downstream projects that build the litmus components in their respective CI/CD pipelines.

  • Enhances the experiment to “fail fast” upon failed app checks in cases where containers are terminated

  • Upgrades the ansible-runner to use python3

  • Enhances the developer experience for litmus chaos experiments by using Okteto CLI to develop & test experiment business logic in-cluster over repeating image-build-job-run cycles

  • Updates the scaffold utils to generate the experiment bootstrap code based on the latest developments in the experiment structure.

  • Adds chaos-instrumented grafana dashboards for the sock-shop application along with details on setting up monitoring for chaos experiment runs.

  • Adds pre-defined/usable workflows for repeatable execution of node resource chaos in the chaos-charts repo

  • Pushes the technical preview / pre-alpha version of the litmus-portal (available on the master branch).

  • Refactors the litmus-e2e repo/code-structure to simplify the addition of new BDD tests (modularization, removal of bash utils, formatted errors, klog usage, scenario coverage parameters)

  • Adds additional stages in litmus-e2e GitLab pipelines to execute both the go-based & ansible-based chaos experiments

  • Improves github-actions based comment-triggered e2e runs with log details

  • Features a completely revamped & improved ChaosHub

  • Improves the project wiki with more information for users and developers (architecture docs, video tutorials, charters for the Litmus Special Interest Groups)

Major Bug Fixes

  • Patches the chaosengine with the right (‘stopped’) and fixes the event to provide the right reason in cases where app filtering is unsuccessful. This will allow a re-apply of the engine to re-trigger the application.

  • Adds a check to factor-in cordoned (SchedulingDisabled) status of nodes in kubelet & docker-service kill experiments.

  • Provides the tc_image used in network chaos experiments as an experiment tunable over hardcoding in order to support users with internal image registries

  • Decides experiment termination based on chaos container status over that of chaos pod objects to support operations in a service-mesh environment (istio, linkerd) where all pods (including chaos resources) are injected with sidecars. Without this, the experiment runs forever due to the proxy sidecars.

  • Sets the restart policy of the experiments jobs to Never over OnFailure to prevent repeated re-execution for certain experiment failure conditions.

  • Fixes the incorrect eventType for chaos events in cases of failures & skipped executions.

  • Fixes the go-based pod-cpu-hog & pod-memory-hog experiments to execute the chaos processes (commands) in the target container by passing them as a args to shell instance (/bin/sh -c) to account for targets which may run with different entrypoints.

  • Fixes permission issues on the infra helm chart resulting in failed metrics collection

Breaking Changes

  • Stops support for the ansible-runner/executor (EoL) (Not to be confused with the ansible-based chaos experiments)

  • Removes the following repositories:

    • litmuschaos/pages: The operator manifests are available over gh-pages sourced out of litmuschaos/litmus

    • litmuschaos/chaos-helm: The experiments helm chart is also into the litmus-helm repo.

    • litmuschaos/community: The demo procedures & community info are now available within the litmus-demo & the litmus repo respectively.

Installation

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.7.0.yaml

Verify your installation

  • Verify if the chaos operator is running
    kubectl get pods -n litmus

  • Verify if chaos CRDs are installed
    kubectl get crds | grep chaos

For more details refer to the documentation at Docs