In Kubernetes there are a few different ways to release an application, you have to carefully choose the right strategy to make Microsoft Azure infrastructure resilient.
- recreate: terminate the old version and release the new one
- Application Gateway Ingress Controller
- Azure Load Balancer + Istio service mesh add-on
- Azure Load Balancer + Web Application Routing add-on
- ramped: release a new version on a rolling update fashion, one
after the other
- Application Gateway Ingress Controller
- Azure Load Balancer + Istio service mesh add-on
- Azure Load Balancer + Web Application Routing add-on
- blue/green: release a new version alongside the old version
then switch traffic
- Application Gateway Ingress Controller
- Azure Load Balancer + Istio service mesh add-on
- Azure Load Balancer + Web Application Routing add-on
- canary: release a new version to a subset of users, then proceed
to a full rollout
- Application Gateway Ingress Controller
- Azure Load Balancer + Istio service mesh add-on
- Azure Load Balancer + Web Application Routing add-on
- a/b testing: release a new version to a subset of users in a precise way (HTTP headers, cookie, weight, etc.). This doesn’t come out of the box with Kubernetes, it imply extra work to setup a smarter loadbalancing system (Istio, Linkerd, Traeffik, custom nginx/haproxy, etc).
- Azure Load Balancer + Istio service mesh add-on
- Azure Load Balancer + Web Application Routing add-on
- shadow: release a new version alongside the old version. Incoming
traffic is mirrored to the new version and doesn't impact the
response.
- Azure Load Balancer + Istio service mesh add-on
- Azure Load Balancer + Web Application Routing add-on
Before experimenting, checkout the following resources:
- CNCF presentation
- CNCF presentation slides
- Kubernetes deployment strategies
- Six Strategies for Application Deployment.
- Canary deployment using Istio and Helm
- Automated rollback of Helm releases based on logs or metrics
Strategy | Use Applicationg Gateway Ingress Controller | Use Istio Service Mesh | Note |
---|---|---|---|
Recreate | Yes | Yes | Regardless of whether the Ingress Controller is selected or not. |
Ramped | Yes | Yes | One of the key functionalities of Kubernetes Deployment. |
Blue/Green | Yes | Yes | At the Kubernetes Deployment level, it is archieved by switching Kubernetes services. At the Kubernetes Cluster level, it is accomplished by utilizing Azure Traffic Manager or Azure FrontDoor for switching. |
Canary | Yes, but manually | Yes | Manually adjusting the number of replicas within the Kubernetes Deployment, or utilizing an Ingress Controller that supports the Traffic Shifting mechanism. |
A/B Testing | No | Yes | The ingress controller needs to have a rule match mechanism (e.g. HTTP headers, cookie, weight, etc.) to determine the direction of traffic. |
Shadow | No | Yes | Currently, achieving this requires the use of a service mesh such as Istio. |
Ingress Controller | Application Gateway Ingress Controller | Isito Ingress Gateway add-on | Web Application Routing add-on |
---|---|---|---|
Based on | Azure Application Gateway | Istio Ingress Gateway | Kubernetes Ingress-Nginx Controller |
Docs | AGIC | https://istio.io/ | https://kubernetes.github.io/ingress-nginx/ |
Managed by | Azure | Azure | Azure |
HTTP | Yes | Yes | Yes |
HTTPS | Yes | Yes | Yes |
TCP | No | Yes | Yes |
UDP | No | No | Yes |
Websocket | Yes | Yes | No |
These examples were created and tested on
Azure Service | Azure Support Status | Version | Dependencies |
---|---|---|---|
Azure Kubernetes Service | GA | v1.27.7 | N/A |
Azure Monitor managed service for Prometheus | GA | N/A | |
Azure Managed Grafana | GA | v9.5.6 (859a2654d3) | Azure Monitor managed service |
Azure Application Gateway Ingress Controller (AGIC) | GA | Standard_v2 | Azure Application Gateway |
Azure Service Mesh add-on (a.k.a Azure Managed Istio Service Mesh) | GA | v1.20 | Azure Load Balancer |
Web Application Routing add-on (a.k.a Azure Managed ingress-nginx) | GA | v1.2.1 | Azure Application Gateway |
Network Observability add-on | Preview | Azure Monitor managed service for Prometheus / Azure Container Insight |
$ cd ./deploy
$ ./deploy-aks.sh
$ kubectl apply -f ama-metrics-prometheus-config.yml
$ kubectl apply -f ama-metrics-settings-configmap.yml
Create a dashboard with a Time series or import the JSON export.
Use the following query:
sum(rate(http_requests_total{app="my-app"}[2m])) by (version)
Since we installed Azure Managed Prometheus with cutomized settings, it is using the short scrape
interval of 10s
so the range cannot be lower than that.
To have a better overview of the version, add {{version}}
in the legend field.
The given code is a Bash script that retrieves information about various Azure and Kubernetes resources and displays them.
#!/bin/bash
RESOURCE_GROUP_NAME="rg-poc-aks"
AGIC_NAME="agic-poc-aks"
GRAFANA_NAME="grafana-poc-aks"
AKS_RESOURCE_GROUPNAME=$(az aks show -n ${AKS_CLUSTER_NAME} -g ${RESOURCE_GROUP_NAME} --query "nodeResourceGroup" -o tsv)
APPGW_PIP=$(az network public-ip show --name ${AGIC_NAME}-appgwpip --resource-group ${AKS_RESOURCE_GROUPNAME} --query "ipAddress" -o tsv)
GRAFANA_URL=$(az grafana show -g ${RESOURCE_GROUP_NAME} -n ${GRAFANA_NAME} --query "properties.endpoint" -o tsv)
ISTIO_INGRESS_GATEWAY_PIP=$(kubectl get service -n aks-istio-ingress aks-istio-ingressgateway-external -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo
echo "Azure Application Gateway IP: ${APPGW_PIP}"
echo "Azure Managed Grafana URL: ${GRAFANA_URL}"
echo "Istio Ingress Gateway IP: ${ISTIO_INGRESS_GATEWAY_PIP}"
echo
This is a Python script that makes HTTP requests to a web link specified by the provided AGIC-PUBLIC-IP
address. The script uses the requests library to send GET requests to the specified URL. It has error handling mechanisms to handle different types of exceptions that might occur during the request.
# Install colorama for colorized output if not already installed
pip3 install colorama
# Run the script
# Example:
# ./curl.py x.x.x.x
./curl.py $AGIC-PUBLIC-IP
# Run the script with a custom header
# Example:
# ./curl.py x.x.x.x test.aks.aliez.tw
./curl.py $AGIC-PUBLIC-IP $HEADER_HOST
The script continues to run indefinitely, making periodic requests to the web link and monitoring for errors.
Recreate:
Ramped:
Blue/Green:
Canary:
A/B testing:
Shadow:
it's a known issue, and wait to fix the issue.
Workaround with manual join permission before fixing the issue
- Click "Azure Managed Grafana"
- Click "Access control (IAM)"
- Click "Add role assignment"
- Select "Job function role: Grafana Admin" and Next
- Click "+Select members" and choose your user account
- Click "Review + assign"
- Wait 3 mins for the role to be assigned
- Login Grafana dashboard
Based on Troubleshoot collection of Prometheus metrics in Azure Monitor
kubectl port-forward ama-metrics-* -n kube-system 9090
Strong recommendation to read Minimizing Downtime During Deployments
- Regarding the
spec.terminationGracePeriodSeconds
parameter, please refer to ramped/app-v1.yaml#30 - Regarding the
spec.containers[0].lifecycle.preStop
parameter, please refer to ramped/app-v1.yaml#L53-L56 - Add connection draining annotation to the Ingress read by AGIC to allow for in-flight connections to complete, please refer to ramped/app-v1.yaml#L65-L66