All notable changes to this project will be documented in this file.
Some of these changes may include:
- Changes to number of pods running in any given namespace
- New namespaces added or namespaces removed (components added/components removed)
- New monitoring or alerting changes and link to relevant SOP
- Any networking-related changes (e.g. not needing the extra router shard in 1.5)
- Any changes to roles, users, or other permissions.
- Any changes to backups, or to restore procedures.
- Changes in resource requirements (num pods, ram, cpu, containers)
- [INTLY-10338] - Remove ServiceMonitor CR for zync component of 3Scale
- [INTLY-10158] - Fixed issue with backup-container where it could sometimes fail because it ran out of inodes.
- [INTLY-10129] - Fixed issue with backup-container where it would needlessly retry Enmasse PV backup if a file had changed, causing alerts to fire as the backup was taking longer than they should.
- [INTLY-9069] - Update the alert manager routing of the UnifiedPushJavaNonHeapThresholdExceeded alert from "critical" to "null". Note: Will still appear as "critical" in prometheus.
- [INTLY-8750] - Update the alert manager routing of the enmasse "RestartingPods" alert from "critical" to "default" (Warning). Note: Will still appear as "critical" in prometheus.
- [INTLY-9947] - Added
TargetDown
andBlackboxTargetDown
alerts - [INTLY-9949] - Update keycloak operator version 1.10.1. Lower the severity of sso alerts that do not meet the cssre critical alert criteria to warning.
- [INTLY-9471] - Update Keycloak readiness/liveness probe
- [INTLY-9948] - Lower the severity of threescale alerts that do not meet the cssre critical alert criteria to warning.
- [INTLY-9907] - Lower all generic Kube* alerts to warning in 1.x
- [INTLY-9909] - Creation of RHMI service endpoints alerts and accompanying SOP
- [INTLY-3623] - Refactor of inventories and associated group_vars to support POC, OSD and PDS environments
- [INTLY-3847] - Update Alert Manager emails to include cluster URL and timestamps
- [INTLY-5856] - Improve resiliency of sso/user-sso w/ 2nd replica and qos of postgres pods up from BestEffort to Burstable
- [INTLY-2544] - Allow customer-admins view 3Scale logs in Kibana
- [INTLY-6525] - Updated heimdall version to release-1.0.1
- [INTLY-7813] - New Alerts for Node CPU & Memory utilisation
- [INTLY-8459] - Fix 3Scale probe alerts
- [INTLY-8601] - Add dummy/null receiver for UnifiedPushJavaHeapThresholdExceeded alert
- [INTLY-8413] - Update PV usage alerts to match upstream kubernetes-mixin
- [INTLY-8600] - Updated SSOPodCount alert to check for at least 2 sso pods to allow for scaling of pods
- [INTLY-9132] - Update alertmanager config during upgrade
- Removed unused templates from UPS role
- [INTLY-9048] - Removed CronJobSuspended alert
- [INTLY-8385] - Added SOPs to 5 new alerts in 1.7.0
- [INTLY-8386] - Route RouterMeshConnectivityHealth and RouterMeshUndeliveredHealth to critical receiver