Skip to content

Commit

Permalink
Gangams/ci dev (#359)
Browse files Browse the repository at this point in the history
* add docker.version file

* move installer files from pal directory

* update configure file

* wip

* clean up makefile and installer datafiles

* fix ai import casing

* re add data files

* update makefile

* update make file

* remove super project git references

* remove legacy host agent code

* merge windows code

* add docs, troubleshoot and alerting

* move windows code to source dir

* refactor the scripts

* remove unnecessary files

* fix path certificategenerator.zip

* bring windows code back

* get latest changes from ci_feature

* remove redundant files

* fix liveness probe path issue

* rename the dir names

* clean up

* clean up

* omi dependencies

* remove redundant files

* update release notes

* merge latest changes for 05222020

* removed weird whitespaces

* update readme with windows version

* update readme with windows agent build instructions

* reorganize scripts

* minor update

* update readme

* re-oragnize the code structure and add build script for windows agent

* update readme

* updates to readme and clean up

* fix build errors

* fix build errors

* fix build errors

* update readme

* update main and setup scripts

* fix path issue in windows docker file

* readme updates

* windows agent path issue

* resolve ai package name conflict

* rename things

* rename file names to have uniform casing

* get ridoff glide files

* add version file

* merge latest chanegs in ci_feature branch

* do go get as part of the build

* doc updates

* read me update

* update docker file

* reorganize the windows code

* reorganize windows code

* update gitignore to not include files under windows

* clean up files

* update readme file

* update to use version info for windows agent

* merge Makefile.common into Makefile

* fix build issues

* fix build error

* add sudo for go commands

* remove go get from makefile

* update to use go1.14.1

* fix bug in windows Makefile script

* fix build error

* remove weird char in makefile.ps1

* readme update

* fix windows build issue

* read me update

* version update to synch with ciprod05262020 release

* add omsagent-ai-res-id yaml

* build with -buildmode=c-shared for out_oms.so for windows agent

* final yaml updates

* use version info for linux go so file

* fix build error

* readme updates

* update readme

* add shell script for installing for linux agent pre-requisites

* take latest release notes

* readme update

* disable monitoring addon script

* pr feedback

* pr feedback

* update disable addon script

* clean up code

* clean up commented code

* pr feedback

* wip

* wip

* refactor scripts to managed

* wip

* cleanup scripts

* update for aks

* add more validation

* Wip

* update powershell script

* add disable monitoring powershell script

* fix bugs with disable script

* fix bugs in ps scripts

* update readme

* re-arrange source code

* re-arrange test code

* update readme

* update path

* move go code under src dir

* update build dependencies to work on wsl

* update readme with code of conduct

* final readme updates

Co-authored-by: root <[email protected]>
  • Loading branch information
ganga1980 and root authored Jun 8, 2020
1 parent 057ce68 commit 7e83b66
Show file tree
Hide file tree
Showing 318 changed files with 15,462 additions and 22,758 deletions.
15 changes: 10 additions & 5 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,14 @@
/build/config.mak

# Unit test files

/test/code/providers/TestScriptPath.h
/test/code/providers/providertestutils.cpp
source/code/go/src/plugins/profiling
source/plugins/go/src/profiling
.vscode/launch.json
source/code/go/src/plugins/vendor/
source/plugins/go/src/vendor/
# go shared object files
*.so
*.zip
# .net code build artificats
build/windows/installer/certificategenerator/bin
build/windows/installer/certificategenerator/obj
# files under omsagentwindows dir since this temp directory to build the docker image
kubernetes/windows/omsagentwindows
4 changes: 2 additions & 2 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Build-Docker-Provider
Docker-Provider

Copyright (c) Microsoft Corporation
All rights reserved.
All rights reserved.

MIT License

Expand Down
490 changes: 250 additions & 240 deletions README.md

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions Rakefile
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
require 'rake/testtask'
require "rake/testtask"

task default: "test"

Rake::TestTask.new do |task|
task.libs << "test"
task.pattern = './test/code/plugin/health/*_spec.rb'
task.warning = false
end
task.libs << "test"
task.pattern = "./test/unit-tests/plugins/health/*_spec.rb"
task.warning = false
end
277 changes: 277 additions & 0 deletions ReleaseNotes.md

Large diffs are not rendered by default.

25 changes: 25 additions & 0 deletions alerts/NotReadyQuery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
```
let endDateTime = now();
let startDateTime = ago(1h);
let trendBinSize = 1m;
let clusterName = 'YOURCLUSTERNAME';
KubeNodeInventory
| where TimeGenerated < endDateTime
| where TimeGenerated >= startDateTime
| distinct ClusterName, Computer, TimeGenerated
| summarize ClusterSnapshotCount = count() by bin(TimeGenerated, trendBinSize), ClusterName, Computer
| join hint.strategy=broadcast kind=inner (
KubeNodeInventory
| where TimeGenerated < endDateTime
| where TimeGenerated >= startDateTime
| summarize TotalCount = count(), ReadyCount = sumif(1, Status contains ('Ready'))
by ClusterName, Computer, bin(TimeGenerated, trendBinSize)
| extend NotReadyCount = TotalCount - ReadyCount
) on ClusterName, Computer, TimeGenerated
| project TimeGenerated,
ClusterName,
Computer,
ReadyCount = todouble(ReadyCount) / ClusterSnapshotCount,
NotReadyCount = todouble(NotReadyCount) / ClusterSnapshotCount
| order by ClusterName asc, Computer asc, TimeGenerated desc
```
24 changes: 24 additions & 0 deletions alerts/NotReadyQueryChart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
```
let endDateTime = now();
let startDateTime = ago(1h);
let trendBinSize = 1m;
let clusterName = 'YOURCLUSTERNAME'; //can remove references for this from the query to show data for all clusters
KubeNodeInventory
| where TimeGenerated < endDateTime
| where TimeGenerated >= startDateTime
| where ClusterName == clusterName
| distinct ClusterName, TimeGenerated
| summarize ClusterSnapshotCount = count() by Timestamp = bin(TimeGenerated, trendBinSize), ClusterName
| join hint.strategy=broadcast (
KubeNodeInventory
| where TimeGenerated < endDateTime
| where TimeGenerated >= startDateTime
| summarize TotalCount = count(), ReadyCount = sumif(1, Status contains ('Ready'))
by ClusterName, Timestamp = bin(TimeGenerated, trendBinSize)
| extend NotReadyCount = TotalCount - ReadyCount
) on ClusterName, Timestamp
| project Timestamp,
ReadyCount = todouble(ReadyCount) / ClusterSnapshotCount,
NotReadyCount = todouble(NotReadyCount) / ClusterSnapshotCount
| render timechart
```
33 changes: 33 additions & 0 deletions alerts/PendingPodCount.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
```
let endDateTime = now();
let startDateTime = ago(1h);
let trendBinSize = 1m;
let clusterName = 'YOURCLUSTERNAME';
KubePodInventory
| where TimeGenerated < endDateTime
| where TimeGenerated >= startDateTime
| where ClusterName == clusterName
| distinct ClusterName, TimeGenerated
| summarize ClusterSnapshotCount = count() by bin(TimeGenerated, trendBinSize), ClusterName
| join hint.strategy=broadcast (
KubePodInventory
| where TimeGenerated < endDateTime
| where TimeGenerated >= startDateTime
| distinct ClusterName, Computer, PodUid, TimeGenerated, PodStatus
| summarize TotalCount = count(),
PendingCount = sumif(1, PodStatus =~ 'Pending'),
RunningCount = sumif(1, PodStatus =~ 'Running'),
SucceededCount = sumif(1, PodStatus =~ 'Succeeded'),
FailedCount = sumif(1, PodStatus =~ 'Failed')
by ClusterName, bin(TimeGenerated, trendBinSize)
) on ClusterName, TimeGenerated
| extend UnknownCount = TotalCount - PendingCount - RunningCount - SucceededCount - FailedCount
| project TimeGenerated,
TotalCount = todouble(TotalCount) / ClusterSnapshotCount,
PendingCount = todouble(PendingCount) / ClusterSnapshotCount,
RunningCount = todouble(RunningCount) / ClusterSnapshotCount,
SucceededCount = todouble(SucceededCount) / ClusterSnapshotCount,
FailedCount = todouble(FailedCount) / ClusterSnapshotCount,
UnknownCount = todouble(UnknownCount) / ClusterSnapshotCount
| summarize AggregatedValue = avg(PendingCount) by bin(TimeGenerated, trendBinSize)
```
26 changes: 26 additions & 0 deletions alerts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# How to set up alerts for performance problems in Azure Monitor for containers

Azure Monitor for containers monitors the performance of container workloads deployed to either Azure Container Instances or managed Kubernetes clusters hosted on Azure Kubernetes Service (AKS). To enable monitoring, you will need to first create alert rules using kusto queries. This article will provide information on how to create alert rules with sample alerting queries.

### How to create alert rules
For step by step procedures on how to create alert rules, please go [here.](https://docs.microsoft.com/en-us/azure/azure-monitor/insights/container-insights-alerts#create-alert-rule)

### Alerting situations (Queries):
- [Node CPU and memory utilization exceeds your defined threshold](https://docs.microsoft.com/en-us/azure/azure-monitor/insights/container-insights-alerts#resource-utilization-log-search-queries)
- [Pod CPU or memory utilization within a controller exceeds your defined threshold as compared to the set limit](https://docs.microsoft.com/en-us/azure/azure-monitor/insights/container-insights-alerts#resource-utilization-log-search-queries)
- ["NotReady" Status Node counts](NotReadyQuery.md)
- [Pod phase counts (Failed, Pending, Unknown, Running, Succeeded)](PendingPodCount.md)

#### *Note on the queries*
- Make sure to change the cluster name to your cluster.
```let clusterName = 'YOURCLUSTERNAME';```

- *Alert by Pod Phases:* To alert on certain pod phases such as Pending, Failed, or Unknown, you will need to modify the last line of the query in [Pod phase counts](PendingPodCount.md).
For example) Alert on FailedCount
```| summarize AggregatedValue = avg(FailedCount) by bin(TimeGenerated, trendBinSize) ```

- *View in Chart*: If you want to see what the query does in the chart, go to Log Analytics and replace the last line that starts with ```| summarize ...``` to ```| render timechart```. Also you can change the start date time and duration by modifying the following:
```
let startDateTime = startofday(ago(14d));
let trendBinSize = 1d;
```
Loading

0 comments on commit 7e83b66

Please sign in to comment.