If you wish to access and manage metrics directly, you can do so via the Hawkular Metrics REST API.
The Hawkular Metrics documentation covers how to use the API, but there are a few differences when dealing with the version of Hawkular Metrics configured for use on OpenShift.
Alerting is now part of the Hawkular Metrics distribution, but it is currently lacking any integration with other OpenShift services and has not UI elements exposed in the console. If you wish to directly access the alerting API endpoints it is possible by following the Hawkular Metrics Alerting documentation.
Note
|
By default only read access is enabled for accessing the Hawkular Metrics REST API. To enable write access, you will need to add a Hawkular admin cluster policy and grant your user the To add the Hawkular cluster policy: $ oc create -f metrics-cluster-role.yaml To grant the $ oadm policy add-role-to-user hawkular-metrics-admin ${USER} To grant the $ oadm policy add-cluster-role-to-user hawkular-metrics-admin ${SERVICEACCOUNT} Note that typically the service account used for management is |
Hawkular Metrics is a multi-tenanted application. The way its been configured is that a project in OpenShift corresponds to a tenant in Hawkular Metrics. So if you are dealing with metrics in the awesome
project, these would correspond to a tenant named awesome
in Hawkular Metrics.
There is one special _system
tenant in Hawkular metrics which corresponds to system or node level metrics. This is where things like the cpu usage of the node would be stored.
The tenant is sent to the Hawkular Metrics server via the Hawkular-Tenant
header.
Hawkular Metrics accepts a bearer token from the client and verifies that token with the OpenShift server using a SubjectAccessReview. If the user has proper read privileges for the project, they will be allowed to read the metrics for that project.
For the _system
tenant, the user requesting to read from this tenant must have cluster-reader permission.
The bearer token is sent to the Hawkular Metrics server via the Authorization
header.
The following example assumes that the Hawkular Metrics route hostname is set to hawkular-metrics.example.com and that we are dealing with metrics from a project named test
. Retrieval of the bearer token is outside of the scope of these examples.
These just represent some very simple examples on how to query data directly from Hawkular Metrics. It is by no means an exhaustive list. For more detailed information please see the Hawkular Metrics documentation.
If you wish to get a list of all metrics on available under the project test
you can perform the following command
$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -H "Hawkular-Tenant: test" -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics | python -m json.tool
This will return a list of metrics in the following format:
[ ... }, { "id": "hawkular-cassandra-1/7186d9dc-9d30-11e5-90d9-3c970e88a56f/cpu/limit", "tags": { "container_base_image": "openshift/origin-metrics-cassandra:latest", "container_base_image_description": "User-defined image name that is run inside the container", "container_name": "hawkular-cassandra-1", "container_name_description": "User-provided name of the container or full container name for system containers", "descriptor_name": "cpu/limit", "group_id": "hawkular-cassandra-1/cpu/limit", "host_id": "192.168.122.1", "host_id_description": "Identifier specific to a host. Set by cloud provider or user", "hostname": "192.168.122.1", "hostname_description": "Hostname where the container ran", "labels": "...", "labels_description": "Comma-separated list of user-provided labels", "namespace_id": "fd5c9c31-8750-11e5-8b09-3c970e88a56f", "namespace_id_description": "The UID of namespace of the pod", "pod_id": "7186d9dc-9d30-11e5-90d9-3c970e88a56f", "pod_id_description": "The unique ID of the pod", "pod_name": "hawkular-cassandra-1-yoe3h", "pod_name_description": "The name of the pod", "pod_namespace": "openshift-infra", "pod_namespace_description": "The namespace of the pod", "resource_id_description": "Identifier(s) specific to a metric" }, "tenantId": "openshift-infra", "type": "gauge" }, { ... ]
As mention previously there is a special tenant called _system
which is used to store the metrics coming from the node itself and not coming from a particular container.
The Hawkular API access for the _system
tenant is exactly the same as for any of the other metrics access. Access to this tenant requires cluster-reader or cluster-admin level access.
For instance, taking our example above, to get all the metrics from the _system
tenant, you would just need to change the Hawkular-Tenant
to _system
:
$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -H "Hawkular-Tenant: _system" -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics | python -m json.tool
The system level metrics actually contain a few more metrics than those gathered by the container. Specifically it includes network and disk usage metrics.
The following command will list the ids for all the machine
level metrics:
$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -H "Hawkular-Tenant: _system" -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics | python -m json.tool | grep -i \"id\" | grep -i machine
For more information on what each of these metrics contains, please see the Heapster schema page.
You can further query down the result that you are looking for by querying based on tags.
Accessing all the cpu/usage
metrics
$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -H "Hawkular-Tenant: test" -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics?tags=descriptor_name:cpu/usage | python -m json.tool
Accessing all the cpu/usage
metrics in pod named myPod
$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -H "Hawkular-Tenant: test" -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics?tags=descriptor_name:cpu/usage,pod_name:myPod | python -m json.tool
Regular Expressions: Accessing all pods where the container_base_image
contains test
Regular expressions can also be used in tag queries. The following example will return all metrics where the container_base_image
contains test
:
$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -H "Hawkular-Tenant: test" -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics?tags=container_base_image:.*test.* | python -m json.tool
Regular Expressions: Accessing all pods where the container_base_image
start with test/
$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -H "Hawkular-Tenant: test" -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics?tags=container_base_image:test/.* | python -m json.tool
From the querying results above, you will notice that each metric contains an id
value. You can use this value to directly access the metric itself and the data it contains.
Accessing the counter metric with id 'hawkular-cassandra-1/7186d9dc-9d30-11e5-90d9-3c970e88a56f/cpu/usage'
$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -H "Hawkular-Tenant: test" -X GET https://hawkular-metrics.example.com/hawkular/metrics/counters/hawkular-cassandra-1%2F7186d9dc-9d30-11e5-90d9-3c970e88a56f%2Fcpu%2Fusage | python -m json.tool
Accessing the metric data for a counter metric with id 'hawkular-cassandra-1/7186d9dc-9d30-11e5-90d9-3c970e88a56f/cpu/usage' The following command will return the data for the metric for the last 10 minutes, placed into 5 buckets of 2 minutes each.
Note: date -d -10minutes +%s%3N
will return a start time 10 minutes ago in milliseconds
$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -H "Hawkular-Tenant: test" -X GET https://hawkular-metrics.example.com/hawkular/metrics/counters/hawkular-cassandra-1%2F7186d9dc-9d30-11e5-90d9-3c970e88a56f%2Fcpu%2Fusage/data?buckets=5\&start=`date -d -10minutes +%s%3N` | python -m json.tool
Accessing the metric rate data for a counter metric with id 'hawkular-cassandra-1/7186d9dc-9d30-11e5-90d9-3c970e88a56f/cpu/usage' The following command is the same as the previous one, but it return rate data instead of the raw data. Where the rate data is a delta between the previous values and not the absolute value.
This is useful for graphing cpu usage data as it gives you the usage between two points in time and not the absolute usage since the start of the container.
$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -H "Hawkular-Tenant: test" -X GET https://hawkular-metrics.example.com/hawkular/metrics/counters/hawkular-cassandra-1%2F7186d9dc-9d30-11e5-90d9-3c970e88a56f%2Fcpu%2Fusage/rate?buckets=5\&start=`date -d -10minutes +%s%3N` | python -m json.tool
You may want to consolidate metric data across various individual metrics.
Determining the average CPU usage across multiple pods
For the following example, we want to determine what the average cpu usage is for all containers within the 'openshift-infra' project.
$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -H "Hawkular-Tenant: test" -X GET https://hawkular-metrics.example.com/hawkular/metrics/counters/data?tags=descriptor_name:cpu/usage,pod_namespace:openshift-infra\&buckets=3\&start=`date -d -10minutes +%s%3N` | python -m json.tool
Get the CPU usage for all containers in a pod
Metric data is stored per individual containers, but you may want to get metric data based on pods instead of containers.
The following example shows how to get the cpu/usage
metric for all containers within a pod named myPod
.
Note that since we are looking for the overall usage of a pod, and not just the average usage, then we cannot use something like previous example. For this we need to use the stacked
option which will perform individual queries on the tags requested and then add the resulting buckets together.
So if the tag query matches two metrics and the average value for the bucket of the first metric is 5 and the average value of the second bucket is 10, then with stacked=true
the bucket returned will be 10.
$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -H "Hawkular-Tenant: test" -X GET https://hawkular-metrics.example.com/hawkular/metrics/counters/data?tags=descriptor_name:cpu/usage,pod_name:myPod\&stacked=true\&buckets=3\&start=`date -d -10minutes +%s%3N` | python -m json.tool
From the Heapster schema, what the cpu/usage
stores is the amount of time in nanoseconds that a core of the CPU has been in use since the container has started.
With just this raw data, it is not intuitive what this data really means or what you can do with it.
The following examples will take this data, along with the uptime
and cpu/limit
metrics and transform it into something more consumable.
Calculating CPU Core Percentage
A more intuitive measurement would be to calculate the CPU core usage.
Note
|
The calculations are based on the percentage of use for a single CPU core, therefor machine with multiple cores may report values higher than 100%. |
We know from the cpu/usage
metric how much of the CPU has been used in nanoseconds, but we also have access to the uptime
metric which specifies how long the container has been running in milliseconds.
From this it is just a simple calculation to determine the percentage of a CPU core used
% core = `cpu/usage` / ( `uptime` * 1000000 )
The steps involved to this is up to the client to determine. Hawkular Metrics does not currently support performing this type of calculation directly. The client will have to fetch the metric values from Hawkular Metrics and then perform the calculations itself.
To determine the CPU usage between a particular start and stop time requires the following steps:
-
usageStart = the
cpu/usage
at the start time -
usageEnd = the
cpu/usage
at the end time -
uptimeStart = fetch the
uptime
at the start time -
uptimeEnd = the
uptime
at the end time
core_percentage = ( $usageEnd - $usageStart ) / ( ($uptimeEnd - $uptimeStart) * 1000000 )
If you want to do the calculation for the total since the container has been running, then you will only need to fetch the cpu/usage
and uptime
at the end time. You do not need to fetch anything at the start time (since at the start both of the values will be 0).
Important
|
When a container is restarted, its |
Calculating CPU Millicore Percentage
If you want to convert percent of cores used into a millicore used metric instead you just need to multiple the percentage by 1000:
millicore_percentage = core_percentage * 1000
Calculating Percentage Usage Based on Limit
It is possible to set the limit on a container for both the cpu and memory usage. For limiting based on CPU, you specify the maximum millicores that a container is allowed to use.
For example, the following will startup the ruby-hello-world container and limits its usage CPU usage to 300 millicores:
apiVersion: "v1" kind: "Pod" metadata: name: test spec: containers: - image: openshift/ruby-hello-world name: test resources: limits: cpu: 300m
The cpu/limit
metric returns the cpu limit for that container in millicores.
From the previous step we showed how to calculate millicores used. To get the percentage used, we then just need to take that value and divide it by the cpu/limit
value.
limit_percentage = ( % millicores used ) / ( 'cpu/limit')