Skip to content

Commit

Permalink
es-313 Added configuration option for Scrape Interval in Target Alloc…
Browse files Browse the repository at this point in the history
…ator (#433)

* changelog

* added scrapeInterval default

* added brief note regarding configuring scrape interval in target allocator

* bump chart version

* version bumped to 91

* mdox fmt otel-integration/k8s-helm/README.md

* mdox fmt otel-integration/CHANGELOG.md

* changelog

* changlog

* updated dependencies

* update target allocator version

* version update
  • Loading branch information
daidokoro authored Aug 7, 2024
1 parent 98db4da commit 4b533e8
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 31 deletions.
8 changes: 5 additions & 3 deletions otel-integration/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## OpenTelemtry-Integration

### v0.0.94 / 2024-08-07
- [Feat] add support for configuring scrape interval for target allocator prometheus custom resource
- [CHORE] - Updated target allocator version to 0.105.0 in values.yaml

## v0.0.93 / 2024-08-06
- [Feat] Add more defaults for fleet management preset

Expand Down Expand Up @@ -233,9 +237,7 @@
- [FIX] Kubelet Stats use Node IP instead of Node name.

### v0.0.37 / 2023-11-27
* [:warning: BREAKING CHANGE] [FEATURE] Add support for span metrics preset. This replaces the deprecated `spanmetricsprocessor`
with `spanmetricsconnector`. The new connector is disabled by default, as opposed the replaces processor.
To enable it, set `presets.spanMetrics.enabled` to `true`.
* [:warning: BREAKING CHANGE] [FEATURE] Add support for span metrics preset. This replaces the deprecated `spanmetricsprocessor` with `spanmetricsconnector`. The new connector is disabled by default, as opposed the replaces processor. To enable it, set `presets.spanMetrics.enabled` to `true`.

### v0.0.36 / 2023-11-15
* [FIX] Change statsd receiver port to 8125 instead of 8127
Expand Down
10 changes: 5 additions & 5 deletions otel-integration/k8s-helm/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: v2
name: otel-integration
description: OpenTelemetry Integration
version: 0.0.93
version: 0.0.94
keywords:
- OpenTelemetry Collector
- OpenTelemetry Agent
Expand All @@ -11,22 +11,22 @@ keywords:
dependencies:
- name: opentelemetry-collector
alias: opentelemetry-agent
version: "0.88.5"
version: "0.88.6"
repository: https://cgx.jfrog.io/artifactory/coralogix-charts-virtual
condition: opentelemetry-agent.enabled
- name: opentelemetry-collector
alias: opentelemetry-agent-windows
version: "0.88.5"
version: "0.88.6"
repository: https://cgx.jfrog.io/artifactory/coralogix-charts-virtual
condition: opentelemetry-agent-windows.enabled
- name: opentelemetry-collector
alias: opentelemetry-cluster-collector
version: "0.88.5"
version: "0.88.6"
repository: https://cgx.jfrog.io/artifactory/coralogix-charts-virtual
condition: opentelemetry-cluster-collector.enabled
- name: opentelemetry-collector
alias: opentelemetry-gateway
version: "0.88.5"
version: "0.88.6"
repository: https://cgx.jfrog.io/artifactory/coralogix-charts-virtual
condition: opentelemetry-gateway.enabled
sources:
Expand Down
33 changes: 12 additions & 21 deletions otel-integration/k8s-helm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,8 +158,7 @@ type: Opaque
# Installation
> [!NOTE]
> With some Helm version (< `v3.14.3`), users might experience multiple warning messages during the installation about following:
> [!NOTE] With some Helm version (< `v3.14.3`), users might experience multiple warning messages during the installation about following:
>
> ```
> index.go:366: skipping loading invalid entry for chart "otel-integration" \<version> from \<path>: validation: more than one dependency with name or alias "opentelemetry-collector"
Expand Down Expand Up @@ -223,8 +222,7 @@ helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-
--render-subchart-notes -f values-crd-override.yaml --set global.clusterName=<cluster_name> --set global.domain=<domain>
```

> [!NOTE]
> Users might experience multiple warning messages during the installation about following:
> [!NOTE] Users might experience multiple warning messages during the installation about following:
>
> ```
> Warning: missing the following rules for namespaces: [get,list,watch]
Expand All @@ -245,8 +243,7 @@ helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-

This change will configure otel-agent pods to send span data to coralogix-opentelemetry-gateway deployment using [loadbalancing exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/loadbalancingexporter). Make sure to configure enough replicas and resource requests and limits to handle the load. Next, you will need to configure [tail sampling processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor) policies with your custom tail sampling policies.

When running in Openshift make sure to set `distribution: "openshift"` in your `values.yaml`.
When running in Windows environments, please use `values-windows-tailsampling.yaml` values file.
When running in Openshift make sure to set `distribution: "openshift"` in your `values.yaml`. When running in Windows environments, please use `values-windows-tailsampling.yaml` values file.

#### Why am I getting ResourceExhausted errors when using Tail Sampling?

Expand Down Expand Up @@ -277,6 +274,8 @@ If you're leveraging the Prometheus Operator custom resources (`ServiceMonitor`

If enabled, the target allocator will be deployed as a separate deployment in the same namespace as the collector. It will be responsible for allocating targets for the agent collector on each node, to scrape targets that reside on the given node (a form of simple sharding). If needed, you can run multiple instances of the target allocator for high availability. This can be achieved by setting the `opentelemetry-agent.targetAllocator.replicas` value to a number greater than 1.

You can specify the preferred scrape interval for the Prometheus Custom Resource by setting `opentelemetry-agent.targetAllocator.prometheusCR.scrapeInterval`, the default is `30s`

For more details on Prometheus custom resources and target allocator see the documentation [here](https://github.com/open-telemetry/opentelemetry-operator/tree/main/cmd/otel-allocator#discovery-of-prometheus-custom-resources).

### Installing the chart on clusters with mixed operating systems (Linux and Windows)
Expand Down Expand Up @@ -318,9 +317,7 @@ GKE Autopilot has limited access to host filesystems, host networking and host p
Notable important differences from regular `otel-integration` are:
- Host metrics receiver is not available, though you still get some metrics about the host through `kubeletstats` receiver.
- Kubernetes Dashboard does not work, due to missing Host Metrics.
- Host networking and host ports are not available, users need to send tracing spans through
Kubernetes Service. The Service uses `internalTrafficPolicy: Local`, to send traffic to locally
running agents.
- Host networking and host ports are not available, users need to send tracing spans through Kubernetes Service. The Service uses `internalTrafficPolicy: Local`, to send traffic to locally running agents.
- Log Collection works, but does not store check points. Restarting the agent will collect logs from the beginning.

To install otel-integration to GKE/Autopilot follow these steps:
Expand Down Expand Up @@ -369,8 +366,7 @@ Applications can send OTLP Metrics and Jaeger, Zipkin and OTLP traces to the loc

### Example Application environment configuration

The following code creates a new environment variable (`NODE`) containing the node's IP address and then uses that IP in the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable.
This ensures that each instrumented pod will send data to the local OTEL collector on the node it is currently running on.
The following code creates a new environment variable (`NODE`) containing the node's IP address and then uses that IP in the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable. This ensures that each instrumented pod will send data to the local OTEL collector on the node it is currently running on.

```yaml
env:
Expand Down Expand Up @@ -586,19 +582,16 @@ processors:

## Picking the right tracing SDK span processor

OpenTelemetry tracing SDK supports two strategies to create an application traces, a “SimpleSpanProcessor” and a “BatchSpanProcessor.”
While the SimpleSpanProcessor submits a span every time a span is finished, the BatchSpanProcessor processes spans in batches, and buffers them until a flush event occurs. Flush events can occur when the buffer is full or when a timeout is reached.
OpenTelemetry tracing SDK supports two strategies to create an application traces, a “SimpleSpanProcessor” and a “BatchSpanProcessor.” While the SimpleSpanProcessor submits a span every time a span is finished, the BatchSpanProcessor processes spans in batches, and buffers them until a flush event occurs. Flush events can occur when the buffer is full or when a timeout is reached.

Picking the right tracing SDK span processor can have an impact on the performance of the collector.
We switched our SDK span processor from SimpleSpanProcessor to BatchSpanProcessor and noticed a massive performance improvement in the collector:
Picking the right tracing SDK span processor can have an impact on the performance of the collector. We switched our SDK span processor from SimpleSpanProcessor to BatchSpanProcessor and noticed a massive performance improvement in the collector:

| Span Processor | Agent Memory Usage | Agent CPU Usage | Latency Samples |
|---------------------|--------------------|-----------------|-----------------|
| SimpleSpanProcessor | 3.7 GB | 0.5 | >1m40s |
| BatchSpanProcessor | 600 MB | 0.02 | >1s <10s |

In addition, it improved the buffer performance of the collector, when we used the SimpleSpanProcessor, the buffer queues were getting full very quickly,
and after switching to the BatchSpanProcessor, it stopped becoming full all the time, therefore stopped dropping data.
In addition, it improved the buffer performance of the collector, when we used the SimpleSpanProcessor, the buffer queues were getting full very quickly, and after switching to the BatchSpanProcessor, it stopped becoming full all the time, therefore stopped dropping data.

#### Example

Expand Down Expand Up @@ -693,15 +686,13 @@ Required settings:
- `mountPath`: specifies the path at which to mount the volume. This should correspond the mount path of your MySQL data volume. Provide this parameter without trailing slash.

Optional settings:
- `logFilesPath`: specifies which directory to watch for log files. This will typically be the MySQL data directory,
such as `/var/lib/mysql`. If not specified, the value of `mountPath` will be used.
- `logFilesPath`: specifies which directory to watch for log files. This will typically be the MySQL data directory, such as `/var/lib/mysql`. If not specified, the value of `mountPath` will be used.
- `logFilesExtension`: specifies which file extensions to watch for. Defaults to `.log`.

### Common issues

- Metrics collection is failing with error `"Error 1227 (42000): Access denied; you need (at least one of) the PROCESS privilege(s) for this operation"`
- This error indicates that the database user you provided does not have the required privileges to collect metrics. Provide the `PROCESS` privilege to the user, e.g. by running query
`GRANT PROCESS ON *.* TO 'user'@'%'`
- This error indicates that the database user you provided does not have the required privileges to collect metrics. Provide the `PROCESS` privilege to the user, e.g. by running query `GRANT PROCESS ON *.* TO 'user'@'%'`

### Example preset configuration for single instance

Expand Down
6 changes: 4 additions & 2 deletions otel-integration/k8s-helm/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ global:
defaultSubsystemName: "integration"
logLevel: "debug"
collectionInterval: "30s"
version: "0.0.93"
version: "0.0.94"

extensions:
kubernetesDashboard:
Expand Down Expand Up @@ -54,9 +54,11 @@ opentelemetry-agent:
allocationStrategy: "per-node"
prometheusCR:
enabled: true
# The interval at which the target allocator will scrape the Prometheus server
scrapeInterval: 30s
image:
repository: ghcr.io/open-telemetry/opentelemetry-operator/target-allocator
tag: v0.101.0
tag: v0.105.0

# Temporary feature gates to prevent breaking changes. Please see changelog for version 0.0.85 for more information.
command:
Expand Down

0 comments on commit 4b533e8

Please sign in to comment.