Name	Name	Last commit message	Last commit date
parent directory ..
gradle	gradle
src/main	src/main
.gitignore	.gitignore
README.md	README.md
build.gradle.kts	build.gradle.kts
docker-compose.yml	docker-compose.yml
gradlew	gradlew
gradlew.bat	gradlew.bat
open-telemetry.properties	open-telemetry.properties
settings.gradle.kts	settings.gradle.kts
telegraf.toml	telegraf.toml

agent

An example program-under-observation instrumented with the OpenTelemetry Java agent.

Overview

The tech stack in this subproject:

A program-under-observation
- This is a fictional "data processing" program written in Java. This program is instrumented with the OpenTelemetry Java agent.
A metrics sink/collector (Telegraf)
- Telegraf acts as a sink for the metrics pushed by the OpenTelemetry agent. Telegraf re-formats the metrics into an acceptable format for the metrics database and then writes the metrics into the database. Telegraf is acting as "collector" in the OpenTelemetry terminology.
A metrics database (InfluxDB)
- InfluxDB is an open source time series database that's usually used for metrics. Prometheus is an even more popular alternative. There are many vendor options, too, like Datadog.

OpenTelemetry defines a protocol and conventions, and as such it comes with a lot of client libraries that implement the protocol and conventions for metric-creation and metric-collection, but OpenTelemetry doesn't replace the database or visualization tool. Remember, OpenTelemetry is not a complete observability stack.

While OpenTelemetry operates in the realm of metrics, logs, and spans, I'm going to only implement a metrics example.

Instructions

Follow these instructions to build and run the example system.

Pre-requisites: Java and Docker
- I used Java 21.
Start infrastructure services
- ```
docker-compose up
```
- This starts Telegraf and InfluxDB.
- Pay attention to the output of these containers as they run. It's a tricky system to set up, and you'll want to know if there are any errors, like if Telegraf is unable to connect to InfluxDB.
Download the OpenTelemetry Java agent
- ```
AGENT_URL="https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v2.2.0/opentelemetry-javaagent.jar"
curl --location --output opentelemetry-javaagent.jar "$AGENT_URL"
```
- It's important that you use the --location (-L) flag because the GitHub URL redirects to some CDN URL at https://objects.githubusercontent.com/....
- Note: in a production codebase, it would be better to handle agent-related things (URL config, downloading the agent, Java options) in the Gradle build. Unfortunately, the Gradle code required (e.g. https://stackoverflow.com/a/20968466) is a bit cryptic and distracting, so it's better to do these steps manually for the sake of clarity and "learning the core concepts" instead of learning Gradle.
Build the program distribution
- ```
./gradlew installDist
```
- The distribution is in build/install/agent/. Notice the "start script" file at bin/agent. This script is generated by Gradle's built-in application plugin and the script provides extension points for us to some behavior. In particular, we'll use the environment variable JAVA_OPTS to set the -javaagent JVM option to instrument our program with the OpenTelemetry Java agent.

Run the program with the agent

JAVA_OPTS="-javaagent:$(pwd)/opentelemetry-javaagent.jar -Dotel.javaagent.configuration-file=$(pwd)/open-telemetry.properties" ./build/install/agent/bin/agent

The program will run indefinitely and continuously submit OTLP-based metrics data to the Telegraf server.

Inspect the metrics in InfluxDB directly

Start an influx session inside the InfluxDB container with the following command.

docker exec -it agent-influxdb-1 influx -precision rfc3339

The influx session may remind you of a SQL sessions. In it, you can run commands like SHOW DATABASES and SHOW MEASUREMENTS to explore the data. We named our database playground. You should be able to connect to it by issuing a use playground command. Then, execute a show measurements command, and hopefully it shows the following metrics that have flowed from our program through Telegraf and into the Influx database. It should look something like the following.

$ docker exec -it agent-influxdb-1 influx
Connected to http://localhost:8086 version 1.8.10
InfluxDB shell version: 1.8.10
> use playground
Using database playground
> show measurements
name: measurements
name
----
jvm.class.count
jvm.class.loaded
jvm.class.unloaded
jvm.cpu.count
jvm.cpu.recent_utilization
jvm.cpu.time
jvm.memory.committed
jvm.memory.limit
jvm.memory.used
jvm.memory.used_after_last_gc
jvm.thread.count
queueSize

Let's inspect the memory usage over time for our "data processing" program. This is captured in the jvm.memory.used metric. Look at the below snippet for an example. The output shows the memory usage in MiB over time, and it represents a typical sawtooth pattern.

> SELECT SUM(gauge) / 1024 / 1024 AS "MiB" FROM "jvm.memory.used" WHERE "jvm.memory.type" = 'heap' GROUP BY time(10s)
name: jvm.memory.used
time                 MiB
----                 ---
2024-03-30T19:38:00Z 49.924827575683594
2024-03-30T19:38:10Z 26.35131072998047
2024-03-30T19:38:20Z 31.070823669433594
2024-03-30T19:38:30Z 35.52025604248047
2024-03-30T19:38:40Z 40.108734130859375
2024-03-30T19:38:50Z 44.671913146972656
2024-03-30T19:39:00Z 49.11351776123047

Stop the Java program
- Press Ctrl+C to stop the program from the same terminal window where you ran the program.
Stop the infrastructure services
- ```
docker-compose down
```
- I think it's important to do a proper down command so that the network is cleaned up. Otherwise, you might experience some weirdness if you change the Docker Compose file and then try to bring the services back up. Not really sure.

Wish List

General clean-ups, TODOs and things I wish to implement for this project:

Reference

OpenTelemetry docs: Automatic Instrumentation
- This is what I'm using in this project.
OpenTelemetry: Semantic Conventions

The benefit to using Semantic Conventions is in following a common naming scheme that can be standardized across a codebase, libraries, and platforms.
- This is, to me, the strongest selling point in OpenTelemetry. Yet another specification can turn into "yet another abandoned specification on an ever-accumulating pile of noise". But, the sheer weight of OpenTelemetry and its adoption across vendors, libraries, marketing, and mind-share means that this "specification of conventions" has staying power. Good!
GitHub repo: influxdata/influxdb-observability

This repository is a reference for converting observability signals (traces, metrics, logs) to/from a common InfluxDB schema.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent

agent

README.md

agent

Overview

Instructions

Wish List

Reference

Files

agent

Directory actions

More options

Directory actions

More options

Latest commit

History

agent

Folders and files

parent directory

README.md

agent

Overview

Instructions

Wish List

Reference