An example program-under-observation instrumented with OpenTelemetry using manual instrumentation.
In some cases, you may use the OpenTelemetry Java agent to instrument your program because it's powerful and requires no changes to the program's source code. OpenTelemetry refers to this style of instrumentation as Automatic Instrumentation. This is especially useful for third-party programs where you don't have access to the source code. For your own software projects, you may want to exercise more precise control over the exact dependencies, configuration, and behavior of the OpenTelemetry instrumentation. In this project, we instrument an example program the manual way. Refer to the OpenTelemetry docs on Manual Instrumentation for Java.
In the same spirit of exercising more control, we'll also opt out of auto configuration and instead configure the
OpenTelemetry Java instrumentation directly. To take it a step further we'll opt out of the OkHttp-based OpenTelemetry
sender because we would prefer to use the HTTP client built-in to JDK itself: java.net.http.HttpClient
. We want to
keep our dependencies to a minimum, so that our software maintenance burden is low. OkHttp itself brings in a dependency
on Okio and the Kotlin standard library and runtime. Read more about the dependencies involved in exporting telemetry
data in the Dependencies section of
the OpenTelemetry Java instrumentation docs.
The tech stack in this subproject:
- A program-under-observation
- This is a fictional "data processing" program written in Java. This program is instrumented manually with the OpenTelemetry Java instrumentation libraries.
- An HTTP/Protobuf-OTLP metrics collector (OpenTelemetry Collector)
- This runs as a Docker container and receives metrics data pushed from the OpenTelemetry instrumentation in the program-under-observation. The OpenTelemetry Collector forwards the metrics data to the Telegraf server using gRPC.
- A gRPC/Protobuf-OTLP metrics collector and ILP converter/forwarder (Telegraf)
- This runs as a Docker container and accepts OTLP metrics from the OpenTelemetry Collector via gRPC, and then re-formats the metrics into an acceptable format for the metrics database (Influx Line Protocol) and then writes the metrics into the metrics backend (InfluxDB).
- A metrics database (InfluxDB)
- InfluxDB is an open source time series database that's often used for metrics.
Note: The fact that we're using two metrics collectors is silly. We're working around a patchy matrix of technology support (gRPC/HTTP/OTLP/ILP) among a matrix of telemetry and metrics systems (Influx/OpenTelemetry). We want our program-under-observation to be constrained to Protobuf and HTTP. We don't want to pay for gRPC support in our program. Unfortunately, Telegraf's OpenTelemetry receiver only supports gRPC, so we have to use the OpenTelemetry Collector as an intermediary. Relatedly, in the spirit of "keep it simple", check out OpenTelemetry's support for JSON-encoded OTLP data which is described in the *JSON Protobuf Encoding * section of the OTLP 1.0 spec. Can we remove the Protobuf dependency from our program-under-observation? Usually we're using JSON already. I'd rather send gzipped JSON than pay for the software maintenance of a Protobuf dependency.
Follow these instructions to build and run the example system.
- Pre-requisites: Java and Docker
- I used Java 21.
- Start infrastructure services
-
docker-compose up
- This starts the OpenTelemetry Collector, Telegraf and InfluxDB.
- Pay attention to the output of these containers as they run. It's a tricky system to set up, and you'll want to know if there are any errors, like if Telegraf is unable to connect to InfluxDB.
-
- Build the program distribution
-
./gradlew installDist
-
- Run the program
-
./build/install/manual-instrumentation/bin/manual-instrumentation
- The program will run indefinitely and continuously submit OTLP-based metrics data to the OpenTelemetry Collector, and it will log metrics to the console. The program output should look something like the following.
-
17:00:25 [main] INFO dgroomes.manual_instrumentation.Runner - Let's simulate some fictional data processing... 17:00:25 [main] DEBUG io.opentelemetry.exporter.internal.http.HttpExporterBuilder - Using HttpSender: io.opentelemetry.exporter.sender.jdk.internal.JdkHttpSender 17:00:25 [main] DEBUG io.opentelemetry.sdk.internal.JavaVersionSpecific - Using the APIs optimized for: Java 9+ 17:00:35 [PeriodicMetricReader-1] INFO io.opentelemetry.exporter.logging.LoggingMetricExporter - Received a collection of 12 metrics for export. 17:00:35 [PeriodicMetricReader-1] INFO io.opentelemetry.exporter.logging.LoggingMetricExporter - metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={service.name="manual-instrumentation-server", service.version="0.1.0", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.36.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.runtime-telemetry-java8, version=2.2.0-alpha, schemaUrl=null, attributes={}}, name=jvm.cpu.time, description=CPU time used by the process as reported by the JVM., unit=s, type=DOUBLE_SUM, data=ImmutableSumData{points=[ImmutableDoublePointData{startEpochNanos=1711922425858898000, epochNanos=1711922435868239000, attributes={}, value=0.669478, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}} ... other metrics omitted ...
-
- Inspect the metrics in InfluxDB directly
- Start an
influx
session inside the InfluxDB container with the following command. -
docker exec -it manual-instrumentation-influxdb-1 influx -precision rfc3339
- The
influx
session may remind you of a SQL session. In it, you can run commands likeshow databases
andshow measurements
to explore the data. We named our databaseplayground
. You should be able to connect to it by issuing ause playground
command. Then, execute ashow measurements
command, and hopefully it shows the following metrics that have flowed from our program through the OpenTelemetry Collector, then through Telegraf and finally into the Influx database. It should look something like the following. -
$ docker exec -it manual-instrumentation-influxdb-1 influx Connected to http://localhost:8086 version 1.8.10 InfluxDB shell version: 1.8.10 > use playground Using database playground > show measurements name: measurements name ---- jvm.class.count jvm.class.loaded jvm.class.unloaded jvm.cpu.count jvm.cpu.recent_utilization jvm.cpu.time jvm.gc.duration jvm.memory.committed jvm.memory.limit jvm.memory.used jvm.memory.used_after_last_gc jvm.thread.count
- Let's inspect the memory usage over time for our "data processing" program. This is captured in the
jvm.memory.used
metric. Look at the below snippet for an example. The output shows the memory usage in MiB over time, and it represents a typical sawtooth pattern. -
> SELECT SUM(gauge) / 1024 / 1024 AS "MiB" FROM "jvm.memory.used" WHERE "jvm.memory.type" = 'heap' GROUP BY time(10s) name: jvm.memory.used time MiB ---- --- 2024-03-30T19:29:00Z 15.698493957519531 2024-03-30T19:29:10Z 25.57617950439453 2024-03-30T19:29:20Z 35.45075225830078 2024-03-30T19:29:30Z 11.872085571289062 2024-03-30T19:29:40Z 21.716766357421875 2024-03-30T19:29:50Z 30.989227294921875 2024-03-30T19:30:00Z 41.054840087890625
- Start an
- Stop the Java program
- Press
Ctrl+C
to stop the program from the same terminal window where you ran the program.
- Press
- Stop the infrastructure services
-
docker-compose down
- I think it's important to do a proper
down
command so that the network is cleaned up. Otherwise, you might experience some weirdness if you change the Docker Compose file and then try to bring the services back up. Not really sure.
-
General clean-ups, TODOs and things I wish to implement for this project:
- DONE Scaffold the project by copy/pasting from the agent project, but configure it with the logging exporter because I need to walk before I can run.
- DONE Export OTLP to Telegraf
- DONE Darn, the Telegraf OTLP receiver doesn't support the HTTP endpoint for OTLP data, only the gRPC endpoint. I'm going to explore the OpenTelemetry Collector instead.
- DONE Configure the metrics export to every 10 seconds instead of every 60 seconds.
- DONE Remove the auto-conf dependencies
- DONE Do we need the semantic conventions dependency declaration? Isn't it already pulled in transitively?
- DONE (done but there's only one lonesome log?) Get JUL-to-SLF4J working. It's nice to be able to debug the OpenTelemetry instrumentation and it's also nice to use SLF4J because we like it.
- DONE (Upgraded to 2.x instrumentation) Are we using the legacy metric conventions? We want the 1.0 semantic conventions and I think you actually need to opt in to that.
- Where is the Protobuf Java implementation shaded? Which of the OpenTelemetry dependencies brings it in?
- Interesting: https://github.com/open-telemetry/opentelemetry-java/blob/f1deb8ec78cd446bc6310b1528a5d71e1d42989e/exporters/common/src/main/java/io/opentelemetry/exporter/internal/http/HttpExporter.java#L24. HTTP/JSON is implemented in the Java library? Can I wholesale use this instead of gRPC/Protobuf?
- Why is there an
opentelemetry-exporter-common
module and anopentelemetry-exporter-otlp
module at the same time? Doesn't OpenTelemetry only support exporting OTLP data? Or, I think you can export Jaeger and Prometheus data too (I think those have first class support). I wish there was an Influx Line Protocol exporter. There has to be one out there somewhere.
- DONE I need more predictable memory usage for the sake of the demo. Consider setting JVM min/max heap, and setting the Garbage collector, explicitly using 64-bit oops, etc.
- SKIP (No I'll keep it, I just removed the calculations) Consider dropping the data processor stuff. It's a bit distracting. It's enough to report on memory usage because it varies with the natural work done by the OpenTelemetry Java machinery and other work happening in the JVM.
- DONE Print metrics to the console using the logging exporter. This is convenient for demo purposes. Specifically, I really to compare OpenTelemetry's toString representation to the Influx Line Protocol representation.
- OpenTelemetry docs: Manual Instrumentation for Java
-
Manual instrumentation is the act of adding observability code to an app yourself.
-
- OpenTelemetry JVM Runtime Metrics library
- OpenTelemetry Collector
-
Vendor-agnostic way to receive, process and export telemetry data.
-