Skip to content

Latest commit

 

History

History

manual-instrumentation

manual-instrumentation

An example program-under-observation instrumented with OpenTelemetry using manual instrumentation.

Overview

In some cases, you may use the OpenTelemetry Java agent to instrument your program because it's powerful and requires no changes to the program's source code. OpenTelemetry refers to this style of instrumentation as Automatic Instrumentation. This is especially useful for third-party programs where you don't have access to the source code. For your own software projects, you may want to exercise more precise control over the exact dependencies, configuration, and behavior of the OpenTelemetry instrumentation. In this project, we instrument an example program the manual way. Refer to the OpenTelemetry docs on Manual Instrumentation for Java.

In the same spirit of exercising more control, we'll also opt out of auto configuration and instead configure the OpenTelemetry Java instrumentation directly. To take it a step further we'll opt out of the OkHttp-based OpenTelemetry sender because we would prefer to use the HTTP client built-in to JDK itself: java.net.http.HttpClient. We want to keep our dependencies to a minimum, so that our software maintenance burden is low. OkHttp itself brings in a dependency on Okio and the Kotlin standard library and runtime. Read more about the dependencies involved in exporting telemetry data in the Dependencies section of the OpenTelemetry Java instrumentation docs.

The tech stack in this subproject:

  • A program-under-observation
    • This is a fictional "data processing" program written in Java. This program is instrumented manually with the OpenTelemetry Java instrumentation libraries.
  • An HTTP/Protobuf-OTLP metrics collector (OpenTelemetry Collector)
    • This runs as a Docker container and receives metrics data pushed from the OpenTelemetry instrumentation in the program-under-observation. The OpenTelemetry Collector forwards the metrics data to the Telegraf server using gRPC.
  • A gRPC/Protobuf-OTLP metrics collector and ILP converter/forwarder (Telegraf)
    • This runs as a Docker container and accepts OTLP metrics from the OpenTelemetry Collector via gRPC, and then re-formats the metrics into an acceptable format for the metrics database (Influx Line Protocol) and then writes the metrics into the metrics backend (InfluxDB).
  • A metrics database (InfluxDB)
    • InfluxDB is an open source time series database that's often used for metrics.

Note: The fact that we're using two metrics collectors is silly. We're working around a patchy matrix of technology support (gRPC/HTTP/OTLP/ILP) among a matrix of telemetry and metrics systems (Influx/OpenTelemetry). We want our program-under-observation to be constrained to Protobuf and HTTP. We don't want to pay for gRPC support in our program. Unfortunately, Telegraf's OpenTelemetry receiver only supports gRPC, so we have to use the OpenTelemetry Collector as an intermediary. Relatedly, in the spirit of "keep it simple", check out OpenTelemetry's support for JSON-encoded OTLP data which is described in the *JSON Protobuf Encoding * section of the OTLP 1.0 spec. Can we remove the Protobuf dependency from our program-under-observation? Usually we're using JSON already. I'd rather send gzipped JSON than pay for the software maintenance of a Protobuf dependency.

Instructions

Follow these instructions to build and run the example system.

  1. Pre-requisites: Java and Docker
    • I used Java 21.
  2. Start infrastructure services
    • docker-compose up
    • This starts the OpenTelemetry Collector, Telegraf and InfluxDB.
    • Pay attention to the output of these containers as they run. It's a tricky system to set up, and you'll want to know if there are any errors, like if Telegraf is unable to connect to InfluxDB.
  3. Build the program distribution
    • ./gradlew installDist
  4. Run the program
    • ./build/install/manual-instrumentation/bin/manual-instrumentation
    • The program will run indefinitely and continuously submit OTLP-based metrics data to the OpenTelemetry Collector, and it will log metrics to the console. The program output should look something like the following.
    • 17:00:25 [main] INFO dgroomes.manual_instrumentation.Runner - Let's simulate some fictional data processing...
      17:00:25 [main] DEBUG io.opentelemetry.exporter.internal.http.HttpExporterBuilder - Using HttpSender: io.opentelemetry.exporter.sender.jdk.internal.JdkHttpSender
      17:00:25 [main] DEBUG io.opentelemetry.sdk.internal.JavaVersionSpecific - Using the APIs optimized for: Java 9+
      17:00:35 [PeriodicMetricReader-1] INFO io.opentelemetry.exporter.logging.LoggingMetricExporter - Received a collection of 12 metrics for export.
      17:00:35 [PeriodicMetricReader-1] INFO io.opentelemetry.exporter.logging.LoggingMetricExporter - metric: ImmutableMetricData{resource=Resource{schemaUrl=null, attributes={service.name="manual-instrumentation-server", service.version="0.1.0", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.36.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.runtime-telemetry-java8, version=2.2.0-alpha, schemaUrl=null, attributes={}}, name=jvm.cpu.time, description=CPU time used by the process as reported by the JVM., unit=s, type=DOUBLE_SUM, data=ImmutableSumData{points=[ImmutableDoublePointData{startEpochNanos=1711922425858898000, epochNanos=1711922435868239000, attributes={}, value=0.669478, exemplars=[]}], monotonic=true, aggregationTemporality=CUMULATIVE}}
      ... other metrics omitted ...
      
  5. Inspect the metrics in InfluxDB directly
    • Start an influx session inside the InfluxDB container with the following command.
    • docker exec -it manual-instrumentation-influxdb-1 influx -precision rfc3339
    • The influx session may remind you of a SQL session. In it, you can run commands like show databases and show measurements to explore the data. We named our database playground. You should be able to connect to it by issuing a use playground command. Then, execute a show measurements command, and hopefully it shows the following metrics that have flowed from our program through the OpenTelemetry Collector, then through Telegraf and finally into the Influx database. It should look something like the following.
    • $ docker exec -it manual-instrumentation-influxdb-1 influx
      Connected to http://localhost:8086 version 1.8.10
      InfluxDB shell version: 1.8.10
      > use playground
      Using database playground
      > show measurements
      name: measurements
      name
      ----
      jvm.class.count
      jvm.class.loaded
      jvm.class.unloaded
      jvm.cpu.count
      jvm.cpu.recent_utilization
      jvm.cpu.time
      jvm.gc.duration
      jvm.memory.committed
      jvm.memory.limit
      jvm.memory.used
      jvm.memory.used_after_last_gc
      jvm.thread.count
      
      
    • Let's inspect the memory usage over time for our "data processing" program. This is captured in the jvm.memory.used metric. Look at the below snippet for an example. The output shows the memory usage in MiB over time, and it represents a typical sawtooth pattern.
    • > SELECT SUM(gauge) / 1024 / 1024 AS "MiB" FROM "jvm.memory.used" WHERE "jvm.memory.type" = 'heap' GROUP BY time(10s)
      name: jvm.memory.used
      time                 MiB
      ----                 ---
      2024-03-30T19:29:00Z 15.698493957519531
      2024-03-30T19:29:10Z 25.57617950439453
      2024-03-30T19:29:20Z 35.45075225830078
      2024-03-30T19:29:30Z 11.872085571289062
      2024-03-30T19:29:40Z 21.716766357421875
      2024-03-30T19:29:50Z 30.989227294921875
      2024-03-30T19:30:00Z 41.054840087890625
      
  6. Stop the Java program
    • Press Ctrl+C to stop the program from the same terminal window where you ran the program.
  7. Stop the infrastructure services
    • docker-compose down
    • I think it's important to do a proper down command so that the network is cleaned up. Otherwise, you might experience some weirdness if you change the Docker Compose file and then try to bring the services back up. Not really sure.

Wish List

General clean-ups, TODOs and things I wish to implement for this project:

  • DONE Scaffold the project by copy/pasting from the agent project, but configure it with the logging exporter because I need to walk before I can run.
  • DONE Export OTLP to Telegraf
    • DONE Darn, the Telegraf OTLP receiver doesn't support the HTTP endpoint for OTLP data, only the gRPC endpoint. I'm going to explore the OpenTelemetry Collector instead.
  • DONE Configure the metrics export to every 10 seconds instead of every 60 seconds.
  • DONE Remove the auto-conf dependencies
  • DONE Do we need the semantic conventions dependency declaration? Isn't it already pulled in transitively?
  • DONE (done but there's only one lonesome log?) Get JUL-to-SLF4J working. It's nice to be able to debug the OpenTelemetry instrumentation and it's also nice to use SLF4J because we like it.
  • DONE (Upgraded to 2.x instrumentation) Are we using the legacy metric conventions? We want the 1.0 semantic conventions and I think you actually need to opt in to that.
  • Where is the Protobuf Java implementation shaded? Which of the OpenTelemetry dependencies brings it in?
  • DONE I need more predictable memory usage for the sake of the demo. Consider setting JVM min/max heap, and setting the Garbage collector, explicitly using 64-bit oops, etc.
  • SKIP (No I'll keep it, I just removed the calculations) Consider dropping the data processor stuff. It's a bit distracting. It's enough to report on memory usage because it varies with the natural work done by the OpenTelemetry Java machinery and other work happening in the JVM.
  • DONE Print metrics to the console using the logging exporter. This is convenient for demo purposes. Specifically, I really to compare OpenTelemetry's toString representation to the Influx Line Protocol representation.

Reference