DHIS2 Monitoring

General monitoring infrastructure

The DHIS2 monitoring infrastructure is designed to expose metrics related to the application runtime and other application-related information. Infrastucture related metrics (such as host metrics, Tomcat or Postgres) are not directly exposed by the application monitoring engine and they have to be collected separately.

The DHIS2 monitoring infrastructure is designed to work with Prometheus (https://prometheus.io/). Prometheus is, at its core, a time-series database that scrapes metrics from HTTP endpoints. A time series is a stream of timestamped values that belong to the same metric and the same labels. The labels cause the metrics to be multi-dimensional. Prometheus pulls data from a number of targets (as opposed to a push-based monitoring infrastructure). Prometheus is not an event-based system and this is very different from other time series databases. Prometheus is not designed to catch individual and punctual events in time (such as a service outage for example) but it is designed to gather pre-aggregated metrics about one or more services.

DHIS2 application monitoring

DHIS2 exposes a series of metrics which can be scraped by Prometheus. Currently, the metrics exposed by the application are:

DHIS2 API (response time, number of calls, etc.)
JVM (Heap size, Garbage collection, etc.)
Hibernate (Queries, cache, etc)
C3P0 Database pool
Application uptime
CPU

Metrics are exported by micrometer.io (http://micrometer.io/). Micrometer is a Java based framework that acts as a facade over the instrumentation clients for the most popular monitoring systems. It supports many monitoring engines, including Prometheus.

The complete set of metrics is available from the following API endpoint:

/api/metrics

API Monitoring

API monitoring works by intercepting incoming HTTP requests and records metrics about Spring MVC execution time and results.

The following metrics are exposed for each API endpoint exposed by DHIS2 (/api/*).

name	type	description
`seconds_max`	gauge	a moving window of the maximum recorded value in a client-side configurable interval
`seconds_count`	summary	the number of times the API has been called
`seconds_sum`	summary	the accumulated duration of the API call

Each metric exposes is additionally disaggregated by the following tags:

tag name	description
`method`	the HTTP method (for example, `GET` or `PUT`)
`status`	the numeric HTTP status code (for example, `200`, `201`, `500`)
`uri`	the URI template prior to variable substitution (for example, `/api/analytics/`)
`exception`	the simple name of the exception class thrown (only if an exception is thrown)
`outcome`	request’s outcome based on the status code of the response. `1xx` is INFORMATIONAL, `2xx` is SUCCESS, `3xx` is REDIRECTION, `4xx` CLIENT_ERROR, and `5xx` is SERVER_ERROR

This is an example of metrics for the /29/analytics endpoint:

dhis2_seconds_max{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/29/analytics",} 2.715992829
dhis2_seconds_sum{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/29/analytics",} 15.404253163
dhis2_seconds_count{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/29/analytics",} 40.0

JVM Monitoring

JVM Monitoring exposes a set of metrics related to the JVM used by the application.

name	type	description
`jvm_memory_used_bytes`	gauge	The amount of used memory
`jvm_memory_committed_bytes`	gauge	The amount of memory in bytes that is committed for the Java virtual machine to use
`jvm_memory_max_bytes`	gauge	The maximum amount of memory in bytes that can be used for memory management
`jvm_gc_pause_seconds`	summary	Time spent in GC pause
`jvm_gc_pause_seconds_max`	gauge	Time spent in GC pause
`jvm_gc_max_data_size_bytes`	gauge	Max size of old generation memory pool
`jvm_gc_live_data_size_bytes`	gauge	Size of old generation memory pool after a full GC
`jvm_gc_memory_promoted_bytes_total`	counter	Count of positive increases in the size of the old generation memory pool before GC to after GC
`jvm_gc_memory_allocated_bytes`	counter	Incremented for an increase in the size of the young generation memory pool after one GC to before the next
`jvm_classes_loaded_classes`	gauge	The number of classes that are currently loaded in the Java virtual machine
`jvm_classes_unloaded_classes_total`	counter	The total number of classes unloaded since the Java virtual machine has started execution
`jvm_threads_states_threads`	gauge	The current number of threads having NEW state
`jvm_threads_peak_threads`	gauge	The peak live thread count since the Java virtual machine started or peak was reset
`jvm_threads_live_threads`	gauge	The current number of live threads including both daemon and non-daemon threads
`jvm_threads_daemon_threads`	gauge	The current number of live daemon threads
`jvm_buffer_memory_used_bytes`	gauge	An estimate of the memory that the Java virtual machine is using for this buffer pool
`jvm_buffer_total_capacity_bytes`	gauge	An estimate of the total capacity of the buffers in this pool

Uptime status monitoring

name	type	description
`process_uptime_seconds`	gauge	The uptime of the Java virtual machine
`process_start_time_seconds`	gauge	Start time of the process since unix epoch.

CPU monitoring

name	type	description
`system_load_average_1m`	gauge	The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time
`system_cpu_count`	gauge	The number of processors available to the Java virtual machine
`process_cpu_usage`	gauge	The "recent cpu usage" for the Java Virtual Machine process
`system_cpu_usage`	gauge	The "recent cpu usage" for the whole system

C3P0 Connection pool monitoring

name	type	description
`jdbc_connections_idle`	gauge	Number of idle connections
`jdbc_connections_active`	gauge	Number of active connections
`jdbc_connections_max`	gauge	Number of max connections available in the pool
`jdbc_connections_min`	gauge	Number of min connections in the pool

Hibernate monitoring

name	type	description
`hibernate_sessions_open_total`	counter	Sessions opened
`hibernate_sessions_closed_total`	counter	Session closed
`hibernate_transactions_total`	counter	The number of transactions we know to have been successful
`hibernate_transactions_total`	counter	The number of transactions we know to have failed
`hibernate_optimistic_failures_total`	counter	The number of `StaleObjectStateExceptions` that have occurred
`hibernate_flushes_total`	counter	The global number of flushes executed by sessions (either implicit or explicit)
`hibernate_connections_obtained_total`	counter	Get the global number of connections asked by the sessions (the actual number of connections used may be much smaller depending whether you use a connection pool or not)
`hibernate_statements_total`	counter	The number of prepared statements that were acquired
`hibernate_statements_total`	counter	The number of prepared statements that were released
`hibernate_second_level_cache_requests_total`	counter	The number of cacheable entities/collections successfully retrieved from the cache
`hibernate_second_level_cache_requests_total`	counter	The number of cacheable entities/collections not found in the cache and loaded from the database
`hibernate_second_level_cache_puts_total`	counter	The number of cacheable entities/collections put in the cache
`hibernate_entities_deletes_total`	counter	The number of entity deletes
`hibernate_entities_fetches_total`	counter	The number of entity fetches
`hibernate_entities_inserts_total`	counter	The number of entity inserts
`hibernate_entities_loads_total`	counter	The number of entity loads
`hibernate_entities_updates_total`	counter	The number of entity updates
`hibernate_collections_deletes_total`	counter	The number of collection deletes
`hibernate_collections_fetches_total`	counter	The number of collection fetches
`hibernate_collections_loads_total`	counter	The number of collection loads
`hibernate_collections_recreates_total`	counter	The number of collections recreated
`hibernate_collections_updates_total`	counter	The number of collection updates
`hibernate_cache_natural_id_puts_total`	counter	The number of cacheable naturalId lookups put in cache
`hibernate_cache_natural_id_requests_total`	counter	The number of cached naturalId lookups successfully retrieved from cache
`hibernate_cache_natural_id_requests_total`	counter	The number of cached naturalId lookups not found in cache
`hibernate_query_natural_id_executions_max_seconds`	gauge	The maximum query time for naturalId queries executed against the database
`hibernate_query_natural_id_executions_total`	counter	The number of naturalId queries executed against the database
`hibernate_query_executions_total`	counter	The number of executed queries
`hibernate_query_executions_max_seconds`	gauge	The time of the slowest query

DHIS2 Monitoring architecture

The monitoring subsytem revolves around the concept of Micrometer registry. A MeterRegistry collects metrics and exports them in the specified format defined by the chosen monitoring platform. Each supported monitoring system has an implementation of MeterRegistry. Since DHIS2 supports Prometheus, the configuration class org.hisp.dhis.monitoring.prometheus.config.PrometheusMonitoringConfig is responsible for configuring and initializing the PrometheusMeterRegistry.

DHIS2 exports the metrics described above through micrometer exporters, which are configured as Spring @Configuration classes.

All the exporters are located in the dhis-support-system module, under the monitoring package.

The API exporter is configured as a Servlet filter. The webMetricsFilter filter is declared in the web.xml file of the dhis-web-portal module.

Finally, the PrometheusScrapeEndpointController is the controller responsible for exposing the /api/metrics endpoint. The controller simply access the PrometheusMeterRegistry and renders the metrics using the appropriate Prometheus-friendly format.

DHIS2 Monitoring configuration

The monitoring subsystem is disabled by default, and can be enabled by defining a set of properties in the dhis.conf DHIS 2 configuration file.

Each metrics cluster has to be explicitely enabled in order for the metrics to be exported. The metrics can be enabled by setting to on the following configuration keys (the default is off):

key name	value	metrics
`monitoring.api.enabled`	off \| on	API
`monitoring.jvm.enabled`	off \| on	JVM
`monitoring.dbpool.enabled`	off \| on	Connection Pool
`monitoring.hibernate.enabled`	off \| on	Hibernate
`monitoring.uptime.enabled`	off \| on	Uptime
`monitoring.cpu.enabled`	off \| on	CPU

Please note that the Hibernate metrics activates the Hibernate property hibernate.generate_statistics. Hibernate statistics are designed to help debug performance problems and are not supposed to be used on a production environment on a permanent basis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

monitoring.md

monitoring.md

DHIS2 Monitoring

General monitoring infrastructure

DHIS2 application monitoring

API Monitoring

JVM Monitoring

Uptime status monitoring

CPU monitoring

C3P0 Connection pool monitoring

Hibernate monitoring

DHIS2 Monitoring architecture

DHIS2 Monitoring configuration

Files

monitoring.md

Latest commit

History

monitoring.md

File metadata and controls

DHIS2 Monitoring

General monitoring infrastructure

DHIS2 application monitoring

API Monitoring

JVM Monitoring

Uptime status monitoring

CPU monitoring

C3P0 Connection pool monitoring

Hibernate monitoring

DHIS2 Monitoring architecture

DHIS2 Monitoring configuration