Skip to content

Commit

Permalink
Migrate all Datadog metrics to Prometheus (#5471)
Browse files Browse the repository at this point in the history
**Story card:**
[sc-13408](https://app.shortcut.com/simpledotorg/story/13408/migrate-pending-custom-metrics-to-prometheus-from-datadog)

## Because

We're migrating all our monitoring infrastructure to Prometheus. Custom
metrics that StatsD and Datadog were previously collecting need to be
moved to Prometheus as well.

## This addresses

- Updated the Metrics class to use Prometheus to gather metrics
- Updated the event naming convention from `prefix.event.type` to
`prefix_event{<labels of type>}`
- StatsD supported the following types of metrics `[Gauge, Counter,
Timer, Histogram, Distribution]` while Prometheus only supports `[Gauge,
Counter, Histogram, Summary]`. Using [this
exporter](https://github.com/prometheus/statsd_exporter?tab=readme-ov-file#metric-mapping-and-configuration)
as a reference to migrating the type
- Added an `at_exit` hook for the Rake tasks to send metrics before exit

## Note
- StatsD and Datadog are not being removed from this PR. We still use
Datadog for logging and traces.

## Sample Metrics

```
# HELP simple_reporting_views_refresh_duration_seconds 
# TYPE simple_reporting_views_refresh_duration_seconds gauge
simple_reporting_views_refresh_duration_seconds{view="latest_blood_pressures_per_patient_per_months"} 0.02536500000860542
simple_reporting_views_refresh_duration_seconds{view="latest_blood_pressures_per_patients"} 0.007472000026609749
simple_reporting_views_refresh_duration_seconds{view="blood_pressures_per_facility_per_days"} 0.00413799996022135
simple_reporting_views_refresh_duration_seconds{view="materialized_patient_summaries"} 0.18930999969597906e-1
simple_reporting_views_refresh_duration_seconds{view="reporting_months"} 0.004361000028438866
simple_reporting_views_refresh_duration_seconds{view="reporting_facilities"} 0.18764999986160547e-1
simple_reporting_views_refresh_duration_seconds{view="reporting_patient_blood_pressures"} 0.11729000019840896e-1
simple_reporting_views_refresh_duration_seconds{view="reporting_patient_blood_sugars"} 0.10630999982822686e-1
simple_reporting_views_refresh_duration_seconds{view="reporting_overdue_calls"} 0.004768999991938472
simple_reporting_views_refresh_duration_seconds{view="reporting_patient_visits"} 0.31565000012051314e-1
simple_reporting_views_refresh_duration_seconds{view="reporting_prescriptions"} 0.11693999986164272e-1
simple_reporting_views_refresh_duration_seconds{view="reporting_patient_follow_ups"} 0.007027000014204532
simple_reporting_views_refresh_duration_seconds{view="reporting_patient_states"} 0.10719999996945262e0
simple_reporting_views_refresh_duration_seconds{view="reporting_facility_appointment_scheduled_days"} 0.35069999867118895e-2
simple_reporting_views_refresh_duration_seconds{view="reporting_overdue_patients"} 0.008374999975785613
simple_reporting_views_refresh_duration_seconds{view="reporting_facility_states"} 0.5332110000308603
simple_reporting_views_refresh_duration_seconds{view="reporting_quarterly_facility_states"} 0.21591000026091933e-1
simple_reporting_views_refresh_duration_seconds{view="reporting_facility_daily_follow_ups_and_registrations"} 0.24701999966055155e-1
simple_reporting_views_refresh_duration_seconds{view="reporting_facility_monthly_follow_ups_and_registrations"} 0.06868900003610179
simple_reporting_views_refresh_duration_seconds{view="all"} 0.6937000000034459

# HELP simple_appointments_merged 
# TYPE simple_appointments_merged counter
simple_appointments_merged{status="new"} 20
simple_appointments_merged{status="schema_invalid"} 10
simple_appointments_merged{status="updated"} 21
simple_appointments_merged{status="old"} 1
simple_appointments_merged{status="discarded"} 10

# HELP simple_sync_to_user_operation_duration_seconds 
# TYPE simple_sync_to_user_operation_duration_seconds gauge
simple_sync_to_user_operation_duration_seconds{operation="current_facility_records",model="appointment"} 0.0
simple_sync_to_user_operation_duration_seconds{operation="other_facility_records",model="appointment"} 3.200001083314419e-05
simple_sync_to_user_operation_duration_seconds{operation="current_facility_records",model="bloodpressure"} 0.0
simple_sync_to_user_operation_duration_seconds{operation="other_facility_records",model="bloodpressure"} 2.400000812485814e-05
simple_sync_to_user_operation_duration_seconds{operation="current_facility_records",model="bloodsugar"} 0.0
simple_sync_to_user_operation_duration_seconds{operation="other_facility_records",model="bloodsugar"} 2.700003096833825e-05
simple_sync_to_user_operation_duration_seconds{operation="current_facility_records",model="medicalhistory"} 1.00000761449337e-06
simple_sync_to_user_operation_duration_seconds{operation="other_facility_records",model="medicalhistory"} 2.700003096833825e-05
simple_sync_to_user_operation_duration_seconds{operation="current_facility_records",model="patient"} 0.0
simple_sync_to_user_operation_duration_seconds{operation="other_facility_records",model="patient"} 0.43000036384910345e-4
simple_sync_to_user_operation_duration_seconds{operation="current_facility_records",model="prescriptiondrug"} 0.0
simple_sync_to_user_operation_duration_seconds{operation="other_facility_records",model="prescriptiondrug"} 1.899997005239129e-05
simple_sync_to_user_operation_duration_seconds{operation="current_facility_records",model="callresult"} 0.0
simple_sync_to_user_operation_duration_seconds{operation="other_facility_records",model="callresult"} 0.24999957531690598e-4
simple_sync_to_user_operation_duration_seconds{operation="current_facility_records",model="encounter"} 0.0
simple_sync_to_user_operation_duration_seconds{operation="other_facility_records",model="encounter"} 2.099998528137803e-05

# HELP simple_encounters_merged 
# TYPE simple_encounters_merged counter
simple_encounters_merged{status="new"} 110
simple_encounters_merged{status="updated"} 106
simple_encounters_merged{status="old"} 1
simple_encounters_merged{status="schema_invalid"} 2

# HELP simple_blood_pressures_merged 
# TYPE simple_blood_pressures_merged counter
simple_blood_pressures_merged{status="new"} 75
simple_blood_pressures_merged{status="updated"} 11
simple_blood_pressures_merged{status="old"} 1

# HELP simple_bloodpressures_merged 
# TYPE simple_bloodpressures_merged counter
simple_bloodpressures_merged{status="schema_invalid"} 5

# HELP simple_blood_sugars_merged 
# TYPE simple_blood_sugars_merged counter
simple_blood_sugars_merged{status="new"} 98
simple_blood_sugars_merged{status="updated"} 22
simple_blood_sugars_merged{status="old"} 2

# HELP simple_bloodsugars_merged 
# TYPE simple_bloodsugars_merged counter
simple_bloodsugars_merged{status="schema_invalid"} 12

# HELP simple_medical_histories_merged 
# TYPE simple_medical_histories_merged counter
simple_medical_histories_merged{status="new"} 27
simple_medical_histories_merged{status="updated"} 22
simple_medical_histories_merged{status="old"} 1
simple_medical_histories_merged{status="discarded"} 10

# HELP simple_medicalhistories_merged 
# TYPE simple_medicalhistories_merged counter
simple_medicalhistories_merged{status="schema_invalid"} 5

# HELP simple_patients_merged 
# TYPE simple_patients_merged counter
simple_patients_merged{status="schema_invalid"} 11
simple_patients_merged{status="new"} 34
simple_patients_merged{status="updated"} 27
simple_patients_merged{status="old"} 2
simple_patients_merged{status="discarded"} 2
simple_patients_merged{status="invalid"} 1
simple_patients_merged{status="identical"} 2

# HELP simple_prescription_drugs_merged 
# TYPE simple_prescription_drugs_merged counter
simple_prescription_drugs_merged{status="new"} 41
simple_prescription_drugs_merged{status="updated"} 21
simple_prescription_drugs_merged{status="old"} 1
simple_prescription_drugs_merged{status="discarded"} 10

# HELP simple_prescriptiondrugs_merged 
# TYPE simple_prescriptiondrugs_merged counter
simple_prescriptiondrugs_merged{status="schema_invalid"} 4

# HELP simple_call_results_merged 
# TYPE simple_call_results_merged counter
simple_call_results_merged{status="new"} 25
simple_call_results_merged{status="updated"} 21
simple_call_results_merged{status="old"} 1
simple_call_results_merged{status="discarded"} 10

# HELP simple_callresults_merged 
# TYPE simple_callresults_merged counter
simple_callresults_merged{status="schema_invalid"} 4

# HELP simple_patient_online_lookups 
# TYPE simple_patient_online_lookups counter
simple_patient_online_lookups{retention_type="permanent",current_state_name="Himachal Pradesh",current_user_id="7c2e7f53-4d72-4d47-a54b-6bae2da0b5ea"} 1
simple_patient_online_lookups{retention_type="temporary",current_state_name="Punjab",current_user_id="b1f8f46d-36e9-42e5-8493-d7b0e9d37cff"} 1
simple_patient_online_lookups{retention_type="temporary",current_state_name="Himachal Pradesh",current_user_id="ed3314d3-bdbe-4da3-a398-810354d913ca"} 1
simple_patient_online_lookups{retention_type="temporary",current_state_name="Punjab",current_user_id="24e25370-9463-4cac-bbf7-3bcff876dd12"} 1
simple_patient_online_lookups{retention_type="temporary",current_state_name="Himachal Pradesh",current_user_id="a824c7ce-76a9-4236-922a-f5b0f50b87b8"} 2
simple_patient_online_lookups{retention_type="permanent",current_state_name="State 1",current_user_id="c2e0ad50-10f9-42da-a10a-b8643363391a"} 1
simple_patient_online_lookups{retention_type="temporary",current_state_name="State 1",current_user_id="c2e0ad50-10f9-42da-a10a-b8643363391a"} 1
simple_patient_online_lookups{retention_type="temporary",current_state_name="State 1",current_user_id="f58577f5-d4b8-41a7-9811-4c010ba30f5e"} 2
simple_patient_online_lookups{retention_type="temporary",current_state_name="Karnataka",current_user_id="3ca261da-c652-4207-b069-92bcdbaf77fd"} 1
simple_patient_online_lookups{retention_type="temporary",current_state_name="Karnataka",current_user_id="d1af8a5c-04a2-4718-8d4a-3620337e7035"} 5

# HELP simple_questionnaire_responses_merged 
# TYPE simple_questionnaire_responses_merged counter
simple_questionnaire_responses_merged{status="new"} 18
simple_questionnaire_responses_merged{status="updated"} 16
simple_questionnaire_responses_merged{status="discarded"} 11
simple_questionnaire_responses_merged{status="invalid"} 3

# HELP simple_teleconsultations_merged 
# TYPE simple_teleconsultations_merged counter
simple_teleconsultations_merged{status="schema_invalid"} 4
simple_teleconsultations_merged{status="new"} 17
simple_teleconsultations_merged{status="updated"} 11
simple_teleconsultations_merged{status="old"} 1

# HELP simple_addresses_merged 
# TYPE simple_addresses_merged counter
simple_addresses_merged{status="new"} 35
simple_addresses_merged{status="updated"} 18
simple_addresses_merged{status="old"} 2
simple_addresses_merged{status="discarded"} 1

# HELP simple_patient_phone_numbers_merged 
# TYPE simple_patient_phone_numbers_merged counter
simple_patient_phone_numbers_merged{status="new"} 34
simple_patient_phone_numbers_merged{status="updated"} 18
simple_patient_phone_numbers_merged{status="old"} 2
simple_patient_phone_numbers_merged{status="discarded"} 1

# HELP simple_patient_business_identifiers_merged 
# TYPE simple_patient_business_identifiers_merged counter
simple_patient_business_identifiers_merged{status="new"} 34
simple_patient_business_identifiers_merged{status="updated"} 21
simple_patient_business_identifiers_merged{status="old"} 2
simple_patient_business_identifiers_merged{status="discarded"} 1

# HELP simple_exotel_call_sessions 
# TYPE simple_exotel_call_sessions counter
simple_exotel_call_sessions{call_type="call_attempt",call_status="completed"} 2
simple_exotel_call_sessions{call_type="call_attempt",call_status="unknown"} 1

# HELP simple_twilio_callbacks 
# TYPE simple_twilio_callbacks counter
simple_twilio_callbacks{result="delivered",communication_type="manual_call"} 1
simple_twilio_callbacks{result="sent",communication_type="manual_call"} 4

# HELP simple_questionnaireresponses_merged 
# TYPE simple_questionnaireresponses_merged counter
simple_questionnaireresponses_merged{status="schema_invalid"} 1

# HELP simple_notification_experiments_tasks_duration_seconds 
# TYPE simple_notification_experiments_tasks_duration_seconds gauge
simple_notification_experiments_tasks_duration_seconds{task="enroll_patients"} 0.14685999951325357e-1
simple_notification_experiments_tasks_duration_seconds{task="record_notification_results"} 0.005059999995864928
simple_notification_experiments_tasks_duration_seconds{task="mark_visits"} 0.12293999956455082e-1
simple_notification_experiments_tasks_duration_seconds{task="evict_patients"} 0.10628000018186867e-1
simple_notification_experiments_tasks_duration_seconds{task="monitor"} 0.02180599997518584
simple_notification_experiments_tasks_duration_seconds{task="schedule_notifications"} 0.003771000017877668
simple_notification_experiments_tasks_duration_seconds{task="conduct_daily"} 0.005327999999281019

# HELP simple_dhis2_bangladesh_disaggregated_diabetes_exporter_job_duration_seconds 
# TYPE simple_dhis2_bangladesh_disaggregated_diabetes_exporter_job_duration_seconds gauge
simple_dhis2_bangladesh_disaggregated_diabetes_exporter_job_duration_seconds 0.08753799996338785

# HELP simple_dhis2_bangladesh_disaggregated_hypertension_exporter_job_duration_seconds 
# TYPE simple_dhis2_bangladesh_disaggregated_hypertension_exporter_job_duration_seconds gauge
simple_dhis2_bangladesh_disaggregated_hypertension_exporter_job_duration_seconds 0.08549799997126684

# HELP simple_dhis2_bangladesh_exporter_job_duration_seconds 
# TYPE simple_dhis2_bangladesh_exporter_job_duration_seconds gauge
simple_dhis2_bangladesh_exporter_job_duration_seconds 0.005124000017531216

# HELP simple_dhis2_ethiopia_exporter_job_duration_seconds 
# TYPE simple_dhis2_ethiopia_exporter_job_duration_seconds gauge
simple_dhis2_ethiopia_exporter_job_duration_seconds 0.007929999963380396

# HELP simple_appointments 
# TYPE simple_appointments gauge
simple_appointments 0

# HELP simple_blood_pressures 
# TYPE simple_blood_pressures gauge
simple_blood_pressures 0

# HELP simple_blood_sugars 
# TYPE simple_blood_sugars gauge
simple_blood_sugars 0

# HELP simple_encounters 
# TYPE simple_encounters gauge
simple_encounters 0

# HELP simple_facilities 
# TYPE simple_facilities gauge
simple_facilities 0

# HELP simple_facility_groups 
# TYPE simple_facility_groups gauge
simple_facility_groups 0

# HELP simple_medical_histories 
# TYPE simple_medical_histories gauge
simple_medical_histories 0

# HELP simple_notifications 
# TYPE simple_notifications gauge
simple_notifications 0

# HELP simple_patients 
# TYPE simple_patients gauge
simple_patients 0

# HELP simple_regions 
# TYPE simple_regions gauge
simple_regions 1

# HELP simple_users 
# TYPE simple_users gauge
simple_users 0

# HELP simple_twilio_invalid_phone_number_errors 
# TYPE simple_twilio_invalid_phone_number_errors counter
simple_twilio_invalid_phone_number_errors 1

# HELP simple_http_requests_total Total HTTP requests from web app.
# TYPE simple_http_requests_total counter
simple_http_requests_total{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics",status="200"} 1
simple_http_requests_total{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics",status="403"} 1
simple_http_requests_total{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics.html",status="200"} 1
simple_http_requests_total{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics.html",status="403"} 1
simple_http_requests_total{action="sync_from_user",controller="api/v3/appointments",path="/api/v3/appointments/sync",status="200"} 2
simple_http_requests_total{action="sync_from_user",controller="api/v3/appointments",path="/api/v3/appointments/sync",status="403"} 1
simple_http_requests_total{action="sync_to_user",controller="api/v3/appointments",path="/api/v3/appointments/sync",status="200"} 1
simple_http_requests_total{action="sync_to_user",controller="api/v3/appointments",path="/api/v3/appointments/sync",status="403"} 1
simple_http_requests_total{action="sync_from_user",controller="api/v3/blood_pressures",path="/api/v3/blood_pressures/sync",status="200"} 2
simple_http_requests_total{action="sync_from_user",controller="api/v3/blood_pressures",path="/api/v3/blood_pressures/sync",status="403"} 1
simple_http_requests_total{action="sync_to_user",controller="api/v3/blood_pressures",path="/api/v3/blood_pressures/sync",status="200"} 1
simple_http_requests_total{action="sync_to_user",controller="api/v3/blood_pressures",path="/api/v3/blood_pressures/sync",status="403"} 1
simple_http_requests_total{action="sync_from_user",controller="api/v3/blood_sugars",path="/api/v3/blood_sugars/sync",status="200"} 2
simple_http_requests_total{action="sync_from_user",controller="api/v3/blood_sugars",path="/api/v3/blood_sugars/sync",status="403"} 1
simple_http_requests_total{action="sync_to_user",controller="api/v3/blood_sugars",path="/api/v3/blood_sugars/sync",status="200"} 1
simple_http_requests_total{action="sync_to_user",controller="api/v3/blood_sugars",path="/api/v3/blood_sugars/sync",status="403"} 1
simple_http_requests_total{action="sync_from_user",controller="api/v3/encounters",path="/api/v3/encounters/sync",status="200"} 2
simple_http_requests_total{action="sync_from_user",controller="api/v3/encounters",path="/api/v3/encounters/sync",status="403"} 1
simple_http_requests_total{action="sync_to_user",controller="api/v3/encounters",path="/api/v3/encounters/sync",status="200"} 1
simple_http_requests_total{action="sync_to_user",controller="api/v3/encounters",path="/api/v3/encounters/sync",status="403"} 1
simple_http_requests_total{action="sync_to_user",controller="api/v3/facilities",path="/api/v3/facilities/sync",status="200"} 1

# HELP simple_http_request_duration_seconds Time spent in HTTP reqs in seconds.
# TYPE simple_http_request_duration_seconds summary
simple_http_request_duration_seconds_sum{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics"} 0.08549300004960969
simple_http_request_duration_seconds_count{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics"} 2
simple_http_request_duration_seconds_sum{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics.html"} 0.15612000005785376
simple_http_request_duration_seconds_count{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics.html"} 2
simple_http_request_duration_seconds_sum{action="sync_from_user",controller="api/v3/appointments",path="/api/v3/appointments/sync"} 0.16586699994513765
simple_http_request_duration_seconds_count{action="sync_from_user",controller="api/v3/appointments",path="/api/v3/appointments/sync"} 3
simple_http_request_duration_seconds_sum{action="sync_to_user",controller="api/v3/appointments",path="/api/v3/appointments/sync"} 0.04663599998457357
simple_http_request_duration_seconds_count{action="sync_to_user",controller="api/v3/appointments",path="/api/v3/appointments/sync"} 2
simple_http_request_duration_seconds_sum{action="sync_from_user",controller="api/v3/blood_pressures",path="/api/v3/blood_pressures/sync"} 0.21395599999232218
simple_http_request_duration_seconds_count{action="sync_from_user",controller="api/v3/blood_pressures",path="/api/v3/blood_pressures/sync"} 3
simple_http_request_duration_seconds_sum{action="sync_to_user",controller="api/v3/blood_pressures",path="/api/v3/blood_pressures/sync"} 0.037635000015143305
simple_http_request_duration_seconds_count{action="sync_to_user",controller="api/v3/blood_pressures",path="/api/v3/blood_pressures/sync"} 2
simple_http_request_duration_seconds_sum{action="sync_from_user",controller="api/v3/blood_sugars",path="/api/v3/blood_sugars/sync"} 0.1712660000193864
simple_http_request_duration_seconds_count{action="sync_from_user",controller="api/v3/blood_sugars",path="/api/v3/blood_sugars/sync"} 3
simple_http_request_duration_seconds_sum{action="sync_to_user",controller="api/v3/blood_sugars",path="/api/v3/blood_sugars/sync"} 0.04341499996371567

# HELP simple_http_request_redis_duration_seconds Time spent in HTTP reqs in Redis, in seconds.
# TYPE simple_http_request_redis_duration_seconds summary


# HELP simple_http_request_sql_duration_seconds Time spent in HTTP reqs in SQL in seconds.
# TYPE simple_http_request_sql_duration_seconds summary
simple_http_request_sql_duration_seconds{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics",quantile="0.99"} 0.013451999810058624
simple_http_request_sql_duration_seconds{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics",quantile="0.9"} 0.013451999810058624
simple_http_request_sql_duration_seconds{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics",quantile="0.5"} 0.00032599997939541936
simple_http_request_sql_duration_seconds{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics",quantile="0.1"} 0.00032599997939541936
simple_http_request_sql_duration_seconds{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics",quantile="0.01"} 0.00032599997939541936
simple_http_request_sql_duration_seconds_sum{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics"} 0.013777999789454043
simple_http_request_sql_duration_seconds_count{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics"} 2
simple_http_request_sql_duration_seconds{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics.html",quantile="0.99"} 0.014554999943356961
simple_http_request_sql_duration_seconds{action="show",controller="api/v3/analytics/user_analytics",path="/api/v3/analytics/user_analytics.html",quantile="0.9"} 0.014554999943356961

# HELP simple_http_request_queue_duration_seconds Time spent queueing the request in load balancer in seconds.
# TYPE simple_http_request_queue_duration_seconds summary


# HELP simple_heap_free_slots Free ruby heap slots.
# TYPE simple_heap_free_slots gauge
simple_heap_free_slots{type="master",pid="79047",hostname="localhost"} 583
simple_heap_free_slots{type="master",pid="21412",hostname="localhost"} 491

# HELP simple_heap_live_slots Used ruby heap slots.
# TYPE simple_heap_live_slots gauge
simple_heap_live_slots{type="master",pid="79047",hostname="localhost"} 2212296
simple_heap_live_slots{type="master",pid="21412",hostname="localhost"} 573422

# HELP simple_rss Total RSS used by process.
# TYPE simple_rss gauge
simple_rss{type="master",pid="79047",hostname="localhost"} 0
simple_rss{type="master",pid="21412",hostname="localhost"} 0

# HELP simple_major_gc_ops_total Major GC operations by process.
# TYPE simple_major_gc_ops_total counter
simple_major_gc_ops_total{type="master",pid="79047",hostname="localhost"} 92
simple_major_gc_ops_total{type="master",pid="21412",hostname="localhost"} 9

# HELP simple_minor_gc_ops_total Minor GC operations by process.
# TYPE simple_minor_gc_ops_total counter
simple_minor_gc_ops_total{type="master",pid="79047",hostname="localhost"} 1316
simple_minor_gc_ops_total{type="master",pid="21412",hostname="localhost"} 64

# HELP simple_allocated_objects_total Total number of allocated objects by process.
# TYPE simple_allocated_objects_total counter
simple_allocated_objects_total{type="master",pid="79047",hostname="localhost"} 584250078
simple_allocated_objects_total{type="master",pid="21412",hostname="localhost"} 7399199
```
  • Loading branch information
danySam authored Oct 1, 2024
1 parent ff176a9 commit 69afefb
Show file tree
Hide file tree
Showing 35 changed files with 163 additions and 147 deletions.
16 changes: 16 additions & 0 deletions Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,20 @@ def is_running_migration?
Rake.application.top_level_tasks.include?("db:migrate")
end

task :after_hook do
at_exit do
# Rake tasks often quit before the data is sent to the Prometheus
# collector.
# This hook will wait for 10 seconds for the queue to become empty
# and closing the socket.
PrometheusExporter::Client.default.stop(wait_timeout_seconds: 10)
end
end

tasks = Rake.application.tasks
tasks.each do |task|
next if [Rake::Task["after_hook"]].include?(task)
task.enhance([:after_hook])
end

Rake::Task["db:schema:load"].enhance [:support_pg_extensions_in_heroku]
3 changes: 1 addition & 2 deletions app/controllers/api/v3/exotel_call_sessions_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,7 @@ def respond_in_plain_text(status, text = "")
end

def report_call_info
Statsd.instance.increment("#{controller_name}.call_type.#{call_type}")
Statsd.instance.increment("#{controller_name}.call_status.#{call_status}")
Metrics.increment("exotel_call_sessions", {call_type: call_type, call_status: call_status})
end

def schedule_call_log_job(user_phone_number, callee_phone_number)
Expand Down
7 changes: 1 addition & 6 deletions app/controllers/api/v3/twilio_sms_delivery_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,13 @@ def create
twilio_message.update(update_params)

communication_type = twilio_message.communication.communication_type
event = [communication_type, twilio_message.result].join(".")
metrics.increment(event)
Metrics.increment("twilio_callbacks", {result: twilio_message.result, communication_type: communication_type})

head :ok
end

private

def metrics
@metrics ||= Metrics.with_prefix("twilio_callback")
end

def update_params
details = {result: message_status}
details[:delivered_on] = DateTime.current if message_status == TwilioSmsDeliveryDetail.results[:delivered]
Expand Down
2 changes: 1 addition & 1 deletion app/controllers/api/v4/patients_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def lookup
json: Oj.dump({
patients: patients.map do |patient|
retention = retention(patient)
Statsd.instance.increment("OnlineLookup.#{retention[:type]}", tags: [current_state.name, current_user.id])
Metrics.increment("patient_online_lookups", {retention_type: retention[:type], current_state_name: current_state.name, current_user_id: current_user.id})
Api::V4::PatientLookupTransformer.to_response(patient, retention)
end
}, mode: :compat),
Expand Down
2 changes: 1 addition & 1 deletion app/controllers/concerns/api/v3/sync_to_user.rb
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ def sync_region_modified?
def time(method_name, &block)
raise ArgumentError, "You must supply a block" unless block

Statsd.instance.time("#{method_name}.#{model.name}") do
Metrics.benchmark_and_gauge("sync_to_user_operation_duration_seconds", {operation: method_name, model: model.name.downcase}) do
yield(block)
end
end
Expand Down
6 changes: 3 additions & 3 deletions app/jobs/appointment_notification/worker.rb
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,15 @@ def scheduled?(notification)
return true if notification.status_scheduled?

Rails.logger.info "skipping notification #{notification.id}, scheduled already"
Statsd.instance.increment("notifications.skipped.not_scheduled")
Metrics.increment("notifications_skipped", {reason: "not_scheduled"})
end

def flipper_enabled?
Statsd.instance.increment("notifications.attempts")
Metrics.increment("notifications_attempts")
return true if Flipper.enabled?(:notifications) || Flipper.enabled?(:experiment)

Rails.logger.warn "notifications or experiment feature flag are disabled"
Statsd.instance.increment("notifications.skipped.feature_disabled")
Metrics.increment("notifications_skipped", {reason: "feature_disabled"})
false
end
end
3 changes: 2 additions & 1 deletion app/jobs/dhis2/dhis2_exporter_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ def initialize
end

def perform(facility_identifier_id, total_months)
Statsd.instance.time("#{self.class}.#{__method__}") do
label_prefix = self.class.name.underscore.tr("/", "_")
Metrics.benchmark_and_gauge("#{label_prefix}_duration_seconds") do
facility_identifier = FacilityBusinessIdentifier.find(facility_identifier_id)
region = Region.find_by!(source_id: facility_identifier.facility_id)
periods = last_n_month_periods(total_months)
Expand Down
2 changes: 1 addition & 1 deletion app/jobs/request_otp_sms_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ def handle_twilio_errors(user, &block)
rescue Messaging::Twilio::Error => error
if error.reason == :invalid_phone_number
Rails.logger.warn("OTP to #{user.id} failed because of an invalid phone number")
Statsd.instance.increment("twilio.errors.invalid_phone_number")
Metrics.increment("twilio_invalid_phone_number_errors")
false
else
raise error
Expand Down
2 changes: 1 addition & 1 deletion app/jobs/tracer_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ class TracerJob
sidekiq_options retry: false # job will be discarded if it fails

def perform(submitted_at, raise_error)
Statsd.instance.increment("tracer_job.count")
Metrics.increment("tracer_jobs")
if raise_error
raise Admin::ErrorTracesController::Boom, "Error trace triggered via sidekiq!"
end
Expand Down
2 changes: 1 addition & 1 deletion app/models/concerns/mergeable.rb
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def existing_record(attributes)
end

def increment_metric(event)
Statsd.instance.increment("merge.#{self}.#{event}")
Metrics.increment("#{table_name}_merged", {status: event})
end

def discarded_record(record)
Expand Down
9 changes: 3 additions & 6 deletions app/models/experimentation/notifications_experiment.rb
Original file line number Diff line number Diff line change
Expand Up @@ -215,15 +215,12 @@ def cancel
def self.time(method_name, &block)
raise ArgumentError, "You must supply a block" unless block

label = "#{name}.#{method_name}"

benchmark(label) do
Statsd.instance.time(label) do
event = "notification_experiments_tasks_duration_seconds"
benchmark(event) do
Metrics.benchmark_and_gauge(event, {task: method_name}) do
yield(block)
end
end

Statsd.instance.flush # The metric is not sent to datadog until the buffer is full, hence we explicitly flush.
end

delegate :time, to: self
Expand Down
6 changes: 3 additions & 3 deletions app/services/duplicate_passport_analytics.rb
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def report
DEFAULT_REPORTABLE_METRICS.values.each do |fn_name|
dupe_count = public_send(fn_name).size
log "#{fn_name} are #{dupe_count}"
gauge "#{fn_name}.size", dupe_count
gauge fn_name, dupe_count
end

legacy_report
Expand All @@ -58,7 +58,7 @@ def legacy_report
Rails.logger.info msg: "#{duplicate_passports_across_facilities.size} passports have duplicate patients across facilities"
end

Statsd.instance.gauge("ReportDuplicatePassports.size", duplicate_passports_across_facilities.size)
Metrics.gauge("duplicate_passports_across_facilities", duplicate_passports_across_facilities.size)
end

def trend(metrics, since, step)
Expand Down Expand Up @@ -230,7 +230,7 @@ def log(msg)
end

def gauge(stat, value)
Statsd.instance.gauge("#{self.class.name}.#{stat}", value)
Metrics.gauge(stat.to_s, value)
end

# rubygems implements levenshtein_distance for guessing typos, we can reuse it
Expand Down
3 changes: 2 additions & 1 deletion app/services/merge_patient_service.rb
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ def existing_patient
end

def log_update_discarded_patient
Statsd.instance.increment("#{self.class}.update_discarded_patient")
Metrics.increment("discarded_patients_updated_total", {},
"Total number of patients who were discarded and then their record was updated. This is a rare scenario, uptick should be investigated.")
end
end
9 changes: 3 additions & 6 deletions app/services/messaging/channel.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,8 @@ class Messaging::Channel
# to handle known errors properly.

def initialize
@metrics = Metrics.with_object(self)
end

attr_reader :metrics

# The channel implementation is responsible for creating a Communication
# and delivery details. This should return the communication object that was created.
def self.send_message(...)
Expand All @@ -31,14 +28,14 @@ def send_message(**opts, &with_communication_do)
end

def track_metrics(&block)
metrics.increment("#{self.class.communication_type}.attempts")
Metrics.increment("#{self.class.communication_type}_attempts")

begin
response = yield block
metrics.increment("#{self.class.communication_type}.send")
Metrics.increment("#{self.class.communication_type}_sent")
response
rescue Messaging::Error => exception
metrics.increment("#{self.class.communication_type}.errors")
Metrics.increment("#{self.class.communication_type}_errors")
raise exception
end
end
Expand Down
8 changes: 4 additions & 4 deletions app/services/notification_dispatch_service.rb
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ def messaging_channel_data

def log_success
communication_type = messaging_channel.communication_type
Statsd.instance.increment("notifications.sent.#{communication_type}")
Metrics.increment("notifications_sent", {communication_type: communication_type})
Rails.logger.info("notification #{notification.id} communication_type=#{communication_type} sent")
end

Expand All @@ -69,18 +69,18 @@ def handle_messaging_errors(&block)
def cancel_no_mobile_notification
notification.status_cancelled!
Rails.logger.info "skipping notification #{notification.id}, patient #{notification.patient_id} does not have a mobile number"
Statsd.instance.increment("notifications.skipped.no_mobile_number")
Metrics.increment("notifications_skipped", {reason: "no_mobile_number"})
end

def cancel_invalid_number_notification
notification.status_cancelled!
Rails.logger.warn("notification #{notification.id} cancelled because of an invalid phone number")
Statsd.instance.increment("notifications.skipped.invalid_phone_number")
Metrics.increment("notifications_skipped", {reason: "invalid_phone_number"})
end

def cancel_no_reference_notification(reason)
notification.status_cancelled!
Rails.logger.warn("notification #{notification.id} cancelled. #{reason}")
Statsd.instance.increment("notifications.skipped.no_reference")
Metrics.increment("notifications_skipped", {reason: "no_reference"})
end
end
8 changes: 4 additions & 4 deletions app/services/patient_deduplication/stats.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ module PatientDeduplication
module Stats
class << self
def report(trigger_type, processed, merged, failures)
opts = {tags: [trigger_type]}
Statsd.instance.count("PatientDeduplication.total_processed", processed, opts)
Statsd.instance.count("PatientDeduplication.total_merged", merged, opts)
Statsd.instance.count("PatientDeduplication.total_failures", failures, opts)
opts = {trigger_type: trigger_type}
Metrics.gauge("patient_deduplications_processed_total", processed, opts)
Metrics.gauge("patient_deduplications_merged_total", merged, opts)
Metrics.gauge("patient_deduplications_failures_total", failures, opts)
end
end
end
Expand Down
14 changes: 4 additions & 10 deletions app/services/record_counter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,6 @@ def self.call
new.call
end

attr_reader :metrics

def initialize
@metrics ||= Metrics.with_prefix("counts")
end

def call
count_totals
count_per_region_totals
Expand All @@ -32,22 +26,22 @@ def call

def count_totals
MODELS_TO_COUNT.each do |model|
metrics.gauge(model.to_s, model.count)
Metrics.gauge(model.table_name, model.count)
end
end

def count_per_region_totals
Region.district_regions.find_each do |district|
count = district.facilities.count
metrics.histogram("facilities_per_district", count)
Metrics.histogram("facilities_per_district", count)
end
Region.facility_regions.find_each do |facility|
count = facility.assigned_patients.count
metrics.histogram("assigned_patients_per_facility", count)
Metrics.histogram("assigned_patients_per_facility", count)
end
Region.block_regions.find_each.each do |block|
count = block.assigned_patients.count
metrics.histogram("assigned_patients_per_block", count)
Metrics.histogram("assigned_patients_per_block", count)
end
end
end
15 changes: 4 additions & 11 deletions app/services/refresh_reporting_views.rb
Original file line number Diff line number Diff line change
Expand Up @@ -109,22 +109,15 @@ def refresh
klass = name.constantize
klass.refresh
end
Statsd.instance.flush
end
end

def benchmark_and_statsd(operation)
name = "refresh_reporting_views.#{operation}"
view = operation == "all" ? "all" : operation.constantize.table_name
name = "reporting_views_refresh_duration_seconds"
result = nil
ms = Benchmark.ms do
Datadog::Tracing.trace("refresh_matview", resource: operation) do |span|
result = yield
end
end
Statsd.instance.timing(name, ms)
if Flipper.enabled?(:prometheus_metrics)
Prometheus.register(:gauge, name) unless Prometheus.exists?(name)
Prometheus.observe(name, ms)
Metrics.benchmark_and_gauge(name, {view: view}) do
result = yield
end
result
end
Expand Down
2 changes: 1 addition & 1 deletion app/services/runner_trace.rb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ def initialize
end

def call
Statsd.instance.increment("runner_trace.count")
Metrics.increment("runner_traces_total")
logger.info msg: "about to raise an error",
sentry_debug_info: sentry_debug_info
raise Error, "Runner trace error"
Expand Down
2 changes: 1 addition & 1 deletion app/validators/api/v3/payload_validator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,6 @@ def check_invalid?
end

def track_invalid
Statsd.instance.increment("merge.#{model_name}.schema_invalid")
Metrics.increment("#{model_name.underscore.pluralize}_merged", {status: :schema_invalid})
end
end
21 changes: 0 additions & 21 deletions config/initializers/prometheus.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,6 @@

Dir.glob(Rails.root.join("lib", "prometheus_middleware", "**", "*.rb")).sort.each { |f| require f }

CLIENT = PrometheusExporter::Client.default
REGISTERED_COLLECTORS = {}

class Prometheus
attr_reader :client, :registered_collectors

def self.register(type, name, description = nil)
raise "collector: #{name} is already registered" if exists?(name)
REGISTERED_COLLECTORS[name] = CLIENT.register(type, name, description)
end

def self.observe(name, value, labels = {})
raise "collector: #{name} is not registered" unless exists?(name)
REGISTERED_COLLECTORS[name].observe(value, labels)
end

def self.exists?(name)
REGISTERED_COLLECTORS.has_key?(name)
end
end

if Rails.env.production?
# This reports stats per request like HTTP status and timings
Rails.application.middleware.unshift SimplePrometheusMiddleware
Expand Down
Loading

0 comments on commit 69afefb

Please sign in to comment.