Black Monitoring Watcher is a distributed, reactive monitoring system designed to execute API/TCP scenarios, measure latency metrics (DNS, connection, communication time), and push observability data into Grafana Mimir for centralized analysis.
The project is built as a multi-module Spring Boot (WebFlux) application with:
- Cassandra as scenario storage
- Zookeeper for distributed coordination
- Mimir + Grafana for metrics storage and dashboarding
- Batch workers + simulators executing scenarios in parallel
The system executes two types of monitoring scenarios:
- API Scenarios — triggers HTTP requests using WebClient
- TCP Scenarios — measures DNS lookup, TCP connect, and communication times
Whenever any monitoring scenario fails due to:
- DNS lookup failure
- HTTP timeout
- HTTP 4xx / 5xx
- TCP connection timeout
- Connection refused
- Unexpected exception
the watcher automatically sends an alert to the api-server at:
POST /api/v1/alert
The api-server performs:
- Looks up the service owner’s email from the Cassandra
servicetable (emailcolumn). - Sends an alert email containing:
- service name
- scenario name
- failure message
This enables real‑time failure visibility and rapid operational response.
The repository includes a full monitoring stack via docker-compose:
Stores:
- service definitions
- API scenarios
- TCP scenarios
Runs schema initialization viainit.cql.
Provides:
- coordination
- distributed locks
- scenario partitioning between multiple batch workers
Stores all metrics pushed from simulators.
UI: http://localhost:10100
Dashboard visualization.
UI: http://localhost:3000
docker compose up -d./gradlew :api-server:bootRun
./gradlew :api-watcher:bootRun
./gradlew :tcp-watcher:bootRunModules run on:
- api-server → 7080
- api-watcher → 7010
- tcp-watcher → 7020
docker build -t black-monitoring-api-server:1.0.0 api-server
docker build -t black-monitoring-api-watcher:1.0.0 api-watcher
docker build -t black-monitoring-tcp-watcher:1.0.0 tcp-watcher- watcher detects a failure
- sends alert →
/api/v1/alert - api-server retrieves service info from Cassandra
- sends email to the service owner automatically
This ensures immediate notification for degraded or failing scenarios.