From f7c90f12d4c844088c5e8346e3c0c25853c523f7 Mon Sep 17 00:00:00 2001 From: Kevin Chu Date: Thu, 11 Sep 2025 10:52:47 -0700 Subject: [PATCH 1/6] Add telemetry requirements --- specs/telemetry.md | 78 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) create mode 100644 specs/telemetry.md diff --git a/specs/telemetry.md b/specs/telemetry.md new file mode 100644 index 00000000..a94816ae --- /dev/null +++ b/specs/telemetry.md @@ -0,0 +1,78 @@ +# Apollo MCP Server Telemetry Spec + +| Category | Metric / Trace / Event | Type | Attributes | Notes | Priority | +|---------------------------|--------------------------------------------------------------------------|-----------------|--------------------------------------------------------------------|-------------------------------------------------------------------------|---------------| +| **Configuration** | `apollo_mcp.config.load_success` | Counter | error_type | Successful config / startup loads | Must Have | +| | `apollo_mcp.config.load_failure` | Counter | error_type | Failed startup (bad schema, manifest, endpoint) | Must Have | +| | `apollo_mcp.tools.registered{source="builtin:introspect"}` | Gauge | — | Introspect tool present if enabled (always =1) | Must Have | +| | `apollo_mcp.tools.registered{source="builtin:search"}` | Gauge | — | Search tool present if enabled (always =1) | Must Have | +| | `apollo_mcp.tools.registered{source="persisted_query"}` | Gauge | — | # of tools from persisted query manifest | Must Have | +| | `apollo_mcp.tools.registered{source="operation_collection"}` | Gauge | — | # of tools from operation collections | Must Have | +| | `apollo_mcp.tools.registered{source="graphql_file"}` | Gauge | — | # of tools from `.graphql` files | Should Have | +| | `apollo_mcp.tools.registered{source="introspection_generated"}` | Gauge | — | # of tools auto-generated from schema introspection | Should Have | +| | `apollo_mcp.schema.source` | Attribute/Event | uplink, local_file, introspection | Where schema was loaded from | Must Have | +| | `apollo_mcp.schema.load_success` / `apollo_mcp.schema.load_failure` | Counter | schema_source | Schema load status | Must Have | +| | `apollo_mcp.schema.size` | Gauge | — | # of types/fields in schema | Should Have | +| | `apollo_mcp.version.info` | Attribute/Event | server_version, schema_hash, manifest_version, manifest_source | Server binary version, GraphQL schema hash, manifest version, manifest type (persisted_query/operation_collection) | Must Have | +| **Usage** | `apollo.mcp.calls` | Counter | tool_name, success, error_code, client_type | Total tool invocations | Must Have | +| | `apollo.mcp.calls.latency` | Histogram | tool_name, success, error_code, client_type | End-to-end request latency | Must Have | +| | `apollo.mcp.operation.calls` | Counter | tool_name, success, error_code, client_type, operation_name | # of backend GraphQL operations executed | Must Have | +| | `apollo.mcp.operation.latency` | Histogram | tool_name, success, error_code, client_type, operation_name | Latency of GraphQL backend call (excludes tool overhead) | Must Have | +| | `apollo_mcp.operation.type.mix` | Counter | query, mutation, subscription | Breakdown of operation types | Should Have | +| | `apollo_mcp.responses.size` | Histogram | tool_name, client_type | Size of responses (bytes) | Should Have | +| | `apollo_mcp.responses.characters` | Histogram | tool_name, client_type | Character count of response payloads (proxy for token estimation) | Nice to Have | +| | `apollo_mcp.clients.active` | Gauge | — | # of active MCP clients | Must Have | +| | `apollo_mcp.concurrency.current_requests` | Gauge | — | # of concurrent tool executions | Should Have | +| **Errors / Reliability** | `apollo_mcp.requests.errors` | Counter | error_type, tool_name, client_type | Failed tool calls (generic catch-all) | Must Have | +| | `apollo_mcp.graphql.backend.errors` | Counter | status_code, operation_name | Errors from upstream GraphQL API | Must Have | +| | `apollo_mcp.transport.errors` | Counter | error_type | Invalid JSON-RPC, dropped connections | Should Have | +| | `apollo_mcp.auth.failures` | Counter | reason, client_type | Authentication failures | Must Have | +| | `apollo_mcp.timeouts` | Counter | tool_name, client_type | Tool or backend operation timed out | Must Have | +| **Traces** | Span: `mcp.tool_invocation` | Trace | tool_name, latency, success | Span for each tool invocation | Must Have | +| | Span: `graphql.operation` | Trace | operation_name, latency, success, error_code | Child span for backend GraphQL operation | Must Have | +| | Span: `serialization` | Trace | size_bytes, latency | Encoding/decoding JSON-RPC overhead | Nice to Have | +| **Events** | `apollo_mcp.client.connected` | Event | client_type | Client connection established | Should Have | +| | `apollo_mcp.client.disconnected` | Event | client_type | Client disconnected | Should Have | +| | `apollo_mcp.config.reload` | Event | schema_source, version_hash | Config/schema/manifest/collection reload | Nice to Have | +| | `apollo_mcp.auth.failed` | Event | client_type, reason | Auth failure | Must Have | +| **Resource Usage** | `process.cpu.usage` | Gauge | — | CPU usage of MCP server process | Nice to Have | +| | `process.memory.usage` | Gauge | — | Memory usage | Nice to Have | +| | `network.bytes.sent` / `network.bytes.received` | Counter | — | Network traffic | Nice to Have | + +## Implementation Notes + +### Client Identification Usage +**`client_type` only:** +- Direct client interactions: calls, operation calls/latency, response size, token estimation, timeouts +- Error analysis: request errors, auth failures +- Connection events: client connected/disconnected, auth failed +- Purpose: Analyze client behavior patterns and identify client-specific issues + +**No client identification:** +- Server configuration: config loads, tool registration, schema info, version info +- System metrics: CPU, memory, network, active clients, concurrency +- Backend operations: GraphQL backend errors, operation type mix, transport errors +- Traces: tool invocation spans, GraphQL operations, serialization +- Purpose: Server-wide metrics and request-level tracing independent of client behavior + +### Client Identification Implementation +- **`client_type`**: Static client identifier derived from User-Agent header or configuration + - Examples: `"claude"`, `"chatgpt"`, `"vscode"`, `"custom"`, `"unknown"` + - Used for understanding client behavior patterns and performance differences + - No PII concerns - represents client software type, not individual users + - Optional: Use `"unknown"` if client type cannot be determined or for privacy + +### Privacy & Retention +- Client identification is optional - use `"unknown"` if privacy concerns exist +- No PII concerns with `client_type` - it represents software, not users +- Ensure compliance with local data protection regulations + +### Token & Cost Estimation +- **Real-time**: Use `apollo_mcp.responses.characters` for fast proxy estimation + - Rule of thumb: 1 token ≈ 3-4 characters for most content + - No performance impact - just `response.length` +- **Offline/Optional**: For precise token counts, run tokenization in background jobs + - Sample a subset of responses (e.g., 1-10%) to avoid performance impact + - Use established tokenizers (tiktoken for OpenAI models, similar for others) + - Store results separately from real-time metrics + - Actual token counts will vary by model and tokenizer From a6e61ff04a4e71806ca3253e90916a15074ed5c6 Mon Sep 17 00:00:00 2001 From: Kevin Chu Date: Thu, 11 Sep 2025 13:02:31 -0700 Subject: [PATCH 2/6] Add note about configuration metrics --- specs/telemetry.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/specs/telemetry.md b/specs/telemetry.md index a94816ae..752efc68 100644 --- a/specs/telemetry.md +++ b/specs/telemetry.md @@ -76,3 +76,6 @@ - Use established tokenizers (tiktoken for OpenAI models, similar for others) - Store results separately from real-time metrics - Actual token counts will vary by model and tokenizer + +### Configuration Metrics +- Probably useful only for Apollo From 907a414d27b4166bddc6dc9160441dc5504da49f Mon Sep 17 00:00:00 2001 From: Samuel Collard Date: Tue, 16 Sep 2025 15:55:11 -0500 Subject: [PATCH 3/6] apollo_mcp to apollo.mcp, latency to duration, singular call --- specs/telemetry.md | 63 +++++++++++++++++++++++----------------------- 1 file changed, 31 insertions(+), 32 deletions(-) diff --git a/specs/telemetry.md b/specs/telemetry.md index 752efc68..eed4437a 100644 --- a/specs/telemetry.md +++ b/specs/telemetry.md @@ -2,39 +2,38 @@ | Category | Metric / Trace / Event | Type | Attributes | Notes | Priority | |---------------------------|--------------------------------------------------------------------------|-----------------|--------------------------------------------------------------------|-------------------------------------------------------------------------|---------------| -| **Configuration** | `apollo_mcp.config.load_success` | Counter | error_type | Successful config / startup loads | Must Have | -| | `apollo_mcp.config.load_failure` | Counter | error_type | Failed startup (bad schema, manifest, endpoint) | Must Have | -| | `apollo_mcp.tools.registered{source="builtin:introspect"}` | Gauge | — | Introspect tool present if enabled (always =1) | Must Have | -| | `apollo_mcp.tools.registered{source="builtin:search"}` | Gauge | — | Search tool present if enabled (always =1) | Must Have | -| | `apollo_mcp.tools.registered{source="persisted_query"}` | Gauge | — | # of tools from persisted query manifest | Must Have | -| | `apollo_mcp.tools.registered{source="operation_collection"}` | Gauge | — | # of tools from operation collections | Must Have | -| | `apollo_mcp.tools.registered{source="graphql_file"}` | Gauge | — | # of tools from `.graphql` files | Should Have | -| | `apollo_mcp.tools.registered{source="introspection_generated"}` | Gauge | — | # of tools auto-generated from schema introspection | Should Have | -| | `apollo_mcp.schema.source` | Attribute/Event | uplink, local_file, introspection | Where schema was loaded from | Must Have | -| | `apollo_mcp.schema.load_success` / `apollo_mcp.schema.load_failure` | Counter | schema_source | Schema load status | Must Have | -| | `apollo_mcp.schema.size` | Gauge | — | # of types/fields in schema | Should Have | -| | `apollo_mcp.version.info` | Attribute/Event | server_version, schema_hash, manifest_version, manifest_source | Server binary version, GraphQL schema hash, manifest version, manifest type (persisted_query/operation_collection) | Must Have | -| **Usage** | `apollo.mcp.calls` | Counter | tool_name, success, error_code, client_type | Total tool invocations | Must Have | -| | `apollo.mcp.calls.latency` | Histogram | tool_name, success, error_code, client_type | End-to-end request latency | Must Have | -| | `apollo.mcp.operation.calls` | Counter | tool_name, success, error_code, client_type, operation_name | # of backend GraphQL operations executed | Must Have | -| | `apollo.mcp.operation.latency` | Histogram | tool_name, success, error_code, client_type, operation_name | Latency of GraphQL backend call (excludes tool overhead) | Must Have | -| | `apollo_mcp.operation.type.mix` | Counter | query, mutation, subscription | Breakdown of operation types | Should Have | -| | `apollo_mcp.responses.size` | Histogram | tool_name, client_type | Size of responses (bytes) | Should Have | -| | `apollo_mcp.responses.characters` | Histogram | tool_name, client_type | Character count of response payloads (proxy for token estimation) | Nice to Have | -| | `apollo_mcp.clients.active` | Gauge | — | # of active MCP clients | Must Have | -| | `apollo_mcp.concurrency.current_requests` | Gauge | — | # of concurrent tool executions | Should Have | -| **Errors / Reliability** | `apollo_mcp.requests.errors` | Counter | error_type, tool_name, client_type | Failed tool calls (generic catch-all) | Must Have | -| | `apollo_mcp.graphql.backend.errors` | Counter | status_code, operation_name | Errors from upstream GraphQL API | Must Have | -| | `apollo_mcp.transport.errors` | Counter | error_type | Invalid JSON-RPC, dropped connections | Should Have | -| | `apollo_mcp.auth.failures` | Counter | reason, client_type | Authentication failures | Must Have | -| | `apollo_mcp.timeouts` | Counter | tool_name, client_type | Tool or backend operation timed out | Must Have | -| **Traces** | Span: `mcp.tool_invocation` | Trace | tool_name, latency, success | Span for each tool invocation | Must Have | -| | Span: `graphql.operation` | Trace | operation_name, latency, success, error_code | Child span for backend GraphQL operation | Must Have | +| **Configuration** | `apollo.mcp.config.load.count. ` | Counter | success | config / startup loads | Must Have | +| | `apollo.mcp.tools.registered{source="builtin:introspect"}` | Gauge | — | Introspect tool present if enabled (always =1) | Must Have | +| | `apollo.mcp.tools.registered{source="builtin:search"}` | Gauge | — | Search tool present if enabled (always =1) | Must Have | +| | `apollo.mcp.tools.registered{source="persisted_query"}` | Gauge | — | # of tools from persisted query manifest | Must Have | +| | `apollo.mcp.tools.registered{source="operation_collection"}` | Gauge | — | # of tools from operation collections | Must Have | +| | `apollo.mcp.tools.registered{source="graphql_file"}` | Gauge | — | # of tools from `.graphql` files | Should Have | +| | `apollo.mcp.tools.registered{source="introspection_generated"}` | Gauge | — | # of tools auto-generated from schema introspection | Should Have | +| | `apollo.mcp.schema.source` | Attribute/Event | uplink, local_file, introspection | Where schema was loaded from | Must Have | +| | `apollo.mcp.schema.load` | Counter | schema_source, success | Schema load status | Must Have | +| | `apollo.mcp.schema.size` | Gauge | — | # of types/fields in schema | Should Have | +| | `apollo.mcp.version.info` | Attribute/Event | server_version, schema_hash, manifest_version, manifest_source | Server binary version, GraphQL schema hash, manifest version, manifest type (persisted_query/operation_collection) | Must Have | +| **Usage** | `apollo.mcp.tool.call.count` | Counter | tool_name, success, error_code, client_type | Total tool invocations | Must Have | +| | `apollo.mcp.tool.call.duration` | Histogram | tool_name, success, error_code, client_type | End-to-end request latency | Must Have | +| | `apollo.mcp.graphql.operation.count` | Counter | tool_name, success, error_code, client_type, operation_name | # of backend GraphQL operations executed | Must Have | +| | `apollo.mcp.graphql.operation.duration` | Histogram | tool_name, success, error_code, client_type, operation_name | Latency of GraphQL backend call (excludes tool overhead) | Must Have | +| | `apollo.mcp.operation.type.mix` | Counter | query, mutation, subscription | Breakdown of operation types | Should Have | +| | `apollo.mcp.responses.size` | Histogram | tool_name, client_type | Size of responses (bytes) | Should Have | +| | `apollo.mcp.responses.characters` | Histogram | tool_name, client_type | Character count of response payloads (proxy for token estimation) | Nice to Have | +| | `apollo.mcp.clients.active` | Gauge | — | # of active MCP clients | Must Have | +| | `apollo.mcp.concurrency.current_requests` | Gauge | — | # of concurrent tool executions | Should Have | +| **Errors / Reliability** | `apollo.mcp.requests.errors` | Counter | error_type, tool_name, client_type | Failed tool calls (generic catch-all) | Must Have | +| | `apollo.mcp.graphql.backend.errors` | Counter | status_code, operation_name | Errors from upstream GraphQL API | Must Have | +| | `apollo.mcp.transport.errors` | Counter | error_type | Invalid JSON-RPC, dropped connections | Should Have | +| | `apollo.mcp.auth.failures` | Counter | reason, client_type | Authentication failures | Must Have | +| | `apollo.mcp.timeouts` | Counter | tool_name, client_type | Tool or backend operation timed out | Must Have | +| **Traces** | Span: `apollo.mcp.tool_invocation` | Trace | tool_name, latency, success | Span for each tool invocation | Must Have | +| | Span: `apollo.mcp.graphql.operation` | Trace | operation_name, latency, success, error_code | Child span for backend GraphQL operation | Must Have | | | Span: `serialization` | Trace | size_bytes, latency | Encoding/decoding JSON-RPC overhead | Nice to Have | -| **Events** | `apollo_mcp.client.connected` | Event | client_type | Client connection established | Should Have | -| | `apollo_mcp.client.disconnected` | Event | client_type | Client disconnected | Should Have | -| | `apollo_mcp.config.reload` | Event | schema_source, version_hash | Config/schema/manifest/collection reload | Nice to Have | -| | `apollo_mcp.auth.failed` | Event | client_type, reason | Auth failure | Must Have | +| **Events** | `apollo.mcp.client.connected` | Event | client_type | Client connection established | Should Have | +| | `apollo.mcp.client.disconnected` | Event | client_type | Client disconnected | Should Have | +| | `apollo.mcp.config.reload` | Event | schema_source, version_hash | Config/schema/manifest/collection reload | Nice to Have | +| | `apollo.mcp.auth.failed` | Event | client_type, reason | Auth failure | Must Have | | **Resource Usage** | `process.cpu.usage` | Gauge | — | CPU usage of MCP server process | Nice to Have | | | `process.memory.usage` | Gauge | — | Memory usage | Nice to Have | | | `network.bytes.sent` / `network.bytes.received` | Counter | — | Network traffic | Nice to Have | From 3e65ee3e00d64d368f317b3a25f32a26b85ca67f Mon Sep 17 00:00:00 2001 From: Samuel Collard Date: Tue, 16 Sep 2025 20:04:43 -0500 Subject: [PATCH 4/6] Fix typo, add http server metrics --- specs/telemetry.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/specs/telemetry.md b/specs/telemetry.md index eed4437a..9278eee7 100644 --- a/specs/telemetry.md +++ b/specs/telemetry.md @@ -2,7 +2,7 @@ | Category | Metric / Trace / Event | Type | Attributes | Notes | Priority | |---------------------------|--------------------------------------------------------------------------|-----------------|--------------------------------------------------------------------|-------------------------------------------------------------------------|---------------| -| **Configuration** | `apollo.mcp.config.load.count. ` | Counter | success | config / startup loads | Must Have | +| **Configuration** | `apollo.mcp.config.load`. | Counter | success | config / startup loads | Must Have | | | `apollo.mcp.tools.registered{source="builtin:introspect"}` | Gauge | — | Introspect tool present if enabled (always =1) | Must Have | | | `apollo.mcp.tools.registered{source="builtin:search"}` | Gauge | — | Search tool present if enabled (always =1) | Must Have | | | `apollo.mcp.tools.registered{source="persisted_query"}` | Gauge | — | # of tools from persisted query manifest | Must Have | @@ -15,16 +15,12 @@ | | `apollo.mcp.version.info` | Attribute/Event | server_version, schema_hash, manifest_version, manifest_source | Server binary version, GraphQL schema hash, manifest version, manifest type (persisted_query/operation_collection) | Must Have | | **Usage** | `apollo.mcp.tool.call.count` | Counter | tool_name, success, error_code, client_type | Total tool invocations | Must Have | | | `apollo.mcp.tool.call.duration` | Histogram | tool_name, success, error_code, client_type | End-to-end request latency | Must Have | -| | `apollo.mcp.graphql.operation.count` | Counter | tool_name, success, error_code, client_type, operation_name | # of backend GraphQL operations executed | Must Have | -| | `apollo.mcp.graphql.operation.duration` | Histogram | tool_name, success, error_code, client_type, operation_name | Latency of GraphQL backend call (excludes tool overhead) | Must Have | -| | `apollo.mcp.operation.type.mix` | Counter | query, mutation, subscription | Breakdown of operation types | Should Have | +| | `apollo.mcp.graphql.operation.count` | Counter | tool_name, success, error_code, client_type, operation_name, operation_type | # of backend GraphQL operations executed | Must Have | +| | `apollo.mcp.graphql.operation.duration` | Histogram | tool_name, success, error_code, client_type, operation_name, operation_type | Latency of GraphQL backend call (excludes tool overhead) | Must Have | | | `apollo.mcp.responses.size` | Histogram | tool_name, client_type | Size of responses (bytes) | Should Have | | | `apollo.mcp.responses.characters` | Histogram | tool_name, client_type | Character count of response payloads (proxy for token estimation) | Nice to Have | | | `apollo.mcp.clients.active` | Gauge | — | # of active MCP clients | Must Have | | | `apollo.mcp.concurrency.current_requests` | Gauge | — | # of concurrent tool executions | Should Have | -| **Errors / Reliability** | `apollo.mcp.requests.errors` | Counter | error_type, tool_name, client_type | Failed tool calls (generic catch-all) | Must Have | -| | `apollo.mcp.graphql.backend.errors` | Counter | status_code, operation_name | Errors from upstream GraphQL API | Must Have | -| | `apollo.mcp.transport.errors` | Counter | error_type | Invalid JSON-RPC, dropped connections | Should Have | | | `apollo.mcp.auth.failures` | Counter | reason, client_type | Authentication failures | Must Have | | | `apollo.mcp.timeouts` | Counter | tool_name, client_type | Tool or backend operation timed out | Must Have | | **Traces** | Span: `apollo.mcp.tool_invocation` | Trace | tool_name, latency, success | Span for each tool invocation | Must Have | @@ -34,9 +30,13 @@ | | `apollo.mcp.client.disconnected` | Event | client_type | Client disconnected | Should Have | | | `apollo.mcp.config.reload` | Event | schema_source, version_hash | Config/schema/manifest/collection reload | Nice to Have | | | `apollo.mcp.auth.failed` | Event | client_type, reason | Auth failure | Must Have | -| **Resource Usage** | `process.cpu.usage` | Gauge | — | CPU usage of MCP server process | Nice to Have | -| | `process.memory.usage` | Gauge | — | Memory usage | Nice to Have | -| | `network.bytes.sent` / `network.bytes.received` | Counter | — | Network traffic | Nice to Have | +| **HTTP Metrics** | `http.server.request.duration` | Histogram | — | Duration of HTTP server requests. | Nice to Have | +| | `http.server.active_requests` | Counter | — | Number of active HTTP server requests. | Nice to Have | +| | `http.server.request.body.size` | Histogram | — | Size of HTTP server request bodies. | Nice to Have | +| | `http.server.response.body.size` | Histogram | — | Size of HTTP server response bodies. | Nice to Have | + + + ## Implementation Notes From 6524521e8d945970d65fb1f9ab3cdcad3b65e55f Mon Sep 17 00:00:00 2001 From: Samuel Collard Date: Tue, 16 Sep 2025 20:06:51 -0500 Subject: [PATCH 5/6] typo: Extra period --- specs/telemetry.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/telemetry.md b/specs/telemetry.md index 9278eee7..0ad9a8ff 100644 --- a/specs/telemetry.md +++ b/specs/telemetry.md @@ -2,7 +2,7 @@ | Category | Metric / Trace / Event | Type | Attributes | Notes | Priority | |---------------------------|--------------------------------------------------------------------------|-----------------|--------------------------------------------------------------------|-------------------------------------------------------------------------|---------------| -| **Configuration** | `apollo.mcp.config.load`. | Counter | success | config / startup loads | Must Have | +| **Configuration** | `apollo.mcp.config.load` | Counter | success | config / startup loads | Must Have | | | `apollo.mcp.tools.registered{source="builtin:introspect"}` | Gauge | — | Introspect tool present if enabled (always =1) | Must Have | | | `apollo.mcp.tools.registered{source="builtin:search"}` | Gauge | — | Search tool present if enabled (always =1) | Must Have | | | `apollo.mcp.tools.registered{source="persisted_query"}` | Gauge | — | # of tools from persisted query manifest | Must Have | From f1bf72e2d189e3f17da8cafd89619bb1cde11cfa Mon Sep 17 00:00:00 2001 From: Kevin Chu Date: Thu, 18 Sep 2025 15:32:54 -0700 Subject: [PATCH 6/6] Add 3rd party and Apollo columns. Add query analysis telemetry proposal --- specs/telemetry.md | 99 ++++++++++++++++++++++++++++++---------------- 1 file changed, 65 insertions(+), 34 deletions(-) diff --git a/specs/telemetry.md b/specs/telemetry.md index 0ad9a8ff..bf0abf50 100644 --- a/specs/telemetry.md +++ b/specs/telemetry.md @@ -1,39 +1,49 @@ # Apollo MCP Server Telemetry Spec -| Category | Metric / Trace / Event | Type | Attributes | Notes | Priority | -|---------------------------|--------------------------------------------------------------------------|-----------------|--------------------------------------------------------------------|-------------------------------------------------------------------------|---------------| -| **Configuration** | `apollo.mcp.config.load` | Counter | success | config / startup loads | Must Have | -| | `apollo.mcp.tools.registered{source="builtin:introspect"}` | Gauge | — | Introspect tool present if enabled (always =1) | Must Have | -| | `apollo.mcp.tools.registered{source="builtin:search"}` | Gauge | — | Search tool present if enabled (always =1) | Must Have | -| | `apollo.mcp.tools.registered{source="persisted_query"}` | Gauge | — | # of tools from persisted query manifest | Must Have | -| | `apollo.mcp.tools.registered{source="operation_collection"}` | Gauge | — | # of tools from operation collections | Must Have | -| | `apollo.mcp.tools.registered{source="graphql_file"}` | Gauge | — | # of tools from `.graphql` files | Should Have | -| | `apollo.mcp.tools.registered{source="introspection_generated"}` | Gauge | — | # of tools auto-generated from schema introspection | Should Have | -| | `apollo.mcp.schema.source` | Attribute/Event | uplink, local_file, introspection | Where schema was loaded from | Must Have | -| | `apollo.mcp.schema.load` | Counter | schema_source, success | Schema load status | Must Have | -| | `apollo.mcp.schema.size` | Gauge | — | # of types/fields in schema | Should Have | -| | `apollo.mcp.version.info` | Attribute/Event | server_version, schema_hash, manifest_version, manifest_source | Server binary version, GraphQL schema hash, manifest version, manifest type (persisted_query/operation_collection) | Must Have | -| **Usage** | `apollo.mcp.tool.call.count` | Counter | tool_name, success, error_code, client_type | Total tool invocations | Must Have | -| | `apollo.mcp.tool.call.duration` | Histogram | tool_name, success, error_code, client_type | End-to-end request latency | Must Have | -| | `apollo.mcp.graphql.operation.count` | Counter | tool_name, success, error_code, client_type, operation_name, operation_type | # of backend GraphQL operations executed | Must Have | -| | `apollo.mcp.graphql.operation.duration` | Histogram | tool_name, success, error_code, client_type, operation_name, operation_type | Latency of GraphQL backend call (excludes tool overhead) | Must Have | -| | `apollo.mcp.responses.size` | Histogram | tool_name, client_type | Size of responses (bytes) | Should Have | -| | `apollo.mcp.responses.characters` | Histogram | tool_name, client_type | Character count of response payloads (proxy for token estimation) | Nice to Have | -| | `apollo.mcp.clients.active` | Gauge | — | # of active MCP clients | Must Have | -| | `apollo.mcp.concurrency.current_requests` | Gauge | — | # of concurrent tool executions | Should Have | -| | `apollo.mcp.auth.failures` | Counter | reason, client_type | Authentication failures | Must Have | -| | `apollo.mcp.timeouts` | Counter | tool_name, client_type | Tool or backend operation timed out | Must Have | -| **Traces** | Span: `apollo.mcp.tool_invocation` | Trace | tool_name, latency, success | Span for each tool invocation | Must Have | -| | Span: `apollo.mcp.graphql.operation` | Trace | operation_name, latency, success, error_code | Child span for backend GraphQL operation | Must Have | -| | Span: `serialization` | Trace | size_bytes, latency | Encoding/decoding JSON-RPC overhead | Nice to Have | -| **Events** | `apollo.mcp.client.connected` | Event | client_type | Client connection established | Should Have | -| | `apollo.mcp.client.disconnected` | Event | client_type | Client disconnected | Should Have | -| | `apollo.mcp.config.reload` | Event | schema_source, version_hash | Config/schema/manifest/collection reload | Nice to Have | -| | `apollo.mcp.auth.failed` | Event | client_type, reason | Auth failure | Must Have | -| **HTTP Metrics** | `http.server.request.duration` | Histogram | — | Duration of HTTP server requests. | Nice to Have | -| | `http.server.active_requests` | Counter | — | Number of active HTTP server requests. | Nice to Have | -| | `http.server.request.body.size` | Histogram | — | Size of HTTP server request bodies. | Nice to Have | -| | `http.server.response.body.size` | Histogram | — | Size of HTTP server response bodies. | Nice to Have | +| Category | Metric / Trace / Event | Type | Attributes | Notes | 3rd party | Apollo | Priority | +|---------------------------|--------------------------------------------------------------------------|-----------------|--------------------------------------------------------------------|-------------------------------------------------------------------------|---------------|---------------|---------------| +| **Configuration** | `apollo.mcp.config.load` | Counter | success | config / startup loads | yes | yes | Should Have | +| | `apollo.mcp.tools.registered{source="builtin:introspect"}` | Gauge | — | Introspect tool present if enabled (always =1) | yes | yes | Should Have | +| | `apollo.mcp.tools.registered{source="builtin:search"}` | Gauge | — | Search tool present if enabled (always =1) | yes | yes | Should Have | +| | `apollo.mcp.tools.registered{source="persisted_query"}` | Gauge | — | # of tools from persisted query manifest | yes | yes | Should Have | +| | `apollo.mcp.tools.registered{source="operation_collection"}` | Gauge | — | # of tools from operation collections | yes | yes | Should Have | +| | `apollo.mcp.tools.registered{source="graphql_file"}` | Gauge | — | # of tools from `.graphql` files | yes | yes | Should Have | +| | `apollo.mcp.tools.registered{source="introspection_generated"}` | Gauge | — | # of tools auto-generated from schema introspection | yes | yes | Should Have | +| | `apollo.mcp.schema.source` | Attribute/Event | uplink, local_file, introspection | Where schema was loaded from | yes | yes | Should Have | +| | `apollo.mcp.schema.load` | Counter | schema_source, success | Schema load status | yes | yes | Should Have | +| | `apollo.mcp.schema.size` | Gauge | — | # of types/fields in schema | no | yes | Should Have | +| | `apollo.mcp.version.info` | Attribute/Event | server_version, schema_hash, manifest_version, manifest_source | Server binary version, GraphQL schema hash, manifest version, manifest type (persisted_query/operation_collection) | yes | yes | Should Have | +| **Usage** | `apollo.mcp.tool.call.count` | Counter | tool_name, success, error_code, client_type | Total tool invocations | yes | yes | Must Have | +| | `apollo.mcp.tool.call.duration` | Histogram | tool_name, success, error_code, client_type | End-to-end request latency | yes | yes | Must Have | +| | `apollo.mcp.graphql.operation.count` | Counter | tool_name, success, error_code, client_type, operation_name, operation_type | # of backend GraphQL operations executed | yes | yes | Must Have | +| | `apollo.mcp.graphql.operation.duration` | Histogram | tool_name, success, error_code, client_type, operation_name, operation_type | Latency of GraphQL backend call (excludes tool overhead) | yes | yes | Must Have | +| | `apollo.mcp.responses.size` | Histogram | tool_name, client_type | Size of responses (bytes) | yes | yes | Should Have | +| | `apollo.mcp.responses.characters` | Histogram | tool_name, client_type | Character count of response payloads (proxy for token estimation) | yes | yes | Nice to Have | +| | `apollo.mcp.clients.active` | Gauge | — | # of active MCP clients | yes | yes | Must Have | +| | `apollo.mcp.concurrency.current_requests` | Gauge | — | # of concurrent tool executions | yes | yes | Should Have | +| | `apollo.mcp.auth.failures` | Counter | reason, client_type | Authentication failures | yes | yes | Must Have | +| | `apollo.mcp.timeouts` | Counter | tool_name, client_type | Tool or backend operation timed out | yes | yes | Must Have | +| **Traces** | Span: `apollo.mcp.tool_invocation` | Trace | tool_name, latency, success | Span for each tool invocation | yes | yes | Must Have | +| | Span: `apollo.mcp.graphql.operation` | Trace | operation_name, latency, success, error_code | Child span for backend GraphQL operation | yes | yes | Must Have | +| | Span: `serialization` | Trace | size_bytes, latency | Encoding/decoding JSON-RPC overhead | no | yes | Nice to Have | +| **Events** | `apollo.mcp.client.connected` | Event | client_type | Client connection established | yes | yes | Should Have | +| | `apollo.mcp.client.disconnected` | Event | client_type | Client disconnected | yes | yes | Should Have | +| | `apollo.mcp.config.reload` | Event | schema_source, version_hash | Config/schema/manifest/collection reload | no | yes | Nice to Have | +| | `apollo.mcp.auth.failed` | Event | client_type, reason | Auth failure | yes | yes | Must Have | +| **HTTP Metrics** | `http.server.request.duration` | Histogram | — | Duration of HTTP server requests. | yes | yes | Nice to Have | +| | `http.server.active_requests` | Counter | — | Number of active HTTP server requests. | yes | yes | Nice to Have | +| | `http.server.request.body.size` | Histogram | — | Size of HTTP server request bodies. | yes | yes | Nice to Have | +| | `http.server.response.body.size` | Histogram | — | Size of HTTP server response bodies. | yes | yes | Nice to Have | +| **Query Analysis** | `apollo.mcp.query.depth.max` | Histogram | tool_name, operation_name | Maximum selection depth in query | no | yes | Nice to Have | +| | `apollo.mcp.query.fields.total` | Histogram | tool_name, operation_name | Total number of fields selected | no | yes | Nice to Have | +| | `apollo.mcp.query.fields.leaf` | Histogram | tool_name, operation_name | Number of leaf fields selected | no | yes | Nice to Have | +| | `apollo.mcp.query.breadth.max` | Histogram | tool_name, operation_name | Maximum breadth at any level | no | yes | Nice to Have | +| | `apollo.mcp.query.shape.pattern` | Counter | tool_name, pattern_type | Categorized patterns: "shallow_broad", "deep_narrow", "mixed" | no | yes | Nice to Have | +| | `apollo.mcp.query.directives.skip` | Counter | tool_name, operation_name | Usage of @skip directive | no | yes | Nice to Have | +| | `apollo.mcp.query.directives.include` | Counter | tool_name, operation_name | Usage of @include directive | no | yes | Nice to Have | +| | `apollo.mcp.query.aliases.count` | Histogram | tool_name, operation_name | Number of field aliases used | no | yes | Nice to Have | +| | `apollo.mcp.query.fragments.count` | Histogram | tool_name, operation_name | Number of fragments used | no | yes | Nice to Have | +| | `apollo.mcp.query.variables.count` | Histogram | tool_name, operation_name | Number of variables used | no | yes | Nice to Have | @@ -78,3 +88,24 @@ ### Configuration Metrics - Probably useful only for Apollo + +### Query Analysis +**Implementation Requirements**: Add GraphQL AST parsing to the MCP Server to analyze queries before forwarding them to the backend. + +**Current Architecture**: MCP Server acts as a proxy, forwarding query strings without parsing. Query complexity analysis requires adding a GraphQL parser dependency (e.g., `graphql-parser` crate) to parse queries into AST before execution. + +**Alternative: Router-Based Analysis**: Apollo Router already captures query complexity metrics, but correlating Router data with MCP tool calls would require users to configure both systems with matching headers/trace IDs - an unrealistic deployment requirement. + +**Zero-Configuration Approach**: Implement AST parsing directly in MCP Server for immediate, out-of-the-box insights without external coordination. + +**Performance Considerations**: +- AST parsing overhead is minimal compared to network/GraphQL execution time +- Optional sampling (e.g., 10% of queries) can further reduce overhead if needed +- Analysis happens once per tool call, not per field resolution + +**Pattern Detection**: +- **Shallow vs Deep**: Track max depth and breadth to identify query patterns +- **Advanced Features**: Count usage of directives, aliases, fragments, variables +- **Categorization**: Automatically classify as "shallow_broad", "deep_narrow", or "mixed" based on depth/breadth ratios + +This approach provides immediate insights into MCP tool usage patterns without requiring users to configure multiple systems.