-
Notifications
You must be signed in to change notification settings - Fork 43
Telemetry Spec #352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Telemetry Spec #352
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
f7c90f1
Add telemetry requirements
kbychu a6e61ff
Add note about configuration metrics
kbychu 907a414
apollo_mcp to apollo.mcp, latency to duration, singular call
swcollard 3e65ee3
Fix typo, add http server metrics
swcollard 6524521
typo: Extra period
swcollard f1bf72e
Add 3rd party and Apollo columns. Add query analysis telemetry proposal
kbychu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
# Apollo MCP Server Telemetry Spec | ||
|
||
| Category | Metric / Trace / Event | Type | Attributes | Notes | 3rd party | Apollo | Priority | | ||
|---------------------------|--------------------------------------------------------------------------|-----------------|--------------------------------------------------------------------|-------------------------------------------------------------------------|---------------|---------------|---------------| | ||
| **Configuration** | `apollo.mcp.config.load` | Counter | success | config / startup loads | yes | yes | Should Have | | ||
| | `apollo.mcp.tools.registered{source="builtin:introspect"}` | Gauge | — | Introspect tool present if enabled (always =1) | yes | yes | Should Have | | ||
| | `apollo.mcp.tools.registered{source="builtin:search"}` | Gauge | — | Search tool present if enabled (always =1) | yes | yes | Should Have | | ||
| | `apollo.mcp.tools.registered{source="persisted_query"}` | Gauge | — | # of tools from persisted query manifest | yes | yes | Should Have | | ||
| | `apollo.mcp.tools.registered{source="operation_collection"}` | Gauge | — | # of tools from operation collections | yes | yes | Should Have | | ||
| | `apollo.mcp.tools.registered{source="graphql_file"}` | Gauge | — | # of tools from `.graphql` files | yes | yes | Should Have | | ||
| | `apollo.mcp.tools.registered{source="introspection_generated"}` | Gauge | — | # of tools auto-generated from schema introspection | yes | yes | Should Have | | ||
| | `apollo.mcp.schema.source` | Attribute/Event | uplink, local_file, introspection | Where schema was loaded from | yes | yes | Should Have | | ||
| | `apollo.mcp.schema.load` | Counter | schema_source, success | Schema load status | yes | yes | Should Have | | ||
| | `apollo.mcp.schema.size` | Gauge | — | # of types/fields in schema | no | yes | Should Have | | ||
| | `apollo.mcp.version.info` | Attribute/Event | server_version, schema_hash, manifest_version, manifest_source | Server binary version, GraphQL schema hash, manifest version, manifest type (persisted_query/operation_collection) | yes | yes | Should Have | | ||
| **Usage** | `apollo.mcp.tool.call.count` | Counter | tool_name, success, error_code, client_type | Total tool invocations | yes | yes | Must Have | | ||
| | `apollo.mcp.tool.call.duration` | Histogram | tool_name, success, error_code, client_type | End-to-end request latency | yes | yes | Must Have | | ||
| | `apollo.mcp.graphql.operation.count` | Counter | tool_name, success, error_code, client_type, operation_name, operation_type | # of backend GraphQL operations executed | yes | yes | Must Have | | ||
| | `apollo.mcp.graphql.operation.duration` | Histogram | tool_name, success, error_code, client_type, operation_name, operation_type | Latency of GraphQL backend call (excludes tool overhead) | yes | yes | Must Have | | ||
| | `apollo.mcp.responses.size` | Histogram | tool_name, client_type | Size of responses (bytes) | yes | yes | Should Have | | ||
| | `apollo.mcp.responses.characters` | Histogram | tool_name, client_type | Character count of response payloads (proxy for token estimation) | yes | yes | Nice to Have | | ||
| | `apollo.mcp.clients.active` | Gauge | — | # of active MCP clients | yes | yes | Must Have | | ||
| | `apollo.mcp.concurrency.current_requests` | Gauge | — | # of concurrent tool executions | yes | yes | Should Have | | ||
| | `apollo.mcp.auth.failures` | Counter | reason, client_type | Authentication failures | yes | yes | Must Have | | ||
| | `apollo.mcp.timeouts` | Counter | tool_name, client_type | Tool or backend operation timed out | yes | yes | Must Have | | ||
| **Traces** | Span: `apollo.mcp.tool_invocation` | Trace | tool_name, latency, success | Span for each tool invocation | yes | yes | Must Have | | ||
| | Span: `apollo.mcp.graphql.operation` | Trace | operation_name, latency, success, error_code | Child span for backend GraphQL operation | yes | yes | Must Have | | ||
| | Span: `serialization` | Trace | size_bytes, latency | Encoding/decoding JSON-RPC overhead | no | yes | Nice to Have | | ||
| **Events** | `apollo.mcp.client.connected` | Event | client_type | Client connection established | yes | yes | Should Have | | ||
| | `apollo.mcp.client.disconnected` | Event | client_type | Client disconnected | yes | yes | Should Have | | ||
| | `apollo.mcp.config.reload` | Event | schema_source, version_hash | Config/schema/manifest/collection reload | no | yes | Nice to Have | | ||
| | `apollo.mcp.auth.failed` | Event | client_type, reason | Auth failure | yes | yes | Must Have | | ||
| **HTTP Metrics** | `http.server.request.duration` | Histogram | — | Duration of HTTP server requests. | yes | yes | Nice to Have | | ||
| | `http.server.active_requests` | Counter | — | Number of active HTTP server requests. | yes | yes | Nice to Have | | ||
| | `http.server.request.body.size` | Histogram | — | Size of HTTP server request bodies. | yes | yes | Nice to Have | | ||
| | `http.server.response.body.size` | Histogram | — | Size of HTTP server response bodies. | yes | yes | Nice to Have | | ||
| **Query Analysis** | `apollo.mcp.query.depth.max` | Histogram | tool_name, operation_name | Maximum selection depth in query | no | yes | Nice to Have | | ||
| | `apollo.mcp.query.fields.total` | Histogram | tool_name, operation_name | Total number of fields selected | no | yes | Nice to Have | | ||
| | `apollo.mcp.query.fields.leaf` | Histogram | tool_name, operation_name | Number of leaf fields selected | no | yes | Nice to Have | | ||
| | `apollo.mcp.query.breadth.max` | Histogram | tool_name, operation_name | Maximum breadth at any level | no | yes | Nice to Have | | ||
| | `apollo.mcp.query.shape.pattern` | Counter | tool_name, pattern_type | Categorized patterns: "shallow_broad", "deep_narrow", "mixed" | no | yes | Nice to Have | | ||
| | `apollo.mcp.query.directives.skip` | Counter | tool_name, operation_name | Usage of @skip directive | no | yes | Nice to Have | | ||
| | `apollo.mcp.query.directives.include` | Counter | tool_name, operation_name | Usage of @include directive | no | yes | Nice to Have | | ||
| | `apollo.mcp.query.aliases.count` | Histogram | tool_name, operation_name | Number of field aliases used | no | yes | Nice to Have | | ||
| | `apollo.mcp.query.fragments.count` | Histogram | tool_name, operation_name | Number of fragments used | no | yes | Nice to Have | | ||
| | `apollo.mcp.query.variables.count` | Histogram | tool_name, operation_name | Number of variables used | no | yes | Nice to Have | | ||
|
||
|
||
|
||
|
||
## Implementation Notes | ||
|
||
### Client Identification Usage | ||
**`client_type` only:** | ||
- Direct client interactions: calls, operation calls/latency, response size, token estimation, timeouts | ||
- Error analysis: request errors, auth failures | ||
- Connection events: client connected/disconnected, auth failed | ||
- Purpose: Analyze client behavior patterns and identify client-specific issues | ||
|
||
**No client identification:** | ||
- Server configuration: config loads, tool registration, schema info, version info | ||
- System metrics: CPU, memory, network, active clients, concurrency | ||
- Backend operations: GraphQL backend errors, operation type mix, transport errors | ||
- Traces: tool invocation spans, GraphQL operations, serialization | ||
- Purpose: Server-wide metrics and request-level tracing independent of client behavior | ||
|
||
### Client Identification Implementation | ||
- **`client_type`**: Static client identifier derived from User-Agent header or configuration | ||
- Examples: `"claude"`, `"chatgpt"`, `"vscode"`, `"custom"`, `"unknown"` | ||
- Used for understanding client behavior patterns and performance differences | ||
- No PII concerns - represents client software type, not individual users | ||
- Optional: Use `"unknown"` if client type cannot be determined or for privacy | ||
|
||
### Privacy & Retention | ||
- Client identification is optional - use `"unknown"` if privacy concerns exist | ||
- No PII concerns with `client_type` - it represents software, not users | ||
- Ensure compliance with local data protection regulations | ||
|
||
### Token & Cost Estimation | ||
- **Real-time**: Use `apollo_mcp.responses.characters` for fast proxy estimation | ||
- Rule of thumb: 1 token ≈ 3-4 characters for most content | ||
- No performance impact - just `response.length` | ||
- **Offline/Optional**: For precise token counts, run tokenization in background jobs | ||
- Sample a subset of responses (e.g., 1-10%) to avoid performance impact | ||
- Use established tokenizers (tiktoken for OpenAI models, similar for others) | ||
- Store results separately from real-time metrics | ||
- Actual token counts will vary by model and tokenizer | ||
|
||
### Configuration Metrics | ||
- Probably useful only for Apollo | ||
|
||
### Query Analysis | ||
**Implementation Requirements**: Add GraphQL AST parsing to the MCP Server to analyze queries before forwarding them to the backend. | ||
|
||
**Current Architecture**: MCP Server acts as a proxy, forwarding query strings without parsing. Query complexity analysis requires adding a GraphQL parser dependency (e.g., `graphql-parser` crate) to parse queries into AST before execution. | ||
|
||
**Alternative: Router-Based Analysis**: Apollo Router already captures query complexity metrics, but correlating Router data with MCP tool calls would require users to configure both systems with matching headers/trace IDs - an unrealistic deployment requirement. | ||
|
||
**Zero-Configuration Approach**: Implement AST parsing directly in MCP Server for immediate, out-of-the-box insights without external coordination. | ||
|
||
**Performance Considerations**: | ||
- AST parsing overhead is minimal compared to network/GraphQL execution time | ||
- Optional sampling (e.g., 10% of queries) can further reduce overhead if needed | ||
- Analysis happens once per tool call, not per field resolution | ||
|
||
**Pattern Detection**: | ||
- **Shallow vs Deep**: Track max depth and breadth to identify query patterns | ||
- **Advanced Features**: Count usage of directives, aliases, fragments, variables | ||
- **Categorization**: Automatically classify as "shallow_broad", "deep_narrow", or "mixed" based on depth/breadth ratios | ||
|
||
This approach provides immediate insights into MCP tool usage patterns without requiring users to configure multiple systems. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.