Summary
The sampling loop (ctx.sample(), ctx.sample_step()) is where most time and cost is spent in MCP servers that use LLM calls, but it has zero OTEL instrumentation. This issue tracks adding spans, metrics, and opt-in content capture to the sampling internals.
Current state
src/fastmcp/server/sampling/run.py has no telemetry imports, no spans, no metrics. The entire sampling loop — LLM calls, tool execution, validation retries, structured output parsing — is invisible in traces.
Proposed spans
sampling/createMessage — wraps sample_impl()
The full sampling loop from start to final response. This is a well-known MCP method name.
Attributes:
mcp.method.name = "sampling/createMessage"
gen_ai.request.model (from model_preferences if available)
gen_ai.request.temperature
gen_ai.request.max_tokens
fastmcp.sampling.tool_count — number of tools provided
fastmcp.sampling.result_type — structured output type name or "str"
fastmcp.sampling.iterations — total iterations before completion (set on span end)
fastmcp.sampling.handler_mode — "fallback" or "client"
sampling/createMessage step — wraps each sample_step_impl() call
Each LLM round-trip within the loop.
Attributes:
fastmcp.sampling.iteration — current iteration number
fastmcp.sampling.stop_reason — "toolUse", "endTurn", "maxTokens"
sampling.execute_tool {name} — wraps each tool execution within the loop
Follows the google-genai pattern of child spans for tool execution during function-calling.
Attributes:
gen_ai.tool.name — the tool being executed
error.type — if the tool errors
Events on the sampling/createMessage span
sampling.validation_failure — when structured output validation fails (with retry count)
sampling.text_response_retry — when LLM returns text instead of calling final_response
Content capture (opt-in)
Controlled by the same env var used by google-genai instrumentation: OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT (default false).
When enabled, record as span events on the sampling/createMessage span:
gen_ai.system.message — system prompt content
gen_ai.user.message — user message content
gen_ai.choice — LLM response content
This follows the GenAI semantic conventions for content capture.
Metrics
mcp.server.operation.duration histogram for sampling/createMessage (per the MCP semconv)
- Consider
gen_ai.client.token.usage histogram if token counts are available from the handler response
Additional improvements (from instrumentation best practices research)
These apply to all existing spans, not just sampling:
References
Summary
The sampling loop (
ctx.sample(),ctx.sample_step()) is where most time and cost is spent in MCP servers that use LLM calls, but it has zero OTEL instrumentation. This issue tracks adding spans, metrics, and opt-in content capture to the sampling internals.Current state
src/fastmcp/server/sampling/run.pyhas no telemetry imports, no spans, no metrics. The entire sampling loop — LLM calls, tool execution, validation retries, structured output parsing — is invisible in traces.Proposed spans
sampling/createMessage— wrapssample_impl()The full sampling loop from start to final response. This is a well-known MCP method name.
Attributes:
mcp.method.name="sampling/createMessage"gen_ai.request.model(from model_preferences if available)gen_ai.request.temperaturegen_ai.request.max_tokensfastmcp.sampling.tool_count— number of tools providedfastmcp.sampling.result_type— structured output type name or "str"fastmcp.sampling.iterations— total iterations before completion (set on span end)fastmcp.sampling.handler_mode— "fallback" or "client"sampling/createMessage step— wraps eachsample_step_impl()callEach LLM round-trip within the loop.
Attributes:
fastmcp.sampling.iteration— current iteration numberfastmcp.sampling.stop_reason— "toolUse", "endTurn", "maxTokens"sampling.execute_tool {name}— wraps each tool execution within the loopFollows the google-genai pattern of child spans for tool execution during function-calling.
Attributes:
gen_ai.tool.name— the tool being executederror.type— if the tool errorsEvents on the
sampling/createMessagespansampling.validation_failure— when structured output validation fails (with retry count)sampling.text_response_retry— when LLM returns text instead of calling final_responseContent capture (opt-in)
Controlled by the same env var used by google-genai instrumentation:
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT(defaultfalse).When enabled, record as span events on the
sampling/createMessagespan:gen_ai.system.message— system prompt contentgen_ai.user.message— user message contentgen_ai.choice— LLM response contentThis follows the GenAI semantic conventions for content capture.
Metrics
mcp.server.operation.durationhistogram forsampling/createMessage(per the MCP semconv)gen_ai.client.token.usagehistogram if token counts are available from the handler responseAdditional improvements (from instrumentation best practices research)
These apply to all existing spans, not just sampling:
span.is_recording()guards before building attribute dicts (every mature OTEL instrumentation does this — httpx, google-genai, Flask)type(e).__qualname__instead oftype(e).__name__forerror.type(matches httpx and google-genai convention)References