Skip to content

OTEL: Instrument ctx.sample() / sampling loop with spans and metrics #3891

@strawgate

Description

@strawgate

Summary

The sampling loop (ctx.sample(), ctx.sample_step()) is where most time and cost is spent in MCP servers that use LLM calls, but it has zero OTEL instrumentation. This issue tracks adding spans, metrics, and opt-in content capture to the sampling internals.

Current state

src/fastmcp/server/sampling/run.py has no telemetry imports, no spans, no metrics. The entire sampling loop — LLM calls, tool execution, validation retries, structured output parsing — is invisible in traces.

Proposed spans

sampling/createMessage — wraps sample_impl()

The full sampling loop from start to final response. This is a well-known MCP method name.

Attributes:

  • mcp.method.name = "sampling/createMessage"
  • gen_ai.request.model (from model_preferences if available)
  • gen_ai.request.temperature
  • gen_ai.request.max_tokens
  • fastmcp.sampling.tool_count — number of tools provided
  • fastmcp.sampling.result_type — structured output type name or "str"
  • fastmcp.sampling.iterations — total iterations before completion (set on span end)
  • fastmcp.sampling.handler_mode — "fallback" or "client"

sampling/createMessage step — wraps each sample_step_impl() call

Each LLM round-trip within the loop.

Attributes:

  • fastmcp.sampling.iteration — current iteration number
  • fastmcp.sampling.stop_reason — "toolUse", "endTurn", "maxTokens"

sampling.execute_tool {name} — wraps each tool execution within the loop

Follows the google-genai pattern of child spans for tool execution during function-calling.

Attributes:

  • gen_ai.tool.name — the tool being executed
  • error.type — if the tool errors

Events on the sampling/createMessage span

  • sampling.validation_failure — when structured output validation fails (with retry count)
  • sampling.text_response_retry — when LLM returns text instead of calling final_response

Content capture (opt-in)

Controlled by the same env var used by google-genai instrumentation: OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT (default false).

When enabled, record as span events on the sampling/createMessage span:

  • gen_ai.system.message — system prompt content
  • gen_ai.user.message — user message content
  • gen_ai.choice — LLM response content

This follows the GenAI semantic conventions for content capture.

Metrics

  • mcp.server.operation.duration histogram for sampling/createMessage (per the MCP semconv)
  • Consider gen_ai.client.token.usage histogram if token counts are available from the handler response

Additional improvements (from instrumentation best practices research)

These apply to all existing spans, not just sampling:

  • Add span.is_recording() guards before building attribute dicts (every mature OTEL instrumentation does this — httpx, google-genai, Flask)
  • Use type(e).__qualname__ instead of type(e).__name__ for error.type (matches httpx and google-genai convention)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    too-longExcessively verbose or unedited LLM output. Condense before triage.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions