OTEL: Instrument ctx.sample() / sampling loop with spans and metrics

## Summary

The sampling loop (`ctx.sample()`, `ctx.sample_step()`) is where most time and cost is spent in MCP servers that use LLM calls, but it has zero OTEL instrumentation. This issue tracks adding spans, metrics, and opt-in content capture to the sampling internals.

## Current state

`src/fastmcp/server/sampling/run.py` has no telemetry imports, no spans, no metrics. The entire sampling loop — LLM calls, tool execution, validation retries, structured output parsing — is invisible in traces.

## Proposed spans

### `sampling/createMessage` — wraps `sample_impl()`
The full sampling loop from start to final response. This is a [well-known MCP method name](https://opentelemetry.io/docs/specs/semconv/gen-ai/mcp/).

**Attributes:**
- `mcp.method.name` = `"sampling/createMessage"`
- `gen_ai.request.model` (from model_preferences if available)
- `gen_ai.request.temperature`
- `gen_ai.request.max_tokens`
- `fastmcp.sampling.tool_count` — number of tools provided
- `fastmcp.sampling.result_type` — structured output type name or "str"
- `fastmcp.sampling.iterations` — total iterations before completion (set on span end)
- `fastmcp.sampling.handler_mode` — "fallback" or "client"

### `sampling/createMessage step` — wraps each `sample_step_impl()` call
Each LLM round-trip within the loop.

**Attributes:**
- `fastmcp.sampling.iteration` — current iteration number
- `fastmcp.sampling.stop_reason` — "toolUse", "endTurn", "maxTokens"

### `sampling.execute_tool {name}` — wraps each tool execution within the loop
Follows the google-genai pattern of child spans for tool execution during function-calling.

**Attributes:**
- `gen_ai.tool.name` — the tool being executed
- `error.type` — if the tool errors

### Events on the `sampling/createMessage` span
- `sampling.validation_failure` — when structured output validation fails (with retry count)
- `sampling.text_response_retry` — when LLM returns text instead of calling final_response

## Content capture (opt-in)

Controlled by the same env var used by google-genai instrumentation: `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` (default `false`).

When enabled, record as span events on the `sampling/createMessage` span:
- `gen_ai.system.message` — system prompt content
- `gen_ai.user.message` — user message content
- `gen_ai.choice` — LLM response content

This follows the [GenAI semantic conventions for content capture](https://opentelemetry.io/docs/specs/semconv/gen-ai/).

## Metrics

- `mcp.server.operation.duration` histogram for `sampling/createMessage` (per the MCP semconv)
- Consider `gen_ai.client.token.usage` histogram if token counts are available from the handler response

## Additional improvements (from instrumentation best practices research)

These apply to all existing spans, not just sampling:

- [ ] Add `span.is_recording()` guards before building attribute dicts (every mature OTEL instrumentation does this — httpx, google-genai, Flask)
- [ ] Use `type(e).__qualname__` instead of `type(e).__name__` for `error.type` (matches httpx and google-genai convention)

## References

- MCP semconv: https://opentelemetry.io/docs/specs/semconv/gen-ai/mcp/
- GenAI semconv (content capture): https://opentelemetry.io/docs/specs/semconv/gen-ai/
- google-genai instrumentation patterns: https://github.com/open-telemetry/opentelemetry-python-contrib
- Full audit: https://github.com/strawgate/fastmcp-pr-review/blob/main/otel-audit.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OTEL: Instrument ctx.sample() / sampling loop with spans and metrics #3891

Summary

Current state

Proposed spans

`sampling/createMessage` — wraps `sample_impl()`

`sampling/createMessage step` — wraps each `sample_step_impl()` call

`sampling.execute_tool {name}` — wraps each tool execution within the loop

Events on the `sampling/createMessage` span

Content capture (opt-in)

Metrics

Additional improvements (from instrumentation best practices research)

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OTEL: Instrument ctx.sample() / sampling loop with spans and metrics #3891

Description

Summary

Current state

Proposed spans

sampling/createMessage — wraps sample_impl()

sampling/createMessage step — wraps each sample_step_impl() call

sampling.execute_tool {name} — wraps each tool execution within the loop

Events on the sampling/createMessage span

Content capture (opt-in)

Metrics

Additional improvements (from instrumentation best practices research)

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`sampling/createMessage` — wraps `sample_impl()`

`sampling/createMessage step` — wraps each `sample_step_impl()` call

`sampling.execute_tool {name}` — wraps each tool execution within the loop

Events on the `sampling/createMessage` span