TracePilot

From noisy traces to concrete engineering action

A stage-controlled LangGraph workflow for investigating failures, grounding them in source code, and preparing GitHub issues or PRs.

Why This Exists

Modern incident debugging is fragmented:

traces live in observability systems
code lives in GitHub
diagnosis lives in somebody's head
remediation gets rewritten again as an issue or PR

TracePilot compresses that loop into a single stateful graph:

initialize the run
inspect traces and logs
synthesize a diagnosis
pull repo context only when it is justified
prepare fix actions
optionally create a GitHub issue or PR
produce a structured trace tree and final result

What It Does

TracePilot is a Python package centered around TracePilotGraph. It accepts a typed request, builds runtime context, queries Jaeger, reasons over evidence, fetches GitHub source context when needed, and can execute one bounded GitHub action.

Core Capabilities

Investigate incidents from a trace ID or trace-oriented prompt.
Search and normalize Jaeger traces and span logs.
Decide whether the problem is source-code related before touching the repo.
Extract GitHub blob URLs from logs and fetch targeted code context.
Prepare issue and PR payloads from the investigation output.
Execute a single GitHub issue or PR action through a controlled subgraph.
Return a structured state object with diagnosis, evidence, timeline, code context, and action results.

Architecture

START
  |
  v
initialize_run
  |
  v
observability_agent
  |
  v
diagnosis_synthesizer
  |
  +--> repo_context ----------+
  |                           |
  +--> skip_repo_context      |
                              v
                    fix_action_preparation
                              |
                 +------------+------------+
                 |                         |
                 v                         v
           github_action           skip_github_action
                 |                         |
                 +------------+------------+
                              v
                       build_trace_tree
                              |
                              v
                             END

Design Notes

repo_context only runs when the diagnosis suggests code involvement or PR mode is requested.
github_action only runs for issue/PR workflows.
each stage writes back into a shared typed GraphState, keeping the run inspectable and testable

Repository Map

tracepilot/
  graph.py                     # top-level graph orchestration
  state/models.py              # typed request, runtime, and graph state
  nodes/                       # stage wrappers
  subgraphs/                   # repo-context and GitHub-action subflows
  services/                    # Jaeger, GitHub, LLM, limits, credentials
tests/
  test_graph.py
  test_*                       # unit coverage across nodes, clients, subgraphs
  e2e/                         # runnable demo scripts
scripts/
  run_graph.py
docker-compose.yml             # local observability stack support

Quickstart

1. Install

python3 -m venv .venv
source .venv/bin/activate
pip install -e .

2. Export credentials

Use one model provider key:

export OPENAI_API_KEY=...
# or
export ANTHROPIC_API_KEY=...

Optional service credentials:

export GITHUB_TOKEN=...
export TRACEPILOT_OBSERVABILITY_TOKEN=...
export TRACEPILOT_JAEGER_BASE_URL=http://localhost:16686

3. Run an end-to-end demo with Jaeger

python3 tests/e2e/run_graph_with_jaeger.py

That script:

waits for Jaeger
emits a demo trace through OTLP
runs TracePilotGraph
prints the final structured state as JSON

4. Run only the repo-context subgraph

python3 tests/e2e/run_repo_context_subgraph.py \
  --git-url "https://github.com/<owner>/<repo>/blob/main/app.py#L42" \
  --mode pr \
  --message "Investigate the code path linked from the logs."

Minimal Usage

from tracepilot import TracePilotGraph
from tracepilot.state import GraphState, RunRequest

state = GraphState(
    request=RunRequest(
        message="Investigate checkout latency service:checkout",
        trace_id="your-trace-id",
        requested_mode="diagnose",
    )
)

result = TracePilotGraph().run(state)

print(result.diagnosis)
print(result.final_response)
print(result.github_result)

Runtime Model

TracePilot normalizes provider and model settings from either the request or environment:

default provider: openai
default OpenAI model: gpt-4.1-mini
default Anthropic model: claude-3-5-sonnet-20241022

Useful env vars:

TRACEPILOT_MODEL_PROVIDER
TRACEPILOT_MODEL
TRACEPILOT_MODEL_TEMPERATURE
TRACEPILOT_MODEL_MAX_OUTPUT_TOKENS

Bounded Execution

The graph is built to stay controlled rather than open-ended.

tool-call limits are resolved into execution limits
GitHub creation limits default to one issue and one PR attempt
repo-context reads are targeted around extracted file locations
GitHub action execution is constrained to one selected payload

Testing

Unit tests cover the graph, clients, nodes, and subgraphs.

python3 -m unittest discover -s tests

If that fails locally, the likely cause is missing package dependencies such as langgraph or langchain_core.

Project Character

TracePilot is not a chat wrapper around observability APIs. It is a staged investigation system with explicit routing:

observability first
source context only when earned
GitHub mutation only when requested
structured state all the way through

That makes it a better fit for serious debugging workflows than a free-form agent loop.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
scripts		scripts
tests		tests
tracepilot.egg-info		tracepilot.egg-info
tracepilot		tracepilot
.env.example		.env.example
.gitignore		.gitignore
HANDOFF_NODE5_REPO_CONTEXT.md		HANDOFF_NODE5_REPO_CONTEXT.md
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TracePilot

From noisy traces to concrete engineering action

Why This Exists

What It Does

Core Capabilities

Architecture

Design Notes

Repository Map

Quickstart

1. Install

2. Export credentials

3. Run an end-to-end demo with Jaeger

4. Run only the repo-context subgraph

Minimal Usage

Runtime Model

Bounded Execution

Testing

Project Character

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TracePilot

From noisy traces to concrete engineering action

Why This Exists

What It Does

Core Capabilities

Architecture

Design Notes

Repository Map

Quickstart

1. Install

2. Export credentials

3. Run an end-to-end demo with Jaeger

4. Run only the repo-context subgraph

Minimal Usage

Runtime Model

Bounded Execution

Testing

Project Character

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages