Skip to content

riturajFi/TracePilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TracePilot

From noisy traces to concrete engineering action

A stage-controlled LangGraph workflow for investigating failures, grounding them in source code, and preparing GitHub issues or PRs.

Python 3.12+ LangGraph Jaeger GitHub


Why This Exists

Modern incident debugging is fragmented:

  • traces live in observability systems
  • code lives in GitHub
  • diagnosis lives in somebody's head
  • remediation gets rewritten again as an issue or PR

TracePilot compresses that loop into a single stateful graph:

  1. initialize the run
  2. inspect traces and logs
  3. synthesize a diagnosis
  4. pull repo context only when it is justified
  5. prepare fix actions
  6. optionally create a GitHub issue or PR
  7. produce a structured trace tree and final result

What It Does

TracePilot is a Python package centered around TracePilotGraph. It accepts a typed request, builds runtime context, queries Jaeger, reasons over evidence, fetches GitHub source context when needed, and can execute one bounded GitHub action.

Core Capabilities

  • Investigate incidents from a trace ID or trace-oriented prompt.
  • Search and normalize Jaeger traces and span logs.
  • Decide whether the problem is source-code related before touching the repo.
  • Extract GitHub blob URLs from logs and fetch targeted code context.
  • Prepare issue and PR payloads from the investigation output.
  • Execute a single GitHub issue or PR action through a controlled subgraph.
  • Return a structured state object with diagnosis, evidence, timeline, code context, and action results.

Architecture

START
  |
  v
initialize_run
  |
  v
observability_agent
  |
  v
diagnosis_synthesizer
  |
  +--> repo_context ----------+
  |                           |
  +--> skip_repo_context      |
                              v
                    fix_action_preparation
                              |
                 +------------+------------+
                 |                         |
                 v                         v
           github_action           skip_github_action
                 |                         |
                 +------------+------------+
                              v
                       build_trace_tree
                              |
                              v
                             END

Design Notes

  • repo_context only runs when the diagnosis suggests code involvement or PR mode is requested.
  • github_action only runs for issue/PR workflows.
  • each stage writes back into a shared typed GraphState, keeping the run inspectable and testable

Repository Map

tracepilot/
  graph.py                     # top-level graph orchestration
  state/models.py              # typed request, runtime, and graph state
  nodes/                       # stage wrappers
  subgraphs/                   # repo-context and GitHub-action subflows
  services/                    # Jaeger, GitHub, LLM, limits, credentials
tests/
  test_graph.py
  test_*                       # unit coverage across nodes, clients, subgraphs
  e2e/                         # runnable demo scripts
scripts/
  run_graph.py
docker-compose.yml             # local observability stack support

Quickstart

1. Install

python3 -m venv .venv
source .venv/bin/activate
pip install -e .

2. Export credentials

Use one model provider key:

export OPENAI_API_KEY=...
# or
export ANTHROPIC_API_KEY=...

Optional service credentials:

export GITHUB_TOKEN=...
export TRACEPILOT_OBSERVABILITY_TOKEN=...
export TRACEPILOT_JAEGER_BASE_URL=http://localhost:16686

3. Run an end-to-end demo with Jaeger

python3 tests/e2e/run_graph_with_jaeger.py

That script:

  • waits for Jaeger
  • emits a demo trace through OTLP
  • runs TracePilotGraph
  • prints the final structured state as JSON

4. Run only the repo-context subgraph

python3 tests/e2e/run_repo_context_subgraph.py \
  --git-url "https://github.com/<owner>/<repo>/blob/main/app.py#L42" \
  --mode pr \
  --message "Investigate the code path linked from the logs."

Minimal Usage

from tracepilot import TracePilotGraph
from tracepilot.state import GraphState, RunRequest

state = GraphState(
    request=RunRequest(
        message="Investigate checkout latency service:checkout",
        trace_id="your-trace-id",
        requested_mode="diagnose",
    )
)

result = TracePilotGraph().run(state)

print(result.diagnosis)
print(result.final_response)
print(result.github_result)

Runtime Model

TracePilot normalizes provider and model settings from either the request or environment:

  • default provider: openai
  • default OpenAI model: gpt-4.1-mini
  • default Anthropic model: claude-3-5-sonnet-20241022

Useful env vars:

  • TRACEPILOT_MODEL_PROVIDER
  • TRACEPILOT_MODEL
  • TRACEPILOT_MODEL_TEMPERATURE
  • TRACEPILOT_MODEL_MAX_OUTPUT_TOKENS

Bounded Execution

The graph is built to stay controlled rather than open-ended.

  • tool-call limits are resolved into execution limits
  • GitHub creation limits default to one issue and one PR attempt
  • repo-context reads are targeted around extracted file locations
  • GitHub action execution is constrained to one selected payload

Testing

Unit tests cover the graph, clients, nodes, and subgraphs.

python3 -m unittest discover -s tests

If that fails locally, the likely cause is missing package dependencies such as langgraph or langchain_core.

Project Character

TracePilot is not a chat wrapper around observability APIs. It is a staged investigation system with explicit routing:

  • observability first
  • source context only when earned
  • GitHub mutation only when requested
  • structured state all the way through

That makes it a better fit for serious debugging workflows than a free-form agent loop.

About

πŸ€–πŸ” LangGraph agent for production incident investigation. πŸ“Š Reads Jaeger traces + logs 🧠 Forms hypotheses and collects evidence πŸ“ Explores related code πŸ› οΈ Proposes fixes πŸš€ Can open GitHub issues / PRs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages