Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature-Tracing-Visualization #47

Draft
wants to merge 11 commits into
base: staging
Choose a base branch
from
Draft

Feature-Tracing-Visualization #47

wants to merge 11 commits into from

Conversation

Abby010
Copy link

@Abby010 Abby010 commented Mar 24, 2025

Description

This program visualizes real-time execution flow of distributed spans in a terminal-based UI (TUI) using React Ink. It allows users to toggle between time-based (--sample 1s) and logical event-based (--sample logical) views while displaying structured span hierarchies with box-drawing characters.

Issues Fixed

Tasks

  • 1. We now have a working visualization (Attaching terminal recordings for reference at the end).
  • 2. The core library has been implemented

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

Terminal 1: --sample logical

  • Preview (https://asciinema.org/a/rsIX7rt6teGgXFUyOJsLctsoR)
  • Runs cli.tsx, utilizing React-Ink to display spans in a hierarchical tree structure based on logical parent-child relationships.
  • Organizes spans according to their causal dependencies (e.g., child spans appear under their respective parent spans).
  • Ensures that spans are grouped logically rather than by strict timestamp order, helping to visualize the structured execution flow.

Terminal 2: --sample 1s

  • Preview (https://asciinema.org/a/4u3ZkqPQu1i40ZfzlJ89Pyl91)
  • Runs cli.tsx, utilizing React-Ink to display spans sorted by their start time rather than logical hierarchy.
  • Orders spans sequentially, meaning spans that start earlier appear first, regardless of parent-child relationships.
  • Provides a time-centric view, useful for debugging latency and analyzing performance bottlenecks in real-time execution.

Copy link

linear bot commented Mar 24, 2025

ENG-234

@Abby010 Abby010 requested a review from brynblack March 24, 2025 02:24
@CMCDragonkai
Copy link
Member

Any new visualisations based on lots of lots of data??? @Abby010 and make sure you are copying over and synthesizing the details I have earlier.

@Abby010
Copy link
Author

Abby010 commented Mar 24, 2025

Any new visualisations based on lots of lots of data??? @Abby010 and make sure you are copying over and synthesizing the details I have earlier.

Not yet, I starting getting some dependency errors so I needed to fix that but I have started working with lots of data.

@Abby010 Abby010 mentioned this pull request Mar 25, 2025
8 tasks
@Abby010
Copy link
Author

Abby010 commented Mar 25, 2025

New video links have been added with more data in the description

@CMCDragonkai
Copy link
Member

Add comments rather than editing spec for that stuff.

Copy link
Member

@aryanjassal aryanjassal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most changes are just code style changes. As a summary:

  • use undefined for null-ish fields
  • check if a value is undefined or not by doing a == null or != null check (yes, even though we have undefined as the type)
  • use single quotes
  • ensure import formatting
  • for array types, always wrap it in Array<>
  • for potentially undefined fields use ?, unless you are unable to use ? for the parameter type, in which case you always use Type | undefined (as opposed to null)

spans.json Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure that you run the following commands before pushing.

  • npm run build
  • npm run lint
  • npm run test

You can omit the first command in most cases if either linting or tests pass. The build command basically reveals any typescript errors which are caused by violating the typescript rules, meaning the program can't be compiled.

If you pushed something in which the CI is failing, then push up the next commit to address the failed CI. This fixing commit should be rebased and squashed into the previous commit, so they look like a single commit and keep the history clean.

Try to ensure the CI is passing at all times. If it is not, and it is intended to not pass, then add the following in a new line in the commit message to skip the CI workflow. You normally do it in WIP commits where it is expected for the CI to fail, and try to avoid skipping CI unless you have a good reason to do so.

wip: working on this new feature
[ci skip]

The lint command reveals a bunch of linting issues, but problems like incorrect indentation or a missing semicolon can be addressed automatically. Thus, in these cases, prefer to run npm run lintfix to resolve automatically fixable lint issues. If there are issues not automatically resolvable, then it will perform all possible fixes and give you a list for manual fixing.

Make sure to resolve all linting errors and warnings. Even a warning will fail the linting step in the CI.

spans.json Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this file is designed to act as dummy data for testing out your program. Usually, fast-check is used for writing tests like this.

Refer to these links for a quick guide on writing tests using fast-check.

You can also dynamically generate JSON data using this method. Refer to the implementation here for an example of doing it.

Refer to this page for a more comprehensive documentation on available arbitraries. We might need to have a meeting to go through using fast-check to generate dummy data.

On the other hand, using pre-set data is also useful for testing. If you're doing a show-case, then using these 'fixtures' makes perfect sense. In that case, it would need to be under a place like tests/fixtures/span.json.

Basically, if you're using this for testing, then use fast-check instead. If you're using this for demo, then restructure the file in the project tree.

@CMCDragonkai
Copy link
Member

@Abby010 can you:

  1. Get the CI working
  2. Start integrating this into the js-logger src code under a new class object Tracer.
  3. Plan out integration into Polykey-CLI for testing.
  4. Get @aryanjassal or @tegefaulkes to review to see how it can hook into object lifecycles and node native async contexts.

I think it's important for testing why seed nodes are crashing.

Also why use -+ and not \?

@CMCDragonkai
Copy link
Member

Also you should be assigning yourself.

@CMCDragonkai
Copy link
Member

Start adding tests too.

@CMCDragonkai
Copy link
Member

And benchmarks.

I want a plan as of next cycle.

src/lib/span.ts Outdated
Comment on lines 9 to 17
constructor(name: string, parentSpanId: string | null = null) {
this.spanId = `span-${Date.now()}-${Math.random().toString(36).substr(2, 5)}`;
this.name = name;
this.startTime = Date.now();
this.endTime = null;
this.parentSpanId = parentSpanId;
this.children = [];
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only things that need to be set in the constructor are things that need the be passed through the constructor. Any other property can be defaulted when defined above with public startTime: number = Date.now()

The spanId should be an IdSortable from @matrixai/id library.

@CMCDragonkai
Copy link
Member

Make sure that the span structure can be streamed into an incremental jsonl format. This is what we will be using to then stream visualise on the browser. @shafiqihtsham can weigh in if you talk to him about it.

But I want you to write out a plan for implementation in Polykey-CLI by today.

@Abby010
Copy link
Author

Abby010 commented Apr 1, 2025

Tracing Integration Plan

Phase 1: Integrate Tracing into js-logger as a First-Class Component

  • Introduce a Tracer class that encapsulates all span-related logic.
  • Replace the standalone openSpan, closeSpan functions with methods on this class.
  • Support structured span creation: startSpan, endSpan, and traced wrappers with context tracking.
  • Provide methods for exporting span data:
    • getActiveSpans()
    • flush()
    • getTraceJSON()

Add Incremental Span Streaming to JSONL

  • Each span, when closed, is written as a JSON object to a .jsonl file (e.g., spans.jsonl).

  • Enable atomic streaming via fs.appendFileSync or streaming writers.

  • Supports real-time debugging and integration with external tools.

  • Update the current TUI visualizer to consume spans from the new Tracer instance.

This phase formalizes span management, embeds JSONL observability, and prepares for broader CLI and system integration.


Phase 2: Integrate Tracer with Polykey CLI and Lifecycle Events

  • Initialize and embed the Tracer into the Polykey CLI runtime.
  • Use Tracer.traced() to instrument top-level CLI commands and subcommands.
  • Extend tracing into object lifecycle tracking: vaults, daemons, agents, etc.
  • Hook into Node.js async context (AsyncLocalStorage) to propagate spans implicitly across async calls and promise chains.
  • Stream span data from real Polykey CLI runs to the .jsonl file.
  • Analyze and visualize spans in the TUI to confirm accurate instrumentation.
  • Use the streamed data for:
    • Automated debugging
    • Replay analysis
    • Bug discovery

This phase validates the integration by capturing real CLI behaviors and preparing for long-term observability and tracing.

@CMCDragonkai
Copy link
Member

Your phase 2 doesn't mention https://github.com/MatrixAI/js-async-init at all. You both need to talk about the implementation plan with @tegefaulkes since I think you're missing some contextual details critical to understanding how tracing can be applied to Polykey's runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Integrate Tracing (derived from OpenTelemetry)
4 participants