Conversation
Deploying happychain with
|
| Latest commit: |
b9091a6
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://6b2f72e5.happychain.pages.dev |
| Branch Preview URL: | https://gabriel-txm-traces.happychain.pages.dev |
994302a to
6f1ef4c
Compare
There was a problem hiding this comment.
I'm not familiar with opentelemetry at all, but what's the purposes of spans vs events. Is it useful to group events emitted in a method in spans (or maybe we have to?) — vs just having the events in a single top-level span?
I think I remember that spans could be used for stuff that happens on different services (different proceses or servers), where it would make sense to have one span per service.
There was a problem hiding this comment.
A span inside a trace represents a process within the trace. Each span can have attributes and events. I think the right approach is to have a span for every method, because it's the easiest way, and it allows us to clearly see the stack trace followed by a transaction.
Every horizontal line is a stack, and you can click on it to view its events
norswap
left a comment
There was a problem hiding this comment.
Seems great, left a question!
ddfe09e to
cf5bf43
Compare
6f1ef4c to
0760c4f
Compare
cf5bf43 to
96a7504
Compare
1727040 to
4303d66
Compare
4365362 to
e783a93
Compare
0760c4f to
a9689ea
Compare
a9689ea to
ada0e33
Compare
fc2dc4f to
161217c
Compare
37754db to
98636d5
Compare
161217c to
be24f08
Compare
98636d5 to
6efc98d
Compare
be24f08 to
81ed24a
Compare
6efc98d to
335232d
Compare
81ed24a to
694721d
Compare
335232d to
2b1bca0
Compare
694721d to
85f1522
Compare
2b1bca0 to
3a6255a
Compare
85f1522 to
fe19633
Compare
3a6255a to
970ff6d
Compare
fe19633 to
03b78a8
Compare
3345d89 to
b9091a6
Compare
03b78a8 to
f9905ee
Compare
b9091a6 to
c1e4a23
Compare
f9905ee to
988fc7f
Compare
c1e4a23 to
c293051
Compare


Description
This PR includes a proof of concept on how to implement traces in the transaction manager. Traces are useful because they allow us to answer questions such as:
What happened in block X that caused the absence of a drand value in that block?
Why is a transaction in the "interrupted" status?
Why are we not including transactions fast enough to reveal randomness?
Metrics, combined with alerts, provide a generic overview that helps us understand if everything is working correctly, while traces offer more concrete data to understand why something is happening. This allows us to debug and fix production issues much faster.
To simplify the implementation, I created a special decorator that allows us to propagate traces without having to use the startActiveSpan method. This method is somewhat clunky because it requires implementing the span code inside the span itself to propagate the async local storage, which is how OpenTelemetry establishes the hierarchical relationship between multiple spans.
Since we are using OpenTelemetry, we can leverage this feature:
Grafana Exemplars
This feature allows us to correlate traces with metrics. For example, if we notice in Grafana that some transactions are taking longer than expected, we can directly jump from Grafana to the specific trace where the delay occurred and quickly understand the root cause.
To visualize the traces, I deployed Tempo locally. It works well and is natively integrated with Grafana, making it the best option for hosting traces.
Toggle Checklist
Checklist
Basics
norswap/build-system-caching).Reminder: PR review guidelines
Correctness
testnet, mainnet, standalone wallet, ...).
< INDICATE BROWSER, DEMO APP & OTHER ENV DETAILS USED FOR TESTING HERE >
< INDICATE TESTED SCENARIOS (USER INTERFACE INTERACTION, CODE FLOWS) HERE >
and have updated the code & comments accordingly.
Architecture & Documentation
(2) commenting these boundaries correctly, (3) adding inline comments for context when needed.
comments.
in a Markdown document.
pacakges/coreandpackages/react), see here for more info.