feat(cu): introduce an EventVacuum that parses well-formatted event logs for transport to other services #1017
+5,395
−1,499
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation:
Processes are essentially applications, and applications need various forms of observability tools - where "Observability" can be defined as "the ability to answer novel, open-ended questions about a system". The AO team is continuing to develop solutions for monitoring generic performance metrics for Processes, but a gap currently exists in the ability to measure richer contextual information from the internals of a Process.
Message handling in Processes is similar in many ways to handling HTTP requests on a server. A great way to get observability over a system like that is to use wide, "Structured Events" that are rich with relevant information about the inner workings of the process that you wouldn't be able to get from Process inputs (e.g. searchable and aggregatable from GQL data) or generic performance metrics. For more background reading on this approach and its benefits see:
https://charity.wtf/2022/08/15/live-your-best-life-with-structured-events/
and
https://docs.honeycomb.io/get-started/basics/observability/concepts/events-metrics-logs/
The challenge with extracting this type of information from Processes is that they run in a sandbox environment without access to a network or file system that can be connected to the outside world. Therefore, existing intra-AO-Process solutions such as AO
subscribables
don't quite fit this model AND would require gas for the messaging necessary to facilitate it. However, AO CU's have direct access to Process memory and outputs, including Process log streams. As such, log streams can be used as a transport mechanism to shuttle observability data out of AO and to the outside world.Technical Contributions
This pull request introduces:
_e: 1
key/value flag and sends them off to a transport layerResults From Preliminary Testing
I created a utility module to produce and print compliant ndsjon events and instrumented a new AO token via the token.lua blueprint with it. You can find the code for those here: permaweb/aos#350
Preliminary test results using the Honeycomb Transport have been great. Here are some examples of what you can do with the integration:
List the errors that have been raised during processing, grouping by nonce, sender, Action, and error reason:
Aggregate the total value of the token that has been transferred in the last 48 hours:
Surface internal analytics for how how many times each specific handler has been successfully triggered on the Process:
... and that's just the start of what's possible.
I strongly believe that when other builders see that this kind of open-ended introspection is possible with these kinds of tools, they will want to give it a try! I'm also open to discussing other means of achieving this form of event transportation in AO.