You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LGTM. Those are good to add, but if we can share any of the same base (with additional HB specific metrics) as Arweave that would be good.
Other things to track:
Members in the pg groups for scheduling and compute
Ideally live message pushing processes, too, but we don't register those with pg right now (nor should we, ideally). We could achieve this by having the logger process generate some sort of event when a push starts and when its registered workers drops to zero.
Size of the message cache (either in element count or bytes if necessary -- just as long as it is performant)
The same for the computed outputs directory's subdirectories (specifically, /computed/*/*/[slot_number]
Stretch goal, only if it will take <1 day: Counts of the atoms used as the first element of tuples sent to ?event(...) -- regardless of whether or not debug logging is enabled.
I think the base Arweave stuff should cover all of the core HTTP metrics (number of response types), average response time etc., but if not please add them. Please be careful not to breakdown on endpoints using the cowboy router patterns at the moment as the HTTP API is in flux (see feat/device-rework branch). In the ideal case (and maybe cowboy works like this anyway?) we would be able to match on simple string patterns (wildcards but no regex) rather than the normal route structures, which won't map well onto the devices underneath.
MVVVP first then we can see where we are and iterate.
We want the system metrics to be pulled by an external monitoring system (like Grafana).
The task is to select essential node metrics and expose them via HTTP. Right now, it looks like we should start with:
The text was updated successfully, but these errors were encountered: