Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing runtime metrics #6321

Open
morningspace opened this issue Nov 11, 2024 · 15 comments
Open

Missing runtime metrics #6321

morningspace opened this issue Nov 11, 2024 · 15 comments
Assignees
Labels

Comments

@morningspace
Copy link

morningspace commented Nov 11, 2024

Description

Some runtime metrics, e.g.: memStats.Alloc, memStats.Sys, are missing.

Steps To Reproduce

  • By checking the code for deprecated runtime, we found there're no runtime metrics for memStats.Alloc, memStats.Sys.
  • Also in the new runtime metrics implementation, the code to collect the metrics is different. Then, what would be the equivalents to the missing runtime metrics in the new implementation?

Expected behavior

Can the missing runtime metrics be added? If a PR is allowed, we'd be happy to contribute.

@morningspace morningspace added the bug Something isn't working label Nov 11, 2024
@ysomad
Copy link

ysomad commented Nov 21, 2024

any updates?

@dmathieu
Copy link
Member

The non-deprecated runtime uses runtime/metrics to retrieve metrics data.
The metrics we exposed are documented in semantic conventions.

It looks like you're looking for go.memory.allocated and go.memory.used.

The current runtime does expose those two metrics on its latest version, even though it uses the runtime/metrics package, not runtime.MemStats.

@morningspace
Copy link
Author

Thanks @dmathieu for your reply!

  • It was mentioned in semantic conventions that go.memory.allocated is the memory allocated to the heap by the application, while in runtime.memStats, it has both memStats.HeapAlloc and memStats.Alloc available. So, I assume go.memory.allocated maps to memStats.HeapAlloc, not memStats.Alloc.
  • For memStats.Sys, that's the memory obtained from system. When I check the code in non-deprected runtime, I found it actually checks go heap memory defined here, which I'm not sure if it is identical to memStats.Sys.

@dmathieu
Copy link
Member

cc @dashpole who led this.

@dashpole dashpole self-assigned this Nov 22, 2024
@dashpole
Copy link
Contributor

Based on https://github.com/prometheus/client_golang/blob/76b74e25d5660965000a74cf2e918c217ed76da2/prometheus/go_collector.go#L26, this is the mapping from memStats to go runtime metrics:

  • memStats.Alloc is the same as the /memory/classes/heap/objects:bytes runtime metric.
  • memStats.Sys is the same as the /memory/classes/total:bytes runtime metric.

/memory/classes/total:bytes isn't useful on its own, as it includes released memory. For the new runtime metrics, we provide the go.memory.used metric, which excludes released memory.

You can see golang/go#67120 for why we don't provide live + unswept heap memory via /memory/classes/heap/objects:bytes:

live+unswept heap memory isn't a terribly useful metric since it tends to be noisy and misleading, subject to sweep scheduling nuances. The heap goal is a much more reliable measure of total heap footprint.

In keeping with that, we would recommend using the go.memory.gc.goal metric, which measures the heap goal.

@dashpole dashpole added enhancement New feature or request and removed bug Something isn't working labels Nov 22, 2024
@morningspace
Copy link
Author

morningspace commented Dec 4, 2024

Hi @dashpole, thanks for your reply and sorry for the delay. I spent a few hours today to revisit the materials mentioned in the above thread. Here's my findings and feel free to correct me if any.

/memory/classes/total:bytes isn't useful on its own, as it includes released memory ...

While I can understand go.memory.used is useful as it excludes released memory, according to golang/go#67120, I don't think /memory/classes/total:bytes is not that useful. It is part of the "Proposed initial metrics" w/ the following rational:

This metric is necessary for tuning GOMEMLIMIT. It's also useful for identifying "other" memory, and together with /memory/classes/heap/released:bytes, what the runtime believes the physical memory footprint of the application is.

You can see golang/go#67120 for why we don't provide live + unswept heap memory via /memory/classes/heap/objects:bytes

According to golang/go#67120, I think the reason that /memory/classes/heap/objects:bytes not included in the "Proposed initial metrics" is different. It's because this metric can be derived from total allocations and frees, which makes it redundant, as its value is essentially a subtraction of the two other metrics.

Re: the discussion "live+unswept heap memory isn't a terribly useful", it actually refers to /gc/heap/frees:bytes, as this metric, combining with /gc/heap/allocs:bytes, can be used to compute the total amount of live + unswept heap memory. But since it includes unswept memory, it recommends to use /gc/heap/goal:bytes, i.e.: go.memory.gc.goal, to measure total heap footprint.

With that, I'm still thinking there are some enhancement chances for the new runtime metrics. WDYT?

Also, I'm wondering when the deprecated runtime metrics will be completely dropped given there are enhancement chances as well for the deprecated ones. It looks currently the feature flag OTEL_GO_X_DEPRECATED_RUNTIME_METRICS is still enabled by default, which means it will use the deprecated runtime metrics by default. BTW: I am working on a more detailed investigation of potential gaps for the deprecated runtime metrics, which I will share here a bit later.

@dashpole
Copy link
Contributor

dashpole commented Dec 4, 2024

This is definitely the right time to make changes to the new runtime metrics. I don't think there is a rush with disabling the old metrics, and we won't remove them for a while. In particular, we could probably add some metrics which are disabled by default, with options to enable them. A few follow-up questions:

This metric is necessary for tuning GOMEMLIMIT

I believe you would want to compare /memory/classes/total:bytes - /memory/classes/heap/released:bytes (i.e. go.memory.used) to GOMEMLIMIT, right? From my understanding, /memory/classes/total:bytes on its own wouldn't be useful in that case.

It's also useful for identifying "other" memory

This seems like the primary use for it, but i'm fuzzy on the actual value here. It would just let us calculate the "released" memory, which from reading doesn't sound that useful.

It's because this metric can be derived from total allocations and frees, which makes it redundant, as its value is essentially a subtraction of the two other metrics.

Do you think we should expose it anyways? Its possible it could be an opt-in metric. But given the go maintainers recommended using /gc/heap/goal:bytes instead, it seems like it could mislead users.

BTW: I am working on a more detailed investigation of potential gaps for the deprecated runtime metrics, which I will share here a bit later.

Looking forward to it! We really appreciate the feedback.

@morningspace
Copy link
Author

Thanks @dashpole !

I like the idea of "add some metrics which are disabled by default, with options to enable them", so that's the end user who determines whether the metric is needed. A few more comments after you as below:

you would want to compare /memory/classes/total:bytes - /memory/classes/heap/released:bytes (i.e. go.memory.used) ...

Just to confirm, by reading the code, it looks go.memory.used has two variations with different attributes: go.memory.type = stack vs. other. And, other equals to /memory/classes/total:bytes - /memory/classes/heap/released:bytes - /memory/classes/heap/stacks:bytes, not /memory/classes/total:bytes - /memory/classes/heap/released:bytes. So, it appears there's no equivalent.

This seems like the primary use for it ...

I'd think this might be one candidate of opt-in metrics, because the total memory usage /memory/classes/total:bytes includes heap, plus other runtime components like stacks, metadata, and cache, which might contribute to the application’s memory footprint to some extent, depending on the nature of user's application and its workload.

Do you think we should expose it anyways? ...

I agree that it could be an opt-in metric. Strictly, /gc/heap/goal:bytes and /memory/classes/heap/objects:bytes are different. The first one represents the target heap size at which the garbage collector will trigger the next cycle, while the second one shows the current live heap memory and reflects the real-time memory usage. In general, /memory/classes/heap/objects:bytes should stay below /gc/heap/goal:bytes. If it frequently approaches or exceeds the goal, it indicates the application is under memory pressure, potentially triggering frequent garbage collections.

Does it make sense?

@dashpole
Copy link
Contributor

dashpole commented Dec 5, 2024

So, it appears there's no equivalent.

If you do not group by the go.memory.type label, you would get /memory/classes/total:bytes - /memory/classes/heap/released:bytes, which is what we want. It is common to do this for metrics where the total is a sum of parts (heap vs other in this case).

@dashpole
Copy link
Contributor

dashpole commented Dec 5, 2024

If we want to add /memory/classes/heap/objects:bytes, we would need to propose it here: https://github.com/open-telemetry/semantic-conventions/blob/106f880ccbf26443f115ce1f48c236ec6a0b6f1b/docs/runtime/go-metrics.md?plain=1#L4

@dashpole
Copy link
Contributor

dashpole commented Dec 5, 2024

We should involve the Go folks that worked on the proposal from the Go side.

@morningspace
Copy link
Author

Hi @dashpole, my comments as below.

If you do not group by the go.memory.type label, ...

Thanks for the clarification.

If we want to add /memory/classes/heap/objects:bytes, we would need to propose it here ...
We should involve the Go folks that worked on the proposal from the Go side.

Sure, I'd be happy to propose something there, and definitely we should ask advice from Go folks.

P.S.:

Below is a detailed summary about the potential gaps based on what I learned from the current deprecated metrics, combining with the above discussion in this thread. Let me know if that makes sense, and if yes, I'd be happy to prepare a PR for the semantic convention proposal first.

Metric Comments
memStats.TotalAlloc Equivalent to /gc/heap/allocs:bytes. Included as "proposed initial metric" in golang/go#67120. Propose as opt-in metric as it can be used to derive an allocation rate in bytes/second, which is useful in understanding GC resource cost impact.
memStats.Alloc Equivalent to /memory/classes/heap/objects:bytes. Not included as "proposed initial metric" in golang/go#67120 as it can be derived from total allocations and frees. Propose as opt-in metric as it is different from /gc/heap/goal:bytes. In general, it should stay below /gc/heap/goal:bytes, and if it frequently approaches or exceeds the goal, it indicates the application is under memory pressure, potentially triggering frequent garbage collections.
memStats.Sys Equivalent to /memory/classes/total:bytes. Included as "proposed initial metric" in golang/go#67120. Propose as opt-in metric as the total memory usage includes heap, and other runtime components like stacks, metadata, and cache, which might contribute to the application’s memory footprint to some extent, depending on the nature of user's application and its workload.
memStats.GCCPUFraction No equivalent in runtime metrics. Not found in both deprecated metrics and new metrics. Not that useful and often misleading due to the fact that it's an average over the lifetime of the process.
memStats.Mallocs Equivalent to /gc/heap/allocs:objects + /gc/heap/tiny/allocs:objects. Used to compute live objects in deprecated metrics, process.runtime.go.mem.live_objects, with memStats.Frees. Propose /gc/heap/allocs:objects as opt-in metric as it is included as "proposed initial metric" in golang/go#67120. It can be used to derive an allocation rate in objects/second, which is useful in understanding memory allocation resource cost impact.
memStats.Frees Equivalent to /gc/heap/frees:objects + /gc/heap/tiny/allocs:objects. Used to compute live objects in deprecated metrics, process.runtime.go.mem.live_objects, with memStats.Mallocs. /gc/heap/frees:objects is not included as "proposed initial metric" in golang/go#67120 as it's not that useful on it's own. It's used to compute live objects, with /gc/heap/allocs:objects, like process.runtime.go.mem.live_objects in deprecated metrics, but the number of live objects on its own also isn't that useful, as it includes unswept memory which is subject to sweep scheduling nuances.
memStats.NextGC Equivalent to /gc/heap/goal:bytes, i.e.: go.memory.gc.goal in new metrics, but not found in deprecated metrics.
memStats.LastGC No equivalent in runtime metrics. Propose as opt-in metric as it can be used to track the intervals between garbage collections by calculating the difference between successive values of LastGC, e.g.: if intervals shrink significantly under increased load, it might indicate a memory pressure issue where the GC is struggling to keep up.
memStats.StackInuse Equivalent to /memory/classes/heap/stacks:bytes, i.e.: go.memory.used with attribute go.memory.type equal to stack found in new metrics, but not found in deprecated metrics.
memStats.PauseNs No equivalent in runtime metrics, i.e.: process.runtime.go.gc.pause_ns found in deprecated metrics, but not found in new metrics.

@dashpole
Copy link
Contributor

dashpole commented Dec 11, 2024

memStats.TotalAlloc

This is equivalent to go.memory.allocated

memStats.Alloc

This makes sense to me as a potential opt-in metric.

memStats.Sys

We could consider exposing released memory as a separate metric (e.g. go.memory.released). Users could then use go.memory.used + go.memory.released to get memStats.Sys. But I'm still not quite sure what use-cases are for the released memory or the total memory.

memStats.GCCPUFraction

That one doesn't sound very useful...

memStats.Mallocs

This one is interesting. We do support /gc/heap/allocs:objects through go.memory.allocations. If tiny allocations are useful to know about, we could consider either making go.memory.allocations the sum of /gc/heap/allocs:objects + /gc/heap/tiny/allocs:objects instead of just heap allocs. Alternatively, we could differentiate tiny and non-tiny allocs using a label on the go.memory.allocations metric.

memStats.Frees

the number of live objects on its own also isn't that useful, as it includes unswept memory which is subject to sweep scheduling nuances.

It does seem like having an opt-in metric for the number of live objects would be a reasonable idea. I don't think we want frees as its own metric, though.

memStats.LastGC

Interesting. Its too bad we don't have the previous gc interval as a metric on its own, but it seems reasonable as an opt-in metric.

memStats.PauseNs

Can you say more about what this is useful for?

@morningspace
Copy link
Author

@dashpole Sorry for the delay. Below are my reply to your comments. Also, I'm preparing a PR for the semantic convention proposal. Let me know if that makes sense or not.

memStats.TotalAlloc
This is equivalent to go.memory.allocated

Yes, it's supported in new metrics, but not in old metrics. If we want to keep both new and old in sync, we could add it to the old metrics too.

memStats.Sys
We could consider exposing released memory ... what use-cases are for the released memory or the total memory.

Yes, since both total and released are collected, we can either expose total, or expose released as go.memory.released then derive total by combining it with go.memory.used. The reason I'm considering total or released is useful is that, even Go has “released” the memory, the OS may not immediately reclaim it. This may affect physical memory pressure and overall system behavior. For example, if released memory keeps increasing while it does not reduce total memory accordingly, or heap usage doesn’t decrease, it could indicate: OS reclaiming may lag while Go released heap memory after GC cycle, or it could signal memory fragmentation or inefficiencies in how the runtime reuses freed memory. These are not observable by only monitoring go.memory.used.

memStats.Mallocs
... If tiny allocations are useful to know about, ... we could differentiate tiny and non-tiny allocs using a label ...

I like the idea of using label. By specifying different labels or w/o specifying label, it can serve different query purposes. Comparing with the regular allocations, tiny allocation is the number of allocations satisfied from the tiny allocation pool, i.e.: usually under 16 bytes, managed by special optimizations. If tiny allocations dominate, it might indicate an opportunity to optimize code, e.g. it may reveal fragmentation or inefficiencies in how memory is utilized.

memStats.PauseNs
Can you say more about what this is useful for?

memStats.PauseNs is used to track the duration (in nanoseconds) of each GC pause. By observing it, we can identify trends and issues in GC pauses that may cause latency spikes. If GC pauses are becoming too frequent or too long, the application might need optimizations either in memory management (e.g.: allocating less frequently) or GC tuning.

@morningspace
Copy link
Author

Hi @dashpole, any comments re: my last reply. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants