-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: support for third party OSS to use otel-ebpf-profiler #192
Conversation
Nice! Since the project has an enforced squash rule for PRs, it would be useful to have one PR per logical feature instead of one big PR. Could you split this to logical pieces?
I have done earlier some private experiments with something similar. Since we are also looking to support off-cpu profiling (see #144). It would make sense to instead compile the eBPF things twice: once as
The PR looks wrong on this part. There is already support to access the files via the
If something is missing or done wrong, it should be done in the proper abstraction level, and not by adjusting the mappings path.
Let's start by splitting this to logical PRs per feature/thing. And remove the
I believe vendors do different things on this part. @christos68k or @florianl can perhaps comment from elastic / devfiler side on what the roadmap/plan is. |
Thanks @fabled I pushed the changes so that running |
So the direction where to go, is that the Go code then also embeds and loads both variants. There should be no build tag. |
My concern around the proposed changes is, that one has to decide if they want their solution running in #144 is facing a similar challenge but might tackle it differently (by sharing information between kprobes and perf event eBPF programs) - so if there is no immediate rush to get this resolved, can we postpone this for a bit? |
I will agree with @fabled here and add that a solution that splits the agent into separate deliverables with divergent functionality seems at this point hard to justify. So my suggestion would be to work towards keeping a single agent that can operate in both modes (sampling, on-demand) at the same time. This is the kind of feature that we'd typically produce a design document for as there are performance concerns and multiple trade-offs and possible implementations to think about.
We are currently discussing a native symbol uploading protocol implementation (see here for some context) but there is no backend symbol processing architecture specified as part of OpenTelemetry profiling as we're leaving that up to each implementor. |
I see what you all are saying, I misinterpreted that, my bad. I will try to push a fix and ping you, thanks! |
Hey @fabled please take a look on the new pushed changes, we have added a new map to hold the kprobes and now both program types can be loaded together. |
This is the code that backs open-telemetry#144. It can be reused to add features like requested in open-telemetry#33 and therefore can be an alternative to open-telemetry#192. The idea that enables off CPU profiling is, that perf event and kprobe eBPF programs are quite similar and can be converted. This allows, with the dynamic rewrite of tail call maps, the reuse of existing eBPF programs and concepts. This proposal adds the new flag '-off-cpu-threshold' that enables off CPU profiling and attaches the two additional hooks, as discussed in Option B in open-telemetry#144. Outstanding work: - [ ] Handle off CPU traces in the reporter package - [ ] Handle off CPU traces in the user space side Signed-off-by: Florian Lehner <[email protected]>
This is not entirely answers our need because of two reasons:
BTW, Not sure if this is only on our side but we wasn't able to run the agent because of errors when loading the eBPF. |
The purpose of #196 is not to resolve #33, as #196 is focused on off CPU profiling. But the concept, to (a) not have to rewrite eBPF code, (b) dynamically reuse perf event eBPF programs as kprobe eBPF programs and (c) allow multiple times of profiling simultaneously , satisfies the request as described in #33 - it allows to have a custom kprobe attached to something that is defined by a configuration flag. So the rough outline, that is missing in #196 to resolve #33 is
Hard to provide feedback on this without information on the error, verifier logs, environment and kernel version. As mentioned in #192 (comment), my personal thinking is, that one form of profiling should not limit the other and having multiple forms of profiling on the system should not increase the number of required resources by this number of different forms of profiling. |
I understand, so the 3 bullets that is needed to resolve #33 is something you want to implement as part of #196 or in a different PR? Regarding the issue of loading the eBPF I will send it over in the slack channel so we don't spam this PR. |
075cc30
to
64c3f2c
Compare
Hey, in addition to the three bullets mentioned by @florianl . Currently, our setup monitors both kprobes and tracepoints, but we need a reliable way to uniquely identify events in our monitoring system for correlation with your trace as enrichment data. To resolve this, we suggest generating unique identifiers for trace events and our events using the following components: Stack pointer: from We would appreciate your feedback on the following: Is our proposed method for generating unique event identifiers effective? Thank you for your insights! |
There is no ETA for #196, there needs to be more feedback and approvals. First #144 needs to get accepted and merged - feedback is very welcomed. With #196 (comment) some discussion around the design of sampling got started, which needs to be resolved first.
The points mentioned in #192 (comment) are just high level points. Being able to distinguish between events that triggered profiling is fundamental, not only for your use case but also for on- vs off-CPU sampling. So there needs to be some way. What information will be used and how it is fetched is up for discussion. |
This is the code that backs open-telemetry#144. It can be reused to add features like requested in open-telemetry#33 and therefore can be an alternative to open-telemetry#192. The idea that enables off CPU profiling is, that perf event and kprobe eBPF programs are quite similar and can be converted. This allows, with the dynamic rewrite of tail call maps, the reuse of existing eBPF programs and concepts. This proposal adds the new flag '-off-cpu-threshold' that enables off CPU profiling and attaches the two additional hooks, as discussed in Option B in open-telemetry#144. Outstanding work: - [ ] Handle off CPU traces in the reporter package - [ ] Handle off CPU traces in the user space side Signed-off-by: Florian Lehner <[email protected]>
This is the code that backs open-telemetry#144. It can be reused to add features like requested in open-telemetry#33 and therefore can be an alternative to open-telemetry#192. The idea that enables off CPU profiling is, that perf event and kprobe eBPF programs are quite similar and can be converted. This allows, with the dynamic rewrite of tail call maps, the reuse of existing eBPF programs and concepts. This proposal adds the new flag '-off-cpu-threshold' that enables off CPU profiling and attaches the two additional hooks, as discussed in Option B in open-telemetry#144. Outstanding work: - [ ] Handle off CPU traces in the reporter package - [ ] Handle off CPU traces in the user space side Signed-off-by: Florian Lehner <[email protected]>
This is the code that backs open-telemetry#144. It can be reused to add features like requested in open-telemetry#33 and therefore can be an alternative to open-telemetry#192. The idea that enables off CPU profiling is, that perf event and kprobe eBPF programs are quite similar and can be converted. This allows, with the dynamic rewrite of tail call maps, the reuse of existing eBPF programs and concepts. This proposal adds the new flag '-off-cpu-threshold' that enables off CPU profiling and attaches the two additional hooks, as discussed in Option B in open-telemetry#144. Outstanding work: - [ ] Handle off CPU traces in the reporter package - [ ] Handle off CPU traces in the user space side Signed-off-by: Florian Lehner <[email protected]>
This is the code that backs open-telemetry#144. It can be reused to add features like requested in open-telemetry#33 and therefore can be an alternative to open-telemetry#192. The idea that enables off CPU profiling is, that perf event and kprobe eBPF programs are quite similar and can be converted. This allows, with the dynamic rewrite of tail call maps, the reuse of existing eBPF programs and concepts. This proposal adds the new flag '-off-cpu-threshold' that enables off CPU profiling and attaches the two additional hooks, as discussed in Option B in open-telemetry#144. Outstanding work: - [ ] Handle off CPU traces in the reporter package - [ ] Handle off CPU traces in the user space side Signed-off-by: Florian Lehner <[email protected]>
This is the code that backs open-telemetry#144. It can be reused to add features like requested in open-telemetry#33 and therefore can be an alternative to open-telemetry#192. The idea that enables off CPU profiling is, that perf event and kprobe eBPF programs are quite similar and can be converted. This allows, with the dynamic rewrite of tail call maps, the reuse of existing eBPF programs and concepts. This proposal adds the new flag '-off-cpu-threshold' that enables off CPU profiling and attaches the two additional hooks, as discussed in Option B in open-telemetry#144. Outstanding work: - [ ] Handle off CPU traces in the reporter package - [ ] Handle off CPU traces in the user space side Signed-off-by: Florian Lehner <[email protected]>
This is the code that backs open-telemetry#144. It can be reused to add features like requested in open-telemetry#33 and therefore can be an alternative to open-telemetry#192. The idea that enables off CPU profiling is, that perf event and kprobe eBPF programs are quite similar and can be converted. This allows, with the dynamic rewrite of tail call maps, the reuse of existing eBPF programs and concepts. This proposal adds the new flag '-off-cpu-threshold' that enables off CPU profiling and attaches the two additional hooks, as discussed in Option B in open-telemetry#144. Outstanding work: - [ ] Handle off CPU traces in the reporter package - [ ] Handle off CPU traces in the user space side Signed-off-by: Florian Lehner <[email protected]>
This is the code that backs open-telemetry#144. It can be reused to add features like requested in open-telemetry#33 and therefore can be an alternative to open-telemetry#192. The idea that enables off CPU profiling is, that perf event and kprobe eBPF programs are quite similar and can be converted. This allows, with the dynamic rewrite of tail call maps, the reuse of existing eBPF programs and concepts. This proposal adds the new flag '-off-cpu-threshold' that enables off CPU profiling and attaches the two additional hooks, as discussed in Option B in open-telemetry#144. Outstanding work: - [ ] Handle off CPU traces in the reporter package - [ ] Handle off CPU traces in the user space side Signed-off-by: Florian Lehner <[email protected]>
This is the code that backs open-telemetry#144. It can be reused to add features like requested in open-telemetry#33 and therefore can be an alternative to open-telemetry#192. The idea that enables off CPU profiling is, that perf event and kprobe eBPF programs are quite similar and can be converted. This allows, with the dynamic rewrite of tail call maps, the reuse of existing eBPF programs and concepts. This proposal adds the new flag '-off-cpu-threshold' that enables off CPU profiling and attaches the two additional hooks, as discussed in Option B in open-telemetry#144. Outstanding work: - [ ] Handle off CPU traces in the reporter package - [ ] Handle off CPU traces in the user space side Signed-off-by: Florian Lehner <[email protected]>
53c1387
to
0958f0d
Compare
77da04f
Hey, in continue to #33 discussion and work from the IG team, we (Kubescape) took some time to do a POC that we think can align well with the project roadmap.
The goal of this PR is to introduce the option to integrate this awesome project as a pkg in other OSS like Kubescape and Inspektor gadget.
In order to do it we added two main things:
kprobes
instead ofperf_events
in order to have the ability to trigger the unwinding capabilities from a tail call when we want a stack trace (e.g when we see syscall xyz we want the stack to see who triggered it).The flow of a third party OSS project can be:
Grab the
main.go
and modify it to have thefd
of thenative_tracer_entry
and register a custom reporter instead of the default ones. Then we can do:I want stack trace->tail_call (native_tracer_entry)->custom_reporter
.First, we would love to have feedback from you on how we can push this to be part of the project and what is missing/need to be changed.
Second, we don't see any releases and so we added support in the Makefile to compile with the
EXTERNAL
flag which will trigger the compilation of thekprobes
instead of theperf_events
but we are going to need some sort of release process for both methods I assume so I would love to have your thoughts on it.In addition the support for native symbols (C/C++/GO etc...) are implemented in your backend software that we didn't find the code to, and so we wonder whether we can have the protocol to talk to your symbol server in some way to be able to resolve native symbols without using the backend software.
Thanks!