Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: support for third party OSS to use otel-ebpf-profiler #192

Merged
0 commits merged into from
Jan 2, 2025

Conversation

amitschendel
Copy link

Hey, in continue to #33 discussion and work from the IG team, we (Kubescape) took some time to do a POC that we think can align well with the project roadmap.
The goal of this PR is to introduce the option to integrate this awesome project as a pkg in other OSS like Kubescape and Inspektor gadget.
In order to do it we added two main things:

  1. Support for kprobes instead of perf_events in order to have the ability to trigger the unwinding capabilities from a tail call when we want a stack trace (e.g when we see syscall xyz we want the stack to see who triggered it).
  2. Support for running from within a container, meaning to support accessing other processes fs.

The flow of a third party OSS project can be:
Grab the main.go and modify it to have the fd of the native_tracer_entry and register a custom reporter instead of the default ones. Then we can do:
I want stack trace->tail_call (native_tracer_entry)->custom_reporter.

First, we would love to have feedback from you on how we can push this to be part of the project and what is missing/need to be changed.
Second, we don't see any releases and so we added support in the Makefile to compile with the EXTERNAL flag which will trigger the compilation of the kprobes instead of the perf_events but we are going to need some sort of release process for both methods I assume so I would love to have your thoughts on it.

In addition the support for native symbols (C/C++/GO etc...) are implemented in your backend software that we didn't find the code to, and so we wonder whether we can have the protocol to talk to your symbol server in some way to be able to resolve native symbols without using the backend software.

Thanks!

Copy link

linux-foundation-easycla bot commented Oct 15, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@fabled
Copy link
Contributor

fabled commented Oct 15, 2024

Hey, in continue to #33 discussion and work from the IG team, we (Kubescape) took some time to do a POC that we think can align well with the project roadmap. The goal of this PR is to introduce the option to integrate this awesome project as a pkg in other OSS like Kubescape and Inspektor gadget. In order to do it we added two main things:

Nice!

Since the project has an enforced squash rule for PRs, it would be useful to have one PR per logical feature instead of one big PR. Could you split this to logical pieces?

1. Support for `kprobes` instead of `perf_events` in order to have the ability to trigger the unwinding capabilities from a tail call when we want a stack trace (e.g when we see syscall xyz we want the stack to see who triggered it).

I have done earlier some private experiments with something similar. Since we are also looking to support off-cpu profiling (see #144). It would make sense to instead compile the eBPF things twice: once as perf_event and another time as kprobe type. We will likely have the kprobe things used by the agent itself. And the same will be useful to be attached as uprobe so the backtrace can be triggered from usermode.

2. Support for running from within a container, meaning to support accessing other processes fs.

The PR looks wrong on this part. There is already support to access the files via the /proc/PID/root by prepending it in the places the file is opened. See:

If something is missing or done wrong, it should be done in the proper abstraction level, and not by adjusting the mappings path.

First, we would love to have feedback from you on how we can push this to be part of the project and what is missing/need to be changed. Second, we don't see any releases and so we added support in the Makefile to compile with the EXTERNAL flag which will trigger the compilation of the kprobes instead of the perf_events but we are going to need some sort of release process for both methods I assume so I would love to have your thoughts on it.

Let's start by splitting this to logical PRs per feature/thing. And remove the EXTERNAL tag by compiling the ebpf twice to produce both variants.

In addition the support for native symbols (C/C++/GO etc...) are implemented in your backend software that we didn't find the code to, and so we wonder whether we can have the protocol to talk to your symbol server in some way to be able to resolve native symbols without using the backend software.

I believe vendors do different things on this part. @christos68k or @florianl can perhaps comment from elastic / devfiler side on what the roadmap/plan is.

@amitschendel
Copy link
Author

Thanks @fabled I pushed the changes so that running make ebpf compiles both versions and moved the control of the embedding to go tags so it will be easy to choose in what to use. Let me know if this answer your needs.

@fabled
Copy link
Contributor

fabled commented Oct 15, 2024

Thanks @fabled I pushed the changes so that running make ebpf compiles both versions and moved the control of the embedding to go tags so it will be easy to choose in what to use. Let me know if this answer your needs.

So the direction where to go, is that the Go code then also embeds and loads both variants. There should be no build tag.

@florianl
Copy link
Contributor

My concern around the proposed changes is, that one has to decide if they want their solution running in GetNativeTracerEntry() mode or sampling/default mode. So if someone wants to have both options available, they are required to run the code twice and information is not shared between both modes - e.g. stack delta extraction happens then multiple times for each executable and the footprint (memory/cpu usage) duplicates as well.

#144 is facing a similar challenge but might tackle it differently (by sharing information between kprobes and perf event eBPF programs) - so if there is no immediate rush to get this resolved, can we postpone this for a bit?

@christos68k
Copy link
Member

christos68k commented Oct 15, 2024

Let's start by splitting this to logical PRs per feature/thing. And remove the EXTERNAL tag by compiling the ebpf twice to produce both variants.

I will agree with @fabled here and add that a solution that splits the agent into separate deliverables with divergent functionality seems at this point hard to justify. So my suggestion would be to work towards keeping a single agent that can operate in both modes (sampling, on-demand) at the same time. This is the kind of feature that we'd typically produce a design document for as there are performance concerns and multiple trade-offs and possible implementations to think about.

In addition the support for native symbols (C/C++/GO etc...) are implemented in your backend software that we didn't find the code to, and so we wonder whether we can have the protocol to talk to your symbol server in some way to be able to resolve native symbols without using the backend software.

I believe vendors do different things on this part. @christos68k or @florianl can perhaps comment from elastic / devfiler side on what the roadmap/plan is.

We are currently discussing a native symbol uploading protocol implementation (see here for some context) but there is no backend symbol processing architecture specified as part of OpenTelemetry profiling as we're leaving that up to each implementor.

@amitschendel
Copy link
Author

I see what you all are saying, I misinterpreted that, my bad. I will try to push a fix and ping you, thanks!

@amitschendel
Copy link
Author

Hey @fabled please take a look on the new pushed changes, we have added a new map to hold the kprobes and now both program types can be loaded together.

florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this pull request Oct 20, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
@florianl florianl mentioned this pull request Oct 20, 2024
@florianl
Copy link
Contributor

With #196 I have pushed an alternative option that also enables the use case as described in #33.

@amitschendel
Copy link
Author

With #196 I have pushed an alternative option that also enables the use case as described in #33.

This is not entirely answers our need because of two reasons:

  1. We don't want any kprobes being triggered when we are not the trigger source (we want to trigger unwinding only when we do a tail call). In your PR if we understood correctly we would need to do a tail call to kprobe/finish_task_switch which is also triggered by the OS.
  2. We don't see that you expose the FD of the kprobe so we can use it in our main to do a tail call.

BTW, Not sure if this is only on our side but we wasn't able to run the agent because of errors when loading the eBPF.
@florianl

@florianl
Copy link
Contributor

The purpose of #196 is not to resolve #33, as #196 is focused on off CPU profiling. But the concept, to (a) not have to rewrite eBPF code, (b) dynamically reuse perf event eBPF programs as kprobe eBPF programs and (c) allow multiple times of profiling simultaneously , satisfies the request as described in #33 - it allows to have a custom kprobe attached to something that is defined by a configuration flag.

So the rough outline, that is missing in #196 to resolve #33 is

  • a costum kProbe similar to SEC("kprobe/finish_task_switch")
  • a CLI flag that defines where this costum kprobe should be attached
  • a way to expose the FD to this kprobe

BTW, Not sure if this is only on our side but we wasn't able to run the agent because of errors when loading the eBPF.
@florianl

Hard to provide feedback on this without information on the error, verifier logs, environment and kernel version.

As mentioned in #192 (comment), my personal thinking is, that one form of profiling should not limit the other and having multiple forms of profiling on the system should not increase the number of required resources by this number of different forms of profiling.

@amitschendel
Copy link
Author

amitschendel commented Oct 21, 2024

I understand, so the 3 bullets that is needed to resolve #33 is something you want to implement as part of #196 or in a different PR?
From our perspective if the 3 bullets are resolved we are good to go (we can change this PR to address them as well).

Regarding the issue of loading the eBPF I will send it over in the slack channel so we don't spam this PR.
Thanks!

@florianl
Copy link
Contributor

I understand, so the 3 bullets that is needed to resolve #33 is something you want to implement as part of #196 or in a different PR?
From our perspective if the 3 bullets are resolved we are good to go!

No, these three bullets should not be part of #196. But they can be a follow up to #196.

@amitschendel
Copy link
Author

Okay so let's put this on hold till #196 is merged and then we will add the missing pieces.
What's the ETA for #196 ?

@afek854 afek854 force-pushed the main branch 2 times, most recently from 075cc30 to 64c3f2c Compare October 27, 2024 08:48
@afek854
Copy link

afek854 commented Oct 27, 2024

Hey, in addition to the three bullets mentioned by @florianl .
We encountered another challenge while attempting to enhance our monitored events with OTEL eBPF profiler traces.

Currently, our setup monitors both kprobes and tracepoints, but we need a reliable way to uniquely identify events in our monitoring system for correlation with your trace as enrichment data.

To resolve this, we suggest generating unique identifiers for trace events and our events using the following components:

Stack pointer: from task_struct->thread_struct
PID and TID Combination: The combination of the process ID and thread ID.
Syscall Number: The specific syscall being executed.

We would appreciate your feedback on the following:

Is our proposed method for generating unique event identifiers effective?
Are there alternative methods you would recommend for this purpose?

Thank you for your insights!
CC: @Amit Schendel

@florianl
Copy link
Contributor

What's the ETA for #196 ?

There is no ETA for #196, there needs to be more feedback and approvals. First #144 needs to get accepted and merged - feedback is very welcomed. With #196 (comment) some discussion around the design of sampling got started, which needs to be resolved first.

We encountered another challenge while attempting to enhance our monitored events with OTEL eBPF profiler traces.

Currently, our setup monitors both kprobes and tracepoints, but we need a reliable way to uniquely identify events in our monitoring system for correlation with your trace as enrichment data.

The points mentioned in #192 (comment) are just high level points. Being able to distinguish between events that triggered profiling is fundamental, not only for your use case but also for on- vs off-CPU sampling. So there needs to be some way. What information will be used and how it is fetched is up for discussion.

florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this pull request Oct 29, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this pull request Nov 1, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this pull request Nov 8, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this pull request Nov 18, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this pull request Nov 26, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this pull request Dec 11, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this pull request Dec 16, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this pull request Dec 23, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
@afek854 afek854 force-pushed the main branch 2 times, most recently from 53c1387 to 0958f0d Compare January 1, 2025 12:18
@christos68k christos68k closed this pull request by merging all changes into open-telemetry:main in 77da04f Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants