Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility of integrating otel-profiling-agent as a library #33

Open
i-Pear opened this issue May 26, 2024 · 5 comments
Open

Possibility of integrating otel-profiling-agent as a library #33

i-Pear opened this issue May 26, 2024 · 5 comments

Comments

@i-Pear
Copy link

i-Pear commented May 26, 2024

Thank you for developing such an amazing profiling agent!

I am interested in integrating this project into other eBPF projects to support unwinding for multiple languages, replacing the bpf_get_stackid() helper function. Due to the complex architecture of this project, I would prefer not to fork the code and create my own version. However, it seems that this project does not account for being called by other eBPF programs.

Considering that the eBPF part of this project requires significant assistance from the userspace agent (such as extracting eh_frame information), I envision two possible integration methods:

  1. The userspace component of otel-profiling-agent runs as an independent daemon process, providing the necessary support for the eBPF part. Other projects could obtain a file descriptor pointing to native_tracer_entry from the daemon process and use eBPF extensions or tail calls to invoke it on the eBPF side, similar to calling bpf_get_stackid(). The daemon process would then provide an API for reading the required call stack information.

  2. Other Golang projects integrate otel-profiling-agent as a package, with this package creating separate threads to maintain the information needed for the eBPF program at runtime. This approach might be easier in terms of interaction but less friendly for projects written in languages other than Go.

What do you think about this? I am curious if you are interested in making the unwinder part of this project a general-purpose component. If some work is needed to achieve this, I am more than willing to help.

@florianl
Copy link
Contributor

Hi @i-Pear

Thank you for your interest in this project. At the moment we are focusing and working on donating this project to OTel. In general it should be already possible to use this project as dependency. One can take this projects main.go as an example and start and load the components.
Overall I think there is one limitation: As this project is using BPF_PROG_TYPE_PERF_EVENT eBPF programs and so custom triggers also need to be able to call this eBPF program type.

@alban
Copy link

alban commented Sep 19, 2024

Hi,
I am one of the Inspektor Gadget maintainers (a CNCF project that would benefit from opentelemetry-ebpf-profiler's stack unwinding abilities). I am new to opentelemetry-ebpf-profiler.
We discussed this topic in the OTel Profiling SIG Meeting today. I talked about the integration method number 1 from above (also described in inspektor-gadget/inspektor-gadget#3457). The feedback I gathered was:

  • It should be technically feasible.
  • It would make sense for Open Source to enable component reuse.
  • But we should consider maintainability. The OpenTelemetry project has a specific scope: mainly profiling and distributed traces. Being consumed by a third-party project that does not fit this scope should not distract OpenTelemetry from its focus. However, other projects under the OpenTelemetry umbrella would benefit from reusability of opentelemetry-ebpf-profiler's ebpf programs to unwind stacks in uprobes, for example ebpf programs that look up distributed traces.
  • It should be discussed in a github issue for people who are not in the meeting. So here it is :)

@slashben
Copy link

slashben commented Oct 6, 2024

Hey,
I just wanted to introduce myself as Alban did. I am one of the maintainers of Kubescape, a Kubernetes security project under the CNCF/LF umbrella. I/we are also contributing to Inspektor Gadget and in general using eBPF for security observability.

I agree with @alban on the points he recorded in the previous comment. The only thing I'd open is the way of integration with the profiling agent.

Correct me if I am wrong, but as I see, options no. 1 and no. 2 are not mutually exclusive.

Projects, that use Go should be able to enjoy the benefits of the ecosystem if it doesn't create extra burden.

cc: @florianl @i-Pear

@i-Pear
Copy link
Author

i-Pear commented Oct 6, 2024

options no. 1 and no. 2 are not mutually exclusive.
Projects, that use Go should be able to enjoy the benefits of the ecosystem if it doesn't create extra burden.

@slashben I also think both options are viable, each with its own advantages.

Option 1 may focus more on "isolation," as the profiler requires some userspace agent threads to assist the eBPF program. If the profiler and other applications are compiled into the same process, I'm not sure if this would impose limitations on syscalls like fork. Additionally, if there are many containers in a k8s cluster that rely on the profiler, the "profiler as a service" design of Option 1 could save resources.

The clear advantage of Option 2 is the elimination of communication overhead, as there would be no need to design a complex protocol. It also allows for easier and deeper customization. Moreover, the tailcall in Option 1 doesn’t seem to have been proven feasible for all types of eBPF programs, so Option 2 might offer better compatibility.

For the Inspektor Gadget project, another reason to favor Option 1 is that IG has disabled CGO, while the profiler heavily relies on this feature.

@amitschendel
Copy link

Hey @florianl @alban @i-Pear @slashben see our proposal in the attached PR #192 we would be happy to get your feedback.

florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this issue Oct 20, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this issue Oct 29, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this issue Nov 1, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this issue Nov 8, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this issue Nov 18, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this issue Nov 26, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this issue Dec 11, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this issue Dec 16, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this issue Dec 23, 2024
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this issue Jan 6, 2025
This is the code that backs
open-telemetry#144.
It can be reused to add features like requested in
open-telemetry#33 and
therefore can be an alternative to
open-telemetry#192.

The idea that enables off CPU profiling is, that perf event and kprobe eBPF
programs are quite similar and can be converted. This allows, with the
dynamic rewrite of tail call maps, the reuse of existing eBPF programs and
concepts.

This proposal adds the new flag '-off-cpu-threshold' that enables off CPU
profiling and attaches the two additional hooks, as discussed in Option B
in open-telemetry#144.

Outstanding work:
- [ ] Handle off CPU traces in the reporter package
- [ ] Handle off CPU traces in the user space side

Signed-off-by: Florian Lehner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants