Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Improve startup time #502

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

[RFC] Improve startup time #502

wants to merge 8 commits into from

Conversation

atenart
Copy link
Contributor

@atenart atenart commented Mar 6, 2025

As of now Retis can take quite some time to handle various things before starting a collection. Investigating where time is spent boils down to three main points:

  • A good amount of time is spent reading /proc/kallsyms; not much we can do here.
  • It turns out removing noops when loading BPF programs is far from being optimized. This is fixed by [RFC] btf/meta filter: introduce boolean expressions #496.
  • Loading BPF and attaching programs take a lot of time and are the major contributor to slowness. This is the goal of this PR (mainly reducing the number of programs loaded).

Combining this and #496 and using the generic profile, we can go from ~6.5s to ~2.5s starting time. In addition to this using kprobe multi helps when using lots of kprobes, especially at exit time (1 link vs 1 per probe). The raw tracepoints benefit from an optimization too, which helps a lot when using a lot of them — this was a major pain point in the past and one of the important reasons Retis could be slow to start.

Note that the last commit, grouping targeted probes with no hooks, might not show a lot of improvement atm but could be in the future. Nevertheless it is more logical, future proof and the changes are really not invasive.

This is based on multiple libbpf-rs changes, which have been merged upstream already but not released yet. This is the only reason this is marked as an RFC.

Some extra details in the commit logs.

atenart added 7 commits March 6, 2025 17:29
Allowing to use the newly added program attach types (kprobe w/ opts,
raw tp w/ opts and kprobe multi).

Signed-off-by: Antoine Tenart <[email protected]>
The kernel symbol address is used as the primary probe configuration
key. It could be retrieved dynamically in k(ret)probes but not in
raw_tracepoints and was there set as rodata. In an effort to move away
from probe-specific rodata, set the symbol address as the BPF cookie
when attaching all probes and use it to get the symbol address. No
functional change intended.

Signed-off-by: Antoine Tenart <[email protected]>
Instead of attaching all probes one by one, add them to the builders
(transferring ownership, which is more logical) and only then attach all
probes at once. Attaching probes added to a builder does not consume the
builder, it can be reused for later probe(s).

This does not change the current behavior but opens the door for
optimizations.

Signed-off-by: Antoine Tenart <[email protected]>
Instead of attaching kprobes one by one, use kprobe_multi to attach them
all at once, speeding up the attaching time a lot when using lots of
kprobes.

Signed-off-by: Antoine Tenart <[email protected]>
We can get the program once and use it for all attachs.

Signed-off-by: Antoine Tenart <[email protected]>
…me nargs

Raw tracepoints are special as the verifier checks at load time no out
of bound argument is accessed, which prevents us from loading a program
once and reusing it for all raw tracepoint probes. However since we
moved the symbol address to the BPF cookie the only probe-specific
variable in our tracepoints is the number of arguments. This one has to
be rodata for the verifier to work, but nothing prevents us from sharing
the same program for tracepoints have the same number of arguments.

Do this. This speed up loading time a lot when using tracepoints, e.g.
the generic profile startup time is divided by two as we have the
following and as loading a raw tracepoint program always take quite some
time:

 nargs 1: [
	"tp:net:napi_gro_receive_entry",
	"tp:net:napi_gro_frags_entry",
	"tp:net:netif_rx",
	"tp:net:net_dev_queue",
	"tp:net:netif_receive_skb"]
 nargs 2: ["tp:net:net_dev_start_xmit"]
 nargs 4: ["tp:skb:kfree_skb"]

Signed-off-by: Antoine Tenart <[email protected]>
@atenart atenart added the run-functional-tests Request functional tests to be run by CI label Mar 6, 2025
@atenart atenart force-pushed the at/startup-perfs branch from 4585dbd to 7300217 Compare March 6, 2025 17:27
Targeted probes use a single use builder. But when they have no hook
attached, they can instead use shared builders. This speeds up attaching
time for those (e.g. skb tracking probes).

Signed-off-by: Antoine Tenart <[email protected]>
@atenart atenart force-pushed the at/startup-perfs branch from 7300217 to e315fee Compare March 6, 2025 17:35
@atenart
Copy link
Contributor Author

atenart commented Mar 7, 2025

I just realized support for getting the BPF cookie in raw tracepoints was only added a year ago[1]. For those kernels we could not use the cookie in raw tracepoints and keep the legacy one program per probe logic.

[1] 68ca5d4eebb8 ("bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-functional-tests Request functional tests to be run by CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant