-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Improve startup time #502
Open
atenart
wants to merge
8
commits into
main
Choose a base branch
from
at/startup-perfs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Allowing to use the newly added program attach types (kprobe w/ opts, raw tp w/ opts and kprobe multi). Signed-off-by: Antoine Tenart <[email protected]>
The kernel symbol address is used as the primary probe configuration key. It could be retrieved dynamically in k(ret)probes but not in raw_tracepoints and was there set as rodata. In an effort to move away from probe-specific rodata, set the symbol address as the BPF cookie when attaching all probes and use it to get the symbol address. No functional change intended. Signed-off-by: Antoine Tenart <[email protected]>
Instead of attaching all probes one by one, add them to the builders (transferring ownership, which is more logical) and only then attach all probes at once. Attaching probes added to a builder does not consume the builder, it can be reused for later probe(s). This does not change the current behavior but opens the door for optimizations. Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: Antoine Tenart <[email protected]>
Instead of attaching kprobes one by one, use kprobe_multi to attach them all at once, speeding up the attaching time a lot when using lots of kprobes. Signed-off-by: Antoine Tenart <[email protected]>
We can get the program once and use it for all attachs. Signed-off-by: Antoine Tenart <[email protected]>
…me nargs Raw tracepoints are special as the verifier checks at load time no out of bound argument is accessed, which prevents us from loading a program once and reusing it for all raw tracepoint probes. However since we moved the symbol address to the BPF cookie the only probe-specific variable in our tracepoints is the number of arguments. This one has to be rodata for the verifier to work, but nothing prevents us from sharing the same program for tracepoints have the same number of arguments. Do this. This speed up loading time a lot when using tracepoints, e.g. the generic profile startup time is divided by two as we have the following and as loading a raw tracepoint program always take quite some time: nargs 1: [ "tp:net:napi_gro_receive_entry", "tp:net:napi_gro_frags_entry", "tp:net:netif_rx", "tp:net:net_dev_queue", "tp:net:netif_receive_skb"] nargs 2: ["tp:net:net_dev_start_xmit"] nargs 4: ["tp:skb:kfree_skb"] Signed-off-by: Antoine Tenart <[email protected]>
4585dbd
to
7300217
Compare
Targeted probes use a single use builder. But when they have no hook attached, they can instead use shared builders. This speeds up attaching time for those (e.g. skb tracking probes). Signed-off-by: Antoine Tenart <[email protected]>
7300217
to
e315fee
Compare
I just realized support for getting the BPF cookie in raw tracepoints was only added a year ago[1]. For those kernels we could not use the cookie in raw tracepoints and keep the legacy one program per probe logic. [1] 68ca5d4eebb8 ("bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As of now Retis can take quite some time to handle various things before starting a collection. Investigating where time is spent boils down to three main points:
/proc/kallsyms
; not much we can do here.Combining this and #496 and using the
generic
profile, we can go from ~6.5s to ~2.5s starting time. In addition to this using kprobe multi helps when using lots of kprobes, especially at exit time (1 link vs 1 per probe). The raw tracepoints benefit from an optimization too, which helps a lot when using a lot of them — this was a major pain point in the past and one of the important reasons Retis could be slow to start.Note that the last commit, grouping targeted probes with no hooks, might not show a lot of improvement atm but could be in the future. Nevertheless it is more logical, future proof and the changes are really not invasive.
This is based on multiple
libbpf-rs
changes, which have been merged upstream already but not released yet. This is the only reason this is marked as an RFC.Some extra details in the commit logs.