Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate with syzkaller #19

Open
dvyukov opened this issue May 26, 2023 · 5 comments
Open

Integrate with syzkaller #19

dvyukov opened this issue May 26, 2023 · 5 comments

Comments

@dvyukov
Copy link

dvyukov commented May 26, 2023

syzkaller is a coverage-guided OS kernel fuzzer. It can generate BPF programs of low quality and could benefit from a high-quality BPF generator. It won't be as efficient as Buzzer in stressing BPF subsystem itself, but will uncover bugs in more complex interactions between BPF programs and other kernel subsystems.

I have a prototype for integration, which you can mostly ignore except for the proposed interface between syzkaller and Buzzer:

// This is supposed to be buzzer.Generate.
// progType is the program type (BPF_PROG_TYPE_SOCKET_FILTER/KPROBE/...).
// oldInsns is the program for mutation, if empty, generate a new one.
// Returns new/mutated program and number of used map fd's in bpf_attr::fd_array.
func BuzzerGenerate(progType int, rnd *rand.Rand, oldInsns []uint64) (insns []uint64, maps int)

Buzzer and syzkaller has some model mismatch as syzkaller generates and mutates programs offline. So it's not possible to embed actual map fd's into the program (they don't exist yet, and we are not even running on the target machine).
So I restricted it to use of only bpf_attr::fd_array, which syzkaller will fill with requested number of fd's later.
It is possible to use fd's embed in the program, but it will require some for of a-la ELF relocations (buzzer will need to say what offsets in the program should contain fd's and which of these refer to the same/different maps). I decided to left this out for now.

Any other ideas on how to design the interface? Anything I've missed? Do we need attach type here? Or map types? Or anything to make it possible to call C functions from the BPF program?

CC @a-nogikh @tarasmadan

@thatjiaozi
Copy link
Collaborator

Hey @dvyukov

Thanks for opening this, let's pursue this idea!

I was chatting offline with @meadori about how we could achieve this and here are some initial thoughts:

  1. From a Buzzer perspective and eBPF generation perspective, the ebpf program generation is already kindof self contained, we think it would make sense to export that as a go module/library that could then be imported and used into syzkaller. While we considered the option of fully integrating buzzer into Syzkaller and deprecating some other features/logic, we think there is a path moving forward where both fuzzers share a common "core" and coexist in harmony :)

  2. Thanks for your prototype! I reviewed it quickly and it overall looks good to me. I am still learning how to use the fd_arrays in eBPF but I am confident it would not pose a problem. I can't come up with anything else that would be needed as part of that interface.

  3. On the buzzer side, the eBPF generation library needs a bit of work, we would like to increase unit test coverage, refactor into a more easy to use interface, etc. I certainly can squeeze the necessary work to have a syzkaller friendly interface among that work.

  4. Regarding map types: Perhaps instead of returning the number of map fds the program uses from fd_array we could return an array of map_types? where each position in the array corresponds to a map_fd and each value represents the expected map type for that fd

@meadori is there anything else you think I am missing from here?

Thanks again for opening this and pursuing this idea!

@meadori
Copy link
Collaborator

meadori commented Jun 13, 2023

@thatjiaozi you covered everything that we talked about. At first glance, the integration steps seem reasonable. I will look more at the specifics this week.

@dvyukov
Copy link
Author

dvyukov commented Jun 13, 2023

Re 1: totally fine with me.

Re 2: AFAIU when using the bpf array, the program refers to maps using index (0, 1, 2, ...), and the actual map FDs are supplied in a separate array. This allows the program to be constant regardless of actual FD values.

Re 3: Mostly up to you, but I would prefer earlier integration to parallelize the work and shake our interface details (you provide a trivial implementation early but with the final interface, and then we improve and integrate in parallel).

Re 4: Looks reasonable to me. But I don't know if different map types have significantly different interfaces or not. If they do (program validation will fail too often with wrong map type), then it makes sense.

@thatjiaozi
Copy link
Collaborator

Sounds good to me on the point on 3. Let's shake the details of the interface ASAP and then we can split the work.

I'll sync with @meadori on this and submit a PR soon.

@thatjiaozi
Copy link
Collaborator

Also I have to admint I had no idea how to use fd_array in ebpf but I just figured out, the documentation is not very helpful but this comment (https://github.com/libbpf/libbpf/blob/master/include/uapi/linux/bpf.h#L1185) gave me all the clues I needed.

After some local hacking I managed to get it to work with buzzer, I think, at the buzzer side, we should entirely ditch the hardcoded map_fd values in favor of fd_arrays as they look way more flexible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants