-
-
Notifications
You must be signed in to change notification settings - Fork 40
Run fuzzing tools to discover safety holes #30
Comments
To clarify, I have listed a bunch of approaches to software verification, and I do not expect fuzzing specifically to discover many interesting bugs, unless there is gross negligence in FFI. Still, for something that takes half an hour to set up, it provides a really good return on investment, so there's no reason not to do it. https://github.com/rust-fuzz/targets already provides a fuzzing target for Do not forget to use address and memory sanitizers if you're looking for bugs in unsafe code, otherwise they will not be discovered. You can omit sanitizers when looking for panics or building the initial corpus. |
All I've looked at what kind of unsafes there are in serde and serde-json, here's what I've found: There is a number of
|
I don't think you have accurately characterized the assumptions around utf8 in Serde:
The key observation, which is called out in a comment, is that there is no way to emit non-utf8 encoded bytes through any API in std::fmt. You can see this by looking at the methods of |
@dtolnay than you for the clarification! I'd really appreciate if you could add a code comment explaining why a poorly implemented macro_rules! serialize_display_bounded_length {
($value:expr, $max:expr, $serializer:expr) => {{
let mut buffer: [u8; $max] = unsafe { mem::uninitialized() };
let remaining_len = {
let mut remaining = &mut buffer[..];
write!(remaining, "{}", $value).unwrap();
remaining.len()
};
let written_len = buffer.len() - remaining_len;
let written = &buffer[..written_len];
// <snip> By reading this code I observe that:
This leads me to believe that in order to verify that this is safe, I need to inspect the implementation of the This makes some critical invariants non-local. So an explanation in a code comment on why |
Thanks for the feedback @Shnatsel. I would love to get to the root of where our perspective differs. In response to your observations:
I guess I don't see this use of [Emphasis mine in the following quote]
As above, I'm not sure there is anything we could write in Serde that would justify why
I've tried not to distinguish between "correct" and "incorrect" implementations of Display because the reasoning applies equally well to any possible way that Display could be implemented. Maybe I'm missing the sort of incorrectness you have in mind -- could you give an example of what you would consider an incorrect implementation of |
I was wondering if it was possible for an implementation of Display to actually write a different amount of bytes than is eventually reported to this code via slice mutation in Turns out this is not possible because Display passes a string to the Write implementation in the formatter, which handles writing and updating the length. Display cannot mess up the length of the string it passes without abusing unsafe code, so this code is safe for any safe implementation of Display. |
Thanks a lot to @dtolnay for clarifications and for bearing with me. This attitude to safety is commendable. Apologies to mre for somewhat unrelated discussion on the issue. |
I quite enjoyed it tbh. 😃 |
We definetly need more tests and fuzzing is a good start, in Haskell i really love quickcheck and i was happy to see Rust-Quickcheck what about pyo3?
so maybe add hypothesis-python |
pyo3 worries me both due to inherent unsafety of FFI and past track record of some of its developers. I would love to see it audited, but I have to admit my competence is insufficient for a meaningful audit. |
Thanks to @RSabet for integrating our first hypothesis-based tests. This should make the codebase quite a bit more robust with regards to edge-cases. |
I've checked pyo3 bugtracker, and it seems the library needs some serious work before you can rely on it to be safe for passing arbitrary data. Check this out: PyO3/pyo3#159 Some of these issues are over a year old. And these are just the bugs that have been sufficiently dramatic to cause a segfault. You need something to go very, very wrong to trigger a segfault - most memory corruption is not detected unless you build the binary with address sanitizer. TL;DR: you would need to audit the relevant parts of pyo3 before you can claim any kind of safety. |
Two years later, drive by hello from a new (ish) pyo3 maintainer. 😄 I treat the soundness of It's a shame that pyo3 has historical safety issues; I have been working to see them removed. As of today I think I have submitted PRs to resolve the last of the issues labelled I wouldn't be surprised if there are still skeletons in the closet though - if you do end up doing any fuzzing and find any issues related to pyo3, please do post them: I'll work to clean them up asap! |
In a discussion on Reddit, user /u/Shnatsel suggested the use of fuzzing to test hyperjson for safety issues and panics. There are a few tools around, which could be helpful:
(List assembled by @memoryruins taken from here)
If somebody wants to take a stab at it, please comment here with any questions.
The text was updated successfully, but these errors were encountered: