-
Notifications
You must be signed in to change notification settings - Fork 390
(more) precisely track memory accesses and allocations across FFI #4326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
6df5ade
to
45232b8
Compare
Write-up of design options and choices: https://hackmd.io/@scZ140SZTaOb1VASX_ju-Q/SJUzIeo-eg |
This doesn't build or pass tests right now since it builds against my fork of rustc, but against it it passes all tests on x86 linux and (assuming I didn't mess them up) on other platforms as well! |
@rustbot ready |
b2cb76e
to
85189c3
Compare
@rustbot author |
Reminder, once the PR becomes ready for a review, use |
b856447
to
ae13c9a
Compare
6ac4531
to
3d6ca8f
Compare
I think this is ready for at least a preliminary review? The rustc side I'm pretty settled on, though, so I'll push and mark that as ready also. There's every chance I missed something really obvious and a second pair of eyes probably helps. @rustbot ready |
0ea7416
to
f97ebcf
Compare
This comment has been minimized.
This comment has been minimized.
☔ The latest upstream changes (possibly 8344967) made this pull request unmergeable. Please resolve the merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started idly scrolling through this, but then realized this is way too much for me at the moment.^^ It is unlikely I'll be able to review a PR of this size and complexity before mid-July. I expect I have to book at least half a day for this, and I don't have that much time to spare earlier than that. If there's some way we could arrange real-time communication (e.g. via zoom) for a quicker feedback cycle, that could be useful.
Is there any way to split this up? Like
- first land some basic ptrace setup that does basically nothing
- then add catching of memory events in the server (but just logging the info to stderr or so without communicating them to Miri proper)
- then add a communication channel so that core Miri learns about these events and uses them for something useful
Also, the IPC protocol between server and client should be explicitly documented somewhere: what messages are there, what is their format, what do they mean, when can which message be sent by whom and what is the reply. Since it seems there are three channels, such docs are needed for each of them. Probably the file that defines the message types it the best place for that. Currently that's just the crate root of trace
I think; this should be a proper module on its own.
src/bin/miri.rs
Outdated
#[cfg(target_os = "linux")] | ||
if !config.native_lib.is_empty() && !config.force_old_native_lib { | ||
// FIXME: This should display a diagnostic / warning on error | ||
// SAFETY: No other threads have spawned yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is almost certainly wrong. rustc can spawn an arbitrary amount of threads before calling us here.
The forking should happen way earlier, probably right after argument parsing but before we invoke rustc_driver.
no_alloc.json
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file probably should not be here. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gulp
I'd be more than happy to do that, yeah! I've also been quite busy the last couple of weeks and so haven't had much time to address reviews, but I'm okay with talking directly and splitting this up further. Let me know what and when works for you, I've had a bit of trouble with Zoom on my device in the past so I have a slight preference for Jitsi but it's not a huge deal. I'm likely to be free again and address review comments + be able to talk sometime early next week |
88170db
to
ef5b4c7
Compare
@rustbot author |
Okay, sounds good! Let's see if it will be needed with the split-up approach and Oli helping with reviews. |
fix review comments ?? docs move init to arg parsing docs again
This greatly expands the information we have access to when code is executed across the FFI bound, including tracking exact allocations/deallocations (where possible) and accesses to Miri memory exposed to foreign code.
The current architecture forks Miri into 2 processes at the point of the first FFI call, with the parent
ptrace
ing the child. Communication is done via IPC channels; the child process signals when to start or stop tracing (during FFI), and the parent returns information about memory accesses that occurred, if any.Alternatives considered:
mmap
ing files from there to act as the machine's memory. The main benefits of this approach would be that it would be more portable as we'd no longer need to disassemble the foreign code to examine it (though we will still need to read arbitrary registers on the fly) and that it would give us slightly more accurate information about the sizes and types of memory accesses where the current approach lightly conservatively overestimates. However, I decided not to do this because it requires FUSE utils to actually be installed on the host system and would add a lot more complexity to the logicsigaction
s: We can also get data by registeringsigactions
to trigger when foreign code accesses our memory, but even though this is much simpler it would only give us page-grained information about what was performed (where the current approach is more or less byte-grained)As it stands, this PR is not ready for merging pending at least the following being resolved. Still, I am opening this because significant sections of the code are ready for (preliminary, this is still very WIP) review. In ~decreasing order of severity as I see it:
ptrace
is disabledOpen questions / fixable current limitations:
iced_x86
with something else (e.g. go ahead with the FUSE idea, or use Capstone/llvm-sys for the better compatibility and eat the binary size & compile time increase)? (Switched to Capstone)Zulip thread: https://rust-lang.zulipchat.com/#narrow/channel/269128-miri/topic/Initializing.20Miri.20memory.20from.20C.20FFI/with/518542321
Works toward (I hesitate to say closes) #11