Skip to content

(more) precisely track memory accesses and allocations across FFI #4326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

nia-e
Copy link
Contributor

@nia-e nia-e commented May 17, 2025

This greatly expands the information we have access to when code is executed across the FFI bound, including tracking exact allocations/deallocations (where possible) and accesses to Miri memory exposed to foreign code.

The current architecture forks Miri into 2 processes at the point of the first FFI call, with the parent ptraceing the child. Communication is done via IPC channels; the child process signals when to start or stop tracing (during FFI), and the parent returns information about memory accesses that occurred, if any.

Alternatives considered:

  • FUSE: It would be possible to obtain most of the same information about accesses to our memory (but not allocations done on C's side) by registering a FUSE fs which lives entirely in memory and mmaping files from there to act as the machine's memory. The main benefits of this approach would be that it would be more portable as we'd no longer need to disassemble the foreign code to examine it (though we will still need to read arbitrary registers on the fly) and that it would give us slightly more accurate information about the sizes and types of memory accesses where the current approach lightly conservatively overestimates. However, I decided not to do this because it requires FUSE utils to actually be installed on the host system and would add a lot more complexity to the logic
  • Plain sigactions: We can also get data by registering sigactions to trigger when foreign code accesses our memory, but even though this is much simpler it would only give us page-grained information about what was performed (where the current approach is more or less byte-grained)

As it stands, this PR is not ready for merging pending at least the following being resolved. Still, I am opening this because significant sections of the code are ready for (preliminary, this is still very WIP) review. In ~decreasing order of severity as I see it:

  • Actually incorporate the obtained information into Miri's memory tracking instead of just printing to stderr
  • General cleanup such as removing hardcoded unwraps, adding more documentation, etc
  • Dynamically fall back to the old behaviour without panicking if the parent fails to trace properly, e.g. if ptrace is disabled
  • Add more tests
  • Add logic for non-x86 architectures
  • Handle the foreign code spawning multiple threads

Open questions / fixable current limitations:

  • Do we want to pull in an existing allocator instead of having our own for managing the machine memory? (No; we merged our own)
  • Can this be made to be combineable with a multi-seeded Miri run? (Yes!)
  • What happens if the FFI code forks further (do we even allow it)?
  • Should we switch out iced_x86 with something else (e.g. go ahead with the FUSE idea, or use Capstone/llvm-sys for the better compatibility and eat the binary size & compile time increase)? (Switched to Capstone)

Zulip thread: https://rust-lang.zulipchat.com/#narrow/channel/269128-miri/topic/Initializing.20Miri.20memory.20from.20C.20FFI/with/518542321

Works toward (I hesitate to say closes) #11

@rustbot

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-author Status: Waiting for the PR author to address review comments label May 19, 2025
@nia-e nia-e force-pushed the ffi-ptrace branch 2 times, most recently from 6df5ade to 45232b8 Compare May 19, 2025 21:03
@RalfJung
Copy link
Member

Write-up of design options and choices: https://hackmd.io/@scZ140SZTaOb1VASX_ju-Q/SJUzIeo-eg

@nia-e
Copy link
Contributor Author

nia-e commented May 21, 2025

This doesn't build or pass tests right now since it builds against my fork of rustc, but against it it passes all tests on x86 linux and (assuming I didn't mess them up) on other platforms as well!

@nia-e
Copy link
Contributor Author

nia-e commented May 21, 2025

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Waiting for a review to complete and removed S-waiting-on-author Status: Waiting for the PR author to address review comments labels May 21, 2025
@nia-e nia-e marked this pull request as ready for review May 21, 2025 11:03
@nia-e nia-e changed the title WIP: (more) precisely track memory accesses and allocations across FFI (more) precisely track memory accesses and allocations across FFI May 21, 2025
@nia-e nia-e force-pushed the ffi-ptrace branch 5 times, most recently from b2cb76e to 85189c3 Compare May 22, 2025 11:56
@RalfJung
Copy link
Member

@rustbot author
based on discussion on Zulip re: splitting up the PR

@rustbot rustbot added S-waiting-on-author Status: Waiting for the PR author to address review comments and removed S-waiting-on-review Status: Waiting for a review to complete labels May 22, 2025
@rustbot
Copy link
Collaborator

rustbot commented May 22, 2025

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@nia-e nia-e force-pushed the ffi-ptrace branch 3 times, most recently from 6ac4531 to 3d6ca8f Compare May 31, 2025 06:42
@nia-e
Copy link
Contributor Author

nia-e commented May 31, 2025

I think this is ready for at least a preliminary review? The rustc side I'm pretty settled on, though, so I'll push and mark that as ready also. There's every chance I missed something really obvious and a second pair of eyes probably helps.

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Waiting for a review to complete and removed S-waiting-on-author Status: Waiting for the PR author to address review comments labels May 31, 2025
@nia-e nia-e force-pushed the ffi-ptrace branch 3 times, most recently from 0ea7416 to f97ebcf Compare June 3, 2025 21:13
@rustbot

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Jun 6, 2025

☔ The latest upstream changes (possibly 8344967) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started idly scrolling through this, but then realized this is way too much for me at the moment.^^ It is unlikely I'll be able to review a PR of this size and complexity before mid-July. I expect I have to book at least half a day for this, and I don't have that much time to spare earlier than that. If there's some way we could arrange real-time communication (e.g. via zoom) for a quicker feedback cycle, that could be useful.

Is there any way to split this up? Like

  • first land some basic ptrace setup that does basically nothing
  • then add catching of memory events in the server (but just logging the info to stderr or so without communicating them to Miri proper)
  • then add a communication channel so that core Miri learns about these events and uses them for something useful

Also, the IPC protocol between server and client should be explicitly documented somewhere: what messages are there, what is their format, what do they mean, when can which message be sent by whom and what is the reply. Since it seems there are three channels, such docs are needed for each of them. Probably the file that defines the message types it the best place for that. Currently that's just the crate root of trace I think; this should be a proper module on its own.

src/bin/miri.rs Outdated
#[cfg(target_os = "linux")]
if !config.native_lib.is_empty() && !config.force_old_native_lib {
// FIXME: This should display a diagnostic / warning on error
// SAFETY: No other threads have spawned yet
Copy link
Member

@RalfJung RalfJung Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is almost certainly wrong. rustc can spawn an arbitrary amount of threads before calling us here.

The forking should happen way earlier, probably right after argument parsing but before we invoke rustc_driver.

no_alloc.json Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file probably should not be here. ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gulp

@nia-e
Copy link
Contributor Author

nia-e commented Jun 15, 2025

I'd be more than happy to do that, yeah! I've also been quite busy the last couple of weeks and so haven't had much time to address reviews, but I'm okay with talking directly and splitting this up further. Let me know what and when works for you, I've had a bit of trouble with Zoom on my device in the past so I have a slight preference for Jitsi but it's not a huge deal. I'm likely to be free again and address review comments + be able to talk sometime early next week

@nia-e nia-e force-pushed the ffi-ptrace branch 2 times, most recently from 88170db to ef5b4c7 Compare June 15, 2025 19:58
@nia-e
Copy link
Contributor Author

nia-e commented Jun 16, 2025

@rustbot author
based on discussion above re: splitting this up further

@rustbot rustbot added S-waiting-on-author Status: Waiting for the PR author to address review comments and removed S-waiting-on-review Status: Waiting for a review to complete labels Jun 16, 2025
@RalfJung
Copy link
Member

Okay, sounds good! Let's see if it will be needed with the split-up approach and Oli helping with reviews.

nia-e added 6 commits June 19, 2025 15:04
fix review comments

??

docs

move init to arg parsing

docs again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: Waiting for the PR author to address review comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants