Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forbid SIGBUS #139

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Forbid SIGBUS #139

wants to merge 1 commit into from

Conversation

dfoxfranke
Copy link

SIGBUS is similar to SIGSEGV in regards to when it is raised and the sort of special handling required for recovery.

I'm planning (haven't started) a pagefault crate specifically designed for handling SIGSEGV and SIGBUS, and want to avoid stepping on each other's toes.

SIGBUS is similar to SIGSEGV in regards to when it is raised and the
sort of special handling required for recovery.
@vorner
Copy link
Owner

vorner commented Feb 3, 2023

Hello

I understand the argument. Nevertheless, this would be a breaking change and it doesn't seem worth doing a „big version“ jump because of that.

Let me note, however, that:

  • signal-hook will not touch a signal unless instructed by user. The chance of user doing so and not noticing it doesn't do what it should is pretty low (but right now, this crate won't stop the user from trying and going through the pain of discovering what's wrong) ‒ so I don't think we would be stepping on each other toes in practice.
  • The FORBIDDEN signals don't actually outright prevent the user from using them, only doing so through the „common“ user-friendly APIs.

Which leads me to the detail that you could (maybe, depending on what you do?) use signal-hook-registry's register_sigaction_unchecked as your backend, to a) avoid dealing with some of the stuff (unlikely to be the bigger part of the problem in your case, to be fair), b) potentially multiplex the signals.

What is it you're actually planning on doing with the signal in the pagefault crate? Only some kind of reporting of what happened and terminating?

I would be OK with adding a documentation note that SIGBUS probably should be part of the FORBIDDEN set, though.

@dfoxfranke
Copy link
Author

So, there are basically two ways of dealing with page faults which don't involve terminating the program:

  1. "Patch and continue": mmap a valid page into the faulting address and then return out of the signal handler back to the instruction which faulted. This is what you want, e.g., for certain GC techniques.
  2. "Catch and recover": siglongjmp out of the handler back to a previously-set recovery point. This is what you want, e.g., if you're doing shared-memory IPC and another process has done something unexpected to the shm segment.

I want to support both use cases but my interest is mainly in the second. There will be two parts to the crate's API. One will look like signal_hook_registry::register_sigaction, except:

  • It only allows registering handlers for SIGSEGV, SIGBUS, SIGILL, and SIGFPE, and before calling any hooks it will check si_code to make sure it really arose from a fault and not from kill(2).
  • It'll digest the fields of siginfo_t relevant to those signals into a friendlier structure that covers over OS differences and pass that to the hook.
  • The hooks return a bool. It'll call them in reverse order of registration until one either doesn't return (probably because it aborted or longjmped out) or returns true. A true return means the fault has been patched and we should return out of the signal handler. A false return means the hook didn't recognize the fault and didn't do anything about it. If every hook returns false or if the signal came from kill(2), the handler will invoke whatever handler was previously installed before it. Finally, if there was no previous handler or if that handler returns, it will reinstall SIG_DFL and then return, which should re-trigger the fault and dump core.

The second part of the API will be a catch_fault function shaped like std::panic::catch_unwind, combined with a throw function that can be used as or from a signal hook. Under the hood, catch_fault will call sigsetjmp and save the jump buffer in thread-local storage (this will be implemented in C given the problems with returning twice into a Rust function), and the throw function will look up that buffer and siglongjmp back to it. This will be less graceful than unwinding from a panic because destructors won't run, but that's valid behavior for Rust in general.

Anyway, this brings me to the problem with signal_hook permitting registration of SIGBUS handlers from safe code. pagefault's register function will be unsafe, even though, in isolation, it's always safe. But if you're triggering any of the faults that pagefault handles, then you must be writing unsafe code somewhere, and you need to reason about its soundness. That reasoning is likely to involve an expectation that if a fault does occur, it will handled in a particular way by trusted code. But signal_hook allowing safe registration of SIGBUS handlers means that untrusted code can inject itself into that path, so the invariant you're relying on becomes an assertion about the global behavior of the entire process rather than something that can be encapsulated into a safe interface. Basically, you can have a safe interface for registering SIGBUS handlers, or you can have a safe interface to code that triggers bus faults, but you can't have both in the same program.

@dfoxfranke
Copy link
Author

Perhaps a backward-compatible way out here is add a new low-level call to signal_hook_registry for demanding exclusive control over the handling of particular signals. So, signal_hook_registry would maintain an initially-empty global list of exclusively-claimed signals. A call to claim_exculsive(signo) would add signo to the list of claimed signals. The call will fail if anything has already registered an action for that signal, or if anything else has already called claim_exclusive for it. Afterward, any attempt to register an action for that signal (even via the _unchecked variants) will panic. So, a crate like mine can call claim_exclusive(SIGBUS) and then (without any further interaction with signal_hook_registry) directly call sigaction(2) in the knowledge that this call can no longer be undone by any safe code.

@vorner
Copy link
Owner

vorner commented Feb 9, 2023

🤔

This approach seems rather fragile. I mean, we can set up some protocol to talk between these two crates, like the exclusive flag you mentioned. But that doesn't stop some third crate we don't know about from messing with the signal handler.

And I'm a bit reluctant to pollute the crate with more functionality (I've had some more bigger plans around that, what I'm not sure how this fits together) and I don't really have much time to think about these things.

But I wonder if it was possible to check for the exclusivity on your side. Considering someone can do the classical chaining of signal handlers (as does signal-hook, but independently) after you check exclusivity and register, it would have to happen both during your registration and during the signal handler, but something like this might work:

  • When you call sigaction to register your signal handler, check that oldact is empty.
  • During the handler, call sigaction with act = null (to not change anything) and see that your own signal handler is returned as oldact. If not, go for that „dump core“ fallback.

Would that thing work? I haven't checked this in practice / against the spec that this is fully supported everywhere, though.

@vorner
Copy link
Owner

vorner commented Feb 9, 2023

Nevertheless, you make a point that SIGBUS is, in fact, a signal somewhat dangerous to mess with. I'll have to think about what are the (safety / soundness) implications with eg. „safely“ replacing the default handler with eg. setting a boolean flag.

@dfoxfranke
Copy link
Author

dfoxfranke commented Feb 9, 2023

This approach seems rather fragile. I mean, we can set up some protocol to talk between these two crates, like the exclusive flag you mentioned. But that doesn't stop some third crate we don't know about from messing with the signal handler.

Sure, but since nothing in std provides a safe way to manipulate signals, that third crate would have to be using unsafe code somewhere. I can't stop that, any more than I can stop a third-party crate from doing anything else unsafe. But since your crate is the Rust ecosystem's de facto standard way of allocating signal handlers, I can ensure compatibility with anything that respects that standard.

But I wonder if it was possible to check for the exclusivity on your side. Considering someone can do the classical chaining of signal handlers (as does signal-hook, but independently) after you check exclusivity and register, it would have to happen both during your registration and during the signal handler.

No, this wouldn't work. Unsafe code needs to know before it causes a segfault what the signal handler is going to do with it. Checking that before every operation would be both racy and prohibitively expensive.

@vorner
Copy link
Owner

vorner commented Feb 9, 2023

that third crate would have to be using unsafe code somewhere. I can't stop that, any more than I can stop a third-party crate from doing anything else unsafe.

I'm simply wondering. Your crate will be offering a safe interface to do your thing. That third party crate would be offering a safe interface. But if put together, they could cause UB. So, which one of them would be unsound?

Anyway, I'm going to think about some future-proof API of incorporating both claiming of the signals and the stuff hinted at in #82.

The problems is, I don't really have the time to write the code right now. So, I can sketch the API and the ideas behind it, but would probably need someone to do the coding.

@dfoxfranke
Copy link
Author

I'm simply wondering. Your crate will be offering a safe interface to do your thing. That third party crate would be offering a safe interface. But if put together, they could cause UB. So, which one of them would be unsound?

That's the trouble with global state — there's no meaningful answer to this question. If we wanted to reason about this formally using separation logic, we'd want to construct a frame of everything that interacts with signal handler state, and ask whether the frame is sound. But we don't know what's in that frame until we link the whole program together.

@dfoxfranke
Copy link
Author

dfoxfranke commented Sep 21, 2023

I've released the pagefault crate, under the (more accurate) name hw-exception: https://docs.rs/hw-exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants