Skip to content
This repository has been archived by the owner on Mar 24, 2022. It is now read-only.

Revert #626 (Refactor run_async custom Future): unsound #643

Merged
merged 1 commit into from
Feb 26, 2021

Conversation

pchickey
Copy link
Contributor

This reverts commit bf24022, reversing
changes made to ddaf91e.

Unfortunately, in testing, we discovered that #626 has unsound interactions with asynchronous signal handling. We want to find a sound way to get these changes back in, but we need to revert the merge while we figure that out.

626 introduced a change to lucet_runtime_internals::State where a variant now contains an Arc:

    Running {
        async_context: Option<std::sync::Arc<AsyncContext>>,
    }

When a signal handler terminates an instance, https://github.com/BytecodeAlliance/lucet/blob/main/lucet-runtime/lucet-runtime-internals/src/instance/signals.rs#L310 will drop this Arc from a signal context, which will in turn call free on the underlying memory. If the system allocator is not signal safe (e.g. jemalloc) this can result in a segfault.

@pchickey
Copy link
Contributor Author

cc @benaubin

@benaubin
Copy link
Contributor

Uh oh! I bet it would be an easy fix to put the async context as a field on the instance instead of on the state. Would be happy to put together a fix :)

Copy link
Contributor

@iximeow iximeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😞 i'm very excited about these changes and very not excited about finding out State has to be signal-safe on drop. the exact drop is obscured by reassigning rather than some value going out of scope, and i'm not sure we'd even realize State has any kind of signal-safety requirements in the first place.

this is also a good time to just rip out the async signalling mechanism, the fuel-based timeout machinery is easy to keep safe by comparison but might need some thought first.

@benaubin i think a patch to move this to Instance would be great! i think rebasing #626 on the merge of this PR plus that change would be great for reviewability. looking at State i see we had (and will have again) Boxes on Yielding and Yielded states, so generally i don't love how close we've been to hitting this for unrelated reasons - we'll have to keep an eye out for how State changes interact with signal safety in the future. Lucet will never SIGALRM a Yielding or Yielded state, but we've got very little type system guarantee that we wouldn't hit this later. :(

@benaubin
Copy link
Contributor

I mean, good to know, at least. #644 Should fix it (required minimal changes), and I'd be happy to rebase upon this PR.

In terms of the out-of-gas mechanism, I believe that an effective solution may be allowing calling a user-defined hostcall for the "on_bound_expired"? I'm not sure about the implications of that from a soundness perspective, but I'd imagine doing so would enable things like "check if cpu time limit expired" without the overhead of a context switch.

@acfoltzer
Copy link
Contributor

acfoltzer commented Feb 26, 2021

this is also a good time to just rip out the async signalling mechanism, the fuel-based timeout machinery is easy to keep safe by comparison but might need some thought first.

This is how I would favor fixing this in the long run, after the immediate revert to back out the unsoundness. It does mean that folks will have to use an async runtime to do timeouts, even if they're not doing any other async stuff. But that can be pretty lightweight between futures::executor::block_on() and futures-timer.

Of course, the current KillSwitch mechanism is more general and powerful than just a means to implement timeouts, so removing it could break those use cases. Are we aware of any such use cases? Is this worth a BCA RFC?

@pchickey pchickey merged commit 62ad436 into main Feb 26, 2021
@pchickey pchickey deleted the pch/revert_626 branch February 26, 2021 18:17
@pchickey
Copy link
Contributor Author

There are several ways we can proceed here, lets take our time figuring out what the best path is over on #644

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants