Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluefin is susceptible to #24189 #30

Open
LSLeary opened this issue Dec 6, 2024 · 8 comments
Open

Bluefin is susceptible to #24189 #30

LSLeary opened this issue Dec 6, 2024 · 8 comments

Comments

@LSLeary
Copy link

LSLeary commented Dec 6, 2024

Due to the implementation of GHC's exception mechanism, catching under unsafePerformIO has a bad interaction with async exceptions. The issue and some workarounds are detailed in #24189.

A port of lexi-lambda's reproducer to bluefin:

{-# OPTIONS_GHC -fno-omit-yields #-}
{-# LANGUAGE BlockArguments #-}

-- base
import Data.Functor (void)
import Control.Concurrent (forkIO, killThread)
import Control.Exception (evaluate)

-- bluefin
import Bluefin.Eff (runPureEff)
import Bluefin.Exception (handle)

main :: IO ()
main = do
  let
    n = 200_000_000
    eff_thunk = runPureEff do
      handle (\() -> pure False) \_ -> do
        delay n do
          pure True
  eval_thread <- forkIO do
    void (evaluate eff_thunk)
  delay (n `div` 2) do
    killThread eval_thread
  print eff_thunk
 where
  delay :: Int -> a -> a
  delay n x = sum [1..n] `seq` x

This program terminates with an exception, but will conclude successfully if you comment out the redundant handle.

@tomjaguarpaw
Copy link
Owner

tomjaguarpaw commented Dec 6, 2024

Thanks for reporting this intriguing issue. Below is the version of your script that I could make run.

To check my understanding, is the problem that the async exception thrown to eval_thread (by killThread) should eventually be raised to the body of the forkIO, and thus swallowed, but instead the exception somehow propagates to the main thread?

EDIT: Oh, I guess the problem is that the async exception gets stored in the thunk, so it is thrown when then thunk is later evaluated during the print.


#!/usr/bin/env cabal
{- cabal:
build-depends: base, bluefin
-}

{-# OPTIONS_GHC -fno-omit-yields #-}
{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE GHC2021 #-}

-- base
import Data.Functor (void)
import Control.Concurrent (forkIO, killThread)
import Control.Exception (evaluate)

-- bluefin
import Bluefin.Eff (runPureEff)
import Bluefin.Exception (handle)

main :: IO ()
main = do
  let
    n = 200_000_000
    eff_thunk = runPureEff do
      handle (\() -> pure False) \_ -> do
        delay n do
          pure True
  eval_thread <- forkIO do
    void (evaluate eff_thunk)
  delay (n `div` 2) do
    killThread eval_thread
  print eff_thunk
 where
  delay :: Int -> a -> a
  delay n x = sum [1..n] `seq` x
% cabal run test28.hs
cabal-script-test28.hs: thread killed

@tomjaguarpaw
Copy link
Owner

effectful does not seem to suffer from this issue, though I haven't worked out why:

#!/usr/bin/env cabal
{- cabal:
build-depends: base, effectful-core
-}

{-# OPTIONS_GHC -fno-omit-yields #-}
{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE GHC2021 #-}

-- base
import Data.Functor (void)
import Control.Concurrent (forkIO, killThread)
import Control.Exception (evaluate)

-- bluefin
import Effectful (runPureEff)
import Effectful.Error.Static (runError)

main :: IO ()
main = do
  let
    n = 200_000_000
    eff_thunk = runPureEff do
      r <- runError do
        delay n do
          pure True

      pure $ case r of
        Left (_, l) -> l
        Right r -> r

  eval_thread <- forkIO do
    void (evaluate eff_thunk)
  delay (n `div` 2) do
    killThread eval_thread
  print eff_thunk
 where
  delay :: Int -> a -> a
  delay n x = sum [1..n] `seq` x
% cabal run test28.hs
True

@tomjaguarpaw
Copy link
Owner

tomjaguarpaw commented Dec 6, 2024

Oh no, a subtle change shows that effectful does suffer from the issue, but I don't understand why that change makes a difference!

#!/usr/bin/env cabal
{- cabal:
build-depends: base, effectful-core
-}

{-# OPTIONS_GHC -fno-omit-yields #-}
{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE GHC2021 #-}

-- base
import Data.Functor (void)
import Control.Concurrent (forkIO, killThread)
import Control.Exception (evaluate)

-- bluefin
import Effectful (runPureEff)
import Effectful.Error.Static (runError)

main :: IO ()
main = do
  let
    n = 200_000_000
    eff_thunk = runPureEff do
      r <- runError do
        delay n do
          pure True

      pure $ case r of
        -- Difference here: we catch `()` and return `False`
        Left (_, ()) -> False
        Right r -> r

  eval_thread <- forkIO do
    void (evaluate eff_thunk)
  delay (n `div` 2) do
    killThread eval_thread
  print eff_thunk
 where
  delay :: Int -> a -> a
  delay n x = sum [1..n] `seq` x
% cabal run test28.hs
cabal-script-test28.hs: thread killed

@tomjaguarpaw
Copy link
Owner

tomjaguarpaw commented Dec 6, 2024

Actually, the situation is weirder with effectful, because printing something before print eff_thunk makes it terminate correctly. (The same does not make the Bluefin code terminate correctly.)

#!/usr/bin/env cabal
{- cabal:
build-depends: base, effectful-core
-}

{-# OPTIONS_GHC -fno-omit-yields #-}
{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE GHC2021 #-}

-- base
import Data.Functor (void)
import Control.Concurrent (forkIO, killThread)
import Control.Exception (evaluate)

-- bluefin
import Effectful (runPureEff)
import Effectful.Error.Static (runError)

main :: IO ()
main = do
  let
    n = 200_000_000
    eff_thunk = runPureEff do
      r <- runError do
        delay n do
          pure True

      pure $ case r of
        Left (_, ()) -> False
        Right r -> r

  eval_thread <- forkIO do
    void (evaluate eff_thunk)
  delay (n `div` 2) do
    killThread eval_thread
  putStrLn "After killThread"
  print eff_thunk
 where
  delay :: Int -> a -> a
  delay n x = sum [1..n] `seq` x
% cabal run test28.hs
After killThread
True

@tomjaguarpaw
Copy link
Owner

The best I have been able to come up with is runPureEffAsyncSafe:

runPureEffAsyncSafe :: forall r. (forall es. Eff es r) -> r
runPureEffAsyncSafe effBody = unsafePerformIO $ mask \restore -> do
tidVar <- newEmptyMVar @ThreadId
done <- newEmptyMVar @(Either SomeException r)
let body = do
tid <- forkIO do
putStrLn "Child running"
r <- Control.Exception.try @SomeException $ restore do
unsafeUnEff do
effBody
case r of
Left l -> do
putStrLn "Child received async exception:"
print l
Right {} -> putStrLn "Child terminated normally"
putMVar done r
putMVar tidVar tid
takeMVar done
r <- fix \again -> restore $
Control.Exception.catch @SomeException body $ \ex ->
-- I'm not sure if this mask is necessary
mask $ \_ -> do
putStrLn "Eff thunk handling async exception"
tid <- takeMVar tidVar
putStrLn "Killing child"
killThread tid
putStrLn "Waiting for child"
_ <- takeMVar done
putStrLn "Throwing to self"
myself <- myThreadId
throwTo myself ex
putStrLn "Looping"
again
case r of
Left l -> do
putStrLn "In L 2"
throwIO l
Right r' -> do
putStrLn "Returning"
pure r'

The equivalent of your point of Alexis's example is at:

putStrLn "Starting"
let n = 200_000_000
eff_thunk = runPureEffAsyncSafe do
handle (\() -> pure False) \_ -> do
delay n do
pure True
putStrLn "Launching thread to evaluate Eff thunk"
eval_thread <- forkIO do
Control.Exception.try @SomeException (evaluate eff_thunk) >>= \case
Left {} -> putStrLn "Eff thunk thread threw exception"
Right {} -> putStrLn "UNEXPECTED: Eff thunk thread did not throw exception"
delay (n `div` 2) do
killThread eval_thread
putStrLn "Killed thread evaluating Eff thunk"
putStrLn "Forcing Eff thunk again"
print eff_thunk
where
delay :: Int -> a -> a
delay n x = sum [1 .. n] `seq` x

Then it runs as desired:

% cabal test all --test-show-details=always
[...]
Starting
Launching thread to evaluate Eff thunk
Child running
Killed thread evaluating Eff thunk
Forcing Eff thunk again
Eff thunk handling async exception
Killing child
Waiting for child
Child received async exception:
thread killed
Throwing to self
Eff thunk thread threw exception
Looping
Child running
Child terminated normally
Returning
True

Test suite bluefin-test: PASS

I think what happens, based on Alexis's explanation on GitLab, is that when the async exception arrives the go loop is suspended, and the next time the eff_thunk is forced the go loop is resumed. I didn't find the need to wrap/unwrap body in unsafePerformIO/evaluate. I'm not sure why. Perhaps because, as explained below, effBody runs in its own thread so we're not trying to share work.

body runs effBody in a thread of its own, which seems sensible because then it's completely immune from further async exceptions because no one knows its ThreadId1. The downside is that its evaluation will have to be completely restarted if eff_thunk is forced a second time. I don't think there's any way around that, other than wrapping every Bluefin catch in this special logic, which is probably prohibitively expensive, and certainly intolerably messy.

I have no idea if this is actually correct or just seems to work so it would be good to try to get confirmation somehow.

Footnotes

  1. With the exception of listThreads and other abstraction-violating primitives, which should probably be considered as dangerous as unsafeCoerce.

@tomjaguarpaw
Copy link
Owner

Oh, and it would be nice if I could have done that with an existing structured concurrency abstraction like async, but withAsync's use of catch suffers the same undesirable behavior, so it seems here we have to write our own.

@LSLeary
Copy link
Author

LSLeary commented Dec 7, 2024

Both the issue and the workaround are complicated and subtle.
I don't possess deep knowledge of the RTS so my own understanding is somewhat murky, but I'll try to clarify as much as I can.

The Issue

The issue can be broken down into two key points:

  • When catch receives an unexpected exception, it handles it by rethrowing it synchronously.
  • When delivering an async exception, the RTS suspends the thunks whose evaluation it cancels, such that subsequent evaluations can continue where these ones left off.

In the original reproducer, the first point interferes disastrously with the second, resulting not only in lost work but even an exploding update.

Working Around It

To craft a workaround, you just need to address these points—lets rederive lexi-lambda's solution.

Rethrowing Asynchronously

For the first point, we want an alternative catch to explicitly handle async exceptions by rethrowing them asynchronously.

rethrowAsync :: Exception e => e -> IO ()
rethrowAsync e = do
  me <- myThreadId
  throwTo me e

That looks good.
Now we want to write something like:

catchPreserveAsync :: Exception e => IO a -> (e -> IO a) -> IO a
catchPreserveAsync act h = catches act
  [ Handler h
  , Handler \(e :: SomeAsyncException) -> rethrowAsync e
  ]

But throwTo isn't non-returning like a regular throwIO, so it can't stand in for an IO a.
You might think that it really won't return so we can just force it with $> undefined, but that only applies to this evaluation—the thunk itself will still be updated to resume from here.
That suggests the next iteration:

catchPreserveAsync :: Exception e => IO a -> (e -> IO a) -> IO a
catchPreserveAsync act h = fix \reset -> catches act
  [ Handler h
  , Handler \(e :: SomeAsyncException) -> rethrowAsync e >> reset
  ]

But as lexi notes, a recursive call in the handler is implicitly masked; we want one last tweak to unmask it:

catchPreserveAsync :: Exception e => IO a -> (e -> IO a) -> IO a
catchPreserveAsync act h = mask \restore -> fix \reset -> restore $ catches act
  [ Handler h
  , Handler \(e :: SomeAsyncException) -> rethrowAsync e >> reset
  ]

Though imperfect, this is already a functioning workaround!
It doesn't prevent loss of work in the enclosing thunk, but it will at least avoid the bad update that threatens program correctness.

Conserving Work

Though catchPreserveAsync can't save the thunk in question, other thunks are fair game.
To leverage this, we just need a bit of indirection: instead of executing an IO action directly, we shove it into a thunk and evaluate it:

conserve :: IO a -> IO a
conserve act = evaluate thunk
 where
  {-# NOINLINE thunk #-}
  thunk = unsafePerformIO act

Now when catchPreserveAsync (conserve act) resets the thunk, it doesn't perform act all over again, it just continues to evaluate thunk!

At this point the rederivation is already complete; if you put these pieces together and inline their definitions you'll arive at something equivalent to what lexi wrote.

Solutions

Unlike their Control.Exception equivalents, bluefin's try, handle, and catch are expected to run under unsafePerformIO, so their implementation ought to account for its foibles.

The simplest approach is to give up on conserving work and just implement them in terms of catchPreserveAsync.

A more complete and adventurous option would be to use catchPreserveAsync . conserve.
However, conserve is essentially unsafeInterleaveIO'.
To render conserve safe it must be made much stricter, but this imposes an NFData constraint:

conserve :: NFData a => IO a -> IO a
conserve act = evaluate (force thunk)
 where
  {-# NOINLINE thunk #-}
  thunk = unsafePerformIO act

@tomjaguarpaw
Copy link
Owner

Very clear derivation, thanks! I wonder why we don't catch all exceptions in the second branch, i.e.

-- Not SomeAsyncException
Handler \(e :: SomeException) -> rethrowAsync e >> reset

It seems to me that is still correct (because it will do the correct thing on synchronous exceptions too) and safer (because throwTo does not require its argument to be a SomeAsyncException).

In any case, I think that's moot for Bluefin, since I'm happy with runPureEffAsyncSafe (unless someone can point out some reason the "separate thread" strategy is doomed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants