Bluefin is susceptible to #24189 #30

LSLeary · 2024-12-06T09:02:53Z

Due to the implementation of GHC's exception mechanism, catching under unsafePerformIO has a bad interaction with async exceptions. The issue and some workarounds are detailed in #24189.

A port of lexi-lambda's reproducer to bluefin:

{-# OPTIONS_GHC -fno-omit-yields #-}
{-# LANGUAGE BlockArguments #-}

-- base
import Data.Functor (void)
import Control.Concurrent (forkIO, killThread)
import Control.Exception (evaluate)

-- bluefin
import Bluefin.Eff (runPureEff)
import Bluefin.Exception (handle)

main :: IO ()
main = do
  let
    n = 200_000_000
    eff_thunk = runPureEff do
      handle (\() -> pure False) \_ -> do
        delay n do
          pure True
  eval_thread <- forkIO do
    void (evaluate eff_thunk)
  delay (n `div` 2) do
    killThread eval_thread
  print eff_thunk
 where
  delay :: Int -> a -> a
  delay n x = sum [1..n] `seq` x

This program terminates with an exception, but will conclude successfully if you comment out the redundant handle.

The text was updated successfully, but these errors were encountered:

tomjaguarpaw · 2024-12-06T10:41:08Z

Thanks for reporting this intriguing issue. Below is the version of your script that I could make run.

To check my understanding, is the problem that the async exception thrown to eval_thread (by killThread) should eventually be raised to the body of the forkIO, and thus swallowed, but instead the exception somehow propagates to the main thread?

EDIT: Oh, I guess the problem is that the async exception gets stored in the thunk, so it is thrown when then thunk is later evaluated during the print.

#!/usr/bin/env cabal
{- cabal:
build-depends: base, bluefin
-}

{-# OPTIONS_GHC -fno-omit-yields #-}
{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE GHC2021 #-}

-- base
import Data.Functor (void)
import Control.Concurrent (forkIO, killThread)
import Control.Exception (evaluate)

-- bluefin
import Bluefin.Eff (runPureEff)
import Bluefin.Exception (handle)

main :: IO ()
main = do
  let
    n = 200_000_000
    eff_thunk = runPureEff do
      handle (\() -> pure False) \_ -> do
        delay n do
          pure True
  eval_thread <- forkIO do
    void (evaluate eff_thunk)
  delay (n `div` 2) do
    killThread eval_thread
  print eff_thunk
 where
  delay :: Int -> a -> a
  delay n x = sum [1..n] `seq` x

% cabal run test28.hs
cabal-script-test28.hs: thread killed

tomjaguarpaw · 2024-12-06T10:52:17Z

effectful does not seem to suffer from this issue, though I haven't worked out why:

#!/usr/bin/env cabal
{- cabal:
build-depends: base, effectful-core
-}

{-# OPTIONS_GHC -fno-omit-yields #-}
{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE GHC2021 #-}

-- base
import Data.Functor (void)
import Control.Concurrent (forkIO, killThread)
import Control.Exception (evaluate)

-- bluefin
import Effectful (runPureEff)
import Effectful.Error.Static (runError)

main :: IO ()
main = do
  let
    n = 200_000_000
    eff_thunk = runPureEff do
      r <- runError do
        delay n do
          pure True

      pure $ case r of
        Left (_, l) -> l
        Right r -> r

  eval_thread <- forkIO do
    void (evaluate eff_thunk)
  delay (n `div` 2) do
    killThread eval_thread
  print eff_thunk
 where
  delay :: Int -> a -> a
  delay n x = sum [1..n] `seq` x

% cabal run test28.hs
True

tomjaguarpaw · 2024-12-06T10:54:18Z

Oh no, a subtle change shows that effectful does suffer from the issue, but I don't understand why that change makes a difference!

#!/usr/bin/env cabal
{- cabal:
build-depends: base, effectful-core
-}

{-# OPTIONS_GHC -fno-omit-yields #-}
{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE GHC2021 #-}

-- base
import Data.Functor (void)
import Control.Concurrent (forkIO, killThread)
import Control.Exception (evaluate)

-- bluefin
import Effectful (runPureEff)
import Effectful.Error.Static (runError)

main :: IO ()
main = do
  let
    n = 200_000_000
    eff_thunk = runPureEff do
      r <- runError do
        delay n do
          pure True

      pure $ case r of
        -- Difference here: we catch `()` and return `False`
        Left (_, ()) -> False
        Right r -> r

  eval_thread <- forkIO do
    void (evaluate eff_thunk)
  delay (n `div` 2) do
    killThread eval_thread
  print eff_thunk
 where
  delay :: Int -> a -> a
  delay n x = sum [1..n] `seq` x

% cabal run test28.hs
cabal-script-test28.hs: thread killed

tomjaguarpaw · 2024-12-06T11:10:48Z

Actually, the situation is weirder with effectful, because printing something before print eff_thunk makes it terminate correctly. (The same does not make the Bluefin code terminate correctly.)

#!/usr/bin/env cabal
{- cabal:
build-depends: base, effectful-core
-}

{-# OPTIONS_GHC -fno-omit-yields #-}
{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE GHC2021 #-}

-- base
import Data.Functor (void)
import Control.Concurrent (forkIO, killThread)
import Control.Exception (evaluate)

-- bluefin
import Effectful (runPureEff)
import Effectful.Error.Static (runError)

main :: IO ()
main = do
  let
    n = 200_000_000
    eff_thunk = runPureEff do
      r <- runError do
        delay n do
          pure True

      pure $ case r of
        Left (_, ()) -> False
        Right r -> r

  eval_thread <- forkIO do
    void (evaluate eff_thunk)
  delay (n `div` 2) do
    killThread eval_thread
  putStrLn "After killThread"
  print eff_thunk
 where
  delay :: Int -> a -> a
  delay n x = sum [1..n] `seq` x

% cabal run test28.hs
After killThread
True

tomjaguarpaw · 2024-12-06T18:07:38Z

The best I have been able to come up with is runPureEffAsyncSafe:

bluefin/bluefin-internal/src/Bluefin/Internal.hs

Lines 1322 to 1368 in d6f8995

    
           runPureEffAsyncSafe :: forall r. (forall es. Eff es r) -> r 
        
           runPureEffAsyncSafe effBody = unsafePerformIO $ mask \restore -> do 
        
             tidVar <- newEmptyMVar @ThreadId 
        
             done <- newEmptyMVar @(Either SomeException r) 
        
             let body = do 
        
                   tid <- forkIO do 
        
                     putStrLn "Child running" 
        
                     r <- Control.Exception.try @SomeException $ restore do 
        
                       unsafeUnEff do 
        
                         effBody 
        
                     case r of 
        
                       Left l -> do 
        
                         putStrLn "Child received async exception:" 
        
                         print l 
        
                       Right {} -> putStrLn "Child terminated normally" 
        
                     putMVar done r 
        
                   putMVar tidVar tid 
        
                   takeMVar done 
        
             r <- fix \again -> restore $ 
        
               Control.Exception.catch @SomeException body $ \ex -> 
        
                 -- I'm not sure if this mask is necessary 
        
                 mask $ \_ -> do 
        
                   putStrLn "Eff thunk handling async exception" 
        
                   tid <- takeMVar tidVar 
        
                   putStrLn "Killing child" 
        
                   killThread tid 
        
                   putStrLn "Waiting for child" 
        
                   _ <- takeMVar done 
        
                   putStrLn "Throwing to self" 
        
                   myself <- myThreadId 
        
                   throwTo myself ex 
        
                   putStrLn "Looping" 
        
                   again 
        
             case r of 
        
               Left l -> do 
        
                 putStrLn "In L 2" 
        
                 throwIO l 
        
               Right r' -> do 
        
                 putStrLn "Returning" 
        
                 pure r'

The equivalent of your point of Alexis's example is at:

bluefin/bluefin-internal/test/Main.hs

Lines 44 to 68 in d6f8995

    
           putStrLn "Starting" 
        
           let n = 200_000_000 
        
               eff_thunk = runPureEffAsyncSafe do 
        
                 handle (\() -> pure False) \_ -> do 
        
                   delay n do 
        
                     pure True 
        
           putStrLn "Launching thread to evaluate Eff thunk" 
        
           eval_thread <- forkIO do 
        
             Control.Exception.try @SomeException (evaluate eff_thunk) >>= \case 
        
               Left {} -> putStrLn "Eff thunk thread threw exception" 
        
               Right {} -> putStrLn "UNEXPECTED: Eff thunk thread did not throw exception" 
        
           delay (n `div` 2) do 
        
             killThread eval_thread 
        
           putStrLn "Killed thread evaluating Eff thunk" 
        
           putStrLn "Forcing Eff thunk again" 
        
           print eff_thunk 
        
           where 
        
             delay :: Int -> a -> a 
        
             delay n x = sum [1 .. n] `seq` x

Then it runs as desired:

% cabal test all --test-show-details=always
[...]
Starting
Launching thread to evaluate Eff thunk
Child running
Killed thread evaluating Eff thunk
Forcing Eff thunk again
Eff thunk handling async exception
Killing child
Waiting for child
Child received async exception:
thread killed
Throwing to self
Eff thunk thread threw exception
Looping
Child running
Child terminated normally
Returning
True

Test suite bluefin-test: PASS

I think what happens, based on Alexis's explanation on GitLab, is that when the async exception arrives the go loop is suspended, and the next time the eff_thunk is forced the go loop is resumed. I didn't find the need to wrap/unwrap body in unsafePerformIO/evaluate. I'm not sure why. Perhaps because, as explained below, effBody runs in its own thread so we're not trying to share work.

body runs effBody in a thread of its own, which seems sensible because then it's completely immune from further async exceptions because no one knows its ThreadId¹. The downside is that its evaluation will have to be completely restarted if eff_thunk is forced a second time. I don't think there's any way around that, other than wrapping every Bluefin catch in this special logic, which is probably prohibitively expensive, and certainly intolerably messy.

I have no idea if this is actually correct or just seems to work so it would be good to try to get confirmation somehow.

With the exception of listThreads and other abstraction-violating primitives, which should probably be considered as dangerous as unsafeCoerce. ↩

tomjaguarpaw · 2024-12-06T18:19:15Z

Oh, and it would be nice if I could have done that with an existing structured concurrency abstraction like async, but withAsync's use of catch suffers the same undesirable behavior, so it seems here we have to write our own.

LSLeary · 2024-12-07T17:58:42Z

Both the issue and the workaround are complicated and subtle.
I don't possess deep knowledge of the RTS so my own understanding is somewhat murky, but I'll try to clarify as much as I can.

The Issue

The issue can be broken down into two key points:

When catch receives an unexpected exception, it handles it by rethrowing it synchronously.
When delivering an async exception, the RTS suspends the thunks whose evaluation it cancels, such that subsequent evaluations can continue where these ones left off.

In the original reproducer, the first point interferes disastrously with the second, resulting not only in lost work but even an exploding update.

Working Around It

To craft a workaround, you just need to address these points—lets rederive lexi-lambda's solution.

Rethrowing Asynchronously

For the first point, we want an alternative catch to explicitly handle async exceptions by rethrowing them asynchronously.

rethrowAsync :: Exception e => e -> IO ()
rethrowAsync e = do
  me <- myThreadId
  throwTo me e

That looks good.
Now we want to write something like:

catchPreserveAsync :: Exception e => IO a -> (e -> IO a) -> IO a
catchPreserveAsync act h = catches act
  [ Handler h
  , Handler \(e :: SomeAsyncException) -> rethrowAsync e
  ]

But throwTo isn't non-returning like a regular throwIO, so it can't stand in for an IO a.
You might think that it really won't return so we can just force it with $> undefined, but that only applies to this evaluation—the thunk itself will still be updated to resume from here.
That suggests the next iteration:

catchPreserveAsync :: Exception e => IO a -> (e -> IO a) -> IO a
catchPreserveAsync act h = fix \reset -> catches act
  [ Handler h
  , Handler \(e :: SomeAsyncException) -> rethrowAsync e >> reset
  ]

But as lexi notes, a recursive call in the handler is implicitly masked; we want one last tweak to unmask it:

catchPreserveAsync :: Exception e => IO a -> (e -> IO a) -> IO a
catchPreserveAsync act h = mask \restore -> fix \reset -> restore $ catches act
  [ Handler h
  , Handler \(e :: SomeAsyncException) -> rethrowAsync e >> reset
  ]

Though imperfect, this is already a functioning workaround!
It doesn't prevent loss of work in the enclosing thunk, but it will at least avoid the bad update that threatens program correctness.

Conserving Work

Though catchPreserveAsync can't save the thunk in question, other thunks are fair game.
To leverage this, we just need a bit of indirection: instead of executing an IO action directly, we shove it into a thunk and evaluate it:

conserve :: IO a -> IO a
conserve act = evaluate thunk
 where
  {-# NOINLINE thunk #-}
  thunk = unsafePerformIO act

Now when catchPreserveAsync (conserve act) resets the thunk, it doesn't perform act all over again, it just continues to evaluate thunk!

At this point the rederivation is already complete; if you put these pieces together and inline their definitions you'll arive at something equivalent to what lexi wrote.

Solutions

Unlike their Control.Exception equivalents, bluefin's try, handle, and catch are expected to run under unsafePerformIO, so their implementation ought to account for its foibles.

The simplest approach is to give up on conserving work and just implement them in terms of catchPreserveAsync.

A more complete and adventurous option would be to use catchPreserveAsync . conserve.
However, conserve is essentially unsafeInterleaveIO'.
To render conserve safe it must be made much stricter, but this imposes an NFData constraint:

conserve :: NFData a => IO a -> IO a
conserve act = evaluate (force thunk)
 where
  {-# NOINLINE thunk #-}
  thunk = unsafePerformIO act

tomjaguarpaw · 2024-12-07T18:33:33Z

Very clear derivation, thanks! I wonder why we don't catch all exceptions in the second branch, i.e.

-- Not SomeAsyncException
Handler \(e :: SomeException) -> rethrowAsync e >> reset

It seems to me that is still correct (because it will do the correct thing on synchronous exceptions too) and safer (because throwTo does not require its argument to be a SomeAsyncException).

In any case, I think that's moot for Bluefin, since I'm happy with runPureEffAsyncSafe (unless someone can point out some reason the "separate thread" strategy is doomed).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bluefin is susceptible to #24189 #30

Bluefin is susceptible to #24189 #30

LSLeary commented Dec 6, 2024

tomjaguarpaw commented Dec 6, 2024 •

edited

Loading

tomjaguarpaw commented Dec 6, 2024

tomjaguarpaw commented Dec 6, 2024 •

edited

Loading

tomjaguarpaw commented Dec 6, 2024 •

edited

Loading

tomjaguarpaw commented Dec 6, 2024

tomjaguarpaw commented Dec 6, 2024

LSLeary commented Dec 7, 2024

tomjaguarpaw commented Dec 7, 2024

Bluefin is susceptible to #24189 #30

Bluefin is susceptible to #24189 #30

Comments

LSLeary commented Dec 6, 2024

tomjaguarpaw commented Dec 6, 2024 • edited Loading

tomjaguarpaw commented Dec 6, 2024

tomjaguarpaw commented Dec 6, 2024 • edited Loading

tomjaguarpaw commented Dec 6, 2024 • edited Loading

tomjaguarpaw commented Dec 6, 2024

Footnotes

tomjaguarpaw commented Dec 6, 2024

LSLeary commented Dec 7, 2024

The Issue

Working Around It

Rethrowing Asynchronously

Conserving Work

Solutions

tomjaguarpaw commented Dec 7, 2024

tomjaguarpaw commented Dec 6, 2024 •

edited

Loading

tomjaguarpaw commented Dec 6, 2024 •

edited

Loading

tomjaguarpaw commented Dec 6, 2024 •

edited

Loading