-
Notifications
You must be signed in to change notification settings - Fork 9
Follow-up from review on PR #554 #572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b3ec628
2a9cd7c
f2d5bde
7b0d85a
b22f64c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -231,6 +231,14 @@ duplicateRuns (DeRef mr) = | |
V.mapM (\r -> withRollback reg (dupRef r) releaseRef) rs | ||
|
||
-- | Take a snapshot of the state of a merging run. | ||
-- | ||
-- TODO: this is not concurrency safe! The inputs runs to the merging run could | ||
-- be released concurrently by another thread that completes the merge, while | ||
-- the snapshot is taking place. The solution is for snapshot here to duplicate | ||
-- the runs it returns _while_ holding the mergeState MVar (to exclude threads | ||
-- that might concurrently complete the merge). And then the caller of course | ||
-- must be updated to release the extra references. | ||
-- | ||
snapshot :: | ||
(PrimMonad m, MonadMVar m) | ||
=> Ref (MergingRun m h) | ||
|
@@ -267,7 +275,7 @@ work to do). | |
The implementation is similar but somewhat more complex. We also accumulate | ||
unspent credits until they reach a threshold at which point we do a batch of | ||
merging work. Unlike the prototype, the implementation tracks both credits | ||
spent credits as yet unspent. We will elaborate on why and how below. | ||
spent and credits as yet unspent. We will elaborate on why and how below. | ||
|
||
In the prototype, the credits spent equals the merge steps performed. The | ||
same holds in the real implementation, but making it so is more complicated. | ||
|
@@ -296,7 +304,8 @@ Thus we track two things: | |
* credits unspent ('UnspentCredits'): credits supplied that are not yet spent | ||
and are thus available to spend. | ||
|
||
The credits supplied is the sum of the credits spent and unspent. | ||
The credits supplied is the sum of the credits spent and unspent. We guarantee | ||
that the supplied credits never exceeds the total debt. | ||
|
||
The credits spent and the steps performed (or in the process of being | ||
performed) will typically be equal. They are not guaranteed to be equal in the | ||
|
@@ -330,7 +339,7 @@ numEntriesToTotalDebt (NumEntries n) = Credits n | |
-- Note that ideally the batch size for different LSM levels should be | ||
-- co-prime so that merge work at different levels is not synchronised. | ||
-- | ||
newtype CreditThreshold = CreditThreshold Credits | ||
newtype CreditThreshold = CreditThreshold UnspentCredits | ||
|
||
-- | The supplied credits is simply the sum of all the credits that have been | ||
-- (successfully) supplied to a merging run via 'supplyCredits'. | ||
|
@@ -559,8 +568,8 @@ atomicDepositAndSpendCredits (CreditsVar !var) !totalDebt | |
|
||
-- 2. not case 1, but enough unspent credits have accumulated to do | ||
-- a batch of merge work; | ||
| (\(UnspentCredits x)->x) unspent' >= batchThreshold | ||
= spendBatchCredits spent unspent' | ||
| unspent' >= batchThreshold | ||
= spendBatchCredits spent unspent' batchThreshold | ||
|
||
-- 3. not case 1 or 2, not enough credits to do any merge work. | ||
| otherwise | ||
|
@@ -587,14 +596,15 @@ atomicDepositAndSpendCredits (CreditsVar !var) !totalDebt | |
assert (leftover >= 0) $ | ||
(supplied', UnspentCredits unspent', leftover) | ||
|
||
spendBatchCredits (SpentCredits !spent) (UnspentCredits !unspent) = | ||
spendBatchCredits (SpentCredits !spent) (UnspentCredits !unspent) | ||
(UnspentCredits unspentBatchThreshold) = | ||
-- numBatches may be zero, in which case the result will be zero | ||
let !nBatches = unspent `div` batchThreshold | ||
!spend = nBatches * batchThreshold | ||
let !nBatches = unspent `div` unspentBatchThreshold | ||
!spend = nBatches * unspentBatchThreshold | ||
!spent' = spent + spend | ||
!unspent' = unspent - spend | ||
in assert (spend >= 0) $ | ||
assert (unspent' < batchThreshold) $ | ||
assert (unspent' < unspentBatchThreshold) $ | ||
assert (spent' + unspent' == spent + unspent) $ | ||
(spend, SpentCredits spent', UnspentCredits unspent') | ||
|
||
|
@@ -702,11 +712,10 @@ performMergeSteps :: | |
-> Credits | ||
-> m Bool | ||
performMergeSteps mergeVar creditsVar (Credits credits) = | ||
assert (credits >= 0) $ | ||
withMVar mergeVar $ \case | ||
CompletedMerge{} -> pure False | ||
OngoingMerge _rs m -> do | ||
-- We have dealt with the case of credits <= 0 above, | ||
-- so here we know credits is positive | ||
let stepsToDo = credits | ||
(stepsDone, stepResult) <- Merge.steps m stepsToDo | ||
assert (stepResult == MergeDone || stepsDone >= stepsToDo) (pure ()) | ||
|
@@ -743,8 +752,9 @@ completeMerge mergeVar mergeKnownCompletedVar = do | |
(OngoingMerge rs m) -> do | ||
-- first try to complete the merge before performing other side effects, | ||
-- in case the completion fails | ||
--TODO: Run.fromMutable claims not to be exception safe | ||
-- may need to use uninteruptible mask | ||
--TODO: Run.fromMutable (used in Merge.complete) claims not to be | ||
-- exception safe so we should probably be using the resource registry | ||
-- and test for exception safety. | ||
Comment on lines
+755
to
+757
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. We could use I'm hoping we can make |
||
r <- Merge.complete m | ||
V.forM_ rs releaseRef | ||
-- Cache the knowledge that we completed the merge | ||
|
@@ -768,16 +778,14 @@ expectCompleted (DeRef MergingRun {..}) = do | |
let totalDebt = numEntriesToTotalDebt mergeNumEntries | ||
suppliedCredits = spentCredits + unspentCredits | ||
!credits = assert (suppliedCredits == totalDebt) $ | ||
assert (unspentCredits >= 0) $ | ||
unspentCredits | ||
|
||
--TODO: what about exception safety: check if it is ok to be interrupted | ||
-- between performMergeSteps and completeMerge here, and above. | ||
weFinishedMerge <- performMergeSteps mergeState mergeCreditsVar credits | ||
-- If an async exception happens before we get to perform the | ||
-- completion, then that is fine. The next 'expectCompleted' will | ||
-- complete the merge. | ||
when weFinishedMerge $ completeMerge mergeState mergeKnownCompleted | ||
-- TODO: can we think of a check to see if we did not do too much work | ||
-- here? <-- assert (suppliedCredits == totalDebt) ought to do it! | ||
-- A related question is if we finished the merge too early, could have | ||
-- spread out the work better. | ||
withMVar mergeState $ \case | ||
CompletedMerge r -> dupRef r -- return a fresh reference to the run | ||
OngoingMerge{} -> do | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise that you are changing the credit scaling in #571, so maybe this comment does not apply anymore, but I'll post it anyway:
I don't think this assertion will hold in the presence of exceptions, even if we fix credit scaling (a priori) to precisely add up to the a priori merging debt.
Say a table needs
n
updates until the next flush. We do some updates but an exception happens, so we roll back the database internal state to a consistent state. This means undoing changes to the write buffer for example, but not undoing the credits we supplied to merging runs already. Afterwards, the table still needsn
updates until the next flush, but some or all of its merging runs already have an elevated count of supplied credits. This means thatsupplyCredits
during normal operation can also return leftover credits.Moreover, if multiple tables concurrently supply credits, we'll in general try to supply even more credits, leading to more leftovers.
If we really want to assert this, maybe we have to:
(i) ensure that
scaleCreditsForMerge
accounts for the elevated counts of supplied credits(ii) undo supplied credits
Both options sound like they would be rather complex to do in a concurrent setting