Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stress test counts wrong when killing workers #642

Closed
keith-turner opened this issue Apr 12, 2016 · 3 comments
Closed

Stress test counts wrong when killing workers #642

keith-turner opened this issue Apr 12, 2016 · 3 comments

Comments

@keith-turner
Copy link
Contributor

I have been running the stress test and stopping and starting Fluo while the test was running. This causes YARN to kill alll of the workers and the Oracle. After the test completes, sometimes the counts are wrong. However doing a diagnostic scan of the entire table to diff levels of the tree, will cause the counts to become correct. This full scan causes some partially completed transactions to roll forward which causes notifications to be written. These notifications trigger observers which complete the computation.

The problem is that notifications are written in the 2nd phase of commit. I think the solution to this problem is to write notifications in the 1st phase of commit. This way if a transaction is partially successful with the 2nd phase observers will still trigger and roll forward.

@keith-turner keith-turner self-assigned this Apr 12, 2016
@keith-turner
Copy link
Contributor Author

Seems I ran into this problem before #456 and fixed it incorrectly. The reason I did not properly fix it then is I was making the following incorrect assumption :

  • If a notification causes an observer to create other notifications, then in the case of failures rerunning the observer with the same notification will always recreate the same notifications.

However this is not true if the observer runs again and finds there is nothing to do. Mainly in the case where it failed previously, but did complete part of the 2nd phase of commit. In this case it may not recreate the notifications it created last time and those notifications will never be created until something tries to read the locked data (which may never happen).

Since the solution in #456 was based on an incorrect premise I may be able to delete the notifications earlier as was done before the "fix" for #456 in addition to setting notifications in the lock phase.

@keith-turner
Copy link
Contributor Author

Thinking about this, if setting notifications during 1st phase of commit then would use start timestamp. Need to be careful that notifications are not deleted when they should not be. Notifications are currently deleted using the start timestamp of a transactions. When writing notification using startTs (instead of commitTs as is currently done) need to do something to avoid a situation like the following where notification created by TX3 is lost.

TX1 snotify C1 with startTs 10 commitTs 12
TX3 starts with startTs 13
TX2 triggered by C1 with startTs 14
TX2 ACK C1 with startTs 14 (deletes notifcation using startTs 14)
TX3 snotify C1 with startTs 13 commitTs 17

Another issue is that even if the notification was not deleted in the situation above, the current code would consider it already acknowledged because this based on ACK timestamp vs Notification timestamp. So TX2 would write an ACK with timestamp 14 which would supersede the notification from TX3.

@keith-turner
Copy link
Contributor Author

Was thinking of writing notifications while writing locks, however there is a problem with this. If a transaction writes a notification w/o writing all of its lock, then an observer triggered by the transaction is not guaranteed to see everything the transaction wrote. Can only write a notification after all locks are written. Beginning to think that strong notifications should be written in the same way that weak notifications are currently written. Weak notifications are written after all locks are written, but before finishing the commit on the primary column. In the case of failures the transactions will be rolled back and all notifications rewitten. For strong notifications, need to ensure observers fail when triggered by a notification written for a transaction that fails.

keith-turner added a commit to keith-turner/fluo that referenced this issue Apr 13, 2016
mikewalch added a commit that referenced this issue Apr 14, 2016
fixes #642 write strong notifcations at same place as weak
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant