Commit e934c43
[SPARK-54585][SS] Fix State Store rollback when thread is in interrupted state
### What changes were proposed in this pull request?
1. Modifies `ChecksumCancellableFSDataOutputStream.cancel()` to cancel both the main stream and checksum stream synchronously instead of using Futures with awaitResult.
2. Moves `changelogWriter.foreach(_.abort())` and `changelogWriter = None` in a try finally block within `RocksDB.rollback()`.
### Why are the changes needed?
For fix 1:
When cancel() is called while the thread is in an interrupted state (e.g., during task cancellation), the previous implementation would fail. The code submitted Futures to cancel each stream, then called awaitResult() to wait for completion. However, awaitResult() checks the thread's interrupt flag and throws InterruptedException immediately if the thread is interrupted.
For fix 2:
Consider the case where `abort()` is called on `RocksDBStateStoreProvider`. This calls `rollback()` on the `RocksDB` instance, which in turn calls `changelogWriter.foreach(_.abort())` and then sets `changelogWriter = None`.
However, if `changelogWriter.abort()` throws an exception, the finally block still sets `backingFileStream` and `compressedStream` to `null`. The exception propagates, and we never reach the line that sets `changelogWriter = None`.
This leaves the RocksDB instance in an inconsistent state:
- changelogWriter = Some(changelogWriterWeAttemptedToAbort)
- changelogWriterWeAttemptedToAbort.backingFileStream = null
- changelogWriterWeAttemptedToAbort.compressedStream = null
Now consider calling `RocksDB.load()` again. This calls `replayChangelog()`, which calls `put()`, which calls `changelogWriter.put()`. At this point, the assertion `assert(compressedStream != null)` fails, causing an exception while loading the StateStore.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added test `"SPARK-54585: Interrupted task calling rollback does not throw an exception"` which simulates the case when a thread in the interrupted state and begins a rollback
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #53313 from dylanwong250/SPARK-54585.
Authored-by: Dylan Wong <[email protected]>
Signed-off-by: Anish Shrigondekar <[email protected]>1 parent 7df7dad commit e934c43
File tree
3 files changed
+84
-19
lines changed- sql/core/src
- main/scala/org/apache/spark/sql/execution/streaming
- checkpointing
- state
- test/scala/org/apache/spark/sql/execution/streaming/state
3 files changed
+84
-19
lines changedLines changed: 7 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
42 | 43 | | |
43 | 44 | | |
44 | 45 | | |
| |||
500 | 501 | | |
501 | 502 | | |
502 | 503 | | |
503 | | - | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
504 | 508 | | |
505 | | - | |
506 | | - | |
507 | | - | |
| 509 | + | |
508 | 510 | | |
509 | | - | |
510 | | - | |
511 | | - | |
512 | | - | |
| 511 | + | |
513 | 512 | | |
514 | 513 | | |
515 | 514 | | |
| |||
Lines changed: 17 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1650 | 1650 | | |
1651 | 1651 | | |
1652 | 1652 | | |
1653 | | - | |
1654 | | - | |
1655 | | - | |
1656 | | - | |
1657 | | - | |
1658 | | - | |
1659 | | - | |
1660 | | - | |
1661 | | - | |
1662 | | - | |
1663 | | - | |
| 1653 | + | |
| 1654 | + | |
| 1655 | + | |
| 1656 | + | |
| 1657 | + | |
| 1658 | + | |
| 1659 | + | |
| 1660 | + | |
| 1661 | + | |
| 1662 | + | |
| 1663 | + | |
| 1664 | + | |
| 1665 | + | |
| 1666 | + | |
| 1667 | + | |
| 1668 | + | |
| 1669 | + | |
1664 | 1670 | | |
1665 | 1671 | | |
1666 | 1672 | | |
| |||
Lines changed: 60 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
73 | 77 | | |
74 | 78 | | |
75 | 79 | | |
| |||
824 | 828 | | |
825 | 829 | | |
826 | 830 | | |
| 831 | + | |
| 832 | + | |
| 833 | + | |
| 834 | + | |
| 835 | + | |
| 836 | + | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
827 | 887 | | |
828 | 888 | | |
829 | 889 | | |
| |||
0 commit comments