Flink exactly-once Reader and Writer #4

StephanEwen · 2017-05-02T21:53:57Z

Moved code for the exactly-once reader and writer.

Fixes #3

StephanEwen · 2017-05-03T11:17:44Z

Addresses comments that were left on the original pull request on pravega/pravega and made checkstyle pass.

skrishnappa · 2017-05-04T09:31:44Z

src/main/java/io/pravega/connectors/flink/FlinkExactlyOncePravegaReader.java

+    private volatile boolean running = true;
+
+    // checkpoint trigger callback, invoked when a checkpoint event is received.
+    // no need to be volatile, the source is driven by but one thread


text - driven by only one thread

Fixing that. I thought that was valid English, though...

skrishnappa · 2017-05-04T09:51:40Z

src/main/java/io/pravega/connectors/flink/FlinkExactlyOncePravegaWriter.java

+            try {
+                Exceptions.handleInterrupted( txn::abort );
+            } catch (Exception e) {
+                suppressed = e;


add a log here

There is actually a bug in my code that the suppressed exception is not always re-thrown. Fixing that.

Do you want me to log it anyways (that would double-log it, because suppressed exceptions are anyways printed in the stack trace of the parent exception).

I thought not rethrowning was on purpose, that's why wanted it logged. No need to log now since exception is not lost.

skrishnappa · 2017-05-04T09:56:04Z

src/main/java/io/pravega/connectors/flink/FlinkExactlyOncePravegaWriter.java

+        final Transaction<T> txn = this.currentTxn;
+        Preconditions.checkState(txn != null, "bug: no transaction object when performing state snapshot");
+
+        if (log.isDebugEnabled()) {


we could remove this check

Sure. I often do these guards when there are more than two arguments to the log statement parameters, because that involves packaging them as parameters into an array. The overhead is admittedly little, and readability is probably more important here.

skrishnappa · 2017-05-04T09:58:34Z

src/main/java/io/pravega/connectors/flink/FlinkExactlyOncePravegaWriter.java

+        // ==> There should never be a case where we have no pending transaction here
+        //
+
+        if (txnsPendingCommit.isEmpty()) {


Preconditions.checkState

yes, that's nicer

skrishnappa · 2017-05-04T09:59:30Z

src/main/java/io/pravega/connectors/flink/FlinkExactlyOncePravegaWriter.java

+
+            // the big assumption is that this now actually works and that the transaction has not timed out, yet
+
+            // TODO: currently, if this fails, there is actually data loss


please create an issue for this and link it here, will be easier to track. This is the approach we take in pravega for TODOs

Done, see #5

skrishnappa · 2017-05-04T10:22:40Z

src/main/java/io/pravega/connectors/flink/ReaderCheckpointHook.java

+        // checkpoint can be null when restoring from a savepoint that
+        // did not include any state for that particular reader name
+        if (checkpoint != null) {
+            this.readerGroup.resetReadersToCheckpoint(checkpoint);


for the case where checkpoint is null (if failure happened even before the first checkpoint succeeded), the readergroup still needs to be reset to its initial position right? because readers might have already read some data which will be discarded, hence they have to reread them

True, it is one of the followup questions that I also raised.

Unfortunately, we cannot do the call in that particular place, because if there was never a complete checkpoint, no checkpoint will be restored and the hooks will not be called.

Please file an issue for this and we can address it as a separate task.

skrishnappa · 2017-05-04T10:24:35Z

src/test/java/io/pravega/connectors/flink/FlinkExactlyOncePravegaReaderTest.java

+    // Setup utility.
+    private static final SetupUtils SETUP_UTILS = new SetupUtils();
+
+    //Ensure each test completes within 30 seconds.


600 seconds

good catch, thanks!

skrishnappa · 2017-05-04T10:26:13Z

src/main/java/io/pravega/connectors/flink/CheckpointSerializer.java

+
+        return (Checkpoint) SerializationUtils.deserialize(bytes);
+    }
+}


Add newline for consistency, for other files too.

StephanEwen · 2017-05-04T16:26:53Z

Addressed all comments except one, for which I would need some input:
Making sure readers are reset to the beginning, when recovering before a checkpoint was taken.
It looks like there is no call on a reader group to to that.

Is the only way to do this to create a new reader group each time a recovery happens? Or should we call 'readerOffline' with a zero position on each reader (currently the checkpoint coordinator has no view on how many readers exist - all pravega structures are sort of opaque to it).

Update

StephanEwen changed the title ~~Flink reader writer~~ Flink exactly-once Reader and Writer May 2, 2017

skrishnappa reviewed May 4, 2017

View reviewed changes

skrishnappa assigned StephanEwen May 4, 2017

StephanEwen added 2 commits May 4, 2017 18:21

Add a Flink exactly-once reader for Pravega

d83dba5

Add a Flink exactly-once writer for Pravega

29fcc47

skrishnappa merged commit 9418482 into pravega:master May 4, 2017

crazyzhou pushed a commit to crazyzhou/flink-connectors that referenced this pull request Oct 20, 2020

Merge pull request pravega#4 from pravega/master

b790f9c

Update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flink exactly-once Reader and Writer #4

Flink exactly-once Reader and Writer #4

StephanEwen commented May 2, 2017 •

edited by skrishnappa

Loading

StephanEwen commented May 3, 2017

skrishnappa May 4, 2017

StephanEwen May 4, 2017

skrishnappa May 4, 2017

StephanEwen May 4, 2017

skrishnappa May 4, 2017

skrishnappa May 4, 2017

StephanEwen May 4, 2017 •

edited

Loading

skrishnappa May 4, 2017

StephanEwen May 4, 2017

skrishnappa May 4, 2017

StephanEwen May 4, 2017

skrishnappa May 4, 2017

StephanEwen May 4, 2017

skrishnappa May 4, 2017 •

edited

Loading

skrishnappa May 4, 2017

StephanEwen May 4, 2017

skrishnappa May 4, 2017

StephanEwen May 4, 2017

StephanEwen commented May 4, 2017


		// the big assumption is that this now actually works and that the transaction has not timed out, yet

		// TODO: currently, if this fails, there is actually data loss

Flink exactly-once Reader and Writer #4

Flink exactly-once Reader and Writer #4

Conversation

StephanEwen commented May 2, 2017 • edited by skrishnappa Loading

StephanEwen commented May 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StephanEwen May 4, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skrishnappa May 4, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StephanEwen commented May 4, 2017

StephanEwen commented May 2, 2017 •

edited by skrishnappa

Loading

StephanEwen May 4, 2017 •

edited

Loading

skrishnappa May 4, 2017 •

edited

Loading