Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error handling for transactional writer commit #376

Open
crazyzhou opened this issue May 22, 2020 · 0 comments
Open

Improve error handling for transactional writer commit #376

crazyzhou opened this issue May 22, 2020 · 0 comments

Comments

@crazyzhou
Copy link
Contributor

crazyzhou commented May 22, 2020

Problem description
We use a two-phase commit algorithm with Flink checkpoint and Pravega transactional writer to implement the end to end exactly-once feature, see #5 in detail. In the second phase, we call the transaction.commit for the final checkpoint commit, but the error handling is not done, hence we may encounter data loss when the commit call is not done.

The commit call will throw a TxnFailedException if something is wrong. It can be either of these two situation.

  1. Server accepted the request but there was some problem which caused the failure.
  2. Server failed to even accept the request.

For the second case, which we can tell from the status of the transaction, the client should do a commit retry to avoid data loss.

With pravega/pravega#4822 fixed, flink connector can deal with such cases in a better manner.

Problem location
FlinkPravegaWriter

Suggestions for an improvement
Some debug logs can be added for better monitoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant