Fix AppState when Engine connection is terminated #6722
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🔍 Description
Issue References 🔗
This issue was noticed a few times when the batch
state
wasset
toERROR
, but theappState
kept the non-terminal state forever (e.g.RUNNING
), even if the application was finished (in this case Yarn Application).It seems that this happens when there is some intermittent failure during the monitoring step and the batch ends with ERROR, leaving the application metadata without an update. This can lead to some misinterpretation that the application is still running. We need to set this to
UNKNOWN
state to avoid errors.Describe Your Solution 🔧
This is a simple fix that only checks if the batch state is
ERROR
and the appState is not in a terminal state and changes theappState
toUNKNOWN
, in these cases (during the batch metadata update).Types of changes 🔖
Test Plan 🧪
Behavior Without This Pull Request ⚰️
If there is some error between the Kyuubi and the Application request (e.g. YARN client), the batch is finished with
ERROR
state and the application keeps the last know state (e.g. RUNNING).Behavior With This Pull Request 🎉
If there is some error between the Kyuubi and the Application request (e.g. YARN client), the batch is finished with
ERROR
state and the application has a non-terminal state, it is forced toUNKNOWN
state.Related Unit Tests
I've tried to implement a unit test to replicate this behavior but I didn't make it. We need to force an exception in the Engine Request (e.g.
YarnClient.getApplication
) but we need to wait for the application to be in the RUNNING state before raising this exception, or maybe block the connection between kyuubi and the engine.Checklist 📝
Be nice. Be informative.