Fix bug in uploadHandle rotation with persisted ACKs causing timeout #278
+198
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue
Our application utilizes the Gstreamer plugin kvssink from the Kinesis producer-c library. Our higher level application (Java) sends MKV containers into the gstreamer pipeline as available. They currently are available once every 10 seconds. We wait for the persisted ACK signal from kvssink before allowing the next fragment to be sent, as we have a use case for knowing 100% for sure that each fragment is persisted.
In aid of this, we send an EoFr frame at the end of each MKV cluster.
Recent behavior in regard to this was discovered frequently failing pipelines due to kvssink. On investigation it was discovered the root cause of this was uploadHandles timing out on stream token rotation. (Application uses IoT Credential Provider).
Root cause was discovered in PIC code on token rotation, if all Persisted ACKs have been received and processed for the fragments that were handled by uploadHandle(N), it would enter state UPLOAD_HANDLE_STATE_TERMINATED and never pass through UPLOAD_HANDLE_STATE_AWAITING_ACK (appropriately). Though on getStreamData:CleanUp that would cause PIC to not hit the code path that pokes the subsequent handle uploadHandle(N+1). uploadHandle(N+1) will be stuck in UPLOAD_HANDLE_STATE_READY until it times out due to inactivity, and uploadHandle(N+2) starts up and successfully picks up data.
https://github.com/grantralston/amazon-kinesis-video-streams-pic/blob/c98c2a256a7bb3dfc4db41ae26d45e28ac7eec56/src/client/src/Stream.c#L1535
What was changed?
getStreamData:CleanUp path adjusted condition to trigger next upload handle poke when uploadHandle reaches UPLOAD_HANDLE_STATE_TERMINATED and has not poked the subsequent handle yet. This condition is tracked as a BOOL on the __UploadHandleInfo struct.
Why was it changed?
Fix was implemented so that edge case on stream token transition does not causes intermittent failures to pipeline and disruptions to streaming operations.
How was it changed?
getStreamData:CleanUp path adjusted condition to trigger next upload handle poke when uploadHandle reaches UPLOAD_HANDLE_STATE_TERMINATED and has not poked the subsequent handle yet. This condition is tracked as a BOOL on the __UploadHandleInfo struct.
What testing was done for the changes?
./tst/kvspic_test
| TimerQueueFunctionalityTest.kickTimerQueueTest was failing prior to change.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.