Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ringbuffer hardware race condition bug. #3240

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

liarokapisv
Copy link
Contributor

When attempting to get a snapshot of the completion count and position, the current ringbuffer implementation uses a critical section. This does indeed prevent the completion_count from updating but does not prevent the position from updating since that is fetched directly from the DMA register.

When the code is called right at the end of the DMA cycle, the position that is read may be that of the next cycle, while the completion_count is still that of the previous cycle. As a result it looks to the code as if we went backwards and an overrun error is emitted.

In order to fix this we remove the critical section and get two readings, comparing the two in order to deduce that a wrap-around occured.

@liarokapisv liarokapisv force-pushed the ringbuffer-race-condition-fix branch from ea675b3 to f869d8c Compare August 8, 2024 13:54
@liarokapisv
Copy link
Contributor Author

@Dirbaio The commit breaks the tests due to the mocker requiring more expectations. Any thoughts on adapting the tests ?

When attempting to get a snapshot of the completion count and
position, the current ringbuffer implementation uses a critical section.
This does indeed prevent the completion_count from updating but does not
prevent the position from updating since that is fetched directly
from the DMA register.

When the code is called right at the end of the DMA cycle,
the position that is read may be that of the next cycle, while the
completion_count is still that of the previous cycle. As a result it
looks to the code as if we went backwards and an overrun error is
emitted.

In order to fix this we remove the critical section and get two
readings, comparing the two in order to deduce that a wrap-around
occured.
@liarokapisv liarokapisv force-pushed the ringbuffer-race-condition-fix branch from f869d8c to dd837dc Compare August 9, 2024 13:37
@Dirbaio
Copy link
Member

Dirbaio commented Aug 13, 2024

LGTM to the fix.

I'm not familiar with the tests, I'll be OK with whatever fix you think is best.

@liarokapisv
Copy link
Contributor Author

LGTM to the fix.

I'm not familiar with the tests, I'll be OK with whatever fix you think is best.

I think the tests would benefit from the introduction of model based testing. Would you mind adding proptest-state-machine as test dependency?

@Dirbaio
Copy link
Member

Dirbaio commented Aug 20, 2024

that'd require rewriting all tests, which surely is not needed for this PR. Please keep PRs focused, if you want to fix the race condition make a PR that does just that. Then afterwards we can discuss what to do with the tests. Otherwise PRs drag on forever, reviewing big refactors takes much more time than small fixes.

@liarokapisv
Copy link
Contributor Author

Yea definitely would need a separate PR for that, I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants