Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #22: Fix concurrent bulk generation #23

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

anubhav-pandey1
Copy link

This PR attempts to fix issue #22.

The root cause of the issue with concurrent bulk generation of Snowflake IDs resulting in duplicate IDs seems to lie in how the sequence variable is being managed within the Snowflake structure in a Rust environment. I think the problem arises due to the lack of synchronisation mechanisms around the access and update of shared state—in this case, the sequence and last_timestamp fields of the Snowflake struct—when accessed by multiple threads.

Why Does This Happen?
In a concurrent environment, multiple threads might call the get_unique_id method on the same Snowflake instance at the same microsecond. Since the current implementation does not include any form of locking or synchronisation, there's a race condition on the sequence field: multiple threads read the same last_timestamp, see that it hasn't changed, and then concurrently attempt to increment the sequence. However, without proper synchronisation, they might not see each other's updates, resulting in the same sequence value being used for multiple IDs.

To fix this, we need to introduce thread-safety into the ID generation process to ensure that concurrent accesses to the sequence and last_timestamp fields are correctly synchronised. In Rust, this can be achieved using synchronisation primitives from the std::sync module with Mutex or Atomic types. Given that the performance of the ID generation is critical and must be high-throughput, using atomic operations is preferable because they incur less overhead than a mutex lock.

The root cause of the issue with concurrent bulk generation of Snowflake IDs resulting in duplicate IDs lies in how the sequence variable is being managed within the Snowflake structure in a Rust environment. I think the problem arises due to the lack of synchronisation mechanisms around the access and update of shared state—in this case, the sequence and last_timestamp fields of the Snowflake struct—when accessed by multiple threads.

Why Does This Happen?
In a concurrent environment, multiple threads might call the get_unique_id method on the same Snowflake instance at the same microsecond. Since the current implementation does not include any form of locking or synchronisation, there's a race condition on the sequence field: multiple threads read the same last_timestamp, see that it hasn't changed, and then concurrently attempt to increment the sequence. However, without proper synchronisation, they might not see each other's updates, resulting in the same sequence value being used for multiple IDs.

To fix this, we need to introduce thread-safety into the ID generation process to ensure that concurrent accesses to the sequence and last_timestamp fields are correctly synchronised. In Rust, this can be achieved using synchronisation primitives from the std::sync module with Mutex or Atomic types. Given that the performance of the ID generation is critical and must be high-throughput, using atomic operations is preferable because they incur less overhead than a mutex lock.
…tion

Fix concurrent bulk generation issues
@anubhav-pandey1
Copy link
Author

@tangledbytes Please take a look at this PR which might fix issue #22. Please feel free to use the code and mould it to suit your coding standards and style.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant