Issue #22: Fix concurrent bulk generation #23
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR attempts to fix issue #22.
The root cause of the issue with concurrent bulk generation of Snowflake IDs resulting in duplicate IDs seems to lie in how the sequence variable is being managed within the Snowflake structure in a Rust environment. I think the problem arises due to the lack of synchronisation mechanisms around the access and update of shared state—in this case, the sequence and last_timestamp fields of the Snowflake struct—when accessed by multiple threads.
Why Does This Happen?
In a concurrent environment, multiple threads might call the get_unique_id method on the same Snowflake instance at the same microsecond. Since the current implementation does not include any form of locking or synchronisation, there's a race condition on the sequence field: multiple threads read the same last_timestamp, see that it hasn't changed, and then concurrently attempt to increment the sequence. However, without proper synchronisation, they might not see each other's updates, resulting in the same sequence value being used for multiple IDs.
To fix this, we need to introduce thread-safety into the ID generation process to ensure that concurrent accesses to the sequence and last_timestamp fields are correctly synchronised. In Rust, this can be achieved using synchronisation primitives from the std::sync module with Mutex or Atomic types. Given that the performance of the ID generation is critical and must be high-throughput, using atomic operations is preferable because they incur less overhead than a mutex lock.