-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent bulk generation of Snowflake IDs from same machine is failing #22
Comments
It seems that the sequence ID is not being updated after creating the first snowflake. This would make it a critical issue as the library would essentially stop performing well in highly-concurrent horizontally scaled environments. |
I've raised a PR that seems to fix the issue. Please feel free to use the code and mould it to suit your coding standards and style. |
Hi @anubhav-pandey1, I am not sure if I understand the issue here. One In the above example, it seems that 10 instances of Am I missing something here? |
I also have a test in the CI which tests exactly this to make sure we don't generate duplicate IDs: https://github.com/tangledbytes/nodejs-snowflake/blob/master/tests/snowflake_core.rs#L144 |
If a single machine decides to bulk generate 10 or 1000 objects (for instance for a bulk insert SQL query for payments), they should have different identifiers - this is a fundamental requirement for any unique identification system. The Snowflake ID system is supposed to support this by having an atomically incremented (thread-safe) sequence number so that 1000s of Snowflake IDs can be concurrently bulk generated by the same machine. If all the IDs generated by the same machine within the same millisecond are the same, then I think it is not a unique identifier (UID) and also not Snowflake. |
Also, your test seems to be running on a sub-second granularity in a single-threaded environment. This could be why it missed this issue. High-throughput systems with high-concurrency would work on a sub-millisecond granularity ie. they can generate thousands of objects within the same millisecond. Such systems are usually multi-threaded too and might run into race conditions with the current version of Snowflake ID generation. I would recommend using my shared Node.js code to attempt bulk generating many IDs concurrently and the issue will be replicated. I've also raised a PR (#23) to attempt fixing this issue. |
Hey @tangledbytes, a gentle request to review the PR #23 and merge the fix if it is acceptable. |
Hi @anubhav-pandey1, how do you plan on adding multi threads to nodejs? If worker threads or other things are supposed to be used then each thread should gets its own instance. There is absolutely no reason to have this thread-safe because there is no reason to share states across different threads. Just let each thread have their own instance (different machine ids though). As for your code, from my perception, it is not the right way to use, the correct code would look like: const CURRENT_MACHINE_ID = 1; // Assumed
const config = {
instance_id: CURRENT_MACHINE_ID,
custom_epoch: new Date("2001-01-01").getTime(),
};
const uid = new Snowflake(config);
const generateSnowflakeId = () => {
const snowflakeId = uid.getUniqueID();
return snowflakeId;
};
const COUNT = 10;
const snowflakeIds = [];
for (let i = 0; i < COUNT; i++) {
const snowflakeId = generateSnowflakeId();
snowflakeIds.push(objectId);
} Also, I would like you to clarify your use case here. |
Yeah I ran into this exact problem when designing our system. I would run into multiple ids of the same millisecond when creating records on the same database instance using AWS Lambda. This is why my database contains a So what @tangledbytes suggested is correct. If you are using like nodejs cluster or something of that sort. Each "worker thread" should have it's own unique id that is generated per configuration instance_id. Based on your setup, you would probably want to do something like a snowflake cache of a single instance per instance_id. An example of what I have done is this. Using Sequelize as the database ORM.
Then I have something like
In my main DbSchema class object. I call this right after I initialize the database and load the server's configuration. In this case I then bind that instance. Example of an |
If we try to bulk generate multiple snowflake IDs concurrently from the same machine, all the IDs are same:
All the IDs are the same. Am I doing something incorrectly?
Node.js version: 16.17 and 18.17
Package version: 2.0.1
The text was updated successfully, but these errors were encountered: