Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node fallback for Put requests #187

Open
gavin-norman-sociomantic opened this issue Nov 13, 2018 · 3 comments
Open

Node fallback for Put requests #187

gavin-norman-sociomantic opened this issue Nov 13, 2018 · 3 comments

Comments

@gavin-norman-sociomantic

Resilience to node outage for Put requests is actually quite simple.

On the client side:

  • If a Put request fails due to a no_node error or a connection error, the client picks another node (using some deterministic algorithm) and sends the record there. (Possibly repeating, if multiple nodes are out.)
  • Ditto for Get requests.

On the node side:

  • The node has a full DHT client, and loads the standard .nodes file at startup, connecting to all other nodes.
  • The node keeps a separate store of "orphan records" (i.e. records that do not fall under its normal hash range and are being temporarily kept for another node).
  • Any records that are Put that are outside the node's hash range are placed in the store of orphaned records.
  • Get requests that are outside the node's hash range are looked up in the store of orphaned records.
  • Periodically, the node iterates over the store of orphaned records and tries to send them to the correct node, using normal Put requests. On success, a record is removed from the store of orphaned records. On failure, it stays there until the next forwarding period.
@nemanja-boric-sociomantic
Copy link
Contributor

Note that we don't need to periodically iterate over the set of the nodes, but we can use node's client's connection notifier to sync the orphaned records when the node reconnects with the lost node.

One thing to worry about here (and maybe it's not worth pursuing at this level) is the network partition problem: if you have two nodes A and B which both can talk only to a subset of clients, and which can't talk to each other, both will receive updates for their records and for the records to the node that's not available. Then on the network recovery, both node A and node B will have the new state for the B's records.

@gavin-norman-sociomantic
Copy link
Author

The redistribution system could probably be rephrased to work with this too: When the hash range of the node changes, it iterates over its channels, puts out-of-range records into the orphaned records store and lets magic do the rest.

@nemanja-boric-sociomantic
Copy link
Contributor

Ah, yes, that's a nice consequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants