You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Resilience to node outage for Put requests is actually quite simple.
On the client side:
If a Put request fails due to a no_node error or a connection error, the client picks another node (using some deterministic algorithm) and sends the record there. (Possibly repeating, if multiple nodes are out.)
Ditto for Get requests.
On the node side:
The node has a full DHT client, and loads the standard .nodes file at startup, connecting to all other nodes.
The node keeps a separate store of "orphan records" (i.e. records that do not fall under its normal hash range and are being temporarily kept for another node).
Any records that are Put that are outside the node's hash range are placed in the store of orphaned records.
Get requests that are outside the node's hash range are looked up in the store of orphaned records.
Periodically, the node iterates over the store of orphaned records and tries to send them to the correct node, using normal Put requests. On success, a record is removed from the store of orphaned records. On failure, it stays there until the next forwarding period.
The text was updated successfully, but these errors were encountered:
Note that we don't need to periodically iterate over the set of the nodes, but we can use node's client's connection notifier to sync the orphaned records when the node reconnects with the lost node.
One thing to worry about here (and maybe it's not worth pursuing at this level) is the network partition problem: if you have two nodes A and B which both can talk only to a subset of clients, and which can't talk to each other, both will receive updates for their records and for the records to the node that's not available. Then on the network recovery, both node A and node B will have the new state for the B's records.
The redistribution system could probably be rephrased to work with this too: When the hash range of the node changes, it iterates over its channels, puts out-of-range records into the orphaned records store and lets magic do the rest.
Resilience to node outage for Put requests is actually quite simple.
On the client side:
no_node
error or a connection error, the client picks another node (using some deterministic algorithm) and sends the record there. (Possibly repeating, if multiple nodes are out.)On the node side:
.nodes
file at startup, connecting to all other nodes.The text was updated successfully, but these errors were encountered: