Skip to content

Add support for transactions spanning multiple shards #2

@bdeggleston

Description

@bdeggleston

Queries with multiple keys (touching replicas that do not have overlapping token ranges), or queries that touch every replica (system changes, etc) will need to be serialized.

Multi-replica instances aren't difficult to commit, since once the coordinator gets dependencies from the relevant nodes, it can run a commit. The difficult part is executing the instance, since it will depend on instances that it doesn't know about, and won't, without some complicated and network heavy 'get instance' procedure.

Since the replica won't be executing instances that fall outside of it's token range, it doesn't need to worry about them. Subsequent interfering instances will gain dependencies on the multi-shard instances, serializing instances on each shard. However, the multi-shard instance needs to know that it doesn't need to worry about instances that are controlled by other shards, since they could be instances it hasn't seen yet.

The best solution I've been able to come up with is to put the hashed key(s) in the instance id. So a replica could quickly determine if it needs to wait for an instance to be committed/executed before executing the multi-shard instance.

One thing to keep in mind about multi-key instances is that a basic quorum won't work, since it could easily skip over entire shard replicas. Given a replication factor of 3, an instance with 3 keys, each living on different replicas (9 total), would only require 5 responses. The 4 instances it ignores, could be the replicas for an entire key. The solution to this is to require a quorum of responses for each set of replicas for each key.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions