Skip to content

Conversation

@jdanbrown
Copy link
Contributor

In Cluster.connectionWatcher a KeeperState.Disconnected doesn't trigger forceShutdown, which is responsible for calling listener.shutdownWork and listener.onLeave. My guess as to why this is is that it's waiting for a KeeperState.Expired, which does trigger forceShutdown and is otherwise identical (modulo logging).

Based on my recent experience and the zk session state-transition docs, this allows the following behavior:

  1. Client partitions from zk cluster
  2. Client-side zk session timeout triggers, client receives KeeperState.Disconnected
  3. Clients stays partitioned from network for arbitrarily long, but ordasity continues running its previously-claimed work
  4. Client eventually rejoins network and regains route to zk cluster
  5. Client attempts to reconnect to the zk cluster, cluster says no, client receives KeeperState.Expired, ordasity finally shuts down work
  6. Client establishes new session, ordasity claims new work

This is harmful in my application since I want work ownership to be best-effort exclusive, i.e. nodes should minimize their overlap in work.

Is this a bug, or was ordasity intentionally designed to maximize this kind of overlap when nodes partition? I can imagine that being a useful—or at least not harmful—behavior in some settings. If so, maybe I can elaborate this proposed change to include a config option to maximize vs. minimize oblivious work overlap?

@jdanbrown
Copy link
Contributor Author

Ok, this change appears to violate some important assumptions—see the two FIXME commits above.

I'd be interested to hear your thoughts on how this could be made to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant