Skip to content

Restarting a running server in a server cluster causes connection issues #121

@ghost

Description

Quick Description

Restarting a running server in a server cluster will cause subsequent connection attempts to this server by other groups to fail and end up performing infinite reconnection attempts.

Explanation

This was experienced using:
OS: Windows 10
DarkRift Version: 2.10.1 (Pro)

Cluster.config file used:

<?xml version="1.0" encoding="utf-8" ?>
<cluster>
  <groups>
    <group name="SubServer" visibility="external">
      <connectsTo name="MainServer" />
    </group>

    <group name="MainServer" visibility="internal" />
  </groups>
</cluster>

Steps to reproduce:

(Assuming consul is running)

  1. Start the MainServer
  2. Start the SubServer

So far the MainServer will correctly pick up the SubServer and the SubServer will perform a connection attempt that results in "Connected to server 0 on 127.0.0.1:4000" as expected. All the events on the MainServer is fired as they should. If I restart the SubServer it will keep working as expected.

Now, to replicate the issue (I've tried it a few times now and this works every time):

  1. Stop the MainServer
  2. Start the MainServer again
  3. Start the SubServer

This is where it goes wrong and the SubServer ends up basically doing infinite reconnection attempts (notice, in the logs it will say "Attempt 1" after every attempt). The only way I can fix this is by restarting the machine and then by following step 1 and 2 it will work again (and step 3, 4, 5 will break it again).

Logs

When the issue occurs, the following traces will be continiously spammed on the MainServer:

[Trace]   DefaultNetworkListener Accepted TCP connection from 127.0.0.1:50099.
[Trace]   DefaultNetworkListener Accepted UDP connection from 127.0.0.1:62471.
[Trace]   RemoteServerManager   New server connected, awaiting identification [127.0.0.1:50099|127.0.0.1:62471].
[Trace]   RemoteServerManager   Server at [127.0.0.1:50099|127.0.0.1:62471] has identified as server 4.
[Trace]   RemoteServerManager   Server at [127.0.0.1:50099|127.0.0.1:62471 connected and identified itself as server 4 however the registry has not yet propgated information about that server. The connection has been dropped.

And the following traces will be continiously spammed on the SubServer:

[Trace]   UpstreamServerGroup   Lost connection to server 3 on 127.0.0.1:4000.
[Trace]   UpstreamServerGroup   Reconnecting to server 3 on 127.0.0.1:4000. Attempt 1.
[Info]    UpstreamServerGroup   Reconnected to server 3 on 127.0.0.1:4000.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't workingServer

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions