Skip to content

Commit

Permalink
Re-use request clients
Browse files Browse the repository at this point in the history
  • Loading branch information
dwoz committed Jan 11, 2023
1 parent 20edbfd commit 5249eac
Showing 1 changed file with 9 additions and 12 deletions.
21 changes: 9 additions & 12 deletions salt/minion.py
Original file line number Diff line number Diff line change
Expand Up @@ -3315,12 +3315,9 @@ def _send_req_sync(self, load, timeout):
minion_privkey_path, salt.serializers.msgpack.serialize(load)
)
load["sig"] = sig

with salt.transport.client.AsyncReqChannel.factory(self.opts) as channel:
ret = yield channel.send(
load, timeout=timeout, tries=self.opts["return_retry_tries"]
)
return ret
return self.req_channel.send(
load, timeout=timeout, tries=self.opts["return_retry_tries"]
)

@salt.ext.tornado.gen.coroutine
def _send_req_async(self, load, timeout):
Expand All @@ -3331,12 +3328,10 @@ def _send_req_async(self, load, timeout):
minion_privkey_path, salt.serializers.msgpack.serialize(load)
)
load["sig"] = sig

with salt.transport.client.AsyncReqChannel.factory(self.opts) as channel:
ret = yield channel.send(
load, timeout=timeout, tries=self.opts["return_retry_tries"]
)
raise salt.ext.tornado.gen.Return(ret)
ret = yield self.async_req_channel.send(
load, timeout=timeout, tries=self.opts["return_retry_tries"]
)
return ret

def fire_master_syndic_start(self):
# Send an event to the master that the minion is live
Expand Down Expand Up @@ -3367,6 +3362,8 @@ def tune_in_no_block(self):

# add handler to subscriber
self.pub_channel.on_recv(self._process_cmd_socket)
self.req_channel = salt.channel.client.ReqChannel.factory(self.opts)
self.async_req_channel = salt.channel.client.ReqChannel.factory(self.opts)

def _process_cmd_socket(self, payload):
if payload is not None and payload["enc"] == "aes":
Expand Down

4 comments on commit 5249eac

@szjur
Copy link

@szjur szjur commented on 5249eac Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dwoz Are you absolutely sure that replacing salt.transport.client.AsyncReqChannel.factory() with salt.channel.client.ReqChannel.factory() for _send_req_async() is the right way to go? I'm investigating poor syndic performance after 3003 -> 3005 upgrade and this change kindly backported to 3003 with #64032 is now on my short list of the performance bottlenecks.
To be honest I can't say why both _send_req_sync() and _send_req_async() used AsyncReqChannel.factory() before but salt-syndic used _send_req_async() to for _return_pub_multi() so chances are _send_req_sync() was not used at all.

@szjur
Copy link

@szjur szjur commented on 5249eac Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After running many tests I'm quite sure _forward_events() blocks for pretty much as long as the actual send takes when it uses that salt.channel.client.ReqChannel.factory() as self.async_req_channel. With AsyncReqChannel.factory() it returns promptly which allows the syndic to do other stuff such as send events back to other masters (that actually happens in the same loop, just with that pseudo-asynchronous code each MoM needs to wait its turn for long) or receive new job results from the local master while send() still goes on asynchronously.

Tested with big job return payloads and syndics/targets located on a different continent than the MoM.

@szjur
Copy link

@szjur szjur commented on 5249eac Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if @dwoz or anyone else reads it but syndic performance with this patch is terrible under load - changing the above code to use AsyncReqChannel.factory() for self.async_req_channel helps a bit but it's still far behind 3003 performance-wise. I'm talking an installation with ~100k minions, dozens of syndics and many MoMs. It ends up with timeouts, duplicate events, big delays etc. Funnily enough despite keeping a connection to 4506 open to each MoM syndic keeps opening new ones too (they get closed why that initial one is kept open). I could not figure out the exact reason why this is happening.

Reverting Syndic._send_req_async() to what it was in 3003 seems to be a real game changer. Not only fixes it performance but also stops these dodgy "Got events for closed stream None" warnings from appearing in the log.

Maybe it all works smooth in 3006, I only tried 3005 with #64032 but I haven't noticed any revolutionary changes in the relevant parts of the latest Salt code.

Mind you, I doubt Syndic uses send_req_sync for anything.

@szjur
Copy link

@szjur szjur commented on 5249eac Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We reverted this change in 3005 (#64032 backport) and made syndic send results as it did in 3003 and it started flying. For some reason these persistent req channels are counter-productive, despite common sense. Not only doing it over these persistent channels results in these dodgy "Got events for closed stream None" warnings but it causes long delays and duplicate results sent back to MoMs under stress.

Please sign in to comment.