You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
I have been testing the suggested HAProxy configuration from the Master Cluster tutorial and have found that the suggested client/server timeout values of 1m cause unstable minion communication, specifically with the publisher port (4505).
I am using the default transport with 3 masters and 50 minions all running 3007.1. My HAProxy version is 2.6.12-1 (Debian 12), though I have tested older and newer versions with the same results.
Adjusting TCP keepalive values on the masters and minions does not seem to affect HAProxy closing TCP sessions after 1 minute of inactivity. Reducing tcp_keepalive_idle does speed up minions reconnecting after HAProxy closes the connection though.
It seems that no matter how frequently the master and minions send keepalives, HAProxy will close the sessions after 1 minute if no data is sent through the session. If I run something like salt '*' test.ping every 30 seconds, this keeps the sessions with the publisher port alive for longer than 1 minute.
To Reproduce:
On the HAProxy server, run watch "netstat -nalpt | grep 4505" and watch for TCP sessions to switch from "ESTABLISHED" to "TIME_WAIT". This should happen within a minute. Run salt '*' test.ping from the master while sessions are in this condition and you'll see minions fail to respond as they did not see the event published from the master.
I am currently using timeout values of 12h on the publisher and request server ports to reduce the frequency TCP sessions being killed off. While this probably isn't the best solution, it does keep minion communication stable as it greatly reduces how often minions have to re-establish their connection with the master.
Suggested Fix
Is there other configuration expected to be set on the master and minion to allow stable minion communication with the suggested 1 minute timeouts in HAProxy?
Would you recommend leaving publish_session at the default of 24 hours, or setting it to something lower? Additionally, would it be a good idea to set ping_on_rotate: True in this configuration?
Description
I have been testing the suggested HAProxy configuration from the Master Cluster tutorial and have found that the suggested client/server timeout values of
1m
cause unstable minion communication, specifically with the publisher port (4505).I am using the default transport with 3 masters and 50 minions all running 3007.1. My HAProxy version is 2.6.12-1 (Debian 12), though I have tested older and newer versions with the same results.
Adjusting TCP keepalive values on the masters and minions does not seem to affect HAProxy closing TCP sessions after 1 minute of inactivity. Reducing
tcp_keepalive_idle
does speed up minions reconnecting after HAProxy closes the connection though.It seems that no matter how frequently the master and minions send keepalives, HAProxy will close the sessions after 1 minute if no data is sent through the session. If I run something like
salt '*' test.ping
every 30 seconds, this keeps the sessions with the publisher port alive for longer than 1 minute.To Reproduce:
watch "netstat -nalpt | grep 4505"
and watch for TCP sessions to switch from "ESTABLISHED" to "TIME_WAIT". This should happen within a minute. Runsalt '*' test.ping
from the master while sessions are in this condition and you'll see minions fail to respond as they did not see the event published from the master.I am currently using timeout values of 12h on the publisher and request server ports to reduce the frequency TCP sessions being killed off. While this probably isn't the best solution, it does keep minion communication stable as it greatly reduces how often minions have to re-establish their connection with the master.
Suggested Fix
Is there other configuration expected to be set on the master and minion to allow stable minion communication with the suggested 1 minute timeouts in HAProxy?
Type of documentation
Tutorial
Location or format of documentation
https://docs.saltproject.io/en/latest/topics/tutorials/master-cluster.html
The text was updated successfully, but these errors were encountered: