Skip to content

Commit 0c561cd

Browse files
JohnGarbuttjovial
authored andcommitted
Support batching up commands
When you have around 60 baremetal nodes attached to a single switch, it takes a long time to execute all those commands. This gets worse when you limit the number of concurrent ssh connections. Here we look to batch up commands to send to the switch together using a single connection. The results of each port's commands are returned when available. This is implemented using etcd as a queueing system. Commands are added to an input key, then a worker thread processes the available commands for a particular switch device. We pull off the queue using the version at which the keys were added, giving a FIFO style queue. The result of each command set are added to an output key, which the original request thread is watching. Distributed locks are used to serialise the processing of commands for each switch device. Various neat etcd features are used here to alleviate some of the issues of distributed task coordination, including transactions, leases, watches, historical key/value tracking, etc. Co-Authored-By: Mark Goddard <[email protected]> Change-Id: I8c458bbc94df5630cfede5434bcdbe527988059c (cherry picked from commit 45b237b) (cherry picked from commit 465c979)
1 parent 507aae2 commit 0c561cd

File tree

9 files changed

+985
-11
lines changed

9 files changed

+985
-11
lines changed

doc/source/configuration.rst

Lines changed: 42 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -108,8 +108,9 @@ for the Dell PowerConnect device::
108108
ngs_switchport_mode = access
109109

110110
Dell PowerConnect devices have been seen to have issues with multiple
111-
concurrent configuration sessions. See :ref:`synchronization` for details on
112-
how to limit the number of concurrent active connections to each device.
111+
concurrent configuration sessions. See :ref:`synchronization` and
112+
:ref:`batching` for details on how to limit the number of concurrent active
113+
connections to each device.
113114

114115
for the Brocade FastIron (ICX) device::
115116

@@ -191,8 +192,16 @@ connection URL for the backend should be configured as follows::
191192
[ngs_coordination]
192193
backend_url = <backend URL>
193194

194-
The default is to limit the number of concurrent active connections to each
195-
device to one, but the number may be configured per-device as follows::
195+
The backend URL format includes the Tooz driver as the scheme, with driver
196+
options passed using query string parameters. For example, to use the
197+
``etcd3gw`` driver with an API version of ``v3`` and a path to a CA
198+
certificate::
199+
200+
[ngs_coordination]
201+
backend_url = etcd3+https://etcd.example.com?api_version=v3,ca_cert=/path/to/ca/cert.crt
202+
203+
The default behaviour is to limit the number of concurrent active connections
204+
to each device to one, but the number may be configured per-device as follows::
196205

197206
[genericswitch:device-hostname]
198207
ngs_max_connections = <max connections>
@@ -206,6 +215,35 @@ timeout of 60 seconds before failing. This timeout can be configured as follows
206215
...
207216
acquire_timeout = <timeout in seconds>
208217

218+
.. _batching:
219+
220+
Batching
221+
========
222+
223+
For many network devices there is a significant SSH connection overhead which
224+
is incurred for each network or port configuration change. In a large scale
225+
system with many concurrent changes, this overhead adds up quickly. Since the
226+
Antelope release, the Generic Switch driver includes support to batch up switch
227+
configuration changes and apply them together using a single SSH connection.
228+
229+
This is implemented using etcd as a queueing system. Commands are added
230+
to an input key, then a worker thread processes the available commands
231+
for a particular switch device. We pull off the queue using the version
232+
at which the keys were added, giving a FIFO style queue. The result of
233+
each command set are added to an output key, which the original request
234+
thread is watching. Distributed locks are used to serialise the
235+
processing of commands for each switch device.
236+
237+
The etcd endpoint is configured using the same ``[ngs_coordination]
238+
backend_url`` option used in :ref:`synchronization`, with the limitation that
239+
only ``etcd3gw`` is supported.
240+
241+
Additionally, each device that will use batched configuration should include
242+
the following option::
243+
244+
[genericswitch:device-hostname]
245+
ngs_batch_requests = True
246+
209247
Disabling Inactive Ports
210248
========================
211249

0 commit comments

Comments
 (0)