Skip to content

Conversation

@Chr1st0ph3rTurn3r
Copy link
Contributor

No description provided.

Chr1st0ph3rTurn3r and others added 30 commits August 7, 2025 15:16
… Security Encryption for Mike Adams' review.
…ntation' of github.com:128technology/docs into 7.1.0-documentation

## Download Failover Resiliency

SSR images can be downloaded from a variety of sources, depending on software access mode (eg. internet-only, prefer-conductor, conductor-only, offline-mode): the HA peer, both conductor nodes, artifactory, and the mist proxy to artifactory (cloud deployments only).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Mist be capitalized?


SSR images can be downloaded from a variety of sources, depending on software access mode (eg. internet-only, prefer-conductor, conductor-only, offline-mode): the HA peer, both conductor nodes, artifactory, and the mist proxy to artifactory (cloud deployments only).

To improve resiliency to network connectivity issues, the SSR queries available versions from all sources before beginning the download. It compiles a list of sources where the requested version is available and begins the download. If more than 50% of requests to a source fail within a window of 10 requests, the SSR marks that source unavailable and moves on to the next source. The following priority order is used for sources:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind the size of the window is more of an implementation detail and may be subject to change based on tuning. We may want to be less specific about that in case we decide to adjust it in the future. But this may be fine too. Not sure how likely we are to need to adjust it


In the event that all sources have reached the threshold of consecutive failures and a download attempt has returned an error, the SSR can be configured to wait for a specified amount of time and then retry the download. If a connection is successfully made, the download will resume where it left off.

When the timeout is enabled, the SSR waits for a configurable amount of time (default is 10800s) for the download to complete. When the timeout value is reached, the download is marked as **Failed** and the retry delay begins.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite accurate. The retry delay will begin once we have marked all download sources as unavailable, as described in the failover resilience section. If enabled, once this timeout is hit, the download will be entirely stopped and marked as a failure. Or in other words, the retries happen inside of this timeout, not after it.


### Sequenced HA Download

The SSR supports sequenced downloading; one node of an HA pair downloads an image from the remote repository, and the other node waits for it to complete. Once that download is complete, the second node downloads it from the first. When targeting an HA router, the download is sequenced by default. To disable this sequencing, use `request system software download simultaneous disable`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once note about the second node downloads it from the first. The peer is the first place that an HA router will attempt to download from, so in most cases this would be the case, but if for whatever reason the connection to the peer went down, the router would move on and continue downloading from the conductor or remote sources. Not sure if that needs to be clarified or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants