Skip to content

Commit

Permalink
Merge pull request #527 from vitobotta/masters-different-locations
Browse files Browse the repository at this point in the history
Masters in different locations
  • Loading branch information
vitobotta authored Jan 30, 2025
2 parents 20362f1 + 9695aa9 commit dd802e8
Show file tree
Hide file tree
Showing 20 changed files with 276 additions and 72 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ See my public profile with links for connecting with me [here](https://vitobotta

- [Installation](docs/Installation.md)
- [Creating a cluster](docs/Creating_a_cluster.md)
- [Masters in different locations](docs/Masters_in_different_locations.md)
- [Upgrading a 1.x cluster to 2.x](Upgrading_a_cluster_from_1x_to_2x.md)
- [Setting up a cluster](docs/Setting%20up%20a%20cluster.md)
- [Recommendations](docs/Recommendations.md)
- [Maintenance](docs/Maintenance.md)
Expand Down
7 changes: 5 additions & 2 deletions docs/Creating_a_cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,11 @@ schedule_workloads_on_masters: false

masters_pool:
instance_type: cpx21
instance_count: 3
location: nbg1
instance_count: 3 # for HA; you can also create a single master cluster for dev and testing (not recommended for production)
locations: # you can specify a single location as well for single masters clusters or if you want all masters in the same location. For regional clusters (only eu-central network zone), each master must be in a different location
- fsn1
- hel1
- nbg1

worker_node_pools:
- name: small-static
Expand Down
8 changes: 4 additions & 4 deletions docs/Installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,15 @@ You need to install these dependencies first:
##### Intel / x86

```bash
wget https://github.com/vitobotta/hetzner-k3s/releases/download/v2.2.2/hetzner-k3s-macos-amd64
wget https://github.com/vitobotta/hetzner-k3s/releases/download/v2.2.3/hetzner-k3s-macos-amd64
chmod +x hetzner-k3s-macos-amd64
sudo mv hetzner-k3s-macos-amd64 /usr/local/bin/hetzner-k3s
```

##### Apple Silicon / ARM

```bash
wget https://github.com/vitobotta/hetzner-k3s/releases/download/v2.2.2/hetzner-k3s-macos-arm64
wget https://github.com/vitobotta/hetzner-k3s/releases/download/v2.2.3/hetzner-k3s-macos-arm64
chmod +x hetzner-k3s-macos-arm64
sudo mv hetzner-k3s-macos-arm64 /usr/local/bin/hetzner-k3s
```
Expand All @@ -51,15 +51,15 @@ sudo mv hetzner-k3s-macos-arm64 /usr/local/bin/hetzner-k3s
#### amd64

```bash
wget https://github.com/vitobotta/hetzner-k3s/releases/download/v2.2.2/hetzner-k3s-linux-amd64
wget https://github.com/vitobotta/hetzner-k3s/releases/download/v2.2.3/hetzner-k3s-linux-amd64
chmod +x hetzner-k3s-linux-amd64
sudo mv hetzner-k3s-linux-amd64 /usr/local/bin/hetzner-k3s
```

#### arm

```bash
wget https://github.com/vitobotta/hetzner-k3s/releases/download/v2.2.2/hetzner-k3s-linux-arm64
wget https://github.com/vitobotta/hetzner-k3s/releases/download/v2.2.3/hetzner-k3s-linux-arm64
chmod +x hetzner-k3s-linux-arm64
sudo mv hetzner-k3s-linux-arm64 /usr/local/bin/hetzner-k3s
```
Expand Down
63 changes: 63 additions & 0 deletions docs/Masters_in_different_locations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Masters in Different Locations

You can set up a regional cluster for maximum availability by placing each master in a different European location. This means the first master will be in Falkenstein, the second in Helsinki, and the third in Nuremberd (listed in alphabetical order). This setup is only possible in network zones with multiple locations, and currently, the only such zone is `eu-central`, which includes these three European locations. For other regions, only zonal clusters are supported. Additionally, regional clusters are limited to 3 masters because we only have these three locations available.

To create a regional cluster, simply set the `instance_count` for the masters pool to 3 and specify the `locations` setting as `fsn1`, `hel1`, and `nbg1`.

## Converting a Single Master or Zonal Cluster to a Regional One

If you already have a cluster with a single master or three masters in the same European location, converting it to a regional cluster is straightforward. Just follow these steps carefully and be patient. Note that this requires hetzner-k3s version 2.2.3 or higher.

Before you begin, make sure to back up all your applications and data! This is crucial. While the migration process is relatively simple, there is always some level of risk involved.

- [ ] Set the `instance_type` for the masters pool to 3 if your cluster currently has only one master.
- [ ] Update the `locations` setting for the masters pool to include `fns1`, `hel1`, and `nbg1` like this:

```yaml
locations:
- fns1
- hel1
- nbg1
```
The locations are always processed in alphabetical order, regardless of how you list them in the `locations` property. This ensures consistency, especially when replacing a master due to node failure or other issues.

- [ ] If your cluster currently has a single master, run the `create` command with the updated configuration. This will create `master2` in Helsinki and `master3` in Nuremberg. Wait for the operation to complete and confirm that all three masters are in a ready state.
- [ ] If `master1` is not in Falkenstein (fns1):
- Drain `master1`.
- Delete `master1` using the command `kubectl delete node {cluster-name}-master1`.
- Remove the `master1` instance via the Hetzner Console or the `hcloud` utility (see: https://github.com/hetznercloud/cli).
- Run the `create` command again. This will recreate `master1` in Falkenstein.
- SSH into each master and run the following commands to ensure `master1` has joined the cluster correctly:

```bash
sudo apt-get update
sudo apt-get install etcd-client
export ETCDCTL_API=3
export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
export ETCDCTL_CACERT=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
export ETCDCTL_CERT=/var/lib/rancher/k3s/server/tls/etcd/server-client.crt
export ETCDCTL_KEY=/var/lib/rancher/k3s/server/tls/etcd/server-client.key
etcdctl member list
```

The last command should display something like this if everything is working properly:

```
285ab4b980c2c8c, started, test-master2-d25722af, https://10.0.0.3:2380, https://10.0.0.3:2379, false
aad3fac89b68bfb7, started, test-master1-5e550de0, https://10.0.0.4:2380, https://10.0.0.4:2379, false
c11852e25aef34e8, started, test-master3-0ed051a3, https://10.0.0.2:2380, https://10.0.0.2:2379, false
```

- [ ] If `master2` is not in Helsinki, follow the same steps as with `master1` but for `master2`. This will recreate `master2` in Helsinki.
- [ ] If `master3` is not in Nuremberg, repeat the process for `master3`. This will recreate `master3` in Nuremberg.

That’s it! You now have a regional cluster, which ensures continued operation even if one of the Hetzner locations experiences a temporary failure. I also recommend enabling the `create_load_balancer_for_the_kubernetes_api` setting to `true` if you don’t already have a load balancer for the Kubernetes API.

## Performance Considerations

This feature has been frequently requested, but I delayed implementing it until I could thoroughly test the configuration. I was concerned about latency issues, as etcd is sensitive to delays, and I wanted to ensure that the latency between the German locations and Helsinki wouldn’t cause problems.

It turns out that the default heartbeat interval for etcd is 100ms, and the latency between Helsinki and Falkenstein/Nuremberg is only 25-27ms. This means the total round-trip time (RTT) for the Raft consensus is around 60-70ms, which is well within etcd’s acceptable limits. After running benchmarks, everything works smoothly! So, there’s no need to adjust the etcd configuration for this setup.
88 changes: 88 additions & 0 deletions docs/Upgrading_a_cluster_from_1x_to_2x.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Upgrading a cluster created with hetzner-k3s v1.x to v2.x

The v1 version of hetzner-k3s is quite old and hasn't been supported for a while, but I know that some people haven't upgraded to v2 because until now there wasn't a straightforward process to do this.

This migration is now possible and straightforward provided you follow these instructions carefully and are patient. The migration also allows you to replace deprecated instance types (series `CX`) with new instance types. This migration requires hetzner-k3s v2.2.3 or higher.

## Prerequisites

- [ ] I recommend you install the [hcloud utility](https://github.com/hetznercloud/cli) to more easily/quickly delete old masters

## Upgrading configuration and first steps

- [ ] ==Backup apps and data== - like with all migrations, there is some risk involved, so be prepared in case something doesn't go according to the plan
- [ ] ==Backup kubeconfig and old config file==
- [ ] Uninstall the System Upgrade Controller
- [ ] Create resolv file on existing nodes, either manually or automate it with the `hcloud` CLI
```bash
hcloud server list | awk '{print $4}' | tail -n +2 | while read ip; do
echo "Setting DNS for ${ip}"
ssh -n root@${ip} "echo nameserver 8.8.8.8 | tee /etc/k8s-resolv.conf"
ssh -n root@${ip} "cat /etc/k8s-resolv.conf"
done
```
- [ ] Convert config file to new format https://github.com/vitobotta/hetzner-k3s/releases/tag/v2.0.0
- [ ] Comment out or remove empty node pools from the config file
- [ ] Set `embedded_registry_mirror: enabled: false` if needed, depending on the current version of k3s (https://docs.k3s.io/installation/registry-mirror)
- [ ] Add `legacy_instance_type` to ==ALL== node pools, both master and workers, set to the current instance type (regardless of whether it's deprecated or not). ==This is crucial for the migration==
- [ ] Run `create` command ==with latest hetnzer-k3s using the new config file==
- [ ] Wait for all CSI pods in `kube-system` to restart, ==ensure everything is running==

## Rotating control plane instances with the new instance type

One master per time (==Switch context before rotating master1== unless your cluster has a load balancer for the Kubernetes API):

- [ ] Drain and delete the master both with kubectl and from the Hetzner console (or using the `hcloud` CLI) to also delete the actual instance
- [ ] Rerun the `create` command to recreate the master with the new instance type, wait for it to join the control plane and be in "ready" status
- [ ] SSH into each master and verify that the etcd members have been updated correctly and are in sync
```bash
sudo apt-get update
sudo apt-get install etcd-client

export ETCDCTL_API=3
export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
export ETCDCTL_CACERT=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
export ETCDCTL_CERT=/var/lib/rancher/k3s/server/tls/etcd/server-client.crt
export ETCDCTL_KEY=/var/lib/rancher/k3s/server/tls/etcd/server-client.key

etcdctl member list
```

Repeat the process for each master carefully. After the three masters have been replaced:

- [ ] Rerun the `create` command once or twice to ensure config is stable and the masters don't get restarted anymore
- [ ] [Debug DNS resolution](https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/). If there are issues with it, restart the agents for DNS resolution with the command below, then restart CoreDNS
```bash
hcloud server list | grep worker | awk '{print $4}'| while read ip; do
echo "${ip}"
ssh -n root@${ip} "systemctl restart k3s-agent"
sleep 10
done
```
- [ ] Address any issues with your workloads, if any, before proceeding with the rotation of the worker nodes

## Rotating a worker node pool

- [ ] Increase node count for the pool by 1
- [ ] Run the `create` command to create the extra node required during the pool rotation

One worker node per time (apart from the last one you've just added):

- [ ] Drain a node
- [ ] Delete the drained node both with kubectl and from the Hetzner console (or using the `hcloud` CLI)
- [ ] Rerun `create` command to recreate the deleted node
- [ ] Verify that all works as expected before proceeding with the next node in the pool

Once all the existing nodes have been rotated:

- [ ] Drain the very last node in the pool which we added earlier
- [ ] Verify that all looks good
- [ ] Delete the very last node both with kubectl and from the Hetzner console (or using the `hcloud` CLI)
- [ ] Update the `instance_count` for the node pool by -1
- [ ] Proceed with the next pool

## Finalizing

- [ ] Remove the `legacy_instance_type` setting from both master and worker node pools
- [ ] Re-run the `create` command once again to double check
- [ ] Optionaly, convert the currently zonal cluster to a regional one with masters in different locations (see [this](Upgrading_a_cluster_from_1x_to_2x.md)).
42 changes: 29 additions & 13 deletions src/cluster/create.cr
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ class Cluster::Create
private getter configuration : Configuration::Loader
private getter hetzner_client : Hetzner::Client { configuration.hetzner_client }
private getter settings : Configuration::Main { configuration.settings }
private getter autoscaling_worker_node_pools : Array(Configuration::NodePool) { settings.worker_node_pools.select(&.autoscaling_enabled) }
private getter autoscaling_worker_node_pools : Array(Configuration::WorkerNodePool) { settings.worker_node_pools.select(&.autoscaling_enabled) }
private getter ssh_client : Util::SSH { Util::SSH.new(settings.networking.ssh.private_key_path, settings.networking.ssh.public_key_path) }
private getter network : Hetzner::Network?
private getter ssh_key : Hetzner::SSHKey
Expand Down Expand Up @@ -102,16 +102,16 @@ class Cluster::Create
"#{settings.cluster_name}-#{instance_type_part}#{prefix}#{index + 1}"
end

private def create_master_instance(index : Int32, placement_group : Hetzner::PlacementGroup?) : Hetzner::Instance::Create
legacy_instance_type = settings.masters_pool.legacy_instance_type
instance_type = settings.masters_pool.instance_type
private def create_master_instance(index : Int32, placement_group : Hetzner::PlacementGroup?, location : String) : Hetzner::Instance::Create
legacy_instance_type = masters_pool.legacy_instance_type
instance_type = masters_pool.instance_type

legacy_instance_name = build_instance_name(legacy_instance_type, index, true)
instance_name = build_instance_name(instance_type, index, settings.include_instance_type_in_instance_name)

image = settings.masters_pool.image || settings.image
additional_packages = settings.masters_pool.additional_packages || settings.additional_packages
additional_post_create_commands = settings.masters_pool.post_create_commands || settings.post_create_commands
image = masters_pool.image || settings.image
additional_packages = masters_pool.additional_packages || settings.additional_packages
additional_post_create_commands = masters_pool.post_create_commands || settings.post_create_commands

Hetzner::Instance::Create.new(
settings: settings,
Expand All @@ -125,16 +125,20 @@ class Cluster::Create
network: network,
placement_group: placement_group,
additional_packages: additional_packages,
additional_post_create_commands: additional_post_create_commands
additional_post_create_commands: additional_post_create_commands,
location: location
)
end

private def initialize_master_instances
masters_pool = settings.masters_pool
placement_group = create_placement_group_for_masters
location_counts = Hash(String, Int32).new(0)

Array(Hetzner::Instance::Create).new(masters_pool.instance_count) do |i|
create_master_instance(i, placement_group)
location = masters_locations.min_by { |loc| location_counts[loc] }
location_counts[location] += 1

create_master_instance(i, placement_group, location)
end
end

Expand All @@ -157,7 +161,7 @@ class Cluster::Create
instance_name: instance_name,
instance_type: instance_type,
image: image,
location: node_pool.location,
location: node_pool.location || default_masters_Location,
ssh_key: ssh_key,
network: network,
placement_group: placement_group,
Expand Down Expand Up @@ -304,7 +308,7 @@ class Cluster::Create
@load_balancer = Hetzner::LoadBalancer::Create.new(
settings: settings,
hetzner_client: hetzner_client,
location: configuration.masters_location,
location: default_masters_Location,
network_id: network.try(&.id)
).run

Expand Down Expand Up @@ -332,10 +336,18 @@ class Cluster::Create
settings: settings,
hetzner_client: hetzner_client,
network_name: settings.cluster_name,
locations: configuration.locations
network_zone: ::Configuration::Settings::NodePool::Location.network_zone_by_location(default_masters_Location)
).run
end

private def masters_locations
masters_pool.locations.sort
end

private def default_masters_Location
masters_locations.first
end

private def find_or_create_network
find_existing_network(settings.networking.private_network.existing_network_name) || create_new_network
end
Expand All @@ -359,4 +371,8 @@ class Cluster::Create
settings: settings
).run
end

private def masters_pool
settings.masters_pool
end
end
14 changes: 7 additions & 7 deletions src/configuration/loader.cr
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,15 @@ class Configuration::Loader
Path[settings.kubeconfig_path].expand(home: true).to_s
end

getter masters_location : String | Nil do
settings.masters_pool.try &.location
getter masters_pool : Configuration::MasterNodePool do
settings.masters_pool
end

getter instance_types : Array(Hetzner::InstanceType) do
hetzner_client.instance_types
end

getter locations : Array(Hetzner::Location) do
getter all_locations : Array(Hetzner::Location) do
hetzner_client.locations
end

Expand Down Expand Up @@ -135,9 +135,9 @@ class Configuration::Loader
errors: errors,
pool: settings.masters_pool,
pool_type: :masters,
masters_location: masters_location,
masters_pool: masters_pool,
instance_types: instance_types,
locations: locations,
all_locations: all_locations,
datastore: settings.datastore
).validate
end
Expand Down Expand Up @@ -172,9 +172,9 @@ class Configuration::Loader
errors: errors,
pool: worker_node_pool,
pool_type: :workers,
masters_location: masters_location,
masters_pool: masters_pool,
instance_types: instance_types,
locations: locations,
all_locations: all_locations,
datastore: settings.datastore
).validate
end
Expand Down
7 changes: 4 additions & 3 deletions src/configuration/main.cr
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
require "yaml"

require "./node_pool"
require "./master_node_pool"
require "./worker_node_pool"
require "./datastore"
require "./manifests"
require "./embedded_registry_mirror"
Expand All @@ -15,8 +16,8 @@ class Configuration::Main
getter k3s_version : String
getter api_server_hostname : String?
getter schedule_workloads_on_masters : Bool = false
getter masters_pool : Configuration::NodePool
getter worker_node_pools : Array(Configuration::NodePool) = [] of Configuration::NodePool
getter masters_pool : Configuration::MasterNodePool
getter worker_node_pools : Array(Configuration::WorkerNodePool) = [] of Configuration::WorkerNodePool
getter post_create_commands : Array(String) = [] of String
getter additional_packages : Array(String) = [] of String
getter kube_api_server_args : Array(String) = [] of String
Expand Down
5 changes: 5 additions & 0 deletions src/configuration/master_node_pool.cr
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
require "./node_pool"

class Configuration::MasterNodePool < Configuration::NodePool
property locations : Array(String) = ["fsn1"] of String
end
3 changes: 1 addition & 2 deletions src/configuration/node_pool.cr
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,12 @@ require "./node_label"
require "./node_taint"
require "./autoscaling"

class Configuration::NodePool
abstract class Configuration::NodePool
include YAML::Serializable

property name : String?
property legacy_instance_type : String = ""
property instance_type : String
property location : String
property image : String | Int64 | Nil
property instance_count : Int32 = 1
property labels : Array(::Configuration::NodeLabel) = [] of ::Configuration::NodeLabel
Expand Down
Loading

0 comments on commit dd802e8

Please sign in to comment.