April 23, 2024

Scaling a Bare Metal Cluster into the Cloud

Mathius Pius

This is a guest post by Mathias Pius, who has dual experience as a Software Engineer and in Operations & Infrastructure, with keen insights into the unique challenges associated with each discipline, but importantly also the contact point between them.

With the pendulum of Cloud vs. On-premises swinging slightly towards on-premises again, some might be considering moving their Kubernetes clusters onto bare metal while nursing their scars from the last time they tried managing their control plane using kubeadm. Others have been using bare metal the whole time, but have been hamstrung by lead times on hardware or regulatory pressure to keep sensitive information on-premises.

With a hybrid cluster, you can have your cake and eat it too – running the main cluster on-premises, but allowing it to expand and scale into a public cloud – and Talos Linux makes it easy. Talos is a Linux-based operating system built specifically for running Kubernetes. It lets you manage your entire machine state through a single configuration file, significantly reducing the maintenance burden of running and upgrading your cluster using its talosctl command line utility.

Starting Point

In this article, you’ll extend an existing 3-node Kubernetes cluster onto a cloud-hosted virtual machine over the internet. If you don’t already have a cluster, you can set one up using Docker, QEMU or VirtualBox. The process for getting a Kubernetes cluster running on Talos is the same regardless of the platform it runs on:

boot machines off the Talos Linux image
define the endpoint for the Kubernetes API and generate your machine configurations
configure Talos Linux by applying machine configurations to the machines
configure talosctl
bootstrap Kubernetes

For more information see the Getting Started guide.

In this case, the cluster is running Talos v1.6.4 and Kubernetes v1.29.1 and is made up of three bare metal nodes, all acting as control planes:

$ kubectl get nodes -o wide
NAME    STATUS   ROLES           AGE     VERSION   INTERNAL-IP      OS-IMAGE         KERNEL-VERSION
n1      Ready    control-plane   250d    v1.29.1   159.69.60.182    Talos (v1.6.4)   6.1.74-talos
n2      Ready    control-plane   247d    v1.29.1   88.99.105.56     Talos (v1.6.4)   6.1.74-talos
n3      Ready    control-plane   247d    v1.29.1   46.4.77.66       Talos (v1.6.4)   6.1.74-talos

Inter-node communication happens over public IPs. No private networking is configured, and therefore, each node in the cluster is configured to explicitly block access to the Talos (50000-50001/tcp) and Kubernetes API (6443/tcp) ports from anything but the nodes within the cluster and the work station we’ll be using to access the cluster.

All other traffic is allowed through the external firewall to allow more fine-grained control configured via network policies.

Potential Pitfalls

The reasons for hybridizing your cluster are great, but there are a couple of things to watch out for:

Latency. Your control plane nodes should always be close to each other! Etcd (the key-value store control planes use internally to track state) is very sensitive to latency, and geographically distributing these nodes might render your cluster unusable in the worst case. Even your workloads need to take latency into account. Running your web service on a 128-core 2TB memory node in the cloud is not going to do anything to compensate for an 800 millisecond round trip time to your on-premise database.
Security. While all Kubernetes API traffic uses HTTPS by default and is therefore encrypted, it is entirely up to the Container Network Interface (CNI) plugin whether standard pod-to-pod traffic is encrypted when crossing the host boundary. Many network plugins like Cilium, Flannel, and Calico support this, but it has to be enabled and generally comes with some caveats. As we shall see, Talos Linux has a solution for this, too.

The Chicken-and-Egg Problem

With a single-platform cluster, locking it down is pretty simple. Once you start opening the door to traffic between platforms, managing your firewall(s) suddenly becomes a lot more complicated, and complexity opens the gate for misconfiguration and expands your attack surface. A great way to solve this is to manage trust between nodes using end-to-end encryption directly.

This became a lot easier with the introduction of Wireguard. Instead of relying on your router supporting IPSec or OpenVPN and hoping that your cloud provider supported it (without charging you an arm and a leg), you can now configure an extremely high-performance, full-mesh, node-to-node, encrypted solution.

However, this solution does present unique challenges: how do you establish an encrypted connection to a node you don’t know or whose only method of communication is encrypted?

There are only three solutions to this problem:

Scenario 1: Escape Hatch. Let some traffic bypass the encryption layer and communicate with the cluster directly. This is what Cilium does. This is a decent solution, but it can potentially expose unencrypted traffic under some circumstances.
Scenario 2: Manually configured. Establish a trusted network beforehand, effectively doing all the heavy lifting yourself.
Scenario 3: Discovery Service. If you already have a shared secret that all nodes know when they’re provisioned, you can transmit encrypted node information (public key, endpoints) between nodes using a third party as a relay. As it turns out, this is exactly how Talos’ KubeSpan works!

Node-to-node encrypted mesh

KubeSpan is a feature built into Talos Linux which establishes a Wireguard-based mesh network between all nodes similar to products like Tailscale, except all aspects of the mesh are managed by Talos itself. When Talos Linux nodes have the configuration applied to join a specific cluster, they have cluster specific security keys. They use this to encrypt the information they send to the discovery service, so only other members of the same cluster can decrypt it – the discovery service cannot decrypt the endpoint information. This sidesteps the chicken-and-egg problem of Cilium by using an external discovery service to distribute end-to-end encrypted public keys and node information, meaning all inter-node traffic is always encrypted. The only traffic you must allow across your firewall is the standard 51820/udp wireguard port.

Enable KubeSpan Overlay Network

Turning the KubeSpan feature on is simple. If you’re deploying a new node (which we’ll do later), you can set the --with-kubespan flag when generating the MachineConfig using the talosctl gen config utility.

For an established cluster, setting the KubeSpan and Discovery feature selectors in each of your MachineConfigs to true is all it takes. For a small cluster, the simplest way to achieve this is to apply the above as a patch to all your nodes:

$ talosctl -n 159.69.60.182 patch machineconfig --patch '
machine:
  network:
    kubespan:
      enabled: true
cluster:
  discovery:
    enabled: true'

Repeat the above for each node in your cluster, and you’re done!

From here, Talos will take care of all the public and private key management, routing and interface configuration required to establish the Wireguard connection between all the nodes.

We can inspect the resources Talos uses to manage this connection to verify that it’s working.

Checking that our node has produced a KubeSpanIdentity which corresponds more or less to a standard Wireguard configuration, a public key and a unique locally addressable IPv6 address used within the mesh network to route traffic:

$ talosctl -n 159.69.60.182 get kubespanidentities
NODE            ADDRESS                                      PUBLICKEY
159.69.60.182   fd57:1f05:2cd4:2402:921b:eff:febd:9bda/128   zU7/U2J0Y8hfIakmsdT+lP7GrcbYrBoKsQbbNbMt20k=

Seeing that it is aware of the other nodes via KubeSpanPeerSpecs:

$ talosctl -n 159.69.60.182 get kubespanpeerspecs
NODE            ID              LABEL   ENDPOINTS
159.69.60.182   GPQBNyo3Tv...   n2      ["88.99.105.56:51820"]
159.69.60.182   PU9nhYz7Ew...   n3      ["46.4.77.66:51820"]

Verifying that traffic is getting routed between them using the KubeSpanPeerStatuses object:

$ talosctl -n 159.69.60.182 get kubespanpeerstatuses
NODE            ID              LABEL   ENDPOINT             STATE   RX         TX
159.69.60.182   GPQBNyo3Tv...   n2      88.99.105.56:51820   up      34464488   63721816
159.69.60.182   PU9nhYz7Ew...   n3      46.4.77.66:51820     up      9079656    22087720

With a peer state of up and both tx (transmitted bytes) and rx (received bytes) already showing megabytes of traffic, it’s clear that our mesh network is configured and working!

If the peers don’t show as up, check to ensure you’ve allowed traffic to each node on port 51820/udp.

Security: Unlike services such as secure shell (SSH), which uses stateful TCP to establish and keep its connection to the target server alive, Wireguard uses UDP and drops incoming packets that present keys the nodes is not configured to accept. This means that allowing traffic from the internet to this port is perfectly safe and reasonable, assuming you can keep your Wireguard keys safe. To a potential attacker, a closed Wireguard port is indistinguishable from attempting to connect to an open Wireguard port with invalid credentials. Wireguard’s cryptographic keys are more than sufficiently secure to ensure they can’t be broken.

Allowing access from anywhere comes in handy in the next section, where we’ll provision a Cloud VPS with an as-of-yet unknown IP address. Allowing access from anywhere saves us the trouble of going back and modifying our firewall policies every time we provision a new server.

Creating the Talos Image for Hetzner

We’ll configure a virtual server in Hetzner Cloud. Talos provides official AMIs for setting up EC2 instances in AWS, for example. Hetzner Cloud has no equivalent to the AMI marketplace, so we’ll have to get our Talos image into the cloud ourselves.

Hetzner offers three simple ways of doing this:

Boot a virtual machine in Rescue Mode, which starts the server using an ephemeral operating system loaded into RAM, allowing us to overwrite the server’s boot disk and take a snapshot of the virtual machine disk before deleting the server itself.
Use HashiCorp’s Packer utility to do all this for us automatically.
You can email Hetzner and ask them to upload the image for you, making it available in their cloud console, but this might take quite a bit longer.

Packer is an excellent solution to this problem, which makes upgrading your base image to newer versions of Talos easier and much less error-prone than manually booting and wgetting files onto a disk every time a minor version of Talos is released.

Go ahead and install Packer using your package manager, or get the binary itself directly from the Packer website.

Next, we’ll need a Packer configuration describing how our image will be built. The official guide for deploying Talos to Hetzner Cloud has a great Packer config file necessary for creating the image, which we’ll be using in a slightly abbreviated format, since we don’t care about building our images for platforms other than amd64, and all our servers will be running in the Helsinki (hel1) datacenter:

# hcloud.pkr.hcl
packer {
  required_plugins {
    hcloud = {
      source  = "github.com/hetznercloud/hcloud"
      version = "~> 1"
    }
  }
}

variable "talos_version" {
  type    = string
  default = "v1.6.4"
}

locals {
  image = "https://github.com/siderolabs/talos/releases/download/${var.talos_version}/hcloud-amd64.raw.xz"
}

source "hcloud" "talos" {
  rescue       = "linux64"
  image        = "debian-11"
  location     = "hel1"
  server_type  = "cx11"
  ssh_username = "root"

  snapshot_name   = "talos-${var.talos_version}"
  snapshot_labels = {
    type    = "infra",
    os      = "talos",
    version = "${var.talos_version}",
    arch    = "amd64",
  }
}

build {
  sources = ["source.hcloud.talos"]

  provisioner "shell" {
    inline = [
      "apt-get install -y wget",
      "wget -O /tmp/talos.raw.xz ${local.image}",
      "xz -d -c /tmp/talos.raw.xz | dd of=/dev/sda && sync",
    ]
  }
}

With Packer installed and our configuration saved to hcloud.pkr.hcl, we’re almost ready to build the image. The only thing left to do is to create an API Key for Hetzner Cloud with Read & Write permissions so Packer can provision servers and store snapshots on our behalf. Creating the key is done through the Console under Security -> API Tokens. Keep it saved somewhere safe; we’ll need it for Packer and later use it to provision our servers.

Let’s export our new token and build our image:

$ export HCLOUD_TOKEN=pW0p2lsDPwNhluykbkAlLPoLQQHLuVUaIN5Naf1ykB1Cq97zyuun5k3I0etDpblm
$ packer init .
$ packer build .

Side note: Packer is designed similarly to Terraform, where most of the non-core functionality is implemented by “plugins”, with which Packer communicates to do all the provider-specific tasks. The packer init . command above looks at our config file and notices that we’re using github.com/hetznercloud/hcloud and automatically fetches that plugin for us.

At this point, your terminal should be full of colored output, hopefully concluding in something like this:

==> Builds finished. The artifacts of successful builds are:
--> hcloud.talos: A snapshot was created: 'talos-v1.6.4' (ID: 150744957)

If we now look under Server -> Snapshots in the Hetzner Cloud Console, we should see a recently created image named talos-v1.6.4, with a size of around 300MB.

Because Talos is completely unconfigured and boots into a Maintenance Mode by default, we can re-use the images for all the virtual machines we’ll need.

Setting up the firewall

Since we’re using KubeSpan, the only ports we’ll need to have open is the 51820/udp wireguard port from anywhere.

Hetzner Cloud provides its own command line interface tool hcloud, making managing your servers and firewalls directly from the terminal easy. These steps can be done through the Hetzner Cloud Console, but the process is more straightforward to explain using a terminal.

The first step is configuring a context, which is just a way to associate our API Key with an easily recognizable name. We’ll call ours talos:

$ hcloud context create talos

If the HCLOUD_TOKEN environment variable is still set from when we built the image using Packer, hcloud will offer to import it for you; otherwise, you will have to enter the key again.

This new context will automatically get set as the active one, which means we’re all set. As a trial run, let’s make sure our image is listed:

$ hcloud image list --type snapshot
ID          TYPE       DESCRIPTION    ARCHITECTURE   IMAGE SIZE   DISK SIZE   CREATED
150744957   snapshot   talos-v1.6.4   x86            0.29 GB      20 GB       Wed Feb 21 11:43:26 CET 2024

Specifying --type snapshot will exclude all the official Hetzner Cloud images, which we aren’t interested in.

Next, let’s create a firewall that blocks all traffic except KubeSpan. Since the node can be configured entirely via user data later, we won’t need access to the Talos API. We only have one rule so that we can do it all from the command line:

$ hcloud firewall create --name talos-workers --rules-file - <<< '[
    {
        "description": "Allow KubeSpan Traffic",
        "direction": "in",
        "port": "51820",
        "protocol": "udp",
        "source_ips": [
            "0.0.0.0/0"
        ]
    }
]'

Joining a node

Now comes the fun part: extending the cluster past its current environment.

Let’s create a MachineConfig for our new worker node. When you created your cluster initially, the talosctl gen config command would have output files for both control plane nodes and worker nodes. If you no longer have the worker.yaml file, you can recreate it assuming you have access to the secrets.yaml file of your cluster. Creating the worker config can be done using the talosctl gen config command as below:

$ talosctl gen config                 \
  --with-kubespan                     \
  --with-secrets secrets.yaml         \
  --output-types worker               \
  your-cluster-name-here              \
  https://<Cluster API Endpoint>:6443
generating PKI and tokens
Created worker.yaml

Our worker’s MachineConfig has now been written to worker.yaml, which means we’re almost done!

From here, spinning up the new server is the easy part:

$ hcloud server create --name talos-worker-1 \ 
    --image 150744957         \
    --type cx21               \
    --location hel1           \
    --firewall talos-workers  \
    --user-data-from-file worker.yaml

The server will now be provisioned, after which the CLI tool will print some information about the server we won’t need since all the configuration happens automatically.

We should see our worker node joining our cluster if everything goes as planned:

$ watch kubectl get nodes
NAME             STATUS   ROLES           AGE     VERSION
n1               Ready    control-plane   243d    v1.29.1
n2               Ready    control-plane   239d    v1.29.1
n3               Ready    control-plane   239d    v1.29.1
talos-worker-1   Ready    <none>          7m55s   v1.29.1

And there it is, our talos-worker-1 server is Ready! Checking out the KubeSpanPeerStatuses again, we can see that the node has joined the mesh network and is communicating with our control plane nodes, with all traffic within the cluster encrypted on the wire with full node-to-node encryption, automatically:

$ talosctl -n 95.216.166.208 get kubespanpeerstatuses
NODE             NAMESPACE   ID                                             LABEL   ENDPOINT              STATE   RX        TX
95.216.166.208   kubespan    GPQBNyo3TvUtCitG6hVTaEVYeQuxe+/V87xr5MCxnSo=   n2      88.99.105.56:51820    up      3490040   650808
95.216.166.208   kubespan    PU9nhYz7Ew/BVBZA1c6D6xBY9ZE2iwJURBPT8cWpxw0=   n3      46.4.77.66:51820      up      135604    80020
95.216.166.208   kubespan    zU7/U2J0Y8hfIakmsdT+lP7GrcbYrBoKsQbbNbMt20k=   n1      159.69.60.182:51820   up      101908    75824

And with that, we’re all set! Installing tools, configuring the image, and setting up the firewall took some legwork. However, deploying the server was painless and this enables us to quickly scale to 1, 3, 20, or even hundreds of workers! Of course, managing these nodes long-term and keeping them all updated, while easy to automate with the talosctl CLI, is much simpler with Sidero Labs’ Omni solution.

Encore

Since all node-to-node traffic happens over the encrypted KubeSpan/Wireguard tunnel, we can simplify our network setup by removing all the rules that allow node-to-node traffic since the wireguard 51280/udp traffic is already allowed.

Without KubeSpan, we would have had to configure firewall rules for every node pair, doubling the number of firewall rules for every node we added!

Summary

We’ve taken an ordinary Talos Linux Kubernetes cluster, running on bare metal, and extended it into a single Hetzner node. Just by enabling KubeSpan, we did not have to be concerned about security – all traffic within the cluster is fully encrypted for all node-to-node traffic, while traffic exiting the cluster is not. All done automatically.

This allows you to take advantage of the strengths of different providers – bare metal for predictable cost control and performance, and cloud providers for bursting out on demand, or for geolocating workloads for low latency.