Executive Summary

This document serves as a foundational reference for deploying highly available Kubernetes clusters with Talos Linux, ensuring security, performance, and operational excellence.

Kubernetes is the leading container orchestration platform, enabling organizations to efficiently deploy, manage, and scale applications. Talos Linux brings simplicity and security to bare-metal and edge Kubernetes making infrastructures secure by default, easier to use, and more reliable to operate. 

This architecture document provides a blueprint for deploying a standard Kubernetes cluster using Talos Linux, outlining best practices for management, security, high availability, and disaster recovery to deliver a scalable and resilient platform for cloud native applications. While there are many options for deploying functions in Kubernetes, this is our recommended architecture.

Technology Overview

Talos Linux

Talos Linux is an open source Linux operating system (OS) purpose-built for Kubernetes, operates entirely through an API, eliminating traditional no SSH or shell access, thus providing a highly secure and minimal operating system for Kubernetes clusters.

Key features include:

Vanilla Upstream Kubernetes

Talos Linux deploys upstream Kubernetes without modifications and ensures full compatibility with the Kubernetes ecosystem. This provides:

By running pure upstream Kubernetes, Talos Linux provides a reliable, community-aligned foundation for cloud native workloads.

Solution Overview

This reference architecture document targets high availability (HA) for Kubernetes, leveraging Talos Linux as the OS. Other architectures are possible with Talos Linux (including single node clusters, clusters that allow workload scheduling on control planes, and even clusters that span datacenters), but this document focuses on a standard HA cluster with dedicated control plane nodes.

Cluster Architecture

Control Plane Nodes

Control plane nodes host the critical components responsible for managing the cluster’s state, coordinating nodes, and providing the API server for interaction. As such it is essential that they are secured, available, and performant.

Kubernetes uses etcd on control plane nodes as a distributed datastore that provides high availability, fault tolerance, and performance.

Best Practices for Provisioning Control Plane Nodes

Beyond this, sizing is dependent on the workload. We recommend gradually scaling up the workload on the cluster and monitoring the control plane nodes. If resource usage of either memory or CPU exceeds 60% capacity, then increase the CPU or memory resource available to the control plane nodes. This will ensure that the control planes have sufficient capacity to handle resource spikes without compromising the stability of the cluster.

Best Practices for Configuring Control Plane Nodes

These practices are the default configuration on Talos Linux. We mention them to ensure that they are not overridden in deployment.

Networking and CNI

By default, Talos Linux will install the Flannel CNI. 

Flannel is an appropriate choice for many enterprises:

Cilium is a supported option on Talos Linux and is selected by many enterprises with the following requirements:

It is advised against changing a CNI after it’s deployed. Although possible, changing CNI can cause major disruption especially without delicate attention to detail. Talos Linux will install Flannel by default. In order to install Cilium it is necessary to override the machine config to specify that no CNI should be initially installed, see the following:

It is then recommended to deploy Cilium using one of these four documented methods.

Storage

While there are many different Kubernetes storage options, and Talos Linux will work with most of them, we generally recommend:

Load Balancing

In IaaS (Infrastructure as a Service) environments (AWS, GCP, Azure, etc), we recommend use of the cloud provider’s native load balancing services.

In bare metal environments, we recommend MetalLB or KubeVIP for most use cases.

Monitoring and Logging

We recommend Prometheus for small or few clusters and VictoriaMetrics for more complicated or large-scale deployments.

We recommend Grafana and Loki for observability and logging.

It should be noted that systems running Talos Linux are compatible with monitoring from most monitoring solutions. The above are simply monitoring systems that are good choices for Kubernetes infrastructure that we and our customers have had success with. It does not preclude the use of other systems.

Tuning

If performance is more important than minimizing power consumption, we recommend setting appropriate performance settings, as documented in the latest performance tuning page.

Talos Linux Extensions

Talos Linux extensions provide additional functionality beyond the base OS. They are recommended for:

To install an extension, a custom Talos Linux installer image must be built with the desired extensions bundled in it. Such an image can suitably be produced either through the hosted service at factory.talos.dev. Meaning that aside from configuration, the Talos OS deployed cannot be modified without the image being replaced via an upgrade to the custom built image. 

More information can be found at:

Security Considerations

Talos Linux is secure with a default install, but there are additional capabilities that can be enabled. Note that there is overhead with additional security features, whether in operational complexity or node performance.

SecureBoot and Disk Encryption with TPM support

Talos Linux supports SecureBoot in order to protect against boot-level malware attacks. Talos Linux implements a fully signed execution path from firmware to userspace. When used in conjunction with the TPM based disk encryption that Talos Linux supports, this provides very strong protection and guarantees that only trusted Talos Linux can run, even given an attacker with physical access to the server.

SecureBoot does complicate OS installation and upgrades, so is not recommended for all cases and requires specially taking it into account.

Ingress Firewall

The Talos Ingress Firewall provides an additional security layer by controlling inbound network traffic at the OS level. The Ingress Firewall defaults to “allow” for all traffic. For a production setup, we recommend blocking all traffic not explicitly permitted and only allowing traffic explicitly to/from classes of machines, such as control plane nodes and workers. This may be appropriate where:

For an example of recommended rules, see the documentation.

KubeSpan

KubeSpan provides transparent wire-level network encryption between all nodes in a cluster and simplifies network management by joining a private and secure cluster-wide mesh VPN network. KubeSpan is well suited for use cases that involve bursting from one network (e.g. bare metal) to another (e.g. a cloud provider) for extra capacity. Because KubeSpan currently creates a full mesh network, it is not recommended for clusters with greater than 100 nodes.

OS and Kubernetes Authentication

The Kubernetes APIServer supports configuring the use of an external authorization provider, this meaning for only authenticated and authorized users to access a given cluster. It is highly recommended to adopt such a mechanism for production clusters. Any OAuth or OIDC provider is supported, albeit with varying levels of supporting configuration and authentication proxies. This eliminates the risk of an employee leaving the enterprise with admin-level kubeconfig access for a cluster, and simplified compliance.

PodSecurity Standards

By default and since Kubernetes v1.23, Talos Linux has baseline PodSecurity Standards enabled. Meaning that workloads cannot run with privileged SecurityContexts, such as: root user, node host network access, hostPath, and privileged.

For more information, see here.

Cluster Upgrades

Since Talos Linux is image-based and uses an A/B boot system, upgrades between release versions of the OS will either succeed and boot into the new release or fail and revert back to the previous one, never causing a broken state.

Similarly, Kubernetes upgrades are managed separately to Talos upgrades, where a version of Kubernetes is upgraded via replacing components in the running cluster with new versions and migrating any related resources.

Cluster Reproducibility

Bringing up a Kubernetes cluster with Talos Linux starts with configuration. As the cluster configuration is declarative, it describes everything. Given some existing configuration, a cluster can simply be redeployed.

Application Management

We recommend ArgoCD for declarative GitOps-based management of applications, where one or more git repositories act as the source-of-truth for what is deployed on one or more clusters.

We recommend using Omni cluster templates for the initial configuration and deployment of ArgoCD (as demonstrated here) and then using Argo to manage the applications.

Further Configuration

There are ever expansive choices available when setting up Kubernetes clusters. While common best practices work in some scenarios, they may not be suitable for others.

Sidero Professional Services can work with you to architect a configuration that is tailored to your deployment.

Additional Software

The Kubernetes ecosystem contains many options for other functions that may or may not be desired for any particular deployment. Additional Kubernetes software can be used to address certain needs such as container security and compliance, namespace controls, policy compliance and governance, CI/CD tooling, secret management, and more. This reference architecture document does not express opinions on these functions. However, because Talos Linux deploys vanilla upstream Kubernetes, such clusters are compatible with virtually any of the options enterprises may be using for these functions. For enterprises that would like specific recommendations for their use cases, either Sidero Labs or one of our consulting partners can be engaged for consultation.

Progressive migration from Virtual Machines

KubeVirt on Talos Linux is a proven reliable way to migrate legacy workloads from Virtual Machines into Kubernetes to run alongside containerized applications. With KubeVirt, Virtual Machines are able to talk to Pods (and vice versa) and can also be exposed like regular Pods through Services, Ingress, Gateway-API and more. Retaining Virtual Machines and running them on Kubernetes is a helpful way to balance the needs of your organization and to ease into modernizing. Import your existing virtualized workloads using tooling like Forklift from providers like VMware vSphere, OVA, oVirt, and OpenStack. 

To effectively running Virtual Machines in KubeVirt, it is best to have the following:

With its declarative configuration, workloads running through KubeVirt are able to be managed through GitOps like ArgoCD. Running Virtual Machines on Talos Linux through KubeVirt is suitable for bare metal, datacenter, and edge deployments.

Contact us

We would love to hear from you, how you are using Talos Linux, and to support your deployments.

Please find contact details here at siderolabs.com/contact.

Hobby

For home labbers
$ 10 Monthly for 10 nodes
  • Includes 10 nodes in base price
  • Limited to 10 nodes, 1 user
  • Community Support

Startup

Build right
$ 250 Monthly for 10 nodes
  • Includes 10 nodes in base price
  • Additional nodes priced per node, per month
  • Scales to unlimited Clusters,
    Nodes and Users
  • Community Support

Business

Expert support
$ 600 Monthly for 10 nodes
  • Volume pricing
  • Scales to unlimited Clusters,
    Nodes and Users
  • Talos Linux, Omni and Kubernetes support from our experts
  • Business hours support with SLAs
  • Unlimited users with RBAC and SAML

Enterprise

Enterprise Ready
$ 1000 Monthly for 10 nodes
  • Business plan features, plus...
  • Volume pricing
  • 24 x 7 x 365 Support
  • Fully Managed Option
  • Can Self Host
  • Supports Air-Gapped
  • Private Slack Channel
On Prem
available

Edge

Manage scale
$ Call Starting at 100 nodes
  • Pricing designed for edge scale
  • 24 x 7 x 365 Support with SLAs
  • Only outgoing HTTPS required
  • Secure node enrollment flows
  • Reliable device management
  • Can Self Host On Prem
  • Private Slack Channel
On Prem
available