KubePrism – Improving Kubernetes workload availability by preventing Kubernetes API endpoint outages

If you are running a production, highly available Kubernetes cluster, you want multiple control plane nodes for fault tolerance. In this scenario, it is common to have an external load balancer to provide high availability for the Kubernetes API endpoint, distributing requests amongst the control plane nodes, ensuring requests are sent only to nodes that are responding, and not to nodes that are unavailable. (Control plane nodes will each run a copy of the kube-apiserver.)

Load balancers are generally highly available, but issues can happen with misconfiguration, overloading, or even with network issues between the cluster and the load balancer. So what happens when the load balancer has issues?

If the Kubernetes cluster is otherwise healthy, but the load balancer for the Kubernetes API endpoint has failed, we might hope that we will just lose kubectl access to the cluster, but everything else will work perfectly – but we would be disappointed!

Processes like CoreDNS, that map service names (e.g. kubernetes.default.svc) to service IPs, and kube-proxy, that map service IPs to pod IPs, are essential for Kubernetes operation – and use the Kubernetes API endpoint to query the kube-apiserver on control plane nodes to determine the correct mapping. Similarly, the kubelet running on nodes queries the API endpoint to determine what pods to run – if kubelet can’t reach the API endpoint, no new pods will be scheduled, etc. This is typical of other essential processes in the cluster, too. So, if the API endpoint is non-responsive because the load balancer is down – many functions in an otherwise healthy cluster will stop working. (Note that some functions will keep working for some time, due to caching.)

A very visible impact of the API endpoint being non-responsive is that all ingress controllers will go down. So effectively you would have an outage on your applications that are served from ingress controllers, even though your cluster is healthy, simply because an external load balancer (which does not terminate the application itself) is down.

KubePrism, introduced in version 1.5 of Talos Linux, our operating system for Kubernetes, solves this problem by enabling a tiny load balancer on every machine in the cluster, bound to localhost. This load balancer proxies traffic to the API server in an intelligent way to increase availability and performance.

KubePrism will send API traffic not only to the external load balancer, but also chooses among all the reachable control plane nodes, and, for control plane nodes, the localhost API endpoint, and selects the set of endpoints that have the lowest latency. This ensures that so long as a worker can reach any of the control plane nodes directly, it will keep functioning correctly even in the face of the external API endpoint failing. It also means that cluster performance will be improved, due to lower latency on API request/responses, and less network traffic.

KubePrism doesn’t prevent you from losing external access to the cluster API, e.g. for kubectl, but all internal functions of the cluster will keep working.

How it works

The KubePrism function of Talos spins up a TCP loadbalancer on every machine, listening on  localhost on the specified port, which automatically picks one of these endpoints as the destination for API traffic:

  • the external cluster endpoint as specified in the machine configuration
  • for controlplane machines, the local port of the API server running on that machine
  • The set of every controlplane machine and its API port (based on the information from Cluster Discovery)

KubePrism automatically filters out unhealthy (or unreachable) endpoints, and selects lower-latency endpoints over higher-latency endpoints. (For control plane nodes, this will be their local endpoint, which means that API traffic never even leaves the node.)

Enabling KubePrism is as simple as applying a patch to a cluster running Talos version 1.5 or later:

machine:
  features:
    kubePrism:
      enabled: true
      port: 7445

Talos automatically reconfigures kubeletkube-scheduler and kube-controller-manager to use the KubePrism endpoint. The kube-proxy manifest is also reconfigured to use the KubePrism endpoint by default. (Note that when enabling KubePrism for a running cluster the manifest should be updated with talosctl upgrade-k8s command.)

When using CNI components that require access to the Kubernetes API server (e.g. Cilium, Canal CNIs), you should ensure that the KubePrism endpoint is passed to the CNI configuration.

We think that KubePrism alone is a compelling reason to upgrade to version 1.5 of Talos Linux – it can significantly improve the availability of applications running on Kubernetes, by insulating them from issues in the load balancer used for the Kubernetes API endpoint, and also improve cluster response by minimizing latency for API requests.

We particularly recommend it for Omni users, as Omni, our SaaS for Kubernetes, creates the Kubernetes API endpoint in the highly available infrastructure of Sidero Labs – but that is separated from the workers and control plane nodes by the Internet, increasing the chance of a transient reachability issue that could make API calls to the endpoint fail.

And when you consider all the other good stuff in Talos Linux 1.5 (TrustedBoot, anyone?) we recommend everyone upgrade.

For more details on KubePrism, see the documentation, and for a deep dive, including a demonstration of enabling KubePrism on a live cluster, and watching an ingress recover, see the video below:

Subscribe!

Occasional Updates On Sidero Labs, Kubernetes And More!