Scheduling workloads on control plane nodes in kubernetes – a bad idea?

Yes, it is generally a bad idea to run workloads on control plane nodes, but it’s a balance. If you have no constraints regarding cost or resource availability and are maximizing security and stability, then three dedicated control plane nodes are best. Control plane nodes are critical components of the Kubernetes cluster, responsible for managing the overall state of the cluster, including scheduling workloads, monitoring the health of the nodes and pods, and maintaining the desired state of the system.

Scheduling workloads on control plane nodes can have negative consequences. It can consume resources on the control plane nodes, potentially impacting the overall performance of the cluster. This can lead to slower response times, increased latency to launch pods, and a degraded user experience. Running workloads on control plane nodes can increase instability – if a workload isn’t limited, consumes all the memory, and thus causes etcd to be slow, or get ejected by the out-of-memory killer, then that is bad for your cluster. It can even make it hard to recover from workload issues without creating further risk. If a workload node is thrashing the CPU or exhausting memory – it’s easy to recover by simply shutting down the workload node. If that node is a control plane node, doing so can put your etcd quorum at risk.

Allowing workloads on your control plane reduces security, also – in theory, the workload containers may be able to exploit a vulnerability local to the control plane systems and get secrets they should not have access to. (Note if you control all the workloads, that may be a lesser issue.)

However, if you have constraints on how many computers you have available, or want to pay for, you can run workloads on the control plane nodes. This is common among the many people that use Talos Linux for Kubernetes at home on Raspberry Pis or other SBCs, or Kubernetes at the edge. You will want to ensure you are strict on providing resource requests and limits. With Talos Linux, allowing workloads to run on control plane nodes is accomplished by setting:

   allowSchedulingOnControlPlanes: true

in the machine config. (Previous to Talos 1.3, this setting was called allowSchedulingOnMasters). This setting stops Talos from tainting the node on boot. (If you set it after the boot, you have to untaint the node yourself, or reboot the node.)

You can also run a single control plane node, and allocate all other machines as workers, which means no high availability. However, if you have good systems to backup and recover your control plane, and can do that reliably and quickly, and your workloads are relatively static (so can keep running happily while you recreate the control plane) then that can be OK too.

So, the answer is 3 dedicated control plane nodes is the conservative answer, but as always, it depends…


Occasional Updates On Sidero Labs, Kubernetes And More!

TalosCon 2023

Our free virtual user conference, March 21.

Featuring Talos and Kubernetes talks from companies including