Add Your Heading Text Here

Why Equinix switched from KubeSpray to Talos Linux, and the adventures that followed

Deployment of Kubernetes went from 45 minutes with Kubespray to under 10 minutes with Talos Linux.

Equinix is the world’s digital infrastructure company: publicly traded (Nasdaq: EQIX), with over 250 datacenters worldwide, covering 27 countries, 5 continents, and $7b revenue.

Jorik Jonker’s team at Equinix offers managed Kubernetes and other managed services to enterprise customers who want a focus on security and compliance. Their initial managed Kubernetes offering was built on Kubespray and Flatcar. They initially thought Kubespray was very good – but their opinion changed over time.

They first found Talos Linux in 2019. They realised it was a very different operating system that was designed for Kubernetes – you could not even log in to nodes. It was different, but it felt very powerful – offering a declarative configuration, reduced attack surface, and API management.

However, Talos was so different, so that it felt a challenge to adopt, and to show the security compliance necessary for their enterprise customers, so they did not pursue Talos at that point.

In 2020, they were running into issues with Kubespray – a system of very convoluted Ansible scripts – effective, but not efficient. Upgrades and deployments took a lot of time and tied up the SREs for too long. As they scaled in adoption, they didn’t get more staff, so they needed more efficiencies.  

They did a proof of concept with Talos Linux, and found that, because it only does Kubernetes, and is API managed, it is architecturally simpler and fast. Deployment of Kubernetes on virtual machines went from 45 minutes with Kubespray to under 10 minutes with Talos Linux.

During the proof of concept, the team liked Talos Linux. But – could they operate it, given how different it was? A background in Linux doesn’t help much with Talos.

This was one of the adventures they had in adopting Talos Linux: it was Unknown – their technical staff didn’t know it, and had no experience with it. A few team members used Talos on their home equipment, but to the rest of the team, it was Different. People don’t like different, especially if they are under the pressure of having to resolve an outage on a system they are not familiar with.

So, Equinix forced exposure to Talos Linux: They scheduled time on the backlog for team members to deploy Talos, and to have someone with experience break the deployment, so they could get experience debugging Talos Linux. They found that within hours people were comfortable with the new Operating System.

Eventually everyone got confident they could support Talos, and they all liked the architecture. This led them to build their first product on Talos Linux, a new generation of their Managed Kubernetes service, in 2021.

They did have some other challenges in adopting Talos Linux – Equinix adopted Talos Linux early in the lifecycle, before version 1.0. They encountered some issues with buggy NTP, and needing custom Kubelet DNS settings, but the Sidero Labs engineering team addressed these quickly. Equinix did have to do some coding to make vCloud Director support deploying Talos Linux.

It was also a bit of a challenge to keep up with the rapid release cycle of Talos Linux, but they were able to address this with automation, which was helped by Talos Linux being fully API managed.

Another issue was demonstrating security compliance – not in actual compliance, but in assessing it. They use Kubebench to assess their platform against CIS hardening guidelines. Kubebench would report some failed tests, as it could not determine that some files and packages were set with limited permissions or disabled on Talos Linux – because such files did not even exist! So the team submitted patches to Kubebench to resolve all the false positives. They similarly submitted PRs so that the Dutch government security compliance standards would correctly recognize Talos Linux as secure.

Equinix has now end-of-life’d their Kubespray Kubernetes offering, and are solely supporting their Talos based Kubernetes product. There was no customer pushback or issues on the migration to the new platform.

They have seen several benefits from switching to Talos Linux – the time saved in deployments and updates is significant – this allows Equinix to iterate releases faster. Talos gets you more of the benefits that Kubernets brings to containers. Talos is an amplifier of the concepts of Kubernetes.

Where there is an issue that needs troubleshooting – just replacing a node makes things work. You can’t do that with Kubespray. Talos Linux encourages you to address infrastructure as cattle, which has systematic advantages in all parts of operations.

This article is based on the talk Jorik gave at TalosCon 2023 – watch the full talk below: