Civo: Building a public cloud powered by Talos Linux

Civo is a public cloud provider, focused on building a public cloud that challenges the narrative of what a hyperscaler can offer and deliver. They launched with a target of being developer focused, and on delivering a Kubernetes focused public cloud.

Getting servers up and running used to involve ordering hardware, waiting a few weeks for delivery, unboxing, racking, cabling, installing the Operating System, setting up switches and routers, etc – these things used to take  months. The increased agility that comes from being able to provision infinite storage and infinite compute on demand drives a lot of the move to the public cloud. However, if you are the one delivering the public cloud, you are still impacted by the time to get servers running, and have to deal with all the operational complexities behind that: datacenter choice, WAN connectivity, what hardware is in use, Operating System, how you manage your cloud and what is running on your own control plane to manage it, and so much more.

Civo had the goals of NEVER having to visit a datacenter, and also wanted a single interface to provision switches, routers and compute hardware. They decided that using bare PXE servers to provision with cloud-init for OS install, and DHCP options for networking hardware, would give them that.

As Civo was launching to be a cloud native public cloud, they evaluated OpenStack, OpenShift and CloudStack – but none of these options felt designed for modern cloud native infrastructure, and imposed too much overhead that wasn’t needed to just get Kubernetes running.

CoreOS was a good project, that was focussed on running K8s. It was early days, and had rough edges around things like TLS cert management. CoreOS was then acquired by RedHat, and, to some extent, lost its focus. In comparison, all the “regular” Linux distributions -Ubuntu, Redhat, Alpine – all had stuff that got in the way of just delivering Kubernetes – excess packages that needed to be removed, insecure defaults, and so on. However, they selected Ubuntu for their first iteration of their Kubernetes cluster service.

Civo wanted to give their internal teams the “it just works” experience for managing the cloud environment. With other systems, there was a lot of setup and configuration to get things right and to make it work. Their goal was to make it easy for their engineers, which would flow through to the customers having a really nice experience with everything just working.

They initially had a problem with configuration drift: where an engineer saw a problem, logged in to the machine, and made a manual tweak to fix it. When you are building systems designed to scale massively, such little tweaks make it very hard to manage at scale.

They used Ansible to manage the tenant workload and address configuration drift, but found themselves having to create more and more operators to manage the infrastructure.

They came across Talos Linux, and immediately fell in love with it. It solved all the problems they had wanted to solve with Core OS, and had been patching on top of Ubuntu to try to make it do what they wanted. Talos Linux was immutable and driven declaratively by a configuration file, so it eliminated configuration drift. Talos Linux was like a modern version of CoreOS, designed just for Kubernetes, and so was secure, and stayed out of the way and let K8s do what it is meant to do. The fact it is API managed, even for things like node reboots and upgrades, really fit with the Kubernetes operator pattern.

They “Absolutely loved it”, and started trying to get it into their platform as soon as possible.

Their first step was to offer Talos Linux as an option for end users to provision immutable secure Kubernetes clusters, with a one click install process. WIth Talos Linux, it takes about 90 seconds to launch a new cluster.

Civo loves Talos Linux from an operations point of view – it works seamlessly. It’s a drop in replacement for their Ubuntu based server operating systems, but much simpler to keep running. It was a drop in replacement to the PXE based build system, sending Talos kernel flags for configuration. Talos Linux delivers the “it just works” experience they were looking for, for their engineers.

With Talos Linux, they have achieved the dream of never having to go to the datacenter. Within 20 minutes of a servers arriving on site and being connecting to networking, the region is up and serving customers. They turn on the new hardware, it self-registers, builds and configures itself, and is available.

Talos Linux also fully eliminates the configuration drift, and is natively aligned with the operator paradigm of Kubernetes.

They currently have one of their regions fully built on Talos, and are planning the migration of existing infrastructure at other datacenters to Talos Linux from Ubuntu.

All new regions will be Talos Linux, and Talos will be the default tenant offering, instead of k3s.

Civo is working with Intel, Investigating running Talos API within SGX enclave.

This article is a summary of the talk Civo gave at TalosCon 2023. Watch the full talk below. All the talks of TalosCon are available here: TalosCon 2023 Kubernetes talks playlist