I’m happy to announce that we have reached beta for our v0.3 release of Talos OS, our immutable operating system for kubernetes! We have some exciting changes in this release that I’m thrilled to share. In this post I will give an overview of the changes coming.
Let’s get started!
In previous versions of Talos we stuck to using static pods for the control plane, but over time we started to see that this approach didn’t quite fit the Talos philosophy. Modifying the control plane in any way meant that we had to expose a way for users to edit files in
/etc/kubernetes/manifests, and this didn’t jive with our read-only approach on things. So instead, we now use bootkube to bootstrap a self-hosted Kubernetes cluster. This allows our users to manage Kubernetes using Kubernetes primitives.
We ❤️ open source, so to give back to the community we have become maintainers of bootkube. You can expect the project to get a lot of ❤️ and upkeep. Please join us in making bootkube the de facto standard tool to create self-hosted clusters!
Upgrades are a huge story to any Linux distribution. In this release of Talos, we have fixed a number of 🐛 that prevented the upgrades of control plane nodes. We’re delighted to announce that upgrades of Talos now work for any machine type. This means that migrations to newer versions of Talos (starting with v0.3) are now possible.
With the 🐛 out of the way, we can now offer to our users an upgrade path that allows them to do canary type deployments of Talos. Starting with v0.3 we now offer 5 channels:
The latest channel is aggressive and meant more for development environments. This channel is updated on every successful merge into
master. Probably not something the common user will want, but we found it useful to catch 🐛 as early as possible while developing Talos.
The edge channel is released once a day from
master if and only if the code passes full conformance tests. Both Talos, and Kubernetes conformance tests are performed.
Alpha, Beta, Stable
The alpha, beta, and stable channels are derived from semantic versioning. Alpha and beta are pre-releases, while stable is a release with only major, minor, and patch.
As we stablized our upgrades and worked on upgrade channels, we were busy working on a controller that can roll out upgrades automatically across your cluster. With the controller you can carve up a cluster into what we are calling “pools.” A pool is implemented as a Custom Resource, and allows for fine-grained tweaking of upgrade paths for a subset of nodes within your cluster.
Here is an example of a pool:
--- apiVersion: upgrade.talos.dev/v1alpha1 kind: Pool metadata: name: control-plane namespace: talos-system spec: name: control-plane channel: beta registry: https://registry-1.docker.io repository: autonomy/installer concurrency: 1 onFailure: Pause checkInterval: 24h
Now, to add nodes to the pool, run a command like:
kubectl label node -l node-role.kubernetes.io/master='' v1alpha1.upgrade.talos.dev/pool=control-plane
The controller will now attempt to upgrade every 24 hours, serially, and use the beta channel to do so. You’re welcome 😉.
Note: The controller is currently considered experimental. Use at your own risk, and contact us if you run into any problems!
In v0.3 our goal was to get Talos running completely in RAM. Needless to say, we have 🎉! At only ~55Mb, our squashfs is mounted in RAM and nothing from Talos ever touches the disk. If you’re asking why that helps with security, I urge you to take a look at the slides from an awesome talk by Kelly Shortridge, and Dr. Nicole Forsgren at Black Hat 2019.
Did you read it? Great! With the new ephemeral approach of Talos, we complete the D.I.E. principle. With Kubernetes we get a distributed system, and with Talos we get immutable and ephemeral. We’re excited to align with D.I.E. and the great thing about it is that you get this all out of the box.
Patch Management: a new tool called “bldr”
In the processes of building Talos, we have had the need for special tooling around the build and packaging of the various libraries used by Talos (e.g. libseccomp, socat, iptables). Our solution is a project we are calling
bldr. I could probably go on for a while about what
bldr is and how it works, but I will save that for a later blog. Suffice it to say that it is an awesome tool that can be used to build any Linux distribution purely in containers. An added benefit is that it also provides a way to create minimal container images that have only the files they need. Very similar to distroless.
bldr we are now able to support multiple versions of Talos. This means that as we work on v0.4 and beyond, we can provide security patches to v0.3.
Linux Integrity Measurement Architecture (IMA)
In our quest to build a super reliable operating system, we decided to make Talos immutable. We have always thought it would be awesome if we could enforce the immutability to the degree that Talos would refuse to operate if any evidence of tampering was found. We thought we would have to implement something ourselves, but, come to find out, the kernel already has powerful tooling already built into it. That tooling is called “Integrity Measurement Architecture”, or IMA.
The short version is that IMA allows for the enforcement of immutability, and more. The kernel will first measure a file, and write extended attributes specific to IMA to the file. From there it is possible to appraise a file. This means that if there is a mismatch in the hash of a file that has already been measured, then access is denied to the file. This is an extremely powerful way to ensure immutability in addition to the read only squashfs that Talos is already running on.
It is important to note that in v0.3 we will only measure and not appraise. Our intention is to appraise files in a future release.
Official support for Ed25519 was landed in go 1.13. We now use Ed25519, instead of P-521, for all API traffic.
In this version of Talos, we are shipping
containerd with seccomp enabled. This allows users to set Seccomp profiles in pods.
v1alpha1 Machine Config
We have stabilized our
v1alpha1 config, and have added a number of fields that should cover a wide range of needs. With the expansion we also added in some validation logic to ensure that the config doesn’t contain any errors and is valid for a particular platform. To try it locally, run something like:
osctl config generate osctl validate --config init.yaml --mode cloud
There are times when you want to get some information from more than one machine at once. Maybe you want to compare the route table, or the health of the services running on the machines. To address this, we’ve added the ability to aggregate responses from multiple machines using a single command.
For example, say you want to look at the
etcd logs across all your control plane nodes at once. To do so simply specify multiple
osctl --nodes $ip1,$ip2,$ip3 logs etcd
osctl CLI has been revamped with better error handling and extra config options for specifying
endpoints (proxying nodes), and
nodes (the target nodes to perform API requests against).
For any project to be successful, documentation must be in order. In this release we have created a site dedicated to the Talos project: www.talos.dev.
This is a site for engineers. Learning just what makes Talos different, understanding the design, how to deploy Talos, and how to connect to our community are all things you can expect to take away from this site. We have a lot more content on the way, but for the beta we documented as much as we could. Please 🙏, take a look and tell us how it could be better!
In addition to AWS, Azure, GCP, Packet, KVM, and Docker, we now support Talos in:
- and Digital Ocean
Latest Stable Kubernetes
True to our promise to track upstream Kubernetes as closely and quickly as possible, Talos ships with Kubernetes 1.17!
Latest Stable Linux
In this version of Talos we started to ship with the latest stable version of Linux (5.3.x at the time of this writing), and we will continue this in future versions. We took inspiration from Greg Kroah-Hartman in this decision.
To quote him:
I rely on the development and latest stable releases to ensure that my machines are running the fastest and most secure releases that we know how to create at this point in time.
A More Robust Networking Stack
We spent a lot of time getting our networking stack done right in this version of Talos. Lots of stability improvements, but one thing I’d like to point out is that we now have support for bonding. This is a feature we have had users ask for in bare-metal clusters.
As always, we are looking for ways to make Talos more stable. In this release we have ironed out a number of 🐛.
In addition to 🐛 fixes we have changed the architecture of Talos to rely on an external load balancer for HA Kubernetes. This has improved the reliability of the system, and made the mental model simpler.
osctl from our beta release and use it to follow our quick start. If you’re eager to try it out on the cloud or on bare metal, we’ve got you covered. Our guides can show you how to get a cluster in AWS, Azure, Digital Ocean, GCP, Packet, VMware, or on Metal!
As you try Talos out, we would appreciate any kind of feedback you may have, bad or good. Please 🙏, tell us about your experience by creating GitHub issues and/or by joining our Slack.
Throughout the development of v0.3 we have had some awesome members join our community that have been extremely valuable. The feedback has helped guide the project into a direction that makes Talos compelling. We look forward to growing this community!
Enjoy v0.3! 👋