SSH is like opening the hood of your car while driving 70mph to adjust the engine. It works fine, until it doesn’t…
Consider this: You go weeks with everything running smoothly. You follow the process, write great code, and don’t SSH into a node right before going to bed on Friday night. One day, an issue hits. Your manager is demanding a fix, and suddenly there’s no time to follow protocols, even if you’re the one who wrote the protocols in the first place. You need to get this done and move on to the next thing. You sigh and do a manual fix.
I’ve heard that even at Amazon, engineers still SSH into nodes when they need to fix something fast. That’s not surprising. When the system lets you take a shortcut, and you’re under pressure, you will. I will. I’ve done it, and rarely does anyone notice.
Years ago, when I was managing Kubernetes clusters, I really believed I could change the culture around SSH. I told my team not to do it, I told myself not to do it, and we all did it anyway. The system made it easy. SSH let us scratch the itch–and get the job done–immediately. It wasn’t so much a technical issue as a human one
Of course, this is what leads teams to suddenly having a 20-node cluster, with five machines that were all slightly different, and no one brave enough to touch any of them.
On top of that, these practices make auditing harder, increase the chance of human error, and open the door to security issues, but we just can’t stop doing them.
Asking the hard question: do you really need it?
At first, the answer always feels like “yes.”
Of course, we need to fix things fast.
Of course, we need flexibility.
Yes, we need this once in a while, maybe not for everyone else, but at least for me!
I recommend taking a step back to ask, “Is this really about SSH?” Or is it about sticking to the known, assuming we have to do it this way because we always have?
When we challenge that mentality, we open the door to a whole new way of thinking. Because the truth is, if you really need SSH to manage your infrastructure, you’re building based on hope.
Talos Linux reimagines the operating system with APIs instead of shells. That might sound like it takes power away from developers, but it’s the exact opposite.
Without the ability to tinker around, you’re forced to rely on the system’s real strength: automation, consistency, and predictability. Changes are validated, reproducible, and traceable by design. You configure it, provision it, move on, and trust that it will work just fine. No more babysitting.
When you remove SSH, you remove the excuse
Talos Linux isn’t really opinionated, I would say. We just believe you shouldn’t have a bunch of stuff you don’t actually need.
Removing SSH isn’t just a technical decision but rather a cultural one. With Talos Linux, you don’t need to debate whether or how to enforce best practices. Best practices become the only practices.
And in return, you get machines that behave the same every time, upgrades that don’t break, and infrastructure that’s easier to understand and harder to mess up. Plus, you never again have to get up on Monday morning and wonder if someone quietly SSH’d into prod last night.
Instead, we rely on APIS. This gives us a controlled, declarative way to interact with the system. That means changes are validated, consistent, and auditable by default. Simply define what the system should look like, and the Talos API enforces that state.
This choice is also part of what makes Omni powerful. As the enterprise-grade control plane built specifically for Talos Linux, Omni is built to take full advantage of Talos’s API-driven architecture. Together, they form a tightly integrated system that’s consistent, reliable, and effortless to manage, because there’s no hidden state and no manual drift to fight against.
In rare situations, we know you might feel you have to get a shell. That’s why we created a documented fallback using kubectl debug to access a node via an alpine container. It’s not a full shell, and it’s not ideal, and it’s not the default, but if you need it, we won’t stop you.