How Nokia Runs One of the World's Largest Private Clouds on Talos Linux

Industry

Communications
IT

Location

Global

Use Cases

Bare Metal

Challenge

Scaling operations by orders of magnitude

Strict security requirements

Consistent Kubernetes across global data centers and edge

Environment

11,000 servers in 350 racks across 7 data centers

320 Kubernetes clusters across 130K cores

Why Sidero and Omni

Efficient for large-scale environments

Secure and reliable with minimal overhead

Impact

1300% increase in core count without added complexity

Reduced overhead costs and complexities

Reduced system binaries from thousands to tens

Nokia employs some 90,000 individuals across 100+ countries and generates more than $25 billion in revenue across telecommunications, information technology, and other business units. Much of its internal development runs on the Nokia Enterprise and Services Cloud (NESC), a private cloud platform managed by Nokia’s internal engineering team and first developed by Janne Heino. Built to mirror the capabilities of public cloud, NESC delivers infrastructure-as-a-service to business units worldwide.

The platform spans 11,000 servers across 7 data centers and enables engineering teams to build, deploy, and manage workloads without handling the complexity of the underlying infrastructure.

Challenge: Legacy Systems Crack Under Growth

As NESC grew, so did the complexity of operations. Systems that worked at low volume began breaking under scale. Repositories built to support 500 installs per year broke when forced to handle 10,000 installs every morning, and fixing one bottleneck often exposed another and resulted in continual upgrades.

At the same time, Kubernetes was becoming the platform of choice for internal teams. In 2019, NESC began planning Kubernetes as a managed service, but delivering it at Nokia’s scale required a secure, reliable operating system that wouldn’t add unnecessary overhead.

The NESC team had been running RedHat and CentOS, which installed a huge number of default services and packages, introducing security risks and making maintenance more complex. They needed a tool that would enable them to work at scale, and they needed an operating system designed specifically for Kubernetes: minimal, secure, and built for automation.

Solution: Minimal, Purpose-Built and API-Driven with Talos Linux

The NESC team selected Talos Linux as the operating system for their Kubernetes nodes. Talos is minimal by design, removing unnecessary components and interfaces like SSH and shell access. Its API-driven architecture aligned with NESC’s need for full automation, consistency, and security across large-scale environments.

The same deployment model now runs across Nokia’s global data centers and edge racks, with one Kubernetes cluster for the NKS master, which runs the API and local datastores, another cluster for monitoring, and one or more customer clusters.

They now offer Talos Linux Kubernetes on bare metal and also virtual machines running Talos Linux for Kubernetes on OpenStack. With Talos, the NESC team is able to easily monitor and automatically repair the nodes in the cluster. The team also deploys standard logging and monitoring into the clusters, and handles upgrades of Kubernetes and the logging and monitoring apps. The internal customer is responsible for user management, service accounts, and other details. NESC uses ArgoCD to deploy the clusters, but end users can also deploy ArgoCD themselves.

For consistent cluster lifecycle management, the NESC team developed a custom layer across OpenStack, Sidero Metal, and VMware, and Talos Linux was standardized as the Kubernetes OS across all platforms.

Results: 320 Clusters, 130K Cores, 55K Pods – One OS

“Talos Linux is faster; it’s lighter; it does what it is supposed to – and not something else.”

Janne Heino, Head Of Nokia Global Services Cloud Architecture

Once the NESC team implemented Talos Linux, Kubernetes adoption surged. In one year, usage jumped from 10,000 to 130,000 cores. The team now operates 320 clusters on 130,000 cores, with 5,400 virtual nodes and 380 bare-metal nodes supporting over 55,000 active pods.

Talos Linux significantly reduced operational overhead, freeing resources for workloads and improving latency. Systems that previously had thousands of binaries now had only a few dozen, reducing the potential attack surface and simplifying system management.

NESC’s total cost of operations, including hardware, staffing, data center space, and power, is now roughly one-third the cost of equivalent public cloud services. By adopting Talos Linux, Nokia’s NESC team delivered a Kubernetes platform that is scalable, secure, and operationally efficient, supporting thousands of engineers every day.

For more details, watch their talk from TalosCon 2023.