France’s National Railway Goes Cloud Native in Four Months, Breaking Through Silos and Slashing Production Incidents

Industry

Transportation

Location

Europe

Use Cases

Data Center Hybrid Edge

Challenge

Internal Silos and legacy systems

Massive amounts of real-time data

Environment

200 clusters across Azure and AWS public clouds and on-prem data center

One team brokering services for 400 internal projects

Data from 5000 trains per day

Why Sidero and Omni

Immutable OS with small attack surface

Easy to install new versions

Release management

Impact

90% fewer production incidents between IaaS and CaaS

66% less maintenance effort

Zero configuration drift

La Société Nationale des Chemins de fer Français (SNCF) is France’s state-owned national railway company, responsible for the country’s entire rail network, including high-speed intracity TGV trains. 

The SNCF relies on its Cloud Native Team to broker services for all the main IT divisions, covering 400 different internal projects across train management, tracks, train stations, rolling stock maintenance, finance, real estate, and more. The team also maintains dedicated open source involvement by contributing to CNCF projects, including Harbor and participating in the Platform Engineering Working Group.

Challenge: Attempts to Modernize Led to Complexity and Ineffective Solutions

The SNCF team processes real-time data from 4,000-5,000 trains daily to support critical passenger information systems across the entire Paris railway network, including information that needs to be shared with the public. They needed to modernize their applications in order to keep up with the endless flow of train data.

Given the complexity and size of data that must be managed on a daily basis, SNCF also wanted to provide seamless operations at the node level, systematizing the way they create, destroy, roll out, and autoscale. They wanted an immutable OS to install on the edge both in the trains and in the train stations which would provide immediate information back to their on-prem data center for analysis of the data.

Transitioning from public cloud to an open source Kubernetes platform in SNCF’s data center presented technical and operational challenges, including broader organizational transformation. Silos, years of established processes, and a 200-page security manifesto caused friction in moving their efforts forward. Teams that had long worked on traditional infrastructures struggled to align to cloud native approaches.

The Cloud Native Team began experimenting with Kubernetes in 2018 but failed to efficiently leverage and implement it at scale. They built a Kubernetes solution using Ubuntu with RKE2, but the year-long project was difficult and ultimately unsuccessful. To move forward, SNCF decided on four key principles. They would need to:

  • Treat nodes as unified OS-Kubernetes pairs with version coupling
  • Manage operations at the node-pool level to prevent configuration drift
  • Simplify security measures by reducing the cyberattack surface
  • Ensure reliable rollback capabilities for Kubernetes upgrades

Solution: Disparate Teams Simplify Infrastructure with Talos Linux

“We don’t need to ask the legacy teams to provide us with a modern solution to run Kubernetes. We’ve got our own out-of-the-box solution for Kubernetes, which is the Talos operating system and Kubernetes. We can just run it through OpenStack and then the magic happens.”

Thomas Comtet, Senior Staff Engineer, SNCF

These principles led SNCF to Talos Linux. Talos’s immutability and built-in security ensured compliance with that 200-page security manifesto. Talos provides release management, makes it easy to install new versions, and erases configuration drift, enabling the team to effectively manage infrastructure at scale.

SNCF uses Talos on-prem in the data center to manage live data, the real-time positioning of trains, real-time localization, and the communication of current train statuses to the public (eg. ETA and track). The team now manages approximately 200 Kubernetes clusters across their environment. 

The SNCF team developed their own tool (https://github.com/mstrohl/talos-cockpit) to replicate AKS auto-upgrade functionality for its on-prem environment, making data center operation possible in locations where direct cloud provider tools aren’t available. By making this open source, they are able to share expertise with the community and ensure others benefit from their learnings.

Results: 90% Fewer Incidents and 200 Clusters in 4 Months

“SNCF’s experience demonstrates that simplifying complex systems, rather than adding layers of complexity, leads to more effective outcomes. By embracing cloud native principles and tools like Talos, we created a consistent, efficient infrastructure that supports both its cloud and on-prem operations.”

Thomas Comtet, Senior Staff Engineer, SNCF

The SNCF Cloud Native Team has driven significant results in its operations, including the modernization of critical applications and facilitating non-public cloud compatible applications to the benefits of a cloud native architecture. 

SNCF has improved its technical stability through Talos, reducing production incidents between IaaS and CaaS by 90%, with only minor issues remaining. The team has also increased its efficiency, achieving a 66% reduction in maintenance efforts and eliminating configuration drift.

Talos made it easy for the team to work quickly, reducing development time and enabling them to have a production-ready, cloud-native solution in 4 months. SNCF’s success shows that by standardizing on a secure, minimal operating system and applying cloud native principles across environments, even a large national infrastructure can modernize quickly and operate with greater stability and efficiency.

Hobby

For home labbers
$ 10 Monthly for 10 nodes
  • Includes 10 nodes in base price
  • Limited to 10 nodes, 1 user
  • Community Support

Startup

Build right
$ 250 Monthly for 10 nodes
  • Includes 10 nodes in base price
  • Additional nodes priced per node, per month
  • Scales to unlimited Clusters,
    Nodes and Users
  • Community Support

Business

Expert support
$ 600 Monthly for 10 nodes
  • Volume pricing
  • Scales to unlimited Clusters,
    Nodes and Users
  • Talos Linux, Omni and Kubernetes support from our experts
  • Business hours support with SLAs
  • Unlimited users with RBAC and SAML

Enterprise

Enterprise Ready
$ 1000 Monthly for 10 nodes
  • Business plan features, plus...
  • Volume pricing
  • 24 x 7 x 365 Support
  • Fully Managed Option
  • Can Self Host
  • Supports Air-Gapped
  • Private Slack Channel
On Prem
available

Edge

Manage scale
$ Call Starting at 100 nodes
  • Pricing designed for edge scale
  • 24 x 7 x 365 Support with SLAs
  • Only outgoing HTTPS required
  • Secure node enrollment flows
  • Reliable device management
  • Can Self Host On Prem
  • Private Slack Channel
On Prem
available