How Tremor Video Scaled Kubernetes for Billions of Requests—and Saved Millions Annually

Industry

Ad Tech

Location

United States

Use Cases

Bare Metal
Cloud

Challenge

Large-scale low-latency compute

Complex environment of integrations and microservices

Environment

On-prem and cloud

Managed by three developers part-time

Why Sidero and Omni

Support for cloud bursting

Fast processing speed

Declarative provisioning

Impact

Savings on licensing and cloud usage costs

Unified management

Secure, read-only nodes as cattle

Tremor Video (now part of Nexxen) connects publishers and advertisers through pre-roll video ads and impactful brand stories across all screens. The Nexxen platform includes a demand-side platform (DSP), supply-side platform (SSP), ad server, and data management platform (DMP).

Challenge: Handling Billions of Requests in 100ms or Less

Tremor Video processes several billion requests a day involving complex algorithmic bidding and matching, all of which must be completed in less than 100 ms. Furthermore, they operate several microservices of varying resource requirements and must integrate with several acquisitions. Across all of this, the team needs to be able to scale quickly and with low latency. To keep costs down, the team runs most of their infrastructure on bare metal but at times need to burst capacity.

This work was managed by a small, busy team, amplifying their need for scalable, cost-efficient, and reliable cluster management.

After finding limitations with Docker Swarm and Consul, the Tremor Video team moved to Kubernetes to deploy and scale apps, with services generally running on bare metal nodes and advertised to Consul. Each pod was assigned its own routable IP address, avoiding ingresses and load balancers to minimize latency, and using Kube-router to peer to the top of the rack via BGP and announce the pod IPs.

They built their first Kubernetes clusters by hand with Kubeadm. When certificate expiration triggered what the team called a “Kubesplosion,” it became clear they needed a more manageable, scalable solution. They turned to KubeSpray but found that, as clusters grew, so did the deployment times. With this setup, it took over two hours to deploy one in a 100 node cluster.

Tremor Video needed a solution that would give them low latency at scale with minimum effort.

Solution: From Kubeadm Bottlenecks to Automated Cluster Deployment with Talos Linux

Then, the Tremor Video team discovered Talos Linux in a Reddit post. They were drawn to Talos’s emphasis on efficiency, security, and appliance-style management. They wanted to treat their nodes like cattle, not pets–so they moved to Talos.

Tremor Video runs control plane nodes as VMs on VMware, increasing the dedication of bare-metal resources for their applications. With just three developers, they can now automate deployments, declaratively defining and scaling Kubernetes clusters—including worker nodes—with a single command.

Results: Reliable Infrastructure at Drastically Lower Costs

The team can now have a cluster up and running in 20 minutes.

Talos delivers a streamlined, appliance-like experience, allowing Tremor Video to run low-latency applications efficiently on bare-metal Kubernetes with immutable infrastructure and automatic lifecycle management. Talos is a minimal Linux distribution with the rootfs mounted read-only, ensuring the consistent management of nodes and eliminating manual modification.

By turning to bare metal for most of their processing and using data centers only to burst capacity, Tremor Video has saved millions of dollars in personnel costs. Performance has improved significantly since the shift to bare metal.

Purpose-built for Kubernetes, Talos simplified the process of bootstrapping highly-available Kubernetes clusters on bare-metal, shortening the time and complexity of integration. Talos’s automatic machine lifecycle management enables nodes to automatically provision, maintain, and terminate themselves without manual intervention.

For more details, watch their talk from TalosCon 2023.