Tremor Video processes several billion requests a day involving complex algorithmic bidding and matching, all of which must be completed in less than 100 ms. Furthermore, they operate several microservices of varying resource requirements and must integrate with several acquisitions. Across all of this, the team needs to be able to scale quickly and with low latency. To keep costs down, the team runs most of their infrastructure on bare metal but at times need to burst capacity.
This work was managed by a small, busy team, amplifying their need for scalable, cost-efficient, and reliable cluster management.
After finding limitations with Docker Swarm and Consul, the Tremor Video team moved to Kubernetes to deploy and scale apps, with services generally running on bare metal nodes and advertised to Consul. Each pod was assigned its own routable IP address, avoiding ingresses and load balancers to minimize latency, and using Kube-router to peer to the top of the rack via BGP and announce the pod IPs.
They built their first Kubernetes clusters by hand with Kubeadm. When certificate expiration triggered what the team called a “Kubesplosion,” it became clear they needed a more manageable, scalable solution. They turned to KubeSpray but found that, as clusters grew, so did the deployment times. With this setup, it took over two hours to deploy one in a 100 node cluster.
Tremor Video needed a solution that would give them low latency at scale with minimum effort.