10 practical Kubernetes checks for 7 compliance frameworks

10 Kubernetes checks for 7 compliance frameworks

Marcus Ross, CCoE Lead at Hamburg Port Authority, presented this talk at TalosCon 2025 in Amsterdam.

Hamburg Port Authority operates Germany's largest seaport and Europe's largest rail port. It manages a complex trimodal logistics operation spanning rail, inland waterway, and lorry transport. The organization runs critical infrastructure, which means compliance isn't optional.

The HKS/Cloud team responsible for the platform is small: three platform engineers and one SRE. They manage approximately 30 Kubernetes clusters across a hybrid environment that includes edge deployments, on-premises infrastructure, and cloud resources. The operational model adds a layer of complexity: HPA maintains autonomous product teams with both internal and external developers who expect full access to Kubernetes capabilities.

The team must address seven distinct frameworks:

When technology meets compliance

How does a team of engineers handle compliance without having to become professional auditors or hire a lawyer?

Rather than starting from the regulatory text, the team inverted the problem. They followed a four-step process: review existing best practices to establish a technical baseline, map those practices against framework requirements, identify gaps and prioritize remediation, then fill those gaps with a concrete action plan.

In the end, they found that good engineering and compliance requirements overlap more than most teams realize. The checks that follow are things a diligent platform team would implement anyway: the compliance mapping is a secondary benefit.

Check 1: Restrict API server access

Frameworks: ✅ NIST AC-3/6 ✅ SOC2 CC6.1 ✅ ISO27001 A.9.4.1 ✅ GDPR Art. 32

The first compliance control involves limiting access to the Kubernetes API server to authorized networks only: private VPCs or approved IP ranges.

In vanilla Kubernetes, the static manifest at /etc/kubernetes/manifests/kube-apiserver.yaml controls the bind address and port through command-line arguments. The --bind-address and --secure-port flags (defaulting to 6443) determine where the API server listens. Additional access restriction mechanisms include iptables or nftables rules on nodes, network policies to restrict API server traffic, and custom authorization webhooks for dynamic IP filtering.

Talos Linux exposes only the Kubernetes API on port 6443 and the Talos API on port 50000. Since Talos eliminates shell access and direct filesystem manipulation, configuration happens through the machine config pushed to control plane nodes:

Check 2: Enable Pod Security Admission (PSA)

Frameworks: ✅ CIS 5.2.X ✅ NIST CM-7/AC-6 ✅ ISO27001 A.8.9 ✅ PCI Req. 2/7 ✅ SOC2 CC6.1/2 ✅ NIS2 Art. 21 ✅ GDPR Art. 25/32

The goal here is to enforce Pod Security Standards, specifically the restricted profile, to block privileged pods, hostProcess access, and unsafe sysctls.

As Pod Security Policies have been deprecated in favor of Pod Security Standards enforced through the Pod Security Admission controller, the definition of security standards have been cleanly separated from their enforcement mechanism. PSA has been enabled by default since Kubernetes 1.25.

But there's a catch. Enforcement requires labeling namespaces. If a namespace isn't labeled, the default behavior is always privileged, effectively opening doors with no restrictions. What happens when autonomous product teams simply don't label their namespaces? The SOC or CISO demanding compliance doesn't know, and there's nothing stopping it.

The default can be changed by passing a configuration argument to the API server:

There's a deeper problem: PSA doesn't apply to static pods. Anyone who can log into a control plane node and drop a YAML file in that directory can bypass PSA entirely. A malicious pod using hostNetwork: true or privileged: true placed there will run without any enforcement.

The Talos mitigation is structural: no shell access means no way to drop files into /etc/kubernetes/manifests/. For runtime detection, the team uses Falco rules configured to spawn alerts when PSA violations occur.

Check 3: Encrypt secrets at rest

Frameworks: ✅ CIS 1.2.27 ✅ NIST SC-28 ✅ ISO27001 A.8.24 ✅ PCI Req. 3 ✅ SOC2 C1/CC6 ✅ NIS2 Art. 21 ✅ GDPR Art. 32

Kubernetes secrets are just base64-encoded data, not encrypted. Etcd encryption needs to be enabled explicitly.

On vanilla Kubernetes, encryption is off by default. To enable it, configure --encryption-provider-config on the API server. The configuration specifies which resources get encrypted, not just encoded. Importantly, operators should be selective: the etcd database defaults to 2 GB and suggests a max of 8 GB, so encrypting and decrypting everything is a practical concern. A reasonable approach encrypts only secrets, configmaps, and specific CRDs:

For external secret management, options include Sealed Secrets, External Secrets Operator, and Vault CSI Provider.

Because Talos Linux runs upstream Kubernetes, it fully supports etcd encryption and integrates with KMS providers, including AWS KMS, HashiCorp Vault, and GCP KMS. Talos stores etcd data at /var/lib/etcd/.

Check 4: Enable Audit Logging

Frameworks: ✅ CIS 3.2.1 ✅ NIST AU-2 ✅ ISO27001 A.8.15 ✅ PCI Req. 10 ✅ SOC2 CC7/CC6 ✅ NIS2 Art. 21 ✅ GDPR Art. 33-34

The practical question isn't whether to log, but what to log. Does an organization really need 10 terabytes of logs for 19 days of data? The standard approach uses --audit-policy-file and --audit-log-path flags on the API server. Retention requirements vary: PCI DSS requires at least 90 days, NIST requires one year.

Log level selection matters significantly. The Kubernetes documentation defines four levels (None, Metadata, Request, and RequestResponse), and the right choice depends on what your auditors and internal SOC actually need.

Talos Linux supports audit logging through the machine config's apiServer section with extraArgs. Logs are streamed to stdout/stderr by default (immutable/ephemeral). The Talos documentation includes built-in Fluent Bit examples for shipping logs to external storage, as logs should not remain on the Talos machine itself.

Check 5: Enforce network policies

Frameworks: ✅ CIS 5.3.1 ✅ NIST AC-4 ✅ ISO27001 A.8.19 ✅ PCI Req. 1 ✅ SOC2 CC6 ✅ NIS2 Art. 21 ✅ GDPR Art. 31

Restrict pod-to-pod traffic using NetworkPolicy with a deny-all default, then allow only explicit connections. The default implementation is straightforward:

The real-world complications come from two directions. First, CNI choice matters. Cilium and Calico are good options, and when using cloud providers, it's worth verifying what CNI they actually use. Second, autonomous product teams with namespace-level access can simply delete the default network policy.

Rather than building configurations from scratch, Ross recommended Ahmed's repository of network policy examples, which pairs traffic flow diagrams with manifest templates. For enforcement, the team uses Kyverno's generator functionality: when a developer modifies or deletes a network policy, Kyverno detects the change and restores the baseline configuration from a template. Kubewarden with Common Expression Language is another option worth watching.

Because Talos Linux runs vanilla upstream Kubernetes, it works with any CNI. The minimal OS footprint also means fewer host-level networking packages to worry about.

Check 6: Disable anonymous auth & use Ssrong RBAC

Ensure no anonymous access to the API server and enforce least-privilege RBAC.

Frameworks: ✅ CIS 1.2.1 ✅ NIST AC/IA ✅ ISO27001 A.5.X ✅ PCI Req. 7/8 ✅ SOC2 CC6/7 ✅ NIS2 Art. 21 ✅ GDPR Art. 32

Setting --anonymous-auth=false seems straightforward. Requests without credentials get rejected. But examining audit logs reveals what actually happens: unauthenticated requests still pass through the API server, mapped to the username system:anonymous with the group system:unauthenticated. RBAC denies the actions afterward, but the requests are being processed.

The danger is in misconfigured RBAC rules. If anyone grants permissions to system:anonymous or system:unauthenticated, those anonymous requests gain real access. Closing the front door isn't enough; RBAC rules need regular auditing.

For auditing, teams can export roles and cluster roles manually with kubectl get roles and kubectl get clusterroles --all-namespaces. The rbac-view plugin (available via Krew) can be used to visualize RBAC configurations in a format accessible to both engineers and management, which is useful for compliance documentation and for making the security posture legible to CIOs and CTOs.

Talos Linux reinforces this check structurally: with no SSH access and no local users, RBAC becomes the only path to cluster resources. Anonymous auth is disabled via machine config:

Check 7: Scan images for vulnerabilities

Ensure all container images are scanned for CVEs before deployment and automatically rejected if critical vulnerabilities exist.

Frameworks: ✅ CIS Control 5.2 ✅ NIST RA-5/SR-11 ✅ ISO27001 A.8.8 ✅ PCI Req. 6/11 ✅ SOC2 CC7.1-3 ✅ NIS2 Art. 21 ✅ GDPR Art. 32

The implementation pattern involves integrating scanners like Trivy, Clair, or Grype into the CI/CD pipeline, enforcing image signing with Cosign or Notary, and generating SBOMs (Software Bill of Materials) for supply chain compliance, which is increasingly required under the EU Cyber Resilience Act. SBOMs can be generated with tools like cdxgen:

Talos Linux doesn't have a machine-config switch to enforce signed images at the OS level. The solution is cluster-level policy enforcement: Sigstore's policy-controller, Kyverno image verification policies, or Ratify with Gatekeeper/ValidatingAdmissionPolicy. Registry mirror and auth configuration in Talos can restrict which registries are allowed, keeping pulls on trusted paths.

Check 8: Enable automatic updates for K8s components

Ensure the Kubernetes control plane and nodes are automatically patched.

Frameworks: ✅ NIST SI-2 ✅ ISO27001 A.8.8/9 ✅ PCI Req. 6.2 ✅ SOC2 CC7/8 ✅ NIS2 Art. 21 ✅ GDPR Art. 32

On vanilla Kubernetes, the options include running kubeadm upgrade on a regular schedule (automatable with Ansible), using managed Kubernetes services with auto-upgrade enabled, or adopting Cluster API for lifecycle management. Umbrella Helm charts for cluster services like External Secrets Operator and cert-manager help keep the broader ecosystem current.

Talos Linux makes this significantly more tractable. Updates are atomic. There's no apt or yum to maintain separately. Upgrades and rollbacks are single commands:

Check 9: Enforce immutable pods & read-only root filesystems

Ensure pods run with read-only root filesystems and immutable containers where possible.

This doesn't map to any of the compliance frameworks covered in this talk, but it’s the right thing to do.

The implementation pattern treats pods as replaceable rather than modifiable: make ConfigMaps and Secrets immutable (immutable: true), mount configs read-only, and use GitOps tooling like ArgoCD or FluxCD to enforce "replace not modify" deployment behavior.

Talos Linux extends this philosophy to the OS itself. The entire operating system runs as an immutable, read-only filesystem. Talos was built this way from the ground up, so immutable pods are a natural extension of the underlying platform rather than an afterthought.

Check 10: Backup etcd & test disaster recovery

Ensure etcd backups are automated, encrypted, and tested for recovery.

Frameworks: ✅ NIST CP-9/10 ✅ NIS2 Art. 21 ✅ SOC2 A1.2/A1.3 ✅ GDPR Art. 32 ✅ ISO27001 A.8.12

A note on tooling: etcdctl is largely deprecated. Use etcdutl for snapshots instead. Beyond the backup itself, the critical work is the recovery playbook and regular DR drills – restoring to a staging cluster on a schedule so that the procedure is tested before it's needed. Velero and Kasten are solid options for broader cluster backup coverage.

Talos Linux has built-in etcd backup and restore via talosctl:

Watch the whole talk here.