Kubernetes And Kernel Panics

How Netflix’s Container Platform Connects Linux Kernel Panics to Kubernetes Pods By Kyle Anderson With a recent effort to reduce customer (engineers, not end users) pain on our container platform <a href="https://netflixtechblog.com/tagged/titus" rel="noopener ugc nofollow" target="_blank">Titus</a>, I started investigating “orphaned” pods. There are pods that never got to finish and had to be garbage collected with no real satisfactory final status. Our Service job (think <a href="https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/" rel="noopener ugc nofollow" target="_blank">ReplicatSet</a>) owners don’t care too much, but our Batch users care a lot. Without a real return code, how can they know if it is safe to retry or not? These orphaned pods represent real pain for our users, even if they are a small percentage of the total pods in the system. Where are they going, exactly? Why did they go away? <a href="https://netflixtechblog.com/kubernetes-and-kernel-panics-ed620b9c6225">Click Here</a>