Kubernetes And Kernel Panics
<p>How Netflix’s Container Platform Connects Linux Kernel Panics to Kubernetes Pods</p>
<p><em>By Kyle Anderson</em></p>
<p>With a recent effort to reduce customer (engineers, not end users) pain on our container platform <a href="https://netflixtechblog.com/tagged/titus" rel="noopener ugc nofollow" target="_blank">Titus</a>, I started investigating “orphaned” pods. There are pods that never got to finish and had to be garbage collected with no real satisfactory final status. Our Service job (think <a href="https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/" rel="noopener ugc nofollow" target="_blank">ReplicatSet</a>) owners don’t care too much, but our Batch users care a lot. Without a real return code, how can they know if it is safe to retry or not?</p>
<p>These orphaned pods represent real pain for our users, even if they are a small percentage of the total pods in the system. Where are they going, exactly? Why did they go away?</p>
<p><a href="https://netflixtechblog.com/kubernetes-and-kernel-panics-ed620b9c6225"><strong>Read More</strong></a></p>