Blog Icon

Blog Post

Understanding Kubernetes pod evicted and scheduling problems

NEW!! LIVE WEBINAR: Introduction to Instrumenting Apps with Prometheus - August 13, 2020 10am Pacific / 1pm Eastern

Pod evicted and scheduling problems are side effects of Kubernetes limits and requests, usually caused by a lack of planning.

Beginners tend to think limits are optional, and merely an obstacle for your stuff to run. Why should I set a limit if I can have no limits? I may need all CPU eventually.

With this way of thinking Kubernetes wouldn’t have gone far. Fortunately, Kubernetes developers had this in mind, and the quota mechanism is designed to avoid misuse of resources.

When you create a pod for your application, you can set requests and limits for CPU and memory for every container inside.

Properly setting these values is the way to instruct Kubernetes how to manage your app resources. Kubernetes gives all the pods a score based on its limits and requests and is ready to kick from the cluster those pods which don’t comply with the fair use rules.

Setting requests is declaring how many resources your containers need to run in a normal operation.

Setting limits is declaring how much memory or CPU can occasionally be used.

Pod eviction

Kubernetes QoS depending on requests and limits

There are two different quotas for containers in Kubernetes:

resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 768Mi
  • Requests: This value is used for scheduling. It’s the minimum amount of resources a container needs to run. Be careful, the request does not mean these are always dedicated resources for the container.
  • Limits: This is the maximum amount of this resource that the node will allow the containers to use.

Both of them can be used with CPU and memory with different implications, as we covered in this post.

There are three kinds of pods depending on their quota settings:

  • Guaranteed: Pods that have request and limit in all of the containers, and they must be the same.
  • Burstable: Non “guaranteed” pods with at least one CPU or memory request in one of the containers.
  • Best effort: Pods without requests or limits of any kind.

As we will see in this post, the QoS type of pod is important when allocating and reclaiming resources.

Scheduling problems due to poorly set request values

Cluster pod allocation is based on requests (CPU and memory). If a pod requires (claims a request) larger than available CPU or memory in a node, the pod can’t be run on that node. If none of the cluster nodes have enough resources to run the pod, the pod will remain pending of schedule until there are enough resources.

Requesting too many resources will make your pods difficult to schedule. In addition, if your request is higher than your regular use of resources, those resources requested and not used cannot be allocated to other pods, reducing operational capacity of the cluster. Your cluster admin won’t be happy, and they will probably let you know.

If a pod is waiting to be allocated and the scheduler can’t find a node with enough resources to run the pod, it will remain in “pending” phase until there are enough resources:

NAME       READY   STATUS    RESTARTS   AGE
frontend   0/2     Pending   0          10s

A “kubectl describe pod” command will give information about the issue:

Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  44s (x2 over 44s)  default-scheduler  0/4 nodes are available: 4 Insufficient memory.

Be aware, sometimes the needed resources to deploy a pod are not what they seem. Pod effective request used for allocation is the highest of these two values:

  • The sum of the requests of the containers inside the pod
  • The request of any init container.

For example, if you have a pod with this definition:

apiVersion: v1
kind: Pod
...
  containers:
  - name: myapp-container
    image: busybox:1.28
    resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 768Mi
  - name: myapp2-container
    image: busybox:1.28
    resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 768Mi
...
  initContainers:
  - name: init-myservice
    image: busybox:1.28
    command: ['sh', '-c', 'sleep 3']
    resources:
          requests:
            cpu: 300m
            memory: 750Mi

The containers that will run the application sum 200 millicores of CPU and 256 MB of requests. However, the effective pod request used to schedule the pod, and the amount of resources marked as occupied, will be 300m and 750MB as requested per the init container.

Having a clear perspective of allocatable resources in your cluster will enable cluster admins to better plan their needs depending on present and expected workloads. Having a good insight of which pod is really using requested resources will be an invaluable tool to maximize cluster occupation and density of applications per node.

Kubernetes cluster capacity
Find these metrics in Sysdig Monitor in the dashboard: Kubernetes → Resource usage → Kubernetes cluster and node capacity

Pod evicted problems

When a node in a Kubernetes cluster is running out of memory or disk, it activates a flag signaling that it is under pressure. This blocks any new allocation in the node and starts the eviction process.

Tip: You can find this information in Sysdig monitor dashboards.

node under pressure information

At that moment, kubelet starts to reclaim resources, killing containers and declaring pods as failed until the resource usage is under the eviction threshold again.

First, kubelet tries to free node resources, especially disk, by deleting dead pods and its containers, and then unused images. If this isn’t enough, kubelet starts to evict end-user pods in the following order:

  • Best Effort.
  • Burstable pods using more resources than its request of the starved resource.
  • Burstable pods using less resources than its request of the starved resource.

You can see some messages like these if one of your pods is evicted by memory use:

NAME       READY   STATUS    RESTARTS   AGE
frontend   0/2     Evicted   0          10s
Events:
  Type     Reason               Age    From                                                  Message
  ----     ------               ----   ----                                                  -------
  Normal   Scheduled            12m    default-scheduler                                     Successfully assigned test/frontend to gke-lab-kube-gke-default-pool-02126501-qcbb
  Normal   Pulling              12m    kubelet, gke-lab-kube-gke-default-pool-02126501-qcbb  pulling image "nginx"
  Normal   Pulled               12m    kubelet, gke-lab-kube-gke-default-pool-02126501-qcbb  Successfully pulled image "nginx"
  Normal   Created              12m    kubelet, gke-lab-kube-gke-default-pool-02126501-qcbb  Created container
  Normal   Started              12m    kubelet, gke-lab-kube-gke-default-pool-02126501-qcbb  Started container
  Warning  Evicted              4m8s   kubelet, gke-lab-kube-gke-default-pool-02126501-qcbb  The node was low on resource: memory. Container db was using 1557408Ki, which exceeds its request of 200Mi.
  Warning  ExceededGracePeriod  3m58s  kubelet, gke-lab-kube-gke-default-pool-02126501-qcbb  Container runtime did not kill the pod within specified grace period.
  Normal   Killing              3m27s  kubelet, gke-lab-kube-gke-default-pool-02126501-qcbb  Killing container with id docker://db:Need to kill Pod

Guaranteed pods are supposed to be safe in case of eviction. If you are not setting requests and limits in your pods, this is a very good reason to do so. Setting those values properly can protect you from unexpected outages.

There is a special exception to this. If some of the system services require more resources than the amount reserved for them, and there are only guaranteed pods, kubelet will evict those pods in order of resource usage until the pressure is gone.

Conclusion

A good configuration of your requests and limits are important for the coexistence of different applications in the cluster. Understanding those limits will allow you to have happy on call rotations.

Some lessons you should learn from this are:

  • Protect your critical pods setting values so they are classified as Guaranteed.
  • Burstable pods are fine for most of the tasks.
  • Be careful and set reasonable pod limits and requests. This will help to adjust cluster capacity and improve pod evicted issues.
  • Try to avoid Best effort pods, keep this for things with no time constraints or that are fault tolerant.

A good monitoring system like sysdig monitor will help you to ensure you avoid pod evicted and pending pods. Request a demo today!

Share This

Stay up to date

Sign up to receive our newest.

Related Posts

Understanding Kubernetes limits and requests by example

How to monitor Golden signals in Kubernetes

Prometheus monitoring and Sysdig Monitor: A technical comparison.