Kubernetes Security Guide, Chapter 2: Kubernetes Security Context and Kubernetes Network Policy

By on April 4, 2018
Falco GCSCC Kubernetes

Once you have defined Kubernetes users and services credentials and permissions, we start leveraging Kubernetes orchestration capabilities to configure security at the pod level. We will learn how to use Kubernetes Security Context, Pod Security Policy and Network Policy resources to define the container privileges, permissions, capabilities and network communication rules. In this section we will also discuss how to limit starvation with Kubernetes resource allocation management.

Kubernetes admission controllers

An admission controller is a piece of code that intercepts requests to the Kubernetes API server prior to persistence of the object, but after the request is authenticated and authorized. They pre-process the requests and can provide utility functions (like filling out empty parameters with default values), but can also be used to enforce further security checks.

Again, this configuration can be found on the kube-apiserver conf file we mentioned before:

--admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota

Let's focus on the admission controllers that can help you strengthen your cluster security:

DenyEscalatingExec: Forbids executing commands on an "escalated" container. This includes pods that run as privileged, have access to the host IPC namespace, and have access to the host PID namespace. Without this admission controller, a regular user can escalate privileges over the Kubernetes node just spawning a terminal on these containers.

NodeRestriction: This admission controller limits the node and pod objects a kubelet can modify. Using this controller, a Kubernetes node will only be able to modify the API representation of itself and the pods bound to this node.

PodSecurityPolicy: This admission controller acts on creation and modification of the pod and determines if it should be admitted based on the requested Security Context and the available Pod Security Policies. The PodSecurityPolicy objects define a set of conditions and security context that a pod must declare in order to be accepted into the cluster, we will cover PSP in more detail below.

ValidatingAdmissionWebhooks: This admission controller (in beta state as of Kubernetes 1.9) is very flexible in the sense that it lets you call any external service implementing your custom security policies to decide if a pod should be accepted in your cluster. You can, for example, pre-validate container images using Grafeas, a container-oriented auditing and compliance engine.

There is a recommended set of admission controllers to run depending on your Kubernetes version.

Kubernetes Security Context

When you declare a pod/deployment, you can group several security-related parameters, like SELinux profile, Linux capabilities, etc, in a Security context block:

...
spec:
  securityContext:
    runAsUser: 1000
    fsGroup: 2000
...

You can configure the following parameters as part of your security context:

Privileged: Maybe, the most common security context flag is privileged. Processes within a privileged container get almost the same privileges that are available to processes outside a container, being able, for example, to directly configure the host kernel or host network stack.

Other context parameters that you can enforce include:

User and Group ID for the processes, containers and volumes: When you run a container without any security context, the 'entrypoint' command will run as root, this is easy to verify:

$ kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
/ # ps aux
PID   USER     TIME  COMMAND
    1 root      0:00 sh

Using the runAsUser parameter you can modify the user ID of the processes inside a container. For example:

apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo
spec:
  securityContext:
    runAsUser: 1000
    fsGroup: 2000
  volumes:
  - name: sec-ctx-vol
    emptyDir: {}
  containers:
  - name: sec-ctx-demo
    image: gcr.io/google-samples/node-hello:1.0
    volumeMounts:
    - name: sec-ctx-vol
      mountPath: /data/demo
    securityContext:
      allowPrivilegeEscalation: false

If you spawn a container using this definition you can check that the initial process is using UID 1000.

USER   PID %CPU %MEM    VSZ   RSS TTY   STAT START   TIME COMMAND
1000     1  0.0  0.0   4336   724 ?     Ss   18:16   0:00 /bin/sh -c node server.js

And any file you create inside the /data/demo volume will use GID 2000 (due to the fsGroup parameter).

Security Enhanced Linux (SELinux): You can assign SELinuxOptions objects using the seLinuxOptions field. Note that SELinux module needs to be loaded on the underlying Linux nodes for this policies to take effect.

Capabilities: Linux capabilities break down root full unrestricted access into a set of separate permissions. This way, you can grant some privileges to your software, like binding to a port < 1024, without granting full root access.

There is a default set of capabilities granted to any container if you don't modify the security context. For example, using chown to set file permissions or net_raw to craft raw network packages.

Using the pod security context, you can drop default Linux capabilities and/or add non-default Linux capabilities. Again, applying the principle of least-privilege you can greatly reduce the damage of any malicious attack taking over the pod.

As a quick example, you can spawn the flask-cap pod:

$ kubectl create -f flask-cap.yaml
apiVersion: v1
kind: Pod
metadata:
  name: flask-cap
  namespace: default
spec:
  containers:
  - image: mateobur/flask
    name: flask-cap
    securityContext:
      capabilities:
        drop:
          - NET_RAW
          - CHOWN

Note that some securityContext should be applied at the pod level, while other labels are applied at container level.

If you spawn a shell, you can verify that these capabilities have been dropped:

$ kubectl exec -it flask-cap bash
root@flask-cap:/# ping 8.8.8.8
ping: Lacking privilege for raw socket.
root@flask-cap:/# chown daemon /tmp
chown: changing ownership of '/tmp': Operation not permitted

AppArmor and Seccomp: You can also apply the profiles of these security frameworks to Kubernetes pods. This feature is in beta state as of Kubernetes 1.9, profile configurations are referenced using annotations for the time being.

AppArmor, Seccomp or SELinux allow you to define run-time profiles for your containers, but if you want to define run-time profiles at a higher level with more context, Sysdig Falco and Sysdig Secure can be better options. Sysdig Falco monitors the run-time security of your containers according to a set of user-defined rules, it has some similarities and some important differences with the other tools we just mentioned (reviewed in the "SELinux, Seccomp, Sysdig Falco, and you" article).

AllowPrivilegeEscalation: The execve system call can grant a newly-started program privileges that its parent did not have. A classical example may be the setuid or setgid Linux flags. AllowPrivilegeEscalation boolean controls whether or not this behavior is allowed. Obviously, you should use this with care and only when required.

ReadOnlyRootFilesystem: Controls whether a container will be able to write into the root filesystem. It is common that the containers only need to write on mounted volumes that persist the state, their root filesystem is supposed to be immutable. You can enforce this behavior using the readOnlyRootFilesystem flag:

$ kubectl create -f https://raw.githubusercontent.com/mateobur/kubernetes-securityguide/master/readonly/flask-ro.yaml
$ kubectl exec -it flask-ro bash
root@flask-ro:/# mount | grep "\/\ "
none on / type aufs (ro,relatime,si=e6100da9e6227a70,dio,dirperm1)
root@flask-ro:/# touch foo
touch: cannot touch 'foo': Read-only file system

Kubernetes Pod Security Policy

Pod Security Policies or PSP are implemented as an admission controller. Using security policies you can restrict the pods that will be allowed to run on your cluster, only if they follow the policy we have defined.

You have different control aspects that the cluster administrator can set:

Control Aspect Field Names
Running of privileged containers privileged
Usage of the root namespaces hostPID, hostIPC
Usage of host networking and ports hostNetwork, hostPorts
Usage of volume types volumes
Usage of the host filesystem allowedHostPaths
White list of FlexVolume drivers allowedFlexVolumes
Allocating an FSGroup that owns the pod's volumes fsGroup
Requiring the use of a read only root file system readOnlyRootFilesystem
The user and group IDs of the container runAsUser, supplementalGroups
Restricting escalation to root privileges allowPrivilegeEscalation, defaultAllowPrivilegeEscalation
Linux capabilities defaultAddCapabilities, requiredDropCapabilities, allowedCapabilities
The SELinux context of the container seLinux
The AppArmor profile used by containers annotations
The seccomp profile used by containers annotations
The sysctl profile used by containers annotations


As you can see, there is a direct relation between the Kubernetes pod Security Context labels and the Kubernetes Pod Security Policies. Your Security Policy will filter allowed pod security contexts defining:

  • Default pod security context values (i.e. defaultAddCapabilities)
  • Mandatory pod security flags and values (i.e. allowPrivilegeEscalation: false)
  • Whitelists and blacklists for the list-based security flags (i.e. list of allowed host paths to mount).

For example, to define that container can only mount an specific host path you would do:

allowedHostPaths:
  # This allows "/foo", "/foo/", "/foo/bar" etc., but
  # disallows "/fool", "/etc/foo" etc.
  # "/foo/../" is never valid.
  - pathPrefix: "/foo"

You need the PodSecurityPolicy admission controller enabled in your API server to enforce these policies.

If you plan to enable PodSecurityPolicy, make sure first that you configure (or have present already) a default PSP and the associated RBAC permissions, otherwise the cluster will fail to create new pods.

If your cloud provider / deployment design already supports and enables PSP, it will come pre-populated with a default set of policies, for example:

$ kubectl get psp
NAME                           PRIV      CAPS      SELINUX    RUNASUSER   FSGROUP    SUPGROUP   READONLYROOTFS   VOLUMES
gce.event-exporter             false     []        RunAsAny   RunAsAny    RunAsAny   RunAsAny   false            [hostPath secret]
gce.fluentd-gcp                false     []        RunAsAny   RunAsAny    RunAsAny   RunAsAny   false            [configMap hostPath secret]
gce.persistent-volume-binder   false     []        RunAsAny   RunAsAny    RunAsAny   RunAsAny   false            [nfs secret]
gce.privileged                 true      [*]       RunAsAny   RunAsAny    RunAsAny   RunAsAny   false            [*]
gce.unprivileged-addon         false     []        RunAsAny   RunAsAny    RunAsAny   RunAsAny   false            [emptyDir configMap secret]

In case you enabled PSP for a cluster that didn't have any pre-populated rule, you can always start creating a permissive policy to avoid run-time disruption and then perform iterative adjustments over your configuration:

As an example, this policy will prevent the execution of any pod that tries to use the root user or group, allowing any other security context:

apiVersion: extensions/v1beta1
kind: PodSecurityPolicy
metadata:
  name: example
spec:
  privileged: true
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  runAsUser:
    rule: RunAsAny
  fsGroup:
    rule: 'MustRunAs'
    ranges:
      - min: 1
        max: 65535
  volumes:
  - '*'

$ kubectl create -f psp.yaml
podsecuritypolicy "example" created

$ kubectl get psp
NAME                           PRIV      CAPS      SELINUX    RUNASUSER   FSGROUP     SUPGROUP   READONLYROOTFS   VOLUMES
example                        true      []        RunAsAny   RunAsAny    MustRunAs   RunAsAny   false            [*]

If you try to create new pods without the runAsUser directive you will get:

$ kubectl create -f https://raw.githubusercontent.com/mateobur/kubernetes-securityguide/master/readonly/flask-ro.yaml
$ kubectl describe pod flask-ro
...
Failed        Error: container has runAsNonRoot and image will run as root

Kubernetes Network Policies

Kubernetes also defines security at the pod networking level. A network policy is a specification of how groups of pods are allowed to communicate with each other and other network endpoints.

You can compare Kubernetes network policies with classic network firewalling (ala iptables) but with one important advantage: using Kubernetes context like pod labels, namespaces, etc.

Kubernetes supports several third-party plugins that implement pod overlay networks. You need to check your provider documentation (these for Calico or Weave) to make sure that Kubernetes network policies are supported and enabled, otherwise, the configuration will show up in your cluster but will not have any effect.

To show how these network policies work, let's use the Kubernetes example scenario guestbook:

kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/examples/guestbook/all-in-one/guestbook-all-in-one.yaml

This will create 'frontend' and 'backend' pods:

$ kubectl describe pod frontend-685d7ff496-7s6kz | grep tier
        tier=frontend
$ kubectl describe pod redis-master-7bd4d6ccfd-8dnlq | grep tier
        tier=backend

You can use these logical groupings to configure your network policy, abstracting away concepts like IP address or physical node, that wouldn't work here as Kubernetes can change those dynamically.

Let's apply the following network policy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-backend-egress
  namespace: default
  spec:
    podSelector:
      matchLabels:
        tier: backend
        policyTypes:
          - Egress
          egress:
            - to:
              - podSelector:
                matchLabels:
                  tier: backend

That you can also find in the repository:

$ kubectl create -f netpol/guestbook-network-policy.yaml

Then you can get the pod names and local IP addresses using:

$ kubectl get pods -o wide
[...]

In order to check that the policy is working as expected, you can 'exec' into the 'redis-master' pod and try to ping first a 'redis-slave' (same tier) and then a 'frontend' pod:

$ kubectl exec -it redis-master-7bd4d6ccfd-8dnlq bash
$ ping 10.28.4.21
PING 10.28.4.21 (10.28.4.21) 56(84) bytes of data.
64 bytes from 10.28.4.21: icmp_seq=1 ttl=63 time=0.092 ms
$ ping 10.28.4.23
PING 10.28.4.23 (10.28.4.23) 56(84) bytes of data.
(no response, blocked)

As we mentioned before, note that this policy will be enforced even if the pods migrate to another node or they are scaled up/down.

Kubernetes resource allocation management

Resource limits are most typically established to avoid unintended saturation due to design limitations or software bugs, but they can also protect you from malicious resource abuse. Unauthorized resource consumption that tries to remain undetected is becoming much more common due to cryptojacking attempts.

There are two basic concepts: requests and limits.

The Kubernetes node will check if it has enough resources left to fully satisfy the request before scheduling the pod. Kubernetes makes sure that the actual resource consumption never goes over the configured limits.

You can run a quick example from the resources/flask-resources.yamlrepository file

apiVersion: v1
kind: Pod
metadata:
  name: flask-resources
  namespace: default
spec:
  containers:
  - image: mateobur/flask
    name: flask-resources
    resources:
      requests:
        memory: 512Mi
      limits:
        memory: 700Mi

$ kubectl create -f resources/flask-resources.yaml

Let's use the stress load generator to test the limits:

root@flask-resources:/# stress --cpu 1 --io 1 --vm 2 --vm-bytes 800M
stress: info: [79] dispatching hogs: 1 cpu, 1 io, 2 vm, 0 hdd
stress: FAIL: [79] (416) <-- worker 83 got signal 9

The resources that you can reserve and limit by default using the pod description are:

There are some third party plugins and Cloud providers that will extend the Kubernetes API to allow defining requests and limits over any other kind of logical resources using the Extended Resources interface. You can also configure resource quotas bounded to a namespace context.




Eager to learn more? Check out our online session: Building an Open Source Container Security Stack

On this session Sysdig and Anchore are presenting how using Falco and Anchore Engine you can build a complete open source container security stack for Docker and Kubernetes.

This online session will live demo:

  • Using Falco, NATS and Kubeless to build a Kubernetes response engine and implement real-time attack remediation with security playbooks using FaaS.
  • How Anchore Engine can detect software vulnerabilities in your images, and how can be integrated with Jenkins, Kubernetes and Falco.


Stay up to date!

Get new articles from this blog (weekly)
Or container ecosystem updates (monthly)

Thanks so much for signing up!
Please check your inbox for a confirmation email.