Blog Icon

Blog Post

Performing Image Scanning on Admission Controller with OPA

In this post we will talk about using image scanning on admission controller to scan your container images on-demand, right before your workloads are scheduled in the cluster.

Ensuring that all the runtime workloads have been scanned and have no serious vulnerabilities is not an easy task. Let’s see how we can block any pod that doesn’t pass the scanning policies before it even runs in your cluster.

Sysdig’s OPA Image Scanner combines Sysdig Secure enterprise level image scanner with the power of OPA policy-based rego language to evaluate the scan results and admission context, providing great flexibility on the admission decision.

Use cases and limitations of Image Scanning

Image scanners work by analyzing the contents of container images and searching for Common Vulnerabilities and Exposures (CVEs), normally by searching in databases like the National Vulnerability Database (NVD), as well as misconfigurations, bad practices, etc. The scanner will either need to pull the image from the registry, or scan it inline right after the build in the CI/CD pipeline. The scan result will be a report with a list of found issues which can be further evaluated by defining some policies, like the criticality of vulnerabilities or blocking some specific practices but allowing others.

If you are already using an image scanner, you might be aware of some limitations. As you need to explicitly configure the set of scanned images, you might miss some images that end up running in your cluster, or end up scanning images that are never run at all. Additionally, the image scanner has limited information about the image it is scanning: only the registry, image name and tag. With such a narrow context, it’s impossible to make more advanced decisions. Do you need a “dev” namespace with more permissive rules, and a very restrictive production one that only allows images from a trusted registry and no vulnerabilities?

The scan of images can be triggered the following ways:

  • Manually, if a user requests to scan a specific image.
  • Automatically, in response to an event (i.e., a webhook, image pushed to registry, etc.).
  • Poll based, by checking for new images in the known registries and scanning all of them.

Our end goal in a cluster environment is to know if we can deploy an image based on the result of the scan and additional information. Some common image scanning use cases include:

  • Allow the image if scanner policy evaluation accepted the image
  • Always allow images in a specific (i.e., “dev”) namespace
  • Deny images from untrusted registries
  • Allow images in a namespace if they come from a trusted registry and have no vulnerabilities

Using a Kubernetes extensibility to perform image scanning on admission controller and evaluating the result addresses the previous limitations. The scan is triggered just when required, for every image that is trying to be deployed in the cluster, so there is no need to spend time on images that will never be used, or miss any image that is being deployed into the cluster. And the admission decision does not rely solely on the image name and tag, but also on additional context from the admission review, which includes namespace, pod metadata, etc.

Furthermore, using OPA and the power of rego language to define the admission policy rules and evaluate the scan result, along with the admission context, provides great flexibility for making informed decisions and resolving the most common image scanning use cases.

OPA enters the room – What is OPA

Open Policy Agent (OPA) is an Open-Source and general purpose policy engine using a high level declarative language called rego. One of the key ideas behind OPA is decoupling decision-making from policy enforcement. This enables policies to be defined and distributed apart from the software, as code. Whenever the application needs to make a policy decision, it queries OPA with relevant data so the rules can be evaluated on the input.

Note that OPA is just the evaluation engine but it complements other tools, like a Kubernetes Admission Controller to enforce policies on admission. OPA Gatekeeper is an implementation of an extensible Kubernetes policy controller using Native Kubernetes CRDs for defining templates and constraints.

The combination of a Kubernetes Admission Controller + Sysdig Image Scanner + OPA Engine provides a way to detach image scanning from the admission decision, defining either simple or advanced rules in rego and evaluating them in both the scan result report and Kubernetes contexts.

Performing Image Scanning on Admission Controller

Kubernetes admission controllers are pieces of code that intercept Kubernetes API calls before the objects are created, so different controllers can validate the object or mutate it.

The Sysdig OPA Image Scanner registers as a dynamic admission controller plugin and intercepts the creation, or update of Pods, in the cluster. When a Pod resource is intercepted, the controller performs several steps for every container inside the pod:

  • It triggers a scan of the image used for that container.
  • It modifies (mutates) the pod definition by changing the tag of the image and replacing it with the @sha256:digest, so the image is selected by digest and not by tag.
  • It retrieves the scan report of the image.
  • It evaluates the scan report and the admission review context using a set of OPA rules that can be modified and extended if required.

In case the OPA evaluation of the rules determines that the image of any of the containers should be denied, the Pod creation/update is rejected.

If the scan report is not yet available, as it can take a few minutes to scan big images, the condition can be detected and evaluated in the OPA rules. The scan result is reported as “report_not_available”, and you can decide to either validate the pod or reject it.

When a pod is rejected due to the admission controller, the usual Kubernetes retry mechanisms apply, and the creation will be retried after some time according to the exponential backoff mechanism.

You might wonder, why mutate the pod spec to use the image digest instead? This will be covered in a future post, but basically keeping the tag you are exposed to a TOCTOU (Time-of-check Time-of-use) threat:

  1. A new pod creation/update request is sent to Kubernetes API.
  2. The admission controller intercepts the request and sends it to Sysdig Secure.
  3. The image is requested to the registry…
  4. … and pulled in Sysdig Secure, where it is scanned.
  5. The scan report is sent back to the admission controller. If the rules validate this image and the scan report, the pod will be admitted.
  6. Right after the admission controller admits the pod on Step 5, a different image for the same tag is pushed to the registry.
  7. The pod is scheduled and the image is pulled from the registry in the node.
  8. A different image than the one that was scanned in Step 4 is deployed in the cluster.

Mutating the pod to use the image digest totally prevents this issue. This ensures that the same image that is scanned is deployed in the cluster, no matter what scheduling events occur in the future. The image and tag names are kept as annotations in the pod, in case you want to retrieve the original image tag.

Some rule examples

You can find some more examples of rules defined using OPA rego language in the README.md file, but let’s analyze how it works and some use cases.

The OPA evaluation engine receives an input object containing the ScanReport and the AdmissionRequest requests as fields of this input object. You can find the details of the AdmissionRequest structure in the official go documentation and in the admission controllers reference. These are available for OPA for evaluation.

Let’s see an example ScanReport in JSON (some parts removed for brevity):

{
    "ImageAndTag": "nginx:1.17.9",
    "Status": "accepted",
    "InnerReport": {
        "detail": {
            "policy": {
                                    ...
                ],
                "name": "Default Sysdig policy bundle",
                "policies": [
                    {
                        "comment": "System default policy",
                        "id": "default",
                        "name": "DefaultPolicy",
                        "rules": [
                            {
                                "action": "WARN",
                                "gate": "dockerfile",
                                "id": "rule_1FlJOnK9qdRSRcTNrfz3IUZXbou",
                                "params": [
                                    {
                                        "name": "instruction",
                                        "value": "HEALTHCHECK"
                                    },
                                    {
                                        "name": "check",
                                        "value": "not_exists"
                                    }
                                ],
                                "trigger": "instruction"
                            },
                            ...
                        ],
                        "version": "1_0"
                    },
                    ...
                ],
                "version": "1_0",
                "whitelisted_images": [],
                "whitelists": [
                    {
                        "comment": "Default global whitelist",
                        "id": "global",
                        "items": [],
                        "name": "Global Whitelist",
                        "version": "1_0"
                    }
                ]
            },
            "result": {
                "bundle": {
                    ...
                },
                "created_at": 1582633822,
                "evaluation_problems": [],
                "final_action": "warn",
                "final_action_reason": "policy_evaluation",
                "image_id": "c7460dfcab502275e9c842588df406444069c00a48d9a995619c243079a4c2f7",
                "last_modified": 1582633822,
                "matched_blacklisted_images_rule": false,
                "matched_mapping_rule": {
                    "id": "mapping_1CI5tw3zxNL9b344sSsXBfth3dW",
                    "image": {
                        "type": "tag",
                        "value": "*"
                    },
                    "name": "default",
                    "policy_ids": [
                        "default"
                    ],
                    "registry": "*",
                    "repository": "*",
                    "whitelist_ids": [
                        "global"
                    ]
                },
                "matched_whitelisted_images_rule": false,
                "result": {
                    "c7460dfcab502275e9c842588df406444069c00a48d9a995619c243079a4c2f7": {
                        "result": {
                            "final_action": "warn",
                            "header": [
                                "Image_Id",
                                "Repo_Tag",
                                "Trigger_Id",
                                "Gate",
                                "Trigger",
                                "Check_Output",
                                "Gate_Action",
                                "Whitelisted",
                                "Policy_Id"
                            ],
                            "row_count": 17,
                            "rows": [
                                [
                                    "c7460dfcab502275e9c842588df406444069c00a48d9a995619c243079a4c2f7",
                                    "docker.io/nginx:1.17.7",
                                    "41cb7cdf04850e33a11f80c42bf660b3",
                                    "dockerfile",
                                    "instruction",
                                    "Dockerfile directive 'HEALTHCHECK' not found, matching condition 'not_exists' check",
                                    "warn",
                                    false,
                                    "default"
                                ],
                                [
                                    "c7460dfcab502275e9c842588df406444069c00a48d9a995619c243079a4c2f7",
                                    "docker.io/nginx:1.17.7",
                                    "1571e70ee221127984dcf585a56d4cff",
                                    "dockerfile",
                                    "instruction",
                                    "Dockerfile directive 'USER' not found, matching condition 'not_exists' check",
                                    "warn",
                                    false,
                                    "default"
                                ],
                                ...
                            ]
                        }
                    },
                    "policy_data": [],
                    "policy_name": "",
                    "whitelist_data": [],
                    "whitelist_names": []
                },
                "tag": "docker.io/nginx:1.17.7",
                "user_id": "tenant_1TqQxfrhMuzrTAkZ5X7smleHiRe"
            }
        },
        "last_evaluation": "2020-02-25T12:30:22Z",
        "policyId": "default",
        "status": "pass"
    }
}

There is a lot of information in there! But usually you can focus on the Status field on the root of the document that the image scanner will populate with:

  • accepted
  • rejected
  • scan_failed
  • report_not_available

Depending on the result of the scan and the applied policies, “scan_failed” indicates that there was some error and it was not possible to perform the scan, while “report_not_available” means that the scan has not yet finished. There are tons of details inside the InnerReport field that could be also evaluated in the OPA rules for advanced use cases, such as policy information or specific vulnerabilities or bad practices that were detected.

The admission controller will create a scan task if it doesn’t exist already, retrieve the image digest, and query the result of the scan for that specific image digest. Then, it will assign the AdmissionRequest and the ScanReport to the corresponding fields in the input object and evaluate the existing rules. It will deny the admission of the image if any deny_image[msg] rules evaluates to true (and it will show the msg in the log).

So, to put everything together, you could write a rule that allows only images accepted by the image scanner:

allow_image {
        input.ScanReport.Status == "accepted"
}
deny_image[msg] {
  not allow_image
  msg := sprintf("Denying images that are not accepted. Status: %s", [input.ScanReport.Status])
}

Or allow images that are either accepted, or in progress:

allow_image {
        input.ScanReport.Status == "accepted"
}
allow_image {
        input.ScanReport.Status == "report_not_available"
}
deny_image[msg] {
  not allow_image
  msg := sprintf("Denying images that are not accepted or in progress. Status: %s", [input.ScanReport.Status])
}

If you want to include criteria from the AdmissionRequest, you can allow either images that are accepted by the image scanner policy or allow anything in “dev” namespace with:

allow_image {
  input.ScanReport.Status == "accepted"
}
allow_image {
  input.AdmissionRequest.object.metadata.namespace == "dev"
}
deny_image["Denying images otherwise"] {
  not allow_image
}

In the previous example we just used the object.metadata.namespace attribute of the AdmissionRequest.

An example of complete AdmissionRequest objects in JSON is:

{
  "uid": "6870143b-55da-40be-b42f-3fc64799bd5d",
  "kind": {
    "group": "",
    "version": "v1",
    "kind": "Pod"
  },
  "resource": {
    "group": "",
    "version": "v1",
    "resource": "pods"
  },
  "requestKind": {
    "group": "",
    "version": "v1",
    "kind": "Pod"
  },
  "requestResource": {
    "group": "",
    "version": "v1",
    "resource": "pods"
  },
  "name": "test-dep-76758659b5-4qq69",
  "namespace": "default",
  "operation": "CREATE",
  "userInfo": {
    "username": "system:serviceaccount:kube-system:replicaset-controller",
    "uid": "3745a732-c159-4a58-8a8e-61b07e27573b",
    "groups": [
      "system:serviceaccounts",
      "system:serviceaccounts:kube-system",
      "system:authenticated"
    ]
  },
  "object": {
    "kind": "Pod",
    "apiVersion": "v1",
    "metadata": {
      "name": "test-dep-76758659b5-4qq69",
      "generateName": "test-dep-76758659b5-",
      "namespace": "default",
      "uid": "aac24f4d-d63f-4ab5-b95b-56cd57579071",
      "creationTimestamp": "2020-02-24T19:15:01Z",
      "labels": {
        "app": "test-dep",
        "pod-template-hash": "76758659b5"
      },
      "ownerReferences": [
        {
          "apiVersion": "apps/v1",
          "kind": "ReplicaSet",
          "name": "test-dep-76758659b5",
          "uid": "4c925465-d86a-447b-a6d0-e5a7cefba425",
          "controller": true,
          "blockOwnerDeletion": true
        }
      ]
    },
    "spec": {
      "volumes": [
        {
          "name": "default-token-pnrxp",
          "secret": {
            "secretName": "default-token-pnrxp"
          }
        }
      ],
      "containers": [
        {
          "name": "nginx",
          "image": "nginx:1.17.9",
          "resources": {},
          "volumeMounts": [
            {
              "name": "default-token-pnrxp",
              "readOnly": true,
              "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
            }
          ],
          "terminationMessagePath": "/dev/termination-log",
          "terminationMessagePolicy": "File",
          "imagePullPolicy": "Always"
        }
      ],
      "restartPolicy": "Always",
      "terminationGracePeriodSeconds": 30,
      "dnsPolicy": "ClusterFirst",
      "serviceAccountName": "default",
      "serviceAccount": "default",
      "securityContext": {},
      "schedulerName": "default-scheduler",
      "tolerations": [
        {
          "key": "node.kubernetes.io/not-ready",
          "operator": "Exists",
          "effect": "NoExecute",
          "tolerationSeconds": 300
        },
        {
          "key": "node.kubernetes.io/unreachable",
          "operator": "Exists",
          "effect": "NoExecute",
          "tolerationSeconds": 300
        }
      ],
      "priority": 0,
      "enableServiceLinks": true
    },
    "status": {
      "phase": "Pending",
      "qosClass": "BestEffort"
    }
  },
  "oldObject": null,
  "dryRun": false,
  "options": {
    "kind": "CreateOptions",
    "apiVersion": "meta.k8s.io/v1"
  }
}

So, you can also check things like volumeMounts, restartPolicy, securityContext or similar attributes and define rules depending on these values.

Apart from ScanReport and AdmissionRequest fields, two additional fields are added to the input object:

  • PodObject: contains the same object that is inside the AdmissionRequest.object field, that is, the pod object specification.
  • ContainerObject: contains the container being evaluated in this step. As a pod can have multiple containers, they are evaluated individually.

Having the full Scan Report and all of the AdmissionReview information available for evaluation makes it possible to elaborate very advanced and powerful admission criteria.

Installing the integration

The easiest way to deploy this admission controller on your cluster is using the helm charts available in the repository. The admission controller is registered as a Kubernetes aggregated API server with mutual TLS authentication, and then it registers a Dynamic Admission Control webhook to intercept the creation or update of pod resources.

Just customize the settings in the values.yaml file, create the namespace and deploy using Helm 3:

$ kubectl create ns sysdig-image-scanner
$ helm install -n sysdig-image-scanner sysdig-image-scanner . 

After a few seconds, this chart will deploy all the required components, which include:

  • Creating certificates for webhook service authentication.
  • Register the aggregated API Service.
  • Register the mutating admission controller webhook.
  • Create required TLS certificates secret and Secure Token secret.
  • Create a service account for the webhook service pod(s).
  • Roles and permissions to allow the SA authenticate the API server, as well as permissions to delegate auth decisions to the Kubernetes core API server.
  • Create the webhook deployment and service.
  • Create a configmap with a predefined set of rules to cover most common use cases.

Customize the settings

The default settings in values.yaml will be fine for most cases, but at the very least, you need to provide:

  • sysdigSecureToken – the Sysdig Secure Token for your account
  • sysdigSecureApiUrl – if the default SaS URL does not fit your environment (in case you are using the onPrem version of Sysdig Secure)

If you set the value verboseLog to true, the OPA engine will include additional information in the output logs, like the input data (AdmissionReview and ScanReport) and the rules being evaluated. This can help debugging issues with the rule by copying the information and testing in the Rego Playground:

In the values.yaml, you will find a scanRules section where you can set default actions for evaluating the images and scan reports, as well as a customRules section:

scanRules:
  # If set to "true", a default set of rules will be generated from this YAML values.
  # Otherwise, no rules will be generated, and only "customRules" below will apply
  autoGenerate: true
  # Default admission policy to apply: [accept | reject | scan-result]
  defaultPolicy: scan-result
  # What should we do if the Scan Result is not yet available (scan in progress): [accept | reject]
  reportPending: reject
  # What should we do if the Scan has failed (wrong credentials, misconfiguration, etc.): [accept | reject]
  scanFailed: reject
  alwaysAccept: []
  # These 2 registries will always be rejected unless 
  alwaysReject:
    - "bad-registry.com/"
    - "malware-registry.io/"
  alwaysScanResult: []
  byNamespace: {}
  #  ns-dev:
  #    # By default, images will be accepted in this NS regardless of the scan result
  #    defaultPolicy: accept
  #  ns-prod:
  #    # All images rejected by default in this namespace
  #    defaultPolicy: reject
  #    # Images from "my-trusted-registry.com/" will be always accepted
  #    alwasyAccept:
  #      - "my-trusted-registry.com/"
  #  ns-playground:
  #    defaultPolicy: accept
  #    alwaysReject: []
# Define a set of custom rego rules. If scanRules.autoGenerate is true, 
# these customRules are appended to the set of generated rules. 
# Otherwise, these customRules are the only rules definition,
customRules: |
      ###### Begin: Custom rules ######
      my_example_rule {
          # Some conditions... 
          false
      }
      other_rule {
          # Some other conditions...
          true
      }
      deny_image["This is a custom deny message"] {
          my_example_rule
          other_rule
      }
      ###### End: Custom rules ######

The defaults (with scanRules.autoGenerate: true) will create a Configmap with a set of OPA rules that will use the settings defined in this section. In this case, images will be denied until the scan report is available, and the image scanner reports as “accepted”. You can whiteList or blackList registries by their prefix, and you can make exceptions per namespace. If a setting is not specified for a namespace, it will use the default options defined at the top level.

You can add additional custom rules as a raw string in the customRules section. In case you want to customize the rules entirely you can set scanRules.autoGenerate to “false” and just write your own rules. You can see the default OPA ruleset in the repository.

Testing: examples in action

After deploying the image scanner with scanRules settings like this:

scanRules:
  autoGenerate: true
  defaultPolicy: scan-result
  reportPending: reject
  scanFailed: reject
  alwaysAccept: []
  alwaysReject: []
  alwaysScanResult: []
  byNamespace:
    dev:
      defaultPolicy: accept

We can try to create a deployment in the “default” namespace:

$ kubectl create deployment test-dep --image=airadier/test:bad
deployment.apps/test-dep created

But the pod creation will be rejected because we have defined a policy in Sysdig Image Scanner that makes the scan fail:

So, the “defaultPolicy: scan-result” applies. We can see the error details in the log of the image scanner pod:

I0402 11:56:31.291171       1 mutationhook.go:31] [mutation-server] mutating Pod admission request
I0402 11:56:31.291240       1 evaluate.go:28] [admission-server] Admission review 969345db-8e45-44ae-b373-24a190112580 - evaluating admission of pod '<Not yet generated>'
I0402 11:56:31.301618       1 admissionevaluatorimpl.go:35] Checking container 'test' image 'airadier/test:bad'
I0402 11:56:31.301822       1 client.go:149] [Anchore] Sending POST request to https://api.sysdigcloud.com/api/scanning/v1/anchore/images, with params map[tag:airadier/test:bad]
I0402 11:56:33.551378       1 client.go:131] [Anchore] Added image to Anchore Engine: airadier/test:bad
I0402 11:56:33.551627       1 client.go:149] [Anchore] Sending GET request to https://api.sysdigcloud.com/api/scanning/v1/anchore/images/sha256:b3787bd182d60ee3bd8d0bb53064e7eaa1073b817c31769dba3822895f9254d6/check?tag=airadier/test:bad&history=false&detail=true, with params map[]
I0402 11:56:34.093250       1 evaluate.go:38] [admission-server] Admission review 969345db-8e45-44ae-b373-24a190112580 - finished evaluating admission of pod 'test-dep-6f78bb8cc4-*'
I0402 11:56:34.093285       1 evaluate.go:46] [admission-server] Admission review 969345db-8e45-44ae-b373-24a190112580 - pod 'test-dep-6f78bb8cc4-*' rejected. Reasons:
image 'airadier/test:bad' for container 'test' failed policy check
Error: Image denied by OPA rules:
- Image rejected by scan-result
I0402 11:56:34.093297       1 mutationhook.go:63] [mutation-server] Patching container image: airadier/test:bad -> airadier/test@sha256:b3787bd182d60ee3bd8d0bb53064e7eaa1073b817c31769dba3822895f9254d6

However, if we try to deploy in the “dev” namespace, it will succeed as our rules define a defaultPolicy of accept for pods in “dev” namespace:

I0402 11:57:46.087798       1 evaluate.go:38] [admission-server] Admission review 7643d522-9055-4471-9358-de6993ad7e77 - finished evaluating admission of pod 'test-dep-6f78bb8cc4-*'
I0402 11:57:46.087816       1 evaluate.go:41] [admission-server] Admission review 7643d522-9055-4471-9358-de6993ad7e77 - pod 'test-dep-6f78bb8cc4-*' accepted
I0402 11:57:46.087825       1 mutationhook.go:63] [mutation-server] Patching container image: airadier/test:bad -> airadier/test@sha256:b3787bd182d60ee3bd8d0bb53064e7eaa1073b817c31769dba3822895f9254d6

Conclusions

With our OPA-based image scanner admission controller, we can ensure that every image that is deployed into the cluster is scanned and verified to not have serious vulnerabilities, even in the presence of tag mutations. We provide a powerful and flexible way of defining evaluation rules that go far beyond the scanner policies, providing a context enriched with the Admission Review specification to apply highly customized rules.

If you are not 100% sure that you can trust your registries and the images being deployed into the cluster (and probably nobody is), then you can prevent many security threats by scanning all of your images with Sysdig Secure during pod creation by installing our admission controller.

Once your images are running in the cluster, learn how to detect anomalous activity in your cluster using Falco in our featured webinar.

Share This

Stay up to date

Sign up to receive our newest.