Comparing GuardDuty & Falco on EKS

Security in cloud providers like AWS is usually the highest priority. With Amazon EKS, unlike bring-your-own vanilla Kubernetes instances, you benefit from a data center and network architecture that is built to meet the requirements of the most security-sensitive organizations. To achieve this, one of the best ways is to use all the security layers we are capable of having. In this case, we will explain how to use GuardDuty and Falco to speed up threat detection.

However, security is a shared responsibility between you and AWS. The shared responsibility model describes this as ‘security of the cloud’ and ‘security in the cloud.’

AWS is responsible for protecting the infrastructure that runs AWS services, like EKS, that operate within the AWS Cloud. Whereas, the customers’ responsibility includes areas such as:

The configuration of the nodes and the containers themselves.
Setting up and managing network controls, such as firewall rules.
Managing platform-level identity and access management, either with or in addition to IAM.

The sensitivity of your data, your company’s requirements, and applicable laws and regulations are managed by the end-users as part of the shared responsibility model. You can achieve this either through managed services on AWS, like GuardDuty, or through open source security solutions, like Falco.

In this blog, we will detail the differences between GuardDuty and Falco on Amazon EKS to better understand what each tool can do for us, what threats are detected, and what kind of metadata is returned for further forensic analysis.

Falco Plugins to add runtime security in the cloud

Falco, the cloud-native runtime security project, is the de facto Kubernetes threat detection engine. Falco detects threats at runtime by observing the behavior of your applications and containers. Falco can extend threat detection across cloud environments (such as AWS) with Falco Plugins.

Unlike Falco’s ability to ingest syscalls from containers running in the cluster (to detect malicious activity on the container itself), GuardDuty currently only alerts on activities against the Kubernetes Control Plane of the cluster. Even saying “control plane” is overstating its capability. GuardDuty does not offer a total threat detection in the control plane node(s) or any other special capability for the control plane.

In this case, we will focus on the following two container-specific Falco rules as scenarios where GuardDuty cannot provide visibility on container-specific events.

Container drift detection

Amazon EKS provides services like Elastic Container Registry (ECR) to ensure images are securely scanned and vetted before they are deployed into runtime.

But, how do we ensure changes are not made to a running container in production? Since GuardDuty does not collect all syscalls from those running containers, and instead focuses on the Kubernetes control plane activity, GuardDuty would need help to detect the below case of container drift where chmod was used to spawn a new executable within the container.

- rule: Container Drift Detected (chmod)
  desc: New executable created in a container due to chmod
  condition: >
    chmod and consider_all_chmods and container and not
    runc_writing_var_lib_docker and not user_known_container_drift_activities
    and evt.rawres>=0 and ((evt.arg.mode contains "S_IXUSR") or (evt.arg.mode
    contains "S_IXGRP") or (evt.arg.mode contains "S_IXOTH"))
  output: >-
    Drift detected (chmod), new executable created in a container
    (user.name=%user.name user.loginuid=%user.loginuid
    proc.cmdline=%proc.cmdline filename=%evt.arg.filename name=%evt.arg.name
    mode=%evt.arg.mode evt.res=%evt.res proc.pid=%proc.pid proc.cwd=%proc.cwd
    proc.ppid=%proc.ppid proc.pcmdline=%proc.pcmdline proc.sid=%proc.sid
    proc.exepath=%proc.exepath user.uid=%user.uid user.loginname=%user.loginname
    group.gid=%group.gid group.name=%group.name container.id=%container.id
    container.name=%container.name evt.type=%evt.type
  priority: error
  source: syscall
...Code language: PHP (php)

Contact EC2 instance metadata service from container

Another solid use case for syscall activity is the ability to see whether a running container is potentially performing data exfiltration.

By monitoring syscalls from a running container, we can see if the container reaches out to the EC2 instance that the containers are running on within the EKS cluster. They can do this to probe the EC2 metadata service for more information about the EC2 instance, like what tags are assigned to the host, what the IP address of the host is, and which security groups are used to control the Virtual Private Cloud (VPC) traffic for that instance/host.

- rule: Contact EC2 Instance Metadata Service From Container
  desc: >-
    Detect attempts to contact the EC2 Instance Metadata Service from a
    Container
  condition: >-
    outbound and fd.sip="169.254.169.254" and container and not
    Ec2_metadata_containers
  output: >-
    Outbound connection to EC2 instance metadata service
    (proc.cmdline=%proc.cmdline connection=%fd.name %container.info
    evt.type=%evt.type evt.res=%evt.res proc.pid=%proc.pid proc.cwd=%proc.cwd
    proc.ppid=%proc.ppid proc.pcmdline=%proc.pcmdline proc.sid=%proc.sid
    proc.exepath=%proc.exepath user.uid=%user.uid user.loginuid=%user.loginuid
    user.loginname=%user.loginname user.name=%user.name group.gid=%group.gid
    group.name=%group.name container.id=%container.id
    container.name=%container.name
    image=%container.image.repository:%container.image.tag)
  priority: notice
  source: syscall
...Code language: YAML (yaml)

For a full list of supported syscalls in Falco, check out our official Falco documentation.

Kubernetes Audit Events

Falco v0.13.0 adds Kubernetes Audit Events to the list of supported event sources. This is in addition to the existing support for system call events. An improved implementation of audit events was introduced in Kubernetes v1.11, and it provides a log of requests and responses to kube-apiserver.

Because almost all of the cluster management tasks are performed through the API server, the audit log can effectively track the changes made to your cluster. Both Falco and GuardDuty utilize Kubernetes Audit Logging, and that’s why it’s important for us to compare their implementations.

To cover these scenarios, an additional set of Falco rules have been added that monitor for notable or suspicious activity, including:

Granting overly broad permissions, such as cluster-admin, to users.
Creating pods that are privileged, mount sensitive host paths, or use host networking.
Creating ConfigMaps with sensitive information.

Once your cluster is configured with audit logging and the events are selected to be sent to Falco, you can write Falco rules that can read these events and send notifications for suspicious or other notable activity.

Full K8s administrative access (Falco)

In the case of detecting K8s operations, where a username is granted full admin permissions but is NOT an approved full admin user, Falco can categorize users into individual macros like Allowed_full_admin_users. We have provided the following Falco YAML manifest as an example:

- rule: Full K8s Administrative Access
  desc: >-
    Detect any k8s operation by a user name that may be an administrator with
    full access.
  condition: >
    kevt and non_system_user and ka.user.name in (full_admin_k8s_users) and not
    Allowed_full_admin_users
  output: >-
    K8s Operation performed by full admin user (user=%ka.user.name
    target=%ka.target.name/%ka.target.resource verb=%ka.verb uri=%ka.uri
    resp=%ka.response.code)
  priority: warning
  source: k8s_audit
...Code language: YAML (yaml)

Where Falco gives users the flexibility to build their own detection ruleset, AWS GuardDuty has a baked-in system called ‘Finding Types.’ Can we write custom detections in GuardDuty? Unfortunately, no.

GuardDuty removes the heavy lifting and complexity of developing and maintaining your own custom rule set. That said, there is a long list of ‘Finding Types’ for Kubernetes Audit Logs.

On GuardDuty, if you want to alert on potential administrative access activity, you would rely on the pre-built Finding for privilege escalation:

PrivilegeEscalation:Kubernetes/PrivilegedContainer (GuardDuty)

This finding informs you that a privileged container was launched on your Kubernetes cluster using an image that has never been used before to launch privileged containers in your cluster. While there’s no pre-configured Falco rule for “privilege escalation” specifically, you can absolutely use Falco to build rules that detect this MITRE ATT&CK technique.

A privileged container has root level access to the host. Adversaries can launch privileged containers as a privilege escalation tactic to gain access to and then compromise the host by escaping the container. By being able to detect all of the syscalls, as well as Kubernetes audit logs, we can get a clearer picture of how the privileged container was created:

- rule: Create Privileged Pod
  desc: |
    Detect an attempt to start a pod with a privileged container
  condition: >-
    kevt and pod and kcreate and ka.req.pod.containers.privileged intersects
    (true)
  output: >-
    Pod started with privileged container (user=%ka.user.name pod=%ka.resp.name
    ns=%ka.target.namespace images=%ka.req.pod.containers.image)
  priority: warning
  source: k8s_audit
...Code language: YAML (yaml)

One common method to get additional permissions on the node/host-level is via SetUID/SetGUID.
The following Falco rule would detect changes on the SUID and GUID bits. The ownership of files also depends on the uid (user ID) and the gid (group ID) of the creator. Dig deeper into the article on Kernel parameters if you want to learn more.

Without the syscall insights, we wouldn’t be able to observe cases where attackers gave themselves permissions to perform privilege escalation on workloads in EKS.

- rule: Set Setuid or Setgid bit
  desc: >
    setuid or setgid bits are set for an application,
  condition: >
    consider_all_chmods and chmod and (evt.arg.mode contains "S_ISUID" or evt.arg.mode contains "S_ISGID")
    and not proc.name in (user_known_chmod_applications)
    and not exe_running_docker_save
    and not user_known_set_setuid_or_setgid_bit_conditions
  output: >
    Setuid or setgid bit is set via chmod (fd=%evt.arg.fd filename=%evt.arg.filename mode=%evt.arg.mode user=%user.name user_loginuid=%user.loginuid process=%proc.name
    command=%proc.cmdline container_id=%container.id container_name=%container.name image=%container.image.repository:%container.image.tag)
  Priority:
    NOTICE
  tags: [process, mitre_persistence]
  source: syscall
...Code language: YAML (yaml)

Falco is not just limited to syscalls or the Kubernetes Audit Logs. The Falco libraries and Falco itself can be extended by using Plugins. This capability allows Falco to be extended to consume additional event sources such as the AWS CloudTrail Plugin.

AWS CloudTrail data events

The Falco cloudtrail plugin can read AWS CloudTrail logs and emit events for each CloudTrail log entry.

This plug-in also includes out-of-the-box rules that can be used to identify interesting/suspicious/notable events in CloudTrail logs, including:

Disabling encryption for S3 buckets
Disabling multi-factor authentication for users
Console logins that do not use multi-factor authentication

Again, GuardDuty can access the same AWS CloudTrail events to understand these IAM activities. The below GuardDuty finding informs you that multiple successful console logins for the same IAM user were observed around the same time in various geographical locations.

Discover our article on cloud log management to learn more about how services manage security events.

UnauthorizedAccess:IAMUser/ConsoleLoginSuccess.B (GuardDuty)

Such anomalous and risky access location patterns indicate potential unauthorized access to your AWS resources. While not focusing on MFA like the Falco rule, this YAML definition could certainly complement the above Falco alert definitions.

Console login without MFA (Falco)

In the case of console logins without the need for Multi-Factor Authentication (MFA), the Falco rule would look something like this. Data is sourced from aws_cloudtrail. Falco aggregates the event activity to show examples of ConsoleLogin=”Success” and MFAUsed=”No”.

- rule: Console Login Without MFA
  desc: Detects a console login without MFA.
  condition: >-
    aws.eventName="ConsoleLogin" and not aws.errorCode exists and
    jevt.value[/userIdentity/type]!="AssumedRole" and
    jevt.value[/responseElements/ConsoleLogin]="Success" and
    jevt.value[/additionalEventData/MFAUsed]="No"
  output: >-
    Detected a console login without MFA (requesting user=%aws.user, requesting
    IP=%aws.sourceIP, AWS region=%aws.region)
  priority: critical
  source: aws_cloudtrail
...Code language: YAML (yaml)

In addition, Falco is able to detect MFA fatigue or spamming attacks.

Policy:S3/AccountBlockPublicAccessDisabled (GuardDuty)

This finding informs you that Amazon S3 Block Public Access was disabled at the account level. When S3 Block Public Access settings are enabled, they are used to filter the policies or access control lists (ACLs) on buckets as a security measure to prevent inadvertent public exposure of data.

AWS S3 versioning disabled (Falco)

Similar to how GuardDuty detects when S3 Block Public Access is disabled, Falco can create S3 bucket rules, such as when S3 bucket versioning is disabled.

- rule: AWS S3 Versioning Disabled
  desc: Detect disabling of S3 bucket versioning.
  condition: >-
    aws.eventSource = "s3.amazonaws.com" and aws.eventName =
    "PutBucketVersioning" and
    jevt.value[/requestParameters/VersioningConfiguration/Status] = "Suspended"
    and not aws.errorCode exists
  output: >-
    The file versioning for a bucket has been disabled. (requesting
    user=%aws.user, requesting IP=%aws.sourceIP, AWS region=%aws.region,
    arn=%jevt.value[/userIdentity/arn], bucket
    name=%jevt.value[/requestParameters/bucketName])
  priority: warning
  source: aws_cloudtrail
...Code language: YAML (yaml)

Typically, S3 Block Public Access is turned off in an account to allow public access to a bucket or to the objects in the bucket. When S3 Block Public Access is disabled for an account, access to your buckets is controlled by the policies, ACLs, or bucket-level Block Public Access settings applied to your individual buckets.

This does not necessarily mean that the buckets are shared publicly, but that you should audit the permissions applied to the buckets to confirm that they provide the appropriate level of access. The same can be said for S3 bucket versioning. S3 versioning allows users to keep multiple versions of an object in one bucket. A rogue user might perform this operation to remove potential backups of S3 data in the case of a ransomware attack. This would be another clear Indicator of Compromise (IoC) for cloud tenants.

DNS logs

Domain Name Services (DNS) is one of the more useful Indicators of Compromise (IoC’s). By determining connections sent to expected/unexpected/unwanted domain names, we can instantly detect potential data exfiltration attempts to known bad C2 servers. For this reason, it’s important to protect DNS. Falco can be used in the same way to detect this type of activity.

- rule: Malicious IPs or domains detected on command line
  desc: >-
    Malicious commands detected in pod/host. The rule was triggered by an IP.
    or domains in proc_cmdline
  condition: >
    evt.type = execve and evt.dir = < and (proc.name="curl" or proc.name="wget")
    and proc_args_with_malicious_domain_ip
  output: >-
    Malicious connections to IP or domains detected in pod or host.
    proc.cmdline=%proc.cmdline evt.type=%evt.type evt.res=%evt.res
    proc.pid=%proc.pid proc.cwd=%proc.cwd proc.ppid=%proc.ppid
    proc.pcmdline=%proc.pcmdline proc.sid=%proc.sid proc.exepath=%proc.exepath
    user.uid=%user.uid user.loginuid=%user.loginuid
    user.loginname=%user.loginname user.name=%user.name group.gid=%group.gid
    group.name=%group.name container.id=%container.id
    container.name=%container.name %evt.args
  priority: warning
  Tags:
  - ioc
  source: syscall
  exceptions: []
...Code language: YAML (yaml)

The above rule is looking for the processes ‘cURL’ or ‘wget’ performing an event against IP addresses or Domain names in the macro – proc_args_with_malicious_domain_ip.

Open the above macro. We can see a short snippet of the IP addresses and domains listed:

macro: proc_args_with_malicious_domain_ip
condition: (proc.args contains "pool.minexmr.com" or proc.args contains "pool.supportxmr.com" or 
  proc.args contains "us1.ethermine.org" or proc.args contains "xmr-us-east1.nanopool.org" or 
  proc.args contains "xmr-us-west1.nanopool.org" or proc.args contains "xmr.pool.minergate.com" or 
  proc.args contains "51.15.67.17" or proc.args contains "142.44.242.100" or 
  proc.args contains "104.140.244.186" or 
  proc.args contains "46.105.31.147" or 
  proc.args contains "us-west.minexmr.com" or 
  proc.args contains "xmr.hashcity.org" or 
  proc.args contains "pool.hashvault.pro" or 
  proc.args contains "fastpool.xyz" or 
  proc.args containsCode language: YAML (yaml)

As mentioned earlier, GuardDuty builds a bunch of ‘findings’ for network-related activity. These feeds are tied into specific threat patterns such as Command & Control (C2 / C&C) server connections, Denial of Service (DoS), or Bitcoin Mining tools. Examples can be seen below:

Backdoor:EC2/C&CActivity.B!DNS

This finding informs you that the listed instance within your AWS environment is querying a domain name associated with a known command and control (C&C) server. The listed instance might be compromised. Command and control servers are computers that issue commands to members of a botnet.

In Falco, we can similarly detect known C2 server IP addresses or domain names. We take the above Falco example for generic malicious IP/DNS connections from the CLI, and apply this logic to any connections (CLI or not) that go specifically to a C2 server list:

- rule: Outbound Connection to C2 Servers
  desc: Detect outbound connection to command & control servers
  condition: outbound and (fd.sip in (c2_server_ip_list) or fd.sip in (ti_c2_ip_list))
  output: >-
    Outbound connection to C2 server (dest=%fd.sip dport=%fd.sport
    dproto=%fd.sproto proc.cmdline=%proc.cmdline connection=%fd.name
    user.name=%user.name user.loginuid=%user.loginuid container.id=%container.id
    evt.type=%evt.type evt.res=%evt.res proc.pid=%proc.pid proc.cwd=%proc.cwd
    proc.ppid=%proc.ppid proc.pcmdline=%proc.pcmdline proc.sid=%proc.sid
    proc.exepath=%proc.exepath user.uid=%user.uid user.loginname=%user.loginname
    group.gid=%group.gid group.name=%group.name container.name=%container.name
    image=%container.image.repository)
  priority: warning
  source: syscall
  append: false
...Code language: YAML (yaml)

The Falco approach adds some clear advantages.

In the above scenario, we are relying on syscalls, so we can determine if these C2 connections are made on the container-level. While GuardDuty can certainly alert on unwanted C2 outbound connections, Falco provides the granularity to tell us which container is generating the traffic and from which network namespace it resides. We can extract all the relevant metadata from the syscall event.

Backdoor:EC2/DenialOfService.Dns

This finding informs you that the listed EC2 instance within your AWS environment is generating a large volume of outbound DNS traffic. This may indicate that the listed instance is compromised and being used to perform DoSattacks using DNS protocol.

Without filling up the blog post with Falco YAML snippets, we have released a blog post on detecting and preventing DoS attacks in Kubernetes using the open source tools Falco and Calico. We stated that most DoS incidents will involve an attacker trying to maintain some form of anonymity. A great way to achieve this is by using Tor VPN exit nodes. A similar Falco rule can be built with a Macro that plugs into the Tor Exit Node IP addresses.

CryptoCurrency:EC2/BitcoinTool.B!DNS

This finding informs you that the listed EC2 instance in your AWS environment is querying a domain name that is associated with Bitcoin or other cryptocurrency-related activity. Bitcoin is a worldwide cryptocurrency and digital payment system that can be exchanged for other currencies, products, and services. Bitcoin is a reward for bitcoin-mining and is highly sought after by threat actors.

Again, we have discussed cryptocurrency detection at length in Falco. Miners typically connect to miner pools on common ports, like 3333, 4444, and 8333.

- rule: Detect outbound connections to common miner pool ports
  desc: >-
    Miners typically connect to miner pools on common ports
  condition: net_miner_pool and not trusted_images_query_miner_domain_dns
  output: >-
    Outbound connection to IP/Port flagged as mining activity (dest=%fd.sip
    proc.cmdline=%proc.cmdline port=%fd.sport domain=%fd.sip.name
    container=%container.info evt.type=%evt.type evt.res=%evt.res
    proc.pid=%proc.pid proc.cwd=%proc.cwd proc.ppid=%proc.ppid
    proc.pcmdline=%proc.pcmdline proc.sid=%proc.sid proc.exepath=%proc.exepath
    user.uid=%user.uid user.loginuid=%user.loginuid
    user.loginname=%user.loginname user.name=%user.name group.gid=%group.gid
    group.name=%group.name container.id=%container.id
    container.name=%container.name image=%container.image.repository)
  priority: critical
  source: syscall
  append: false
...Code language: YAML (yaml)

Inside the net_miner_pool macro, we can see two further macros:

miner_ports lists the IP addresses associated with these bitcoin/cryptoming pools
miners_ip addresses the ports numbers commonly used for bitcoin mining

list: miner_ports
items: [25, 80, 443, 3333, 3334, 3335, 3336, 3357, 4444, 
  5555, 5556, 5588, 5730, 6099, 6666, 7777, 7778, 
  8000, 8001, 8008, 8080, 8118, 8333, 8888, 8899, 
  9332, 9999, 14433, 14444, 45560, 45700]Code language: YAML (yaml)

list: miners_ip
items: ["51.15.39.52", "163.172.162.51", "51.15.89.69", "213.32.74.230", 
  "213.32.74.219", "151.80.59.84", "51.15.39.186", "144.217.14.109", 
  "192.99.69.170", "142.44.242.100", "142.44.243.6", "144.217.14.139", 
  "128.199.55.158", "46.101.236.153", "18.184.127.10", "188.166.16.158", 
  "18.184.174.79", "18.184.182.16", "3.68.113.29", "46.101.145.131" etc]Code language: YAML (yaml)

Unlike most of the GuardDuty ‘findings,’ you can add your own good/bad IP addresses to these feeds – nothing more. This allows users to whitelist domains or IP’s they don’t wish to get detections on. GuardDuty spins this as a “feature.” The opposite can be said for Falco, where users can bring their own threat feeds and simply plug them into a YAML-formatted rule file.

Bespoke operations

If your security team needs to generate security events from your own applications or third-party applications you use, GuardDuty won’t provide a solution at the time of writing.

How do you currently generate alerts from the dozens of open source tools running in your Kubernetes cluster? Many organizations ship their events to a SIEM tool for log aggregation and further forensics.

With Falco, we offer a plugin developer guide to build your own Falco plugins. If that’s something you or your developer’s are interested in building, check out the Falco community session with HashiCorp, where we build a HashiCorp plugin on the fly.

What happens with the money?

Unlike Falco, AWS GuardDuty is not free. For instance, GuardDuty continuously analyzes CloudTrail management events. At the time of writing, CloudTrail management event analysis is charged everyper 1 million events per month and is prorated.

Falco is a free, community-backed open source project. You can deploy Falco on a local machine, cloud, a managed Kubernetes cluster, or a Kubernetes cluster such as K3s running on IoT and Edge computing. If you’re working on Linux systems, there’s very few limitations to where Falco can be run.

Saying that, managing any open source rule engine on-scale can become time consuming and make it difficult to identify blindspots. That’s why Sysdig Secure extends and scales the existing Falco rule engine by adding out-of-the-box workflows for security and compliance teams.

While GuardDuty charges based on the number of logs/events ingested, Sysdig Secure offers a simplified per-node licensing model. If you find that your organization is generating large bills due to the volume of security events triggered in your AWS environment, it might be worth considering Sysdig Secure.

Conclusion

If you are managing security solely for AWS and the services that run on AWS, like EKS, then you have a valid decision to be made between Falco and GuardDuty. If you need to extend support to other infrastructure, non-clusters, or multi-cloud strategy, that’s where Falco becomes a clear winner: it can work in on-prem, standalone VM’s, as well as collect cloud audit data from the likes of Google Cloud Platform, Microsoft Azure, and IBM Cloud.

From a management perspective, GuardDuty certainly provides a simplified approach to managing security incidents and response. In three clicks, you will have GuardDuty up-and-running. However, it comes with its own limitations. For instance, you have to go with what GuardDuty has and this is the case in general. So, it could be S3, it could be EC2, it could be EKS, but you will have to take action on what GuardDuty provides out of the box. Their ‘findings’ are not really customizable. There is no policy language or other way for you, as the user, to mark an event as a violation or not a violation. You just have to go with GuardDuty’s opinion.

On the flip side, Falco’s customizable YAML rule language allows users to pull arbitrary data out of their events and build relevant alerts. There’s no limitation to what Falco can alert on within the constraints of syscalls, kubernetes audit logs, and the CloudTrail logs discussed above.

The setup for Falco and GuardDuty was rather simple in both cases. Arguably, this is a lot simpler in GuardDuty where you follow the below workload on a single standalone cloud environment:

Open the GuardDuty console at https://console.aws.amazon.com/guardduty/
Choose Get Started
Choose Enable GuardDuty

Based on this blog post, you should be able to decide whether you wish to pursue using Falco alongside GuardDuty, or using Falco instead of GuardDuty. If the need is for a fully-customizable policy engine, then Falco will likely come out as the stronger option. However, if you wish to have a fully-managed security offering to take the burden off your security incident and event management teams, you might see GuardDuty as the safer option at first glance.

As a third option, you could have Falco as a fully-managed policy implementation for Falco. Sysdig Secure offers a managed, enterprise solution built upon scalable open-source solutions like Falco, Sysdig, OPA, and Prometheus. Combined, Sysdig Secure offers a managed Cloud Detection & Response (CDR) platform. Users won’t need to worry about managing Falco instances. The graphical user interface will show the criticality of these alerts, as well as an intuitive policy builder within the UI.

If you’d like to know more about Sysdig’s CDR platform, you can try it out for free today.