In the world of cybersecurity, noise is a critical issue associated with Day 2 operations. The complex nature of noise and its impact on detection accuracy and false positives make it a challenging topic to address when creating detection rules, including in tools like Falco. This article will provide some guidelines on tuning Falco container security rules to eliminate noise.
The tension between detection accuracy and false positives is a constant challenge in the industry, and it’s often said that the only ruleset with no false positives is one with no rules at all. While completely avoiding false positives may be an unrealistic goal, there are guidelines that can be followed to minimize their impact and reduce noise.
Test and Validate
Before using a rule in production, make sure you test it extensively in as many environments as possible (different OS distributions, kernels, container engines, and orchestrators). A great example of this is the ability to detect suspicious outbound connections to the EC2 metadata service in AWS.
By default, this rule is disabled in Falco – and there’s a good reason for that! On AWS EC2 instances, 169.254.169.254 is a special IP used to fetch metadata about the instance. It may be desirable to prevent access to this IP from specific containers, however, there are legitimate cases where an operator pod may need to connect to the AWS EC2 metadata service.
- rule: Contact EC2 Instance Metadata Service From Container
desc: Detect attempts to contact the EC2 Instance Metadata Service from a container
condition: outbound and fd.sip="169.254.169.254" and container and not ec2_metadata_containers
output: Outbound connection to EC2 instance metadata service (command=%proc.cmdline pid=%proc.pid connection=%fd.name %container.info image=%container.image.repository:%container.image.tag)
priority: NOTICE
enabled: FALSE
tags: [network, aws, container, mitre_discovery, T1565]
Code language: Perl (perl)
Regarding validation, Falco cannot inherently know which workloads need to communicate with the EC2 metadata service, and therefore has no idea what should be considered “suspicious.” The idea would be to enable this rule in a test environment, see what detections are generated, and then learn what needs to be excluded from future detections. This way, we can test and validate rules before blindly enabling them in large production environments, which goes a long way in helping to reduce noise.
Priority-based Filtering
Avoid deploying a rule for the first time with ERROR or CRITICAL as the priority. Start with DEBUG or INFO, see what happens, and increase the value if it’s not too noisy. Lower-priority rules can be easily filtered out as different stages of the output pipeline, so they don’t run the risk of waking up the security operations center team in the middle of the night.
- rule: reading sensitive file with incorrect priority
desc: Detects when the file secret.env is read
condition: evt.type = open and fd.name = /etc/secret.env
output: "Reading of cryptographic symmetric key from environmental variable"
priority: ERROR
tags: [incorrect_priority, sensitive_file]
Code language: Perl (perl)
Every Falco rule has a priority which indicates how serious a violation of the rule is. This is similar to what we know as the severity of a syslog message. The priority is included in the message/JSON output/etc.
The general guidelines used to assign priorities to rules should be:
- If a rule is related to writing state (i.e., filesystem, etc.), its priority is ERROR.
- If a rule is related to an unauthorized read of state (i.e., reading sensitive files, etc.), its priority is WARNING.
- If a rule is related to unexpected behavior (spawning an unexpected shell in a container, opening an unexpected network connection, etc.), its priority is NOTICE.
- If a rule is related to behaving against good practices (unexpected privileged containers, containers with sensitive mounts, running interactive commands as root), its priority is INFO.
Leverage Tags
The tags that you assign to your rules are included in Falco’s gRPC and JSON outputs. This means that you can use them to complement priorities and filter Falco’s outputs in an even more flexible way. A good example is using a tag for the appropriate team who should handle the relevant alert notifications.
- rule: Detect outbound connections to common miner pool ports
desc: Miners typically connect to miner pools on common ports.
condition: net_miner_pool and not trusted_images_query_miner_domain_dns
enabled: FALSE
output: Outbound connection to IP/Port flagged by https://cryptoioc.ch (command=%proc.cmdline pid=%proc.pid port=%fd.rport ip=%fd.rip container=%container.info image=%container.image.repository)
priority: CRITICAL
tags: [host, container, NETWORK, mitre_execution, T1496]
Code language: Perl (perl)
A Security Operations Center (SOC) team may not necessarily need to see every alert notification. In the case of cryptojacking, the SOC team might prefer to know when a crypto-mining binary was installed or initiated. Therefore, they could look to remove that miner from the environment. Whereas, the SOC team might not have control over network activity in containers and Kubernetes.
Instead, it might make sense for the network engineers to receive notifications related to network activity. In the case of the above Falco rule, a network team would see which IP address, Fully-Qualified Domain Name (FQDN), and/or port number the container performed egress traffic to. The network team can then apply a Network Policy on the namespace associated with that pod or container to block the connection.
Correct tagging does two things; (1) It sends the most important alert to the most relevant team to take action, and (2) it reduces noise for each team by routing only the most relevant alerts that they can take action on – rather than routing all alerts to all teams.
Different Rules for Different Infrastructure
You’ll inevitably need to write different rules for different infrastructure, such as staging versus production environments, due to the inherent differences and specific requirements of each context. Staging environments often serve as testing grounds for new features and updates, where developers can freely experiment and identify potential issues. In this case, Falco rules can be more permissive to avoid hindering development velocity, allowing for quicker iteration and feedback cycles.
- rule: Disallowed SSH Connection
desc: Detect any new ssh connection to a host other than those in an allowed group of hosts
condition: (inbound_outbound) and ssh_port and not allowed_ssh_hosts
enabled: false
output: Disallowed SSH Connection (command=%proc.cmdline pid=%proc.pid connection=%fd.name user=%user.name user_loginuid=%user.loginuid container_id=%container.id image=%container.image.repository)
priority: NOTICE
tags: [host, container, network, mitre_cc, mitre_lateral_movement, T1021.004]
Code language: Perl (perl)
In the above staged Falco rules file, there isn’t any way to know the specific production hosts for which SSH access is allowed, so the below macro just repeats ssh_port
, which effectively allows ssh from all hosts.
- macro: allowed_ssh_hosts
condition: ssh_port
- macro: ssh_port
condition: fd.sport=22
Code language: Perl (perl)
In the case of Day 2 operations, you will most likely need to override this macro to enumerate the servers for which ssh connections are allowed. For example, you might have a ssh gateway host for which ssh connections are allowed. The condition would look something like:
- macro: ssh_ProductionAllowList
condition: (fd.sip="a.b.c.d" or fd.sip="e.f.g.h" intersects (ssh_hosts))
Code language: Perl (perl)
Production environments should require a higher level of security and stability, where Falco rules should be more stringent to detect and prevent any malicious or unauthorized activities. That’s why we modify the Macros associated with the different environments. That way, we ensure the rules stay much the same from staging to production, but the Macro should be somewhat unique in each case.
These rules and supporting macros are more of an example for how to use the fd.*ip
and fd.*ip.name
fields to match connection information against IPs, netmasks, and complete domain names. To use the aforementioned Falco rule, you should enable it and populate allowed_{source,destination}_{ipaddrs,networks,domains}
with the values that make sense for your environment.
- list: allowed_outbound_destination_ipaddrs
items: ['"127.0.0.1"', '"8.8.8.8"']
- list: allowed_outbound_destination_networks
items: ['"127.0.0.1/8"']
- list: allowed_outbound_destination_domains
items: [google.com, www.yahoo.com]
Code language: Perl (perl)
Therefore, tailored Falco rules are necessary to account for the unique characteristics and potential risks associated with each environment, ensuring effective monitoring and protection.
Plan for Upgrades
Falco container security users need to carefully consider various aspects when it comes to upgrades, particularly from a “Day 2” operations perspective.
Firstly, using a Helm is the safest way to automatically upgrade and rollback easily. Falco’s Helm chart will add Falco to all nodes in your Kubernetes cluster using a DaemonSet. Then, each deployed Falco pod will try to install the driver on its own node. This is the default configuration for syscall instrumentation. Using Helm is super quick and reliable. If anything goes wrong between versions, you can easily rollback to a previous version in seconds. Therefore, avoid potential downtime in Day 2 operations.
Secondly, Falcoctl is provided as an out-of-the-box solution to manage the lifecycle of rules (installation, updates). As the name suggests, Falcoctl is a CLI tool that can perform several useful tasks for Falco admins, one of which is helping container security teams smoothly install the relevant Falco plugins for event handling from different sources (GitHub Audit Logging Services, AWS CloudTrail, Kubernetes Audit Logs, etc.).
Falcoctl can automatically pull rules from personal repo or shared community repos – with no associated downtime. Using known CI/CD techniques, we can pack the latest rules in a distributable object. Whether you prefer to stick to a stable version, or plan on being flexible with multiple versions, combining the power of Git with the standards of OCI, Falco is able to selectively retrieve the most suitable rules for each platform. Furthermore, it provides the ability to be run as a daemon to periodically check the artifacts’ repositories and automatically install new versions.
Indexes:
- name: falcosecurity
url: https://falcosecurity.github.io/falcoctl/index.yaml
Artifact:
Install:
Refs:
- k8saudit:0.5.0
Follow:
every: 6h0m0s
falcoVersions: http://localhost:8765/versions
Refs:
- k8saudit:-rules:0.5
Code language: Perl (perl)
The configuration of this behavior is also visible in /etc/falcoctl/falcoctl.yaml
.
Falco Container Security Performance Tuning
Performance is another important topic to consider writing and deploying rules, because Falco typically operates with high-frequency data sources. When you are using Falco with a system call like a kernel module or the eBPF probe, your whole ruleset might be evaluated millions of times per second. At such frequencies, rule performance is key.
Having a tight ruleset is definitely a good practice to keep Falco’s CPU utilization under control. It is also important, however, to make sure every new rule you create is optimized for performance. The overhead of your rule is more or less proportional to the number of field comparisons that the rule’s condition needs to perform for every input event. Therefore, you should expect that a simple condition like this:
proc.name=p1
Code language: Perl (perl)
This should require far less CPU usage than a more complex, intersect rule like the one seen below:
- macro: mount_info
condition: (proc.args="" or proc.args intersects ("-V", "-l", "-h"))
Code language: Perl (perl)
Therefore, optimizing a rule is all about making sure that, in most common situations, it requires the Falco engine to perform the smallest possible number of comparisons. In order to reduce the CPU overhead associated with these rules, we would recommend the below considerations:
- Rules should always start with event type checks
Falco understands when your rule is restricted to only some event types, and therefore will evaluate the rule only when it receives a matching event. For example, if your rule starts withevt.type=open
, Falco won’t even start evaluating it for any event that is not an'open'
system call. Implementing warning checks when the rule fails to include checks on event types is important to avoid sending invalid rules to production. - Falco conditions work like ‘if’ statements in software programming
Falco rules are evaluated left-to-right until something fails. The sooner you make the condition fail, the less work it will require to complete. Try to find simple ways to restrict the scope of your rule. - Push heavy complex rules to the right
You should attempt to start with the aggressive comparisons mentioned in the previous point, and only after including those rules that had a high probability of failing earlier do we then push heavy, complex rule logic. An example of complex rule logic includes long exception lists that belong at the end of the rule. - Use multiple value operators instead of multiple comparisons
Value operators can be anything likein, and,
pmatch.
Writing multiple comparisons would look something likeevt.type
or
evt.type=mkdirat
.
It’s better for performance to write with value operators:evt.type in (mkdir, mkdirat)
. - Keep rules as small as possible
This doesn’t just speed up processing of your rules, but from a Day 2 operations perspective it also ensures they are readable and maintainable.
Plan for inevitable exceptions
Good rules are designed to account for known and unknown exceptions in a way that is readable, modular, and can easily be extended. Take a look, for example, at the Write Below RPM Database
rule from the default ruleset:
- rule: Write below rpm database
desc: an attempt to write to the rpm database by any non-rpm related program
condition: >
fd.name startswith /var/lib/rpm and open_write
AND NOT rpm_procs
AND NOT ansible_running_python
AND NOT python_running_chef
AND NOT exe_running_docker_save
AND NOT amazon_linux_running_python_yum
AND NOT user_known_write_rpm_database_activities
output: "Rpm database opened for writing by a non-rpm program (command=%proc.cmdline pid=%proc.pid file=%fd.name parent=%proc.pname pcmdline=%proc.pcmdline container_id=%container.id image=%container.image.repository)"
priority: ERROR
tags: [host, container, filesystem, software_mgmt, mitre_persistence, T1072]
Code language: Perl (perl)
Note how known exceptions are included in the rule as macros (rpm_procs
, ansible_running_python
, etc.), but the rule also includes a macro (user_known_write_rpm_database_activities
) that lets the user add their own exceptions through the override mechanism.
Conclusion
In conclusion, Falco offers a runtime security tool that is well-designed to address common Day 2 operations issues. By providing a rule-based engine, Falco allows security teams to define and tune security policies to detect and respond to real-time threats in a dynamic cloud-native environment.
Falco’s priority-based filtering helps security teams to distinguish between serious security violations and less critical ones, reducing alert fatigue and enabling them to focus on the most important issues. Leveraging tags further reduces noise within network and security teams, helping them to easily identify and prioritize relevant alerts.
Testing and validating the rules before deployment is also critical, ensuring that they are effective and aligned with organizational security policies. Finally, applying exceptions to your rules is necessary! Since not all environments were built equally, we need to allow for exceptions to be made based on the unique characteristics of your environment.